Logo image
Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark
Conference proceeding

Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark

Simon J Pennycook, Simon D Hammond, Gihan R Mudalige and Stephen A Jarvis
19/11/2010

Abstract

QA76 Electronic computers. Computer science. Computer software
We present the performance analysis of a port of the LU benchmark from the NAS Parallel Benchmark (NPB) suite to NVIDIA's Compute Unified Device Architecture (CUDA), and report on the optimisation efforts employed to take advantage of this platform. Execution times are reported for several different GPUs, ranging from low-end consumer-grade products to high-end HPC-grade devices, including the Tesla C2050 built on NVIDIA's Fermi processor. We also utilise recently developed performance models of LU to facilitate a comparison between future large-scale distributed clusters of GPU devices and existing clusters built on traditional CPU architectures, including a quad-socket, quad-core AMD Opteron cluster and an IBM BlueGene/P.

Metrics

1 Record Views

Details

Logo image

Usage Policy