Can large clusters of GPUs lead us to ExaScale?
William Sawyer, Swiss National Supercomputing Centre (CSCS)
The Swiss National Supercomputing Centre (CSCS) possesses, among others, Piz Daint, a computing platform with 5,272 nodes, each with one Intel Xeon E5-2670 (Sandybridge) CPU and one NVIDIA Tesla K20x graphics processing unit (GPU). The accelerators help give this platform the 6th place (6.7 PFlops) in the www.top500.org list, and their low power consumption (3.1 GFlops/W on www.green500.org) makes it the most energy-efficient PetaFlop/s machine in the world. However, realizing the full potential of the machine is fraught with challenges: Additional software development to port customer applications to the GPUs, and scaling applications to the full extent of the machine requires considerable forethought. The step to exascale will only exacerbate the challenges.
In this talk we present various techniques used in porting several of our mainstream user applications. These techniques include the development of a domain specific embedded language to support finite difference methods combined with a full rewrite to utilize this DSEL, the use of the NVIDIA CUDA/C++ programming language, as well as the use of OpenACC accelerator directives to approach a "single-source" programming paradigm. We also present our experience in "jump-starting" application development by the users themselves during the first European OpenACC hackathon (EuroHack15). We report an honest mix of experiences: there are deficiencies and strengths in all the existing paradigms, and while certain applications scale strongly to petascale (and perhaps later to exascale), others can only achieve this through weak scalability.