Accelerator based computer systems
Traditionally to augment the computing power of a computing machine the common approach was to increase the clock rate of its core processor. Due both to physical limitations and energy constrains, this method has reached its limits. Nowadays, computing power is gained increasing the parallelism of the system by raising the number of computing units, each one of them with a moderate clock rate. A consequence of this evolution is visible in modern desktop and laptop computers, which are built with multi-core processors: single computing components with several general purpose processing units designed to work in parallel.
In high performance computing this concept is taken even further by constructing supercomputing machines where accelerators are integrated. An accelerator is a piece of computing hardware separated from the CPU that contains many processing units with reduced functionality but specifically designed to run computationally intensive code very fast. Examples of this kind of accelerator hardware are graphic cards (GPUs), which contain hundreds of graphic cores. GPUs where originally designed to efficiently process images, but their ability to perform floating point operations at extremely high speed has brought them into the supercomputing world. Other examples of accelerating hardware are many-core processors: computing cards containing tenths of cores with less silicon area than a standard CPU and reduced functionality, put together to concurrently perform computing intensive operations.
The Jülich Supercomputing Center (JSC) is interested in accelerators and their use in high performance computing, in particular for their relatively low energy consumption per Flop. The JSC's research centers in the evaluation of their usability for scientific application developers and in the investigation of innovative architectural concepts that can improve their usability and performance.
The key focus of the NVIDIA application lab, a multi-year collaboration between Forschungszentrum Jülich and NVIDIA, is to enable the efficient use of parallel multi-GPU architectures. This includes the analysis of new features of next-generation hardware and programming models. Scientific applications from different research areas including astrophysics and astronomy, biology, elementary particle physics and materials science are used to evaluate the effort needed to port code to the latest generation NVIDIAgraphic cards in relation to the gain obtained in performance. The NVIDIA application lab provides support for the optimization of applications using GPGPUs (General-purpose computation on graphics processing units) and aims at broadening the application basis for such devices.
In the EU-funded Exascale project DEEP 16 Partners from 8 different European countries, coordinated by JSC, join efforts to build the prototype of a new Exascale enabling platform. DEEP takes the concept of compute acceleration to a new level: many-core accelerator cards (Intel's Xeon Phi) are integrated into a high-speed network (EXTOLL from University of Heidelberg) avoiding the common assignment between accelerator and host-CPU that often leads to bottlenecks. This cluster of accelerators (the Booster) is connected to a conventional HPC cluster increasing its compute performance. The cluster-level heterogeneity of DEEP will attenuate the consequences of Amdahl's law allowing users to run applications with kernels of high scalability alongside kernels of low scalability concurrently on different sides of the system, avoiding at the same time over and under subscription. Together with a software stack focused on meeting Exascale requirements comprising adapted programming models, libraries, and performance tools, the DEEP architecture will enable unprecedented scalability. The DEEP concept serves as a proof of concept for a next-generation 100 PFlop/s PRACE production system and has the potential to reach Exascale between 2018 and 2020.