Navigation and service

German and Russian Scientists Join Forces to Improve HPC Performance Tuning

High-performance computing is a key technology of the 21st century. However, exploiting the full power of HPC systems has always been hard and is becoming even harder as the complexity and size of systems and applications continue to grow. On the other hand, the savings potential in terms of energy and CPU hours that application optimization can achieve is enormous.

Key to understanding and ultimately improving the performance of HPC applications is performance measurement. Unfortunately, many HPC systems expose their jobs to substantial amounts of interference (aka noise), leading to significant run-to-run variation. This makes performance measurements generally irreproducible, heavily complicating performance analysis and modelling. On noisy systems, performance analysts usually have to repeat performance measurements several times and then apply statistical analysis to capture trends. Firstly, this is expensive and secondly, extracting trends from a limited series of experiments is far from trivial, as the noise can follow quite irregular patterns.

Prof. Felix Wolf of TU Darmstadt, Dr. Bernd Mohr of the Jülich Supercomputing Centre, and Drs. Dmitry Nikitenko and Konstantin Stefanov of Moscow State University are now addressing this problem in a joint project, named ExtraNoise. It is funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) and the Russian Foundation for Basic Research (RFBR). Moreover, Prof. Torsten Hoefler of ETH Zurich is contributing his expertise as an associated partner. In addition to making performance analysis more noise-resilient, the partners also aim to achieve a better understanding of how applications respond to noise in general and which design choices increase or lower their active and passive interference potential. The project, which will run for three years, is coordinated by TU Darmstadt.

Contact: Dr. Bernd Mohr, b.mohr@fz-juelich.de

from JSC News No. 280, 26 April 2021