Interview: #1 Ranking on Green500
Background and outlook on the European exascale supercomputer JUPITER
13 May 2024
The European exascale supercomputer JUPITER at Forschungszentrum Jülich is set to take scientific simulations to a new level and enable breakthroughs in artificial intelligence. The system is also a real pioneer in terms of energy efficiency. Its precursor, the JUPITER Exascale Development Instrument (JEDI), was ranked number one on the Green500 list of the most energy-efficient supercomputers in May 2024. In this interview, Prof. Dr. Dr. Thomas Lippert, director of the Jülich Supercomputing Centre, explains what the new Jülich efficiency record is all about.
Prof. Dr. Dr. Thomas Lippert, how exactly do you make it onto the Green500 list? How do the measurements work?
The Green500 list is published alongside the TOP500 list of the world’s most powerful supercomputers. It only contains systems that feature in the TOP500 ranking. But the Green500 list has a different focus. What matters is not maximum performance, but maximum performance per watt, i.e. efficiency – taking into account the power consumption of the computer.
The decisive factor is the same benchmark as for the TOP500, the High-Performance Linpack (HPL). The aim is to solve linear systems of equations with double precision. During the execution of the HPL, the power consumption of the respective components is measured at fixed intervals. This determines how many floating point operations per second and per watt can be achieved. The exact duration of the test run depends on the size of the system. In the case of the JEDI system, it took about 10 minutes.
What are the reasons for this exceptional efficiency?
Besides the surrounding rack infrastructure, the most important role is played by the computing units used. The most efficient version of the NVIDIA Grace Hopper Superchip (GH200) was selected, which offers maximum performance per watt. Unlike much of the competition, the NVIDIA Grace CPU uses the ARM processor architecture. The chip has a very high number of processor cores and is therefore ideal for compute-intensive tasks. It is also more efficient in standby mode than comparable processors. With a low-power DDR of the LPDDR5x type, Grace also uses a memory that consumes particularly little energy. The energy-intensive part of the superchip, the Hopper GPU, can be managed more flexibly via the superchip than with dedicated CPUs.
Even without optimizations, the JEDI system hardware was able to achieve first place on the Green500 list in November 2023. After this success was demonstrated early on in the installation, various parameters were subsequently tested in order to evaluate which optimizations the hardware would allow. Common parameters include, for example, the shutdown or the fixed (i.e. static) allocation of processor cores. Another interesting parameter is the clock rate of processors and graphics accelerators (GPUs). During the energy crisis in recent years, we at JSC have already conducted experiments to optimize the performance per watt on our current flagship, the JUWELS Booster. We will now transfer these findings to the upcoming JUPITER supercomputer.
What was the purpose of installing the JEDI development system?
JEDI was installed in parallel to JUPITER in order to have a small version with identical hardware to the JUPITER Booster available at this early stage. The focus here is on supporting the JUPITER Research and Early Access Program (JUREAP). This should ensure that the JUPITER hardware can be used efficiently and at an early stage. Both scientific simulations and AI models should be possible to carry out at an early stage and continuously scaled up to exascale. JEDI also serves as a platform for the development of the JUPITER management stack. This is the software for managing JUPITER, which ensures that the system is ultimately available to users in a reliable manner.
Unlike JUPITER, which is installed in an innovative modular HPC data centre (MDC), JEDI is located in JSC’s existing data centre. For the Green500 run, half a BullSequana XH3000 cell with 12 slots and thus 24 servers and 96 GH200 chips was available, connected via a NVIDIA Quantum-2 InfiniBand interconnect. Over the course of May, 12 additional slots will be added to complete the XH3000 and thus launch JUREAP.
What other measures are planned for the final exascale system to ensure it operates as sustainably as possible?
The most important measure is, of course, first of all to optimize efficiency as far as possible. During the tendering process and in our contracts, we have also made sure that JUPITER is powered entirely by green electricity.
During the procurement process for JUPITER, great importance was attached to ensuring that the hardware used in the system was not trimmed exclusively for maximum performance, but that it also offered advantages in terms of optimizing power consumption. The Eviden BullSequana XH3000 platform was therefore chosen for the final hardware of the JUPITER Booster module. This uses direct hot water cooling. For most of the year, the water can be cooled by free cooling in the ambient air. This is significantly more efficient than a conventional cold water supply or conventional air cooling.
The modular data centre in which JUPITER will be installed is designed to extract the waste heat and use it for the heat supply. In the medium term, a connection to the low-temperature heating network on the Jülich campus is planned. The electrical power that JUPITER consumes during operation will ultimately be predominantly converted into heat. And this will then be reutilized.
What is the current development status of the overall system? When will JUPITER go into operation?
JUPITER is being installed in several phases. It is somewhat of a marathon due to the size of the system and the complexity of the overall project. The launch of JEDI, on the other hand, is more like an intermediate sprint.
In April of this year, the concrete floor of the modular data centre container in which JUPITER will be installed was poured on the Jülich campus. In the coming weeks, production of the JUPITER hardware will begin at the Eviden factory in Angers, France. Various parts of the modular data centre are also manufactured there.
The first containers for the modular data centre are expected to be delivered in June. The delivery and construction phase will then continue throughout the summer and autumn, ensuring that large parts of the JUPITER hardware will be on the Jülich campus by the end of this year. The end of the installation process and the start of user operation is planned for the beginning of next year, with scientists already able to access JEDI and the subsequently installed parts of JUPITER via the JUREAP early access programme during the set-up phase.