The Challenge of Exascale
As of May 2022, the first-ever exascale system made it onto the Top500 list of the fastest supercomputers in the world. Forschungszentrum Jülich is also working intensively on developing new technologies that will make such a leap possible. In an interview, Prof. Thomas Lippert and Prof. Estela Suarez from the Jülich Supercomputing Centre talk about the challenges associated with this type of system.
Jülich, 30 May 2022 – Finally, it has become reality! Following a number of setbacks over the last two years, a supercomputer has officially achieved exascale for the very first time. This was made public in the latest Top500 list of the fastest computers in the world, which was issued today at the ISC supercompuing conference in Hamburg. Frontier, the supercomputer deployed at the Oak Ridge National Laboratory in the USA, is the first supercomputer to achieve 1018 floating point arithmetic operations. Building an exascale supercomputer of this kind has been seen as the next major step in high-performance computing for many years now.
Prof. Dr. Dr. Thomas Lippert, head of the Jülich Supercomputing Centre (JSC), and Prof. Dr. Estela Suarez (JSC), who coordinates the European DEEP projects that aim to develop an exascale-capable supercomputer ecosystem, discuss the latest developments in the following interview.
Building an exascale computer is considered to be a gigantic challenge. What are the difficulties involved?
Thomas Lippert: Hundreds of steps need to be taken at different levels to make that one big step towards exascale computing. Many difficulties are related to the fact that since 2005, certain scaling laws such as Moore’s law no longer apply as they did in the 20 years before. The performance of the processors used to improve a hundredfold every 10 years, for example. In the case of supercomputers, other degrees of freedom have also been exploited and ultimately a thousandfold increase in performance has been achieved. This fundamental increase in the performance of processors no longer occurs today for various reasons, which makes it much more difficult to take a substantial step forward in the field of supercomputing.
Problems include energy consumption and heat generation. Today, one rack has an electrical input of 150 kilowatts. The output of only 1 rack is therefore 10 to 15 times higher than that of a standard heating system in a single-family home. The heat must be dissipated somehow, otherwise the system will evaporate within minutes. In the past it was possible to use air cooling, but nowadays, other, more efficient solutions are used, namely hot water. This cooling technology consumes less energy and the generated heat can be used to keep buildings warm.
With regard to exascale computing, however, many other fundamental questions arise: How can so many processors and components be mastered at the same time? And how can this be administered? We are no longer able to manually install a software update on 10,000 machines. Orchestrating software is required to completely administer a system.
For what applications is an exascale computer needed?
Thomas Lippert: The scientific issues are extremely diverse. The current mass of simulation applications is always a compromise between the available computing power and the size of the problem being simulated. Climate simulations, for example, achieve a resolution of 10 kilometres on a petaflop computer. However, the aim is to achieve a resolution of one kilometre. Furthermore, there are hotspots that must be very finely resolved to pick up certain phenomena that occur on a very small scale. This is only possible if the computing power of the machine is correspondingly higher.
“It is our philosophy to get better and better, to simulate bigger and more complex problems, in order to achieve an increasingly realistic picture of the world.”
The exascale computer represents the next step but is by no means the end. In the foreseeable future, the maximum system performance will always be used. The higher it is, the more reliable the result. This is especially true for weather and climate simulations and ecosystem research, but also for any other complex system such as molecular dynamics simulations and drug research.
Estela Suarez: In addition to problem size and accuracy, there is always the issue of what aspects can be taken into account. In climate research, the state of the oceans and the Earth’s surface is important as well as the atmosphere. Connecting them all computationally is incredibly complex. The same is true for other fields. Neuroscience, for example, aims at simulating the entire human brain. To understand certain diseases, you need to look at individual neurons and at the same time reproduce all the different functions they fulfil. You therefore need a level of computing power that is higher than that of today’s supercomputers.
As you mentioned before, it is becoming more and more difficult to make substantial progress in supercomputing. What approaches are there to further increase computing power?
Estela Suarez: In the past, the standard approach was to increase the clock frequency of the processors. At some point, this no longer worked because the energy consumption became too high. As a result, and to keep increasing compute performance, more processor cores were installed per unit. This strongly increasing parallelism could also be observed with normal computers and mobile devices. Subsequently other processing units besides CPUs were used, such as graphics cards, which were originally developed for other purposes, for example for applications in the gaming industry. They are now also used in supercomputers, since they offer a lot of computing power and require comparatively little energy.
This approach will continue to play an important role in the future. This means we will keep looking at technologies that were not originally intended for supercomputers but can be used in this field. As a next step, quantum computers and neuromorphic chips based on the human brain may be integrated as soon as they are fit for this purpose. At Jülich, our approach focuses on a modular supercomputer architecture. This makes it possible to define different clusters, each with different hardware properties, and connect them to each other. Users can then access all these cluster modules simultaneously, depending on what their code needs.
But exascale is not just about hardware, the software is at least as important. Systems are becoming more and more complex. The software must be able to handle the growing hardware heterogeneity and make the systems accessible to the end users. The task is to further develop the corresponding software packages, provide new interfaces and prepare application codes for future exascale machines. Among others, we are working on this in the currently running project "DEEP-SEA".
Thomas Lippert: As things currently stand, we have submitted an application through the Gauss Centre for Supercomputing, which is an amalgamation of the three national high-performance computing centres HLRS (High-Performance Computing Center Stuttgart of the University of Stuttgart), JSC (Jülich Supercomputing Centre), and LRZ (Leibniz Supercomputing Centre, Garching near Munich), as part of the European Union’s call within the framework of EuroHPC. This call is for a European exascale computer, which will be based on European technology and should be ready to go in to operation in 2024.What progress has been made with the plan to establish an exascale computer at Jülich?
Thomas Lippert: As things currently stand, we have submitted an application through the Gauss Centre for Supercomputing, which is an amalgamation of the three national high-performance computing centres HLRS (High-Performance Computing Center Stuttgart of the University of Stuttgart), JSC (Jülich Supercomputing Centre), and LRZ (Leibniz Supercomputing Centre, Garching near Munich), as part of the European Union’s call within the framework of EuroHPC. This call is for a European exascale computer, which will be based on European technology and should be ready to go in to operation in 2024.
Needless to say, this type of system poses a great challenge. But, we have to remember that these machines will benefit society as a whole. It is important to realize that in 20 to 30 years’ time, many of these machines will perform essential tasks for us such as optimizing traffic and safety in entire cities or monitoring our environment. We are on the way towards autonomous driving and the digital twin. All of these things will have to be computed somewhere. The technologies we are developing today will play a crucial role in this.