Energy crisis: Exploiting savings potential in IT at ZEA-1
Forschungszentrum Jülich has set itself the goal of reducing energy consumption by 20 percent compared to previous years. This committed goal can only be achieved if all areas make their contribution.
The Simulation and Computation team uses the HPC cluster JuZEA1 at the JSC as well as a self-administered cluster at ZEA-1 for complex numerical calculations. In simplified terms, HPC clusters consist of a central login node and computing nodes connected to it. The computation jobs of the users are distributed by the login node with the help of a queue to free computation nodes, which then perform the actual computation work. Until now, it has been common for computing nodes to be switched on continuously, which leads to them consuming significant amounts of electrical energy in idle mode without benefit during periods of low utilisation. There was therefore a great potential for savings here in terms of energy consumption.
Nowadays, with the help of so-called board management controllers (BMC), modern HPC nodes offer the possibility to intervene in the energy management via a separate network interface and to specifically shut down or also start individual nodes. The configuration of the ZEA-1's queue was adapted in such a way that the computing nodes are switched off after a certain idle time (currently 2 hours). If they are needed again due to the pending computational tasks in the queue, they automatically receive the signal to start from the login node and are ready for use within a few minutes without the user having to intervene manually. The ZEA-1 cluster has a total of 7 computing nodes with a total of 264 CPU cores. The actual saving depends on the utilisation of the cluster, but is estimated at several thousand kilowatt hours per year.