Artificial intelligence on the supercomputer

Large language models trained on open data. The combination of AI and concentrated computing power is bringing about crucial progress.

An application that can benefit from JUPITER and its Nvidia GPUs is machine learning. In particular, generative AI, which produces images and text, for example, has impressively demonstrated in recent years with language programs such as ChatGPT that these algorithms can come remarkably close to human language.

Large language models (LLMs) are trained with a multitude of texts. They use these texts to learn the probability of one word following another – and can thus form a meaningful sentence, word by word. The model must therefore memorize huge amounts of data on linguistic building blocks and how they are interconnected.

“A leap in the quality of these language models was only achieved when they were trained with numerous parameters on huge amounts of data. And that was only possible because high-performance computers were used,” explains Chelsea Maria John (JSC). In a team focused on algorithms and methods for computing accelerators, such as JUPITER’s GPUs from Nvidia, she works at the interface between AI and supercomputers.

Her main focus is an open-source language model that she is helping to develop in the OpenGPT-X project: “Behind this is a German network of ten project partners. They come from research, as well as from industry and the media. OpenGPT-X models should be able to handle different European languages, but especially German.” The software for training such LLMs should also run on JUPITER. Its processors are tailor-made for AI.

“This makes it possible to train large language models much faster and more efficiently. However, we also have to ensure that the tasks are distributed evenly across all processors,” says John. “Another challenge will be to minimize energy consumption. Training language models involves a very high level of electricity consumption.” This is another reason why JUPITER was designed to be particularly energy efficient (see infobox).

Chelsea Maria John requires the computing power of JUPITER to develop open-source language models.

The 24,000 GPUs in the Booster module are designed to process data in a highly parallel manner. In contrast, conventional processors (CPUs) are particularly good at performing complex calculations in rapid succession. To perform these calculations, they have a small number of powerful processing cores. GPUs, on the other hand, have more processing cores that are not quite as powerful, but which work hand in hand simultaneously. This parallel computing allows them to perform the relatively simple individual operations that are carried out for AI, for example, faster than conventional processors.

Text: Arndt Reuning