JUWELS in action: researchers develop model to predict enzyme function using supercomputer
An interdisciplinary team of experts from the Forschungszentrum Jülich, the HHU Düsseldorf and the Helmholtz AI at the Helmholtz Centre Munich has developed a new model for machine learning at the molecular level. The AI model ‘TopEC’ analyses enzyme structures, learns their chemical reactions and can derive their functions from this. A major step for enzyme technology and biocatalysis. The model was trained on the JUWELS supercomputer at the JSC.

Proteins are the basis of all cellular life. Knowledge of their three-dimensional structure, which was first recognised in 1958, has since then determined progress in molecular biology, medicine and biotechnology.
The shape of a protein is determined by the interactions of its atoms. It is this structure that determines how the protein functions by interacting with other molecules. Advances in protein structure prediction have made it particularly possible since the introduction of AlphaFold to accurately predict the structures of enzymes. AlphaFold is an AI programme that uses neural networks to predict the three-dimensional structure of proteins based on their amino acid sequence. This is an important step, because precise prediction of enzyme functions is needed to develop sustainable, bio-based processes and to accurately interpret genome data.

New machine learning model recognises enzyme functions
Extensive databases are filled with structural models. Nevertheless, such a model is only available for about 60 percent of all known enzyme functions. To close this gap, Gohlke and a group of scientists from the Institute for Bio- and Geosciences at HHU Düsseldorf developed the ‘TopEnzyme’ database two years ago as part of a research project. (DOI: 10.1093/bioinformatics/btad116). The aim will now be to gradually add the information that is currently missing.
This research team, led by Prof. Gohlke, has now developed the new machine learning model in collaboration with AI experts from Helmholtz AI (Helmholtz Munich): ‘TopEC’ assesses enzyme functions on the basis of more than 250,000 structures from the protein and AlphaFold database. The model was trained on the JSC supercomputer JUWELS.
Computer-assisted methods to close data gaps
Providing a database is one thing – populating it with data is quite another. This is because precisely determining the molecular functions of an enzyme on the basis of its predicted structure remains a challenge. The function of an enzyme cannot always be derived 1:1 from its structure – this makes the experimental determination of enzyme functions not only time-consuming but also prone to error. It is therefore not surprising that existing databases sometimes contain incorrect function assignments.
Computer-aided methods that are directly based on the enzyme structure can help. By making functional predictions on a large scale in an automated manner, they can quickly and accurately fill in data gaps – and thus make a fundamental contribution to the correct evaluation of biological data.
Development and training on the JUWELS supercomputer
To develop and train the model, the team used computing time on the JUWELS supercomputer at JSC, provided by the John von Neumann Institute for Computing (NIC). In doing so, they were able to reduce the computational requirements by using a special approach: instead of using the complete enzyme structure, the researchers implemented a localised, atom-type-based 3D descriptor that focused on the nearest hundred atoms around the active site of an enzyme. The training speed increased significantly.
By extracting further information from the enzyme structure – such as distances and angles between atoms – TopEC significantly increases the accuracy of predicting enzyme functions compared to conventional methods. Furthermore, the model is particularly robust against structural variations in enzyme binding sites and can recognise similar functions across different structural features.
One possible application is the targeted search for new enzymes. TopEC can be used to identify new enzyme variants using a purely computer-based approach. This offers completely new possibilities, particularly in the context of sustainable biotechnology.
The challenge that scientists now face is that there are already more than 30 million enzymes with predicted functions – mostly based on sequence comparisons. The actual error rate of these predictions is not known exactly. Refining this data could be the first major task for TopEC, based on as many automatically generated structural models as possible – for example, from AlphaFold. Gohlke and his team now want to investigate the potential of this method in a follow-up project.
Helmholtz School for Data Science in Life, Energy, and Earth (HDS-LEE) / Helmholtz AI
The HDS-LEE Graduate School is an international English-language graduate school aimed at excellent graduates in mathematics, computer science, natural sciences and engineering who want to improve the development of data science methods and use cutting-edge data science technologies to solve challenging scientific problems.
Helmholtz AI is an application-oriented platform for artificial intelligence that accelerates science throughout the Helmholtz Association. It enables the development and implementation of AI solutions, promotes collaboration and ensures access to resources and expertise.
Original publication: TopEC: prediction of Enzyme Commission classes by 3D graph neural networks and localized 3D protein descriptor, by van der Weg, K., Merdivan, E., Piraud, M., Gohlke, H. Nature Commun. 2025, 16, 2737. DOI: 10.1038/s41467-025-57324-5
Contact at JSC: Prof. Dr. Holger Gohlke
Further information:
HDSLEE: HDS-LEE - Helmholtz Information & Data Science Academy
Helmholtz AI: Helmholtz AI
Bioinformatics at the FZJ: Bioinformatik (IBG-4)