Machine Learning in Earth System Science: Insights from the European Geosciences Union (EGU) General Assembly 2025

From 27 April to 2 May 2025, the European Geosciences Union (EGU) General Assembly gathered over 20,000 participants from more than 120 countries in Vienna. Among them were Carsten Hinz, Michael Langguth, and Erik Pavel, representing the Earth System Data Exploration (ESDE) research group of the Jülich Supercomputing Centre (JSC). Their participation focused on scientific exchange – particularly around the increasingly discussed WeatherGenerator project – and on strengthening networks within the international geoscience community.

Machine Learning in Earth System Science: Insights from EGU 2025
Geoscientists on their way to the conference center in Vienna.
Michael Langguth, JSC

Machine Learning Was at the Center of EGU 2025

This year, the ESDE group's machine learning (ML) research was featured prominently through contributions linked to four closely related initiatives: WeatherGenerator, RAINA which focuses on extreme weather events, HClimRep and WarmWorld, a project which centers on the further development of ICON, the current German numerical model for weather predictions. These efforts align strongly with EGU’s growing emphasis on ML. In recent years, ML has evolved from a niche topic to one of the key themes at the assembly. This year, in 2025, entire sessions and subsessions were dedicated to ML-based approaches for forecasting, modelling – even the continious development of numerical models thanks to ML – , and infrastructure, far exceeding the level of engagement seen just a few years ago.

Connecting Communities: Machine Learning Tools That Bridge Disciplines

The role of ML at EGU extended beyond scientific applications. Even the structure of the conference benefited from ML. A tool developed by researchers at the Helmholtz Centre for Environmental Research (UFZ) allowed attendees to identify thematically related abstracts across different sessions – an important step toward bridging the disciplinary gaps between EGU’s 22 divisions and enabling more cohesive interdisciplinary discussions.

Scientific Deep Dive: Modelling, Forecasting, and Technical Progress

The scientific program offered a wide-ranging overview of recent developments in ML for weather and climate. There was significant focus on the integration of physical constraints into ML-based climate models and on the advancement of ML-supported forecasting systems, including updates from the new Artificial Intelligence Forecasting System (AIFS) of the European Centre for Medium-Range Weather Forecasts (ECMWF). Discussions around technical infrastructure highlighted the ongoing development of key data formats such as netCDF and zarr. For netCDF, several talks addressed metadata standardisation efforts – such as those led by Hereon – and new approaches to facilitate cloud-based data access. For zarr, recent versions aimed at reducing inode use were presented as particularly relevant for large-scale model outputs.

The technical ecosystem supporting Earth system science also saw the introduction of innovative tools. uxarray was presented as a way to better work with unstructured grids while maintaining the familiar xarray interface. kerchunk emerged as a lightweight solution to enable data access across diverse formats, including zarr. Another recurring topic was data compression, especially lossy methods. Here, contributions from ECMWF , the German Climate Computing Centre (DKRZ) and the University of Helsinki sparked interest, notably those connected to the ESiWACE project , which addresses scalability in Earth system data workflows.

Machine Learning in Earth System Science: Insights from EGU 2025
The RAINA poster attracted a lot of interest. RAINA develops ML-based forecasts for extreme weather events, like strong precipitation. In doing so, it contributes to the EU project WeatherGenerator.
Michael Langguth, JSC

From Generative Models to Data Access: Highlights from the Research Floor

The WeatherGenerator project itself attracted strong attention through a poster session, particularly due to its use of diffusion models for simulating weather variables. The approach reflects a broader trend: generative modelling is gaining momentum within the geoscience community. This was echoed in other contributions at the conference. For example, a talk by Gabriele Franch introduced a nowcasting system – RUSH – that combines remote sensing observations with AIFS forecasts to produce high-resolution 24-hour predictions. This work mirrors similar modelling approaches while incorporating observational data at finer temporal resolutions.

Another noteworthy contribution came from Jonathan Schmidt, whose poster addressed temporally coherent, multi-variate downscaling using diffusion models. His work also tackled a pressing methodological challenge: the handling of distribution shifts between training and application phases, which often limit the generalisability of ML models in practice. In the domain of data access and infrastructure, a study on STAC catalog backends for the Copernicus Data Space presented performance comparisons between opensearch and PostgreSQL-based solutions. It highlighted potential bottlenecks and deadlocks when ingesting large, continuous streams of satellite imagery – raising critical questions for operational data systems.

Beyond the scientific contributions, EGU also served as a platform for broader discussions on research infrastructure, FAIR data principles, and the future of software ecosystems in geoscience. Presentations from the Barcelona Supercomputing Center (BSC) introduced custom tools and libraries for processing ICON and IFS model output, developed in the context of the Destination Earth initiative. These tools were positioned as potential enhancements or alternatives to ECMWF’s existing software stack, underscoring a shared momentum toward more efficient and interoperable modelling frameworks.

Conclusion: Machine Learning as a Core Element of Geoscientific Progress

EGU 2025 made it clear that machine learning has become a central driver of innovation in climate sciences. From high-resolution forecasting and downscaling to metadata harmonisation and data compression, the integration of ML into research workflows is transforming how scientists study and simulate the Earth system. The conference served not only as a showcase of technical progress, but also as a space for connecting ideas, infrastructures, and communities across disciplinary boundaries. The momentum seen this year suggests that machine learning will continue to play an increasingly foundational role in Earth system science – one that is interdisciplinary, data-driven, and deeply collaborative.

Last Modified: 23.06.2025