PUNCH4NFDI - JSC Activities
TA2: Lattice QCD
The long-term archive at JSC is a key component of the International Lattice Data Grid (ILDG), which is currently being revised as part of the PUNCH4NFDI project. The ILDG provides a federated infrastructure of regional grids (Europe, Japan, USA and UK) that enables Lattice Quantum Chromodynamics (LQCD) researchers to store lattice gauge field configurations to multiple storage elements (SE) around the world, JSC being a central SE in Europe. The data stored in the ILDG can be used to calculate the properties of quarks and gluons and how they bind into into hadrons such as protons and neutrons.
Lattice gauge field configurations are the result of expensive, large-scale simulations performed on the world's most powerful supercomputers. The configurations, which were calculated using a Monte Carlo method with the same physical parameters (typically as a Markov chain), are combined in so-called ensembles.
The ILDG provides a metadata catalog (MDC) for the ensembles and configurations, a file catalog (FC) that specifies the storage location of a specific configuration for one or more SEs and an access control server (ACS) that specifies the rights to the files. This allows the working groups that created the ensembles to decide independently on the storage and distribution policy.
Thanks to new exascale machines like JUPITER, in the coming years LQCD calculations will generate an ever-increasing amount of data that needs to be stored. Currently, the entire ILDG contains approximately 2 petabytes of data. This amount is expected to increase rapidly in the coming years as current efforts to modernize the infrastructure will allow more researchers to use it effectively.
The development of the ILDG began in the early 2000s, but it is currently being modernized, updated and improved as part of PUNCH4NFDI, being a crucial component of task area 2 (TA2). One of the goals of the new developments is to redesign the ILDG infrastructure so that it becomes LQCD agnostic, thus allowing other communities to deploy similar federated systems for their field-specific data and metadata.
TA2: Long-term archiving (LOFAR)
Low Frequency Array (LOFAR) is an international project to build and operate an interferometric array of radio telescopes without moving parts. The electric signals from the distributed LOFAR antenna fields are digitised, transported to a central processor, and software-controlled combined in order to map the sky. The data collected at the radiotelescope is sent to a central processor in Groningen and then further to the 3 sites in Europe which build together the Long Term Archive (LTA): SURF in the Netherlands, JSC in Germany, and PSNC in Poland.
The LOFAR LTA in JSC consists of a distributed heterogenous storage system based on the dCache software. dCache allows storing and retrieving huge amounts of data, distributed among a large number of different server nodes, under a single virtual file system tree with a variety of standard access methods. The distributed storage system is made up of spinning disks for “online” or “hot” data and tapes for “offline” or “cold” data.
TA2-WP2: Compute4PUNCH
JSC compute resources in the JURECA-DC cluster are planned to be accessible as one of the compute sites of the Compute4PUNCH common computing grid. The integration is planned to be based on the HTCondor job manager and the COBalD/Tardis software framework. However, the integration of the JSC site should meet special security requirements, which imposes additional restrictions on the integration layout. In order to find an appropriate solution we elaborate different custom approaches based on so-called HTCondor grid universes or utilizing a proxy virtual machines. We develop both approaches, so it can be compared with each other and one of them will be chosen as the best integration scheme.
TA3
To best meet the needs of the community, a user-oriented choice of codes will be made that shall be further developed and optimised. In the first stage, the codes will be optimised that are used by the largest number of users at the HPC centres. In the second stage, PUNCH4NFDI will also include the most resource-intensive codes. From both measures all users will benefit because the overall available resources will be more efficiently used. The optimised data production tools will include large-scale simulations and analysis software packages used in astrophysics and LQCD, as well as hydrodynamical simulations performed e.g.’ in the cosmology, high-energy astrophysics and nuclear physics communities. In astrophysics, the focus will be on state-of-the-art N-body and hydrodynamical codes. In QCD applications, the focus will be on solvers for large sparse matrices and improved discretisation schemes for solving Hamiltonian equations of motions on four-dimensional space-time lattices.
TA3-WP2: Numerical methods and simulations
Various research communities are involved in the PUNCH4NFDI consortium using different codes and algorithms. Despite this diversity, there are common challenges in the development, improvement and optimisation of basic, time-consuming routines that need to be addressed in order to achieve acceptable sustained performance in large-scale simulations and data analysis campaigns performed in HPC environments. Hence, aim of this work package is to provide tuned libraries of numerical tools to the user. The repository of libraries will contain the most commonly applied algorithms used on HPC machines in Germany. It will differ from existing code repositories in that these codes are optimised for the HPC platforms in Germany and are adapted whenever hardware changes occur. Most of the computing time used at the HPC computer centres by the PUNCH community can be attributed to about twenty different code bases. The performance-critical parts of these codes will be encapsulated in low-level libraries that address the needs of users in the PUNCH research community and are tuned for the heterogeneous compute environments.
TA3-WP3: Machine Learning
Due to the high complexity and size of PUNCH datasets — e.g. observations by astronomical observatories LOFAR, SKA, Rubin Observatory, AutoML algorithms can be useful for a range of other scientific applications. PUNCH4NFDI will increase the diversity of datasets and study the robustness and performance of the developed AutoML frameworks by using the library of benchmark datasets curated by MaRDI.
TA3-WP4: Methods for analyses across datasets
A prerequisite for resolving the heterogeneous data format problem is to ensure that the datasets have appropriate metadata describing their format such that appropriate dataset conversion or reading tools can be automatically selected. Specific examples of conversion/reading methods will be implemented for selected LQCD, astrophysics and other data formats. These tools will be deployable within heterogeneous computing environments. The developed framework will be transparent and easy to apply for the user when analysing multiple datasets automatically, including the required converters/reads in the workflow
TA5: Data irreversibility
The rapid increase in both data rates and data complexity leads to several vital challenges soon to be seen throughout society as we enter the ”Internet-Of-Things” era, where large sets of ”sensors” will transmit data upon which autonomous ”actors” will react. However, the substantial increase of power consumption for storage solutions, e.g. cloud computing, requires the investigation of resource-optimised data sets with maximal relevance and minimal redundancy. Decisions will need to be made, often in real-time and without human intervention, which information to keep or how to compress it with calculable loss. Loss will be inevitable and mostly irreversible, while off-line analyses or emerging additional information will feed back and dictate modifications of the on-line processes (“dynamic filtering”). The decision process of rules and methods for the extraction of pertinent information out of huge data streams in real-time will need to be updated frequently and captured as important metadata. Hence, the impact of the information loss must be traced and gauged in order to allow drawing adequate conclusions from archives, which will no longer be static but dynamic entities.
TA5-WP1: Implications for discovery potential and reproducibility
A few approaches are being developed in this section. One of them is based on the Shannon’s information theory potentially allowing to identify valuable data in large data streams and data sets. This method is thought to be powerful and fast, though it is quite general and demanding careful adjustments to each type of data. It can be used for discovery of the most informative parts of unseen data and dynamic filtering at the pre-processing stage. In addition to that, approaches based on the Bayesian inference method could also be widely applicable in this context, since they allow to propagate and integrate information from several sources as well as to update the parameter distribution with new observed data. Bayesian updating is particularly important in the analysis of data streams.
TA5-WP4: Scaling workflows
In this section we carry out performance tests of scientific software. For example, parts of the LOFAR reduction pipeline supporting MPI are tested on our HPC clusters. Based on the test results, ways of increasing performance are worked out.
TA6: Synergies and Services
This task area targets cross-cutting activities that foster an exchange of concepts and developments among the PUNCH4NFDI community as well as with other consortia and the NFDI in general. Synergies are often closely related to the common use of services being provided either to subsets of the community or to the entire NFDI. Special emphasis is placed on a set of core topics (i.e. open data and metadata, big data management, and authentication and authorisation infrastructure (AAI)).
TA6-WP2: PUNCH AAI
The PUNCH4NFDI user management, user authentication and authorization (AAI) is operated as part of the Helmholtz ID by JSC. The service is a strongly connected AAI, which follows already the recommendations and guidelines for participating in EOSC. Helmholtz ID supports multiple communities beyond the Helmholtz association and is a well accepted service in the operation with other community or research infrastructure AAIs. The experience in operating Helmholtz ID and the usage, adoption and enhancement of the AAI in PUNCH4NFDI is a valuable contribution to the Base4NFDI activity on Identity & Access Management (IAM4NFDI).
TA7: Training Courses
JSC provides a range of training courses which we disseminate to the PUNCH communities:
- Einführung in ParaView zur Visualisierung von wissenschaftlichen Daten (training course, online)
- Introduction to parallel programming with MPI
- Parallel programming with OpenMP
- Introduction to Bayesian Statistical Learning (training course, online)
- GPU Programming Part 1: Foundations (training course, on-site)
- Introduction to Unreal Engine for Science (training course, online)
- Interactive High-Performance Computing with JupyterLab (training course, online)
- Data Analysis and Plotting in Python with Pandas (training course, online)
- Programming in C++ (training course, on-site)
- Introduction to Bayesian Statistical Learning 2 (training course, online)
- Virtual Worlds for Machine Learning (training course, online)
- High-performance computing with Python (training course, online)
- GPU programming Part 2: Advanced GPU Programming (training course, online)
- Introduction to parallel programming with MPI and OpenMP (training course, on-site)[beginner+intermediate level]
- In-Situ Visualization on High-Performance-Computers (training course, online)
- Directive-based GPU programming with OpenACC (training course, online)
- Parallel I/O and Portable Data Formats (training course, on-site)
- High-performance scientific computing in C++ (training course, online)
- Advanced Parallel Programming with MPI and OpenMP (training course, online)Introduction to parallel programming with MPI