ATML Application Optimisation and User Service Tools

ATML Application Optimization and User Service Tools

The primary mission of ATML Application Optimisation and User Service Tools (ATML-AO) is HPC user support on the JSC supercomputing systems. Support is provided through three major channels: direct user support on-demand through the ticket system as a service, the development and maintenance of open-source HPC tools, as well as training in form of courses on HPC topics. The ATML-AO’s strengths lie in its unique combination of development, training, and support activities within one team, which enables the adaptation of HPC tools to suit user needs, including powerful in-house tools such as JUBE and LLview, as well as the user service portals JuDoor and JARDS.

Within the Program-oriented Funding (PoF IV) of Helmholtz Information, this group contributes to Program 1 “Engineering Digital Futures”, Topic 1 “Enabling Computational- & Data-Intensive Science and Engineering”.

Support and Services

ATML-AO is active across all GCS support levels:

  • Support Level 1 - Help and service desk
  • Support Level 2 - Code analysis and optimisation
  • Support Level 3 - Code refactoring
  • Support Level 4 - Joint research projects

The ATML-AO manages the SC support team which is responsible for Support Level 1 and staffs most of it's members. The HPC service desk and the Office for User Services are reachable by mail and phone and all internal communication is handled through dedicated ticket systems.

ATML-AO makes an emphasis on complementing re-active support with proactive and pre-emptive user support. ATML-AO continuously monitors the usage of JSC systems in collaboration with the system administrators. Based on data provided through the in-house tools LLview and KontView, the support team reaches out to users and assists them in optimising their workflows for better performance, resource utilisation, and sustainability.

ATML-AO provides monthly, interactive support sessions to JSC users. The JSC HPC Support Corner starts with a short topic presentation by the support team followed by a generall Q&A that is shaped by the questions and needs of JSC users.

As a special service, the ATML-AO organises, in collaboration with the system administrators, the BigDays. This special mode of operation of the HPC systems is implemented to facilitate large-scale and full system runs, and serves as preparation for the Exascale system JUPITER. BigDays are offered on Tuesdays based-on-demand and upon registration.

Furthermore, the ATML-AO is responsible for the JSC-wide project mentor assignment to compute projects on the HPC systems and staffs several mentors. ATML-AO organises a cross-sectional, weekly meeting bringing together system administrators, the support team, as well as SDLs and ATMLs to discuss and solve both system and user related issues.

The ATML-AO contributes to JSC’s Office for User Services, where it oversees the management of user accounts, computing time, data projects, database accounts, external access to JuNet (VPN) and WLAN, and DFN certificates. The team, comprising staff from both front and back offices, serves as the primary point of contact for users and project managers who may be experiencing difficulties with supercomputer accounts or external access.

Research and Software Development

ATML-AO provides HPC user support through the development and maintenance of open-source HPC tools. Notable tools include LLview for job monitoring, JUBE as a versatile and portable benchmarking and workflow environment, LinkTest for communication benchmarking, SIONlib for scalable highly parallel I/O at exascale and PinningTool for thread-core binding configuration. Tool development is driven by the goal to better understand HPC resource usage patterns and to optimise HPC resource utilisation. The ATML-AO has contributed a set of synthetic benchmarks to the JUPITER Benchmark Suite. In addition, the ATML-AO develop and provide user service tools such as JuDoor, JARDS (in collaboration with other groups) and KontView.

Research activities of ATML-AO are focused on user needs and demand. The ATML-AO's primer research avenue is the optimisation of parallel I/O. One of the main outcomes is the open-source library SIONlib, which implements efficient parallel I/O of task-local data from massively parallel applications.

  • LLview (incl. the supplementary tools JURI and calibrate) - Job monitoring and job reporting
  • JUBE - Benchmarking environment
  • SIONlib - Parallel I/O library of task-local data from massively parallel applications
  • JuDoor - Portal for managing accounts, projects and resources at JSC
  • KontView - Monitoring and analysis of compute project's resource usage on the JSC systems
  • JARDS - Joint Application, Review, and Dispatch Service for handling resource allocation processes
  • Pinning tool - Visualisation and verification of process pinning
  • LinkTest - Scalable MPI point-to-point benchmark

Training

As an integral component of proactive and pre-emptive support, ATML-AO provides vital HPC training courses in MPI, OpenMP, Parallel I/O and Portable Data Formats. Furthermore, ATML-AO organises the 'Introduction to Supercomputing at JSC - Theory & Practice' training course held twice a year always at the beginning of every allocation period for new compute time projects. The introductory course is the largest course at JSC, aimed to allow a smooth start for new users at JSC removing many initial issues before they occur through teaching best practices, such also optimising HPC resource utilisation. Key strengths of the introductory course are the direct on-demand support from the support team during the hands-on, direct accessibility of domain experts to discuss specific user issues, and a range of specialisation lectures that allow to deepen knowledge in various state-of-the-art HPC topics. ATML-AO also contributes to other training activivties such as GPU Hackathons and the IHPCSS.

Contact

Dr. Wolfgang Frings

Division Head "Application Support" PI in Helmholtz Information Program 1, Topics 1 and 2

  • Institute for Advanced Simulation (IAS)
  • Jülich Supercomputing Centre (JSC)
Building 16.3 /
Room 316
+49 2461/61-2435
E-Mail
Last Modified: 19.03.2025