Navigation and service

Modular Science

Summary

The modular science framework is a software for the deployment of complex interactive workflows on supercomputers, which serves as an orchestrator for scientific applications.

Background and Motivation

Launching applications on supercomputers requires interaction with scheduling systems and the setup of software and working environments. A successful job submission on a supercomputer (the execution of a defined set of instructions) depends on the accurate definition of paths to libraries, access to data input/output, availability of required computational resources, correct definition of the job within the limits imposed by the scheduler and the correct execution of the instructions enclosed in the job. Each of these dependencies can be a source of problems which prevent the job execution. If the job is part of a complex workflow, the dependencies multiply and the potential of failure increases, especially when the workflow is interactive. Each component must function for the system to run correctly. Early detection of configuration and simulation errors prevents waste of computing resources.

Our approach

The goal of modular science is to provide a low entry threshold software framework to ease the deployment of complex interactive workflows on HPC resources. In particular, we tackle two problems: the robust online deployment of multiple applications, and the orchestration of data flows in an interactive and reproducible fashion.

Modular Science integrates:

  • APIs for the transfer of data between different applications
  • API and interaction contracts for multiple user groups
  • Wrapper software to allow the deployment and monitoring of these applications on hybrid HPC resources
  • An interactive visualization front end which can be used to define workflows from a scientific model perspective and explore the results of processing at different steps in the workflows at run time.

Modular Science Fig 1. General diagram of the modular science design, in which an orchestrator coordinates the execution of workflows involving several applications (here marked as A and B) running independently using a common communication API. The diagram also shows the role of the visualization and steering modules which allow the user to interact with the workflow during execution.

Our contribution

We are involved in the following activities:

  • Design and organization of working groups
  • Use case specification
  • Benchmarking, testing and development of the Proof of Concept

Our collaboration partners

We are working together with Prof. Wolfram Schenck from the Center for Applied Data Science Gütersloh of Fachhochschule Bielefeld — University of Applied Sciences and Jun.-Prof. Dr.-Ing. Benjamin Weyers from the University of Trier.