Accelerating Massive Data Processing in Python with Heat
Claudia Comito
(Course no. 3052025 in the training programme 2025 of Forschungszentrum Jülich)
You can participate in this course either online or on-site at JSC. The course will be held in English.
Contents:
This hands-on tutorial introduces the Heat library, which is designed to scale Python-based array computing and data science workflows to distributed and GPU-accelerated environments. Heat offers a familiar NumPy-like API while distributing memory-intensive operations using PyTorch and mpi4py.
Topics covered include:
- Heat Fundamentals: Get started with distributed arrays (DNDarrays), distributed I/O, data decomposition schemes, and array operations.
- Key Functionalities: Explore the multi-node linear algebra, statistics, signal processing, and machine learning capabilities.
- DIY Development: Learn how to use Heat's infrastructure to build your own multi-node, multi-GPU capable research applications.
Prerequisites:
Participants should have a laptop and experience with Python and its scientific ecosystem (e.g., NumPy, SciPy). A basic understanding of MPI is helpful but not required.
Target audience:
Researchers and Research Software Engineers (RSEs) working with large datasets that exceed the memory of a single machine. HPC practitioners who support these scientists or may be interested in contributing to the project are also welcome.
Language:
The course will be held in English.
Duration:
1 half day
Dates:
24.11., 10:00 - 12:00 & 13:30 - 15:00
Venue:
Hybrid: Online or on-site at Jülich Supercomputing Centre, Building 16.3, Room 211 (Training Room 2)
Number of Participants:
Maximum 20
Instructors:
Claudia Comito (JSC), Fabian Hoppe (DLR)
Contact:
Registration:
Please register here for the course: https://indico3-jsc.fz-juelich.de/event/261/registrations/195/