Accelerating Massive Data Processing in Python with Heat
Claudia Comito
(Kurs-Nr. 3052025 im Trainingsprogramm 2025 des Forschungszentrums Jülich)
Eine Teilnahme an diesem Kurs ist sowohl online als auch vor Ort im JSC möglich. Die Kurssprache ist Englisch.
Contents:
This hands-on tutorial introduces the Heat library, which is designed to scale Python-based array computing and data science workflows to distributed and GPU-accelerated environments. Heat offers a familiar NumPy-like API while distributing memory-intensive operations using PyTorch and mpi4py.
Topics covered include:
- Heat Fundamentals: Get started with distributed arrays (DNDarrays), distributed I/O, data decomposition schemes, and array operations.
- Key Functionalities: Explore the multi-node linear algebra, statistics, signal processing, and machine learning capabilities.
- DIY Development: Learn how to use Heat's infrastructure to build your own multi-node, multi-GPU capable research applications.
Prerequisites:
Participants should have a laptop and experience with Python and its scientific ecosystem (e.g., NumPy, SciPy). A basic understanding of MPI is helpful but not required.
Target audience:
Researchers and Research Software Engineers (RSEs) working with large datasets that exceed the memory of a single machine. HPC practitioners who support these scientists or may be interested in contributing to the project are also welcome.
Language:
The course will be held in English.
Duration:
1 half day
Dates:
24.11., 10:00 - 12:00 & 13:30 - 15:00
Venue:
Hybrid: Online or on-site at Jülich Supercomputing Centre, Building 16.3, Room 211 (Training Room 2)
Number of Participants:
Maximum 20
Instructors:
Claudia Comito (JSC), Fabian Hoppe (DLR)
Contact:
Registration:
Please register here for the course: https://indico3-jsc.fz-juelich.de/event/261/registrations/195/