Introduction to Scalable Deep Learning (training course, online)

Start
8th May 2023 07:00 AM
End
12th May 2023 11:00 AM
Location
Online
Contact

Dr. Stefan Kesselheim

(Course no. 1622023 in the training programme 2023 of Forschungszentrum Jülich)

This course will take place as an online event. The link to the streaming platform will be provided to the registrants only.

Contents:

In this course, we will cover machine learning and deep learning and how to achieve scaling to high performance computing systems. The course aims at covering all levels, from fundamental software design to specific compute environments and toolkits. We want to enable the participants to unlock the resource of machines like the JUWELS booster for their machine learning workflows. Different from previous years we assume that the participants have a background from a university level introductory course to machine learning. Suggested options for self-teaching are given below.

We will start the course with a presentation of high performance computing system architectures and the design paradigms for HPC software. In the tutorial, we familiarize the users with the environment. Furthermore, we give a recap of important machine learning concepts and algorithms and the participants will train and test a reference model. Afterwards, we introduce how deep learning algorithms can be parallelized for supercomputer usage with Horovod. Furthermore, we discuss best practicies and pitfalls in adopting deep learning algorithms on supercomputers and learn to test their function and performance. Finally we apply the gained expertise to large scale unsupervised learning, with a particular focus on Generative Adversarial Networks (GANs).

Contents level

in hours

in %

Beginner's contents:

4.5 h

30 %

Intermediate contents:

10.5 h

70 %

Advanced contents:

0 h

0 %

Community-targeted contents:

0 h

0 %

Prerequisites:

We assume that the participants are familiar with general concepts of machine learning and/or deep learning, such as widely used models, losses, regularization and basic model training / testing. Many excellent self-training resources are available such as:

Hands-on experience with ML/DL framework is required, first experience with HPC systems is helpful.

Target audience:

Scientists who want to unlock supercomputer power for ML/DL workflows.

Learning outcome:

After this course, participants will be able to parallelize Tensorflow and Pytorch ML workflows on HPC machines, taking into account the HPC system architecture and circumventing typical pitfalls and bottlenecks.

Language:

This course is given in English.

Duration:

5 half days

Date:

8-12 May 2023, 9:00 - 13:00

Venue:

Online

Number of Participants:

maximum 40

Instructors:

Dr. Stefan Kesselheim, Dr. Jenia Jitsev, Dr. Mehdi Cherti, Dr. Alexandre Strube, Jan Ebert, JSC

Contact:

Dr. Stefan Kesselheim

Head of SDL Applied Machine Learning & AI Consultant team

  • Institute for Advanced Simulation (IAS)
  • Jülich Supercomputing Centre (JSC)
Building 14.14 /
Room 3023
+49 2461/61-85927
E-Mail

Registration:

Please register via the registration form .

Last Modified: 03.04.2023