PRACE-Trainingskurs "Parallel and Scalable Machine Learning"

Anfang
25.02.2019 08:00 Uhr
Ende
27.02.2019 15:30 Uhr
Veranstaltungsort
Jülich Supercomputing Centre, Rotunde, Geb. 16.4, R. 301

(Kurs-Nr. 1222019 im Trainingsprogramm 2019 des Forschungszentrums)

Zielgruppe:

Mitarbeiter, die Daten mit Machine Learning analysieren wollen

Inhalt

 

Voraussetzungen:

Job-Submissions zu großen HPC-Maschinen über Batch-Skripts, Kenntnisse der mathematischen Grundlagen in Linearer Algebra sind hilfreich.


Bitte bringen Sie Ihr eigenes Notebook mit (mit einem ssh-Client).

Sprache:

Der Kurs wird auf Englisch gehalten.

Dauer

3 Tage

Zeit:

25. - 27. Februar 2019, 9.00-16.30 Uhr

Ort:

Jülich Supercomputing Centre, Rotunde, Geb. 16.4, R. 301

Teilnehmerzahl:

maximal 40

Referent:

Prof. Morris Riedel, Dr. Gabriele Cavallaro, JSC

Ansprechpartner:

Dr. Gabriele Cavallaro


Telefon: +49 2461 61-3858


E-mail: g.cavallaro@fz-juelich.de

Anmeldung:

Bitte melden Sie sich bis zum 22. Januar 2019 über das

Anmeldeformular bei PRACE

an.


The course offers basics of analyzing data with machine learning and data mining algorithms in order to understand foundations of learning from large quantities of data. This course is especially oriented towards beginners that have no previous knowledge of machine learning techniques. The course consists of general methods for data analysis in order to understand clustering, classification, and regression. This includes a thorough discussion of test datasets, training datasets, and validation datasets required to learn from data with a high accuracy. Easy application examples will foster the theoretical course elements that also will illustrate problems like overfitting followed by mechanisms such as validation and regularization that prevent such problems.

The tutorial will start from a very simple application example in order to teach foundations like the role of features in data, linear separability, or decision boundaries for machine learning models. In particular this course will point to key challenges in analyzing large quantities of data sets (aka ‘big data’) in order to motivate the use of parallel and scalable machine learning algorithms that will be used in the course. The course targets specific challenges in analyzing large quantities of datasets that cannot be analyzed with traditional serial methods provided by tools such as R, SAS, or Matlab. This includes several challenges as part of the machine learning algorithms, the distribution of data, or the process of performing validation. The course will introduce selected solutions to overcome these challenges using parallel and scalable computing techniques based on the Message Passing Interface (MPI) and OpenMP that run on massively parallel High Performance Computing (HPC) platforms. The course ends with a more recent machine learning method known as deep learning that emerged as a promising disruptive approach, allowing knowledge discovery from large datasets in an unprecedented effectiveness and efficiency.

This course is a PRACE training course and also held in connection with the DEEP-EST project.

Letzte Änderung: 11.04.2022