Institute for Advanced Simulation (IAS)

Jülich Supercomputing Centre (JSC)

Trainingskurs "From zero to hero, Part II: Understanding and fixing intra-node performance bottlenecks"

Anfang

05.11.2019 08:00 Uhr

Ende

06.11.2019 15:30 Uhr

Veranstaltungsort

Jülich Supercomputing Centre, Ausbildungsraum 1, Geb. 16.3, R. 213a

(Kurs-Nr. 1092019 im Trainingsprogramm 2019 des Forschungszentrums)

Zielgruppe:	Scientists/Developers who want to understand performance-critical hardware features of modern CPUs such as SIMD, ILP, caches or out-of-order execution, and utilize these features in their applications in a performance portable way. (Advanced course)
Inhalt:
Teilnahmevoraussetzungen:	Participation in the Part I course or deep knowledge of the covered topics; Linux (ssh), Command line tools (grep, less), knowledge of Fortran, C or C++ and a threading framework (std::thread, pthreads, ...); Experience with own code exhibiting performance/scaling bottlenecks; optional: Git: examples are provided in a git repository Editors: vim or emacs to work on remote machines
Sprache:	Der Kurs wird auf Englisch gehalten.
Dauer:	2 Tage
Termin:	5. - 6. November 2019, 9.00 - 16.30 Uhr
Ort:	Jülich Supercomputing Centre, Ausbildungsraum 1, Geb. 16.3, Raum 213a
Teilnehmerzahl:	mindestens 5, höchstens 15
Referenten:	Andreas Beckmann, Dr. Ivo Kabadshow, JSC
Ansprechpartner:	Andreas Beckmann Telefon: +49 2461 61-8713 E-mail: a.beckmann@fz-juelich.de
Anmeldung:

Generic algorithms like FFTs or basic linear algebra can be accelerated by using 3rd-party libraries and tools especially tuned and optimized for a multitude of different hardware configurations. But what happens if your problem does not fall into this category and 3rd-party libraries are not available?

In Part I of this course we provided insights in today's CPU microarchitecture. As example applications we used a plain vector reduction and a simple Coulomb solver. We started from basic implementations and advanced to optimized versions using hardware features such as vectorization, unrolling and cache tiling to increase on-core performance. Part II sheds some light on achieving portable intra-node performance.

Continuing with the example applications from Part I, we use threading with C++11 std::thread to exploit multi-core parallelism and SMT (Simultaneous Multi-Threading). In this context, we discuss the fork-join model, tasking approaches and typical synchronization mechanisms.

To understand the parallel performance of memory-bound algorithms we take a closer look at the memory hierarchy and the parallel memory bandwidth. We consider data locality in the context of shared caches and NUMA (Non-Uniform Memory Access).

In this course we present several abstraction concepts to hide the hardware-specific optimizations. This improves readability and maintainability. We also discuss the overhead costs of the introduced abstractions and show compile-time SIMD configurations as well as corresponding performance results on different platforms.

Covered topics:

Memory Hierarchy: From register to RAM
Data structures: When to use SoA, AoS and AoSoA
Vectorization: SIMD on JURECA, JURECA Booster and JUWELS
Unrolling: Loop-unrolling for out-of-order execution and instruction-level parallelism
Separation of concerns: Decoupling hardware details from suitable algorithms

This course is for you if one of the following questions:

Why is my parallel performance so bad?
Why should I not be afraid of threads?
When should I use SMT (hyperthreading)?
What is NUMA and why does it hurt me?
Is my data structure optimal for this architecture?
Do I need to redo everything for the next machine?
Why is it that complicated, I thought science was the hard part?

The course consists of lectures and hands-on sessions. After each topic is presented, the participants can apply the knowledge right-away in the hands-on training. The C++ code examples are generic and advance step-by-step.

Bitte senden Sie Ihre Anmeldung bis 25. Oktober 2019 an Andreas Beckmann.

Wenn Sie nicht Mitarbeiter des Forschungszentrums Jülich sind, geben Sie bei der Anmeldung bitte die folgenden Daten an:
Vorname, Name, Geburtsdatum, Nationalität, vollständige Adresse des Wohnorts, E-Mail-Adresse

In eigenem Kalender speichern (ICS)

Letzte Änderung: 07.08.2025