link to homepage

Institute for Advanced Simulation (IAS)

Navigation and service

PPP Kroatien

Optimization of Materials Science algorithms on hybrid HPC platforms (HoMSa)

Taking linear algebra routines, such as eigenvalue solvers, to high-performance computing levels is not a simple task. Quite often the diversity of architectures is regarded as an obstacle for the determination of the relative merits of performance versus programmability. The traditional approach is to introduce highly specialized optimizations at the lowest level (i.e. single lines of code) through intensive manual labour or by autotuning of the computational libraries. However, hardware-specific and optimized implementations often come at the price of the lack of portability. Experience has shown that programming efforts by means of promising programming approaches and high-speed architectures may end in dead ends. A second serious limitation of traditional routines is their black-box structure which prevents the acquisition and the exploitation of extra knowledge pertaining to specific tasks and their solutions. In some cases, this means that code developers are forced to optimize, tailor, or even rediscover algorithms for specific combinations of target architecture and applications.

This project addresses both limitations by proposing a change in paradigm. The aim is to combine the generation of variants, their extended parallelism and their performance signatures in a conceptually novel algorithmic implementation which, by harnessing the numerical properties of the given eigenproblem adapts to the available computing platform. The main scientific goal of this project is to design, extend, and implement two complementary algorithms for the solution of eigenvalue problems emerging from Materials Science computations: Chebyshev Accelerated Subspace iteration Eigensolver (ChASE), specifically tailored to sequences of eigenvalue problems and implicit Hari-Zimmermann method for computing Hamiltonian and Overlap matrices without solving the eigenproblem or even fully forming its defining matrices explicitly.

As part of the goal, the outcome of the implementation will be a number of performance-portable parallel routines specifically tailored to emerging architectures such as accelerator-based platforms consisting of multi-core CPUs equipped multiple Nvidia GPUs and modular clusters comprising computing boosters based on Intel Xeon Phi coprocessors.
To this end, instead of relying on BLAS, we will generalize the current state-of-the-art techniques to write highly parallel routines for eigenvalue decomposition; that is, analytical models to determine optimal tiling (decomposition), selective packing (permutation) to maximize its reuse and amortization, and highly-tuned microkernels that attain performance close to the architecture's peak. Finally, in order to attain high scalability on hybrid architectures, we will address the problems of task decomposition, fine-grained dynamic scheduling of tasks, and overlapping of communication with computation to minimize overhead. In order to fully utilize the hybrid system, we will use dynamic scheduling to offload the compute-intensive parts of code or tasks to the most appropriate computational unit (CPU, GPU or KNL).


Forschungszentrum Jülich GmbH, Germany
JSC's contact person is Dr. Edoardo Di Napoli

Rudjer Boskovic Institute, Zagreb, Croatia
Contact person is Dr. Davor Davidović

University of Zagreb, Zagreb, Croatia
Contact person is Prof. Sanja Singer

The partner programme is supported by the German Academic Exchange Service (DAAD) under the grant ID 57449075.

The grant period is January 2019 until December 2020.