# Guest Student Programme 2018

## Abstracts

### String axion simulations

**Aleksandr Boitsov, Faculty of Physics, Saint Petersburg State University, Russia**

**Adviser: Kalman Szabo**

This report discusses the question of axion strings simulations. Playing a huge role in QCD simulations, the axion still remains a puzzling particle. Moreover, there appears topological defects (strings), which make the investigation of axion evolution very challenging. In this report the method of string modelling was presented. Then, the simulations for different number of strings and various conditions were performed. The results of simulations were also discussed. There is also a brief outlook on axion string modelling problem.

Adaptive Dynamic Load Balancing with Voronoi Cells - Increasing efficiency for parallel particle simulations

**David Immel, Department of Physics, University of Duisburg-Essen, Germany**

**Adviser: Godehard Sutmann**

We discuss an adaptive dynamic load balancing algorithm for classical MD simulations, which uses a spatial domain decomposition with Voronoi cells. The algorithm minimizes a cost function and the Voronoi points follow the gradient into a better balanced system. To minimize overhead through global communication, we only use local available information during a load balancing step. We combine this load balancing algorithm with a linked cell algorithm to obtain fast (O)(N) access to short range interaction partners. The algorithm is tested for a static inhomogeneous system and a shear flow. These simulations show independent from the number of processors a strong improvement of efficiency through load balancing.

Extrapolated Stabilized Explicit Runge-Kutta Methods

**Edilbert Christhuraj, Faculty of Mechanical Engineering, RWTH Aachen University, Germany**

**Adviser: Lukas Pieronek**

In this project, a C++ implementation of Extrapolated Stabilized Explicit Runge-Kutta methods, in particular ESERK4 and ESERK5, originally introduced by Martin-Vaquero and Kleefeld, is developed. In addition to that, a C++ template which serves as a unified framework for various ESERK schemes is created. Further, in order to improve performance, manual optimizations are performed. Lastly, an attempt was made to parallelize the C++ implementation using OpenMP.

Efficient Tree Solver for Hines Matrices on the GPU using fine grained parallelization and basic work balancing

**Felix Huber, Institute of Applied Analysis and Numerical Simulation, University of Stuttgart, Germany**

**Adviser: Alexander Peyser**

The human brain consists of a large number of interconnected neurons communicating via exchange of electrical spikes. Simulations play an important role in better understanding electrical activity in the brain and offers a way to to compare measured data to simulated data such that experimental data can be interpreted better. A key component in such simulations is an efficient solver for the Hines matrices used in computing inter-neuron signal propagation. In order to achieve high performance simulations, it is crucial to have an efficient solver algorithm. In this report we explain a new parallel GPU solver for these matrices which offers fine grained parallelization and allows for work balancing during the simulation setup.

Lattice Gauge Field Generation on KNL Architecture

**Justin Loye, Numerical Physics, Université de Franche-Comté, France**

**Adviser: Eric Gregory**

Quantum Chromodynamics (QCD) is a part of the Standard Model, a physical theory that describes our current understanding of the universe. In our work, we will present a discretized version of this theory known as Lattice QCD in order to run simulation of the strong interaction between quarks in gluons. Such a system is best simulated using the Hybrid Monte Carlo method. We will use the Chroma QCD library to achieve high performance calculations to the extent of generating a physical system ready to be measured.

Visualization of Hierarchical Performance Data Using Treemaps

**Johannes Wasmer, Faculty of Mechanical Engineering, RWTH Aachen, Germany**

**Adviser: Marc-André Hermanns**

Numerical simulation of physical phenomena and data analysis is used to find answers to evermore difficult questions in science and engineering. The machines and codes required in that endeavor continue to increase in complexity. Competition for computation time on large machines is high, so the codes must run fast. Performance optimization is the discipline that tries to help achieve that goal. Today's supercomputers consist of millions of processing units, and users must apply complex parallelization schemes. Consequently, the amount of analysis data produced becomes too large for manual processing. Analysis tools help with automated processing to reduce that amount. Visualization tools help to navigate the processed data and identify where code performance issues might originate. The requirement to get a quick overview of the behavior of a particular performance metric over the whole system has not been sufficiently addressed up to the present. An expedient solution for this is the treemapping method. This visualization technique has been realized for the Cube performance report explorer, an established performance analysis and visualization tool. The feasibility of the approach is showcased and evaluated on a selection of analysis reports from codes run on IBM BlueGene machines at the Jülich Supercomputing Center.

Optimization of LQCD Kernels on x64 Architectures Utilizing Advanced Vector Extensions (AVX)

**Luis Altenkort, Department of Physics, Bielefeld University, Germany**

**Adviser: Stefan Krieg**

Intel's Advanced Vector eXtensions can be used to perform SIMD operations on 256-bit registers with C-style functions that directly correspond to assembly. Compilers often fail to vectorize complex problems automatically, thus not taking full advantage of the capabilities of the CPU. Making use of AVX intrinsics, one can speed up calculations of lattice quantum chromodynamics (LQCD) involving linear algebra significantly without having to use inline assembly. In this report it is shown how to do this for a simplified version of the main kernel of most LQCD calculations.

Parallel-In-Time Integration with XBraid of an Allen-Cahn System - Coupling DUNE and XBraid

**Lucas Kersken, Mathematics and Computer Science, University of Wuppertal, Germany**

**Adviser: Ruth Schöbel**

The Allen-Cahn equation describes moving interface problems in material science or fluid dynamics through a phase field approach. This kind of problem typically involve smooth and non-smooth parts which can not be dealt with using classical integration schemes. As solver the Truncated Non-Smooth Newton Multigrid will be applied. In order to solve it the Finite Element Method is used to discretize in space. Then the Multigrid-Reduction-in-Time method is used to parallelize in time.

Task-Based Distributed Memory Parallelization of the FMM - Coupling of a Tasking Engine to MPI

**Michael Innerberger, Institute for Analysis and Scientific Computing, TU Wien, Austria**

**Adviser: David Haensel**

The Fast Multipole Method (FMM) is a fast summation technique for long-range interactions of particles. Based on a C++-based tasking engine for intra-node parallelization of the FMM, we present a communication scheme that extends task-based parallelism to a distributed memory domain via MPI. Furthermore, we give details of implementation and experimental proof of strong scaling.

Lagrangian Transport Simulations for Climate Modeling - Coupling of ICON & MPTRAC

**Manuel Krage, Institute for Geophysics, University of Münster, Germany**

**Adviser: : Lars Hoffmann**

To date there are a number of Lagrangian particle dispersion models available to analyze and calculate the particle motions in the atmosphere. The one used in this project is called MPTRAC, developed at the Jülich Supercomputing Centre. In contrast to former studies, where meteorological reanalyses were used to drive the simulation, the approach in this work is a direct coupling to the output of an atmospheric general circulation model. Here, the ICON model of the German Weather Service (DWD) has been coupledto MPTRAC using routines of the OASIS software. In general the implementation of the coupling process was more complex than expected, especially the linking of the C code of MPTRAC with the Fortran routines of OASIS and all connected libraries of both tools took a lot of the project time. This is why the project's key focus is on the following two points.

First of all a simple model comprising a Fortran code to send and a C code to receive different data sets was created to measure the time spent for the data transfer and for the transformation from a triangular icosahedral to a cartesian grid. The resolution of the data sets was made comparable to different reanalyses to give an impression whether the computational costs of the coupling are reasonable. This test showed that the amount of time needed for coupling even for large data sets is not as high as expected, so that even coupling intervals shorter than 3 hour, which is the shortest interval of the compared reanalyses, can be reasonable to improve the simulation results.

The second step was to implement the OASIS routines in ICON and MPTRAC so that a first test of the coupling system can be established using ICON in an idealized global "Aquaplanet" experiment with a surface consisting of water only. So after all further testing and optimizing is necessary but nevertheless this project is a first step to use MPTRAC with the direct coupling to a weather and climate simulation model to be independent of reanalyses and to be able to apply arbitrary temporal and spatial resolutions of the meteorological data.

###

Scalable Electrostatics - Using the Multigrid Method

**Morian Sonnet, Department of Physics, RWTH Aachen University, Germany**

**Adviser: Paul Baumeister**

Scalable electrostatics is an important topic, due to its vast use cases, e.g. in material science or quantum mechanical calculations, where the number of objects to consider can become very large. This Guest Student Programme project concentrates on a Poisson solver for many point multipoles close to homogeneously distributed in a periodically repeated cubic unit cell. Linear scaling is archieved by solving the equation on a uniform Cartesian grid employing multigrid method.

### Analyzing I/O behaviour on HPC systems - JUQUEEN Analysis

**Petros Anastasiadis, Department of Electrical and Computer Engineering, National Technical University of Athens, Greece**

**Adviser: Salem El Sayed**

The modern HPC systems are advancing in terms of computational capabilities with a tremendous rate; but their supporting I/O systems are somewhat left behind.

So in order to design more efficient I/O systems and ensure that applications in an HPC environment would utilize them, analyzing the I/O behavior of applications on such systems has become a necessity. In this work, we focus on expanding the previous I/O Analysis done for JUGENE to JUQUEEN, and overall improving the whole analysis progress. Finally, we present the results of analyzing the I/O behavior on JUQUEEN.

## Group Photo

Copyright: Forschungszentrum Jülich