2019 guest student programme
The 2019 guest student programme ran from 5 August to 11 October 2019 with 11 students.
BLIS kernels optimizations for Scalable Vector Extension on ARM-based Architectures
Baran Cengiz, Middle East Technical University, Turkey
Adviser: Dirk Pleiter, JSC
Scalable vector extension (SVE) is and extension to the ARM v8 instruction set architecture (ISA) and it will be used in different future processors: Fujitsu A64FX (Processor for the upcoming Japanese exascale computer called Fugaku), EPI Gen1. SVE offers Vector Length Agnostic programming (VLA), advanced load/store operations and per-lane-predication. This report, explains the basics of SVE,the development of BLIS kernels for SVE, and tests of SVE using different kernels (daxpy, zaxpy, jacobi). When tested on gem5 simulation, SVE extension performs 30% better for the jacobi kernel. This improvement shows that SVE might be crucial for future implementations.
Generalizing Bayes-by-Backprop Using Normalizing Flows
Chase van de Geijn, Universiteit van Amsterdam, Netherlands
Adviser: Kai Krajsek, JSC
Traditional deep learning learns a deterministic mapping function. However, some systems are probabilistic, so it is more beneficial for the model to reflect the underlying probability distribution of the model. Bayes-by-Backprop is a scaleable way to make the network parameters probabilistic, although, it makes two assumptions about the latent variables. First, it assumes the weights, or latent variables, can be represented as statistically independent Gaussian random variables, and, second, these Gaussians are independent. This work investigates these assumptions by applying normalizing flows to the Gaussian weights to allow for the weights to represent increasingly more complexity latent distributions allowing the network to model correlations between the weights. Preliminary results show that the distributions retained a Gaussian shape after applying the flow, but the covariance matrix showed an off-diagonal pattern suggesting the flow can capture correlations between network weights.
MNIST Number Recognition - Using Spiking Neural Networks, Structural Plasticity and L2L
Jacob Beyer, RWTH Aachen University, Germany
Adviser: Sandra Diaz, JSC
Mimicry of biological spiking neural networks may further our understanding of the interior working of the brain. To this end, a simulation was conducted using structural plasticity in a spiking neural network. This resembles the growing of synaptic connections in biological networks. As a test bench for the performance of this network the MNIST number
database was used, and a distinction between the 1 and the other numbers attempted. Using the performance of the network for this task a hyperparameter optimization using the Learning-to-Learn framework can be conducted. While no definite results could be obtained, some promising preliminaries could be shown.
MPI Enabled Tasking - An Asynchronous Summation Scheme for the FMM
Janos Meny, University of Bonn, Germany
Adviser: Laura Morgenstern, JSC
For large scale molecular dynamics simulation of hundred of thousands particles direct integration is prohibitivly expensive. The Fast Multipole Method (FMM) reduces the complexity from quadratic to linear time, making the simulation of large particle systems feasable. In this work we demonstrate how to implement an asynchronous communication scheme for the FMM using MPI. For this we modify a tasking framework to incorporate message passing abilities. Furthermore we introduce a distributed octree data structure to reduce data redundancies.
Playing Atari Games Using Policy Gradient Method and CuLE
Kyaw Lin Oo, Ruhr University Bochum, Germany
Adviser: Jenia Jitsev, JSC
Reinforcement learning algorithms combined with deep neural network have been successfully used to play most of Atari video games at near human-level. Conventionally, the games are emulated on CPUs while neural network update was done GPU(s). We use CuLE library, which is a CUDA port of Atari Learning Environment (ALE). Since CuLE can generate between 40M and 190M frames per hour using a single GPU, it allows for quick prototyping new neural network architecture and algorithms. In this project, we train the agent using Actor-Critic Method+V-Trace on single GPU and multi-GPUs. We then analyzed how throughput-oriented nature of GPU-based emulators effects the training performance.
Numerical Methods to Solve Two Real Parameter Non-Linear Eigenvalue Problems
Kristina Simeonova, Sofia University "St. Kliment Ohridski", Bulgaria
Adviser: Andreas Kleefeld, JSC
In this work, we want to perform a study of the stability analysis of time-invariant linear delay equations. This leads to a two parameter non-linear eigenvalue problem. As a specific case of finding a solution to a problem of that sort, the two parameter non-linear eigenvalue problem can be translated into solving a quadratic eigenvalue problem that only computes real-valued critical delays. Computing the eigenvalues of the problem, in this specific case, can be done by using different methods based either on the Kronecker product or the Jacobi-Davidson method.
Chebyshev Accelerated Subspace Iteration Eigensolver
Lennart Klebl, RWTH Aachen University, Germany
Adviser: Edoardo Di Napoli, JSC
Sequences of correlated eigenvalue problems are common in many fields of scientific research. Most solvers do not take advantage of the correlated nature. If only some eigenpairs are required, iterative methods like subspace iteration can be used - the ChASE algorithm is based on subspace iteration and can be used for sequences of correlated eigenvalue problems. It additionally applies Chebyshev filtering to the eigenvectors for maximized convergence speed. This report discusses the implementation of ChASE using the Julia language.
Pulsatile Inflow Boundary Conditions for Lattice Boltzmann Methods and their Biomechanical Effects
Maximilian Köhler, Ruhr-University Bochum, Germany
Adviser: Seong Ryong Koh, JSC
In recent years Lattice Boltzmann Methods (LBM) gain more popularity within the computational fluid dynamics (CFD) community. It is a very versatile approach to solve a set of problems. In this paper an in-depth explanation for time-dependent boundary conditions in Lattice Boltzmann Methods is given. Besides that, this report provides an introduction to LBM and FFTW (C++ Fourier Transformation library). The implementation is used to solve a hemodynamics problem and the results of it are shown as well.
Simulating Carbon Nano Structures - Omelyan Integration with Mixed Precision Solver
Marcel Rodekamp, University of Bielefeld, Germany
Adviser: Eric Gregory, JSC
Simulating Carbon Nanostructures (CNS) is a highly non trivial task and yet important due to the vast variety of possible applications for economics and science. In this paper it is described how CNS on a honeycomb lattice can be described using the Hamilton Stratonovich model with Hasenbusch terms. Furthermore, it is illustrated in which way Hybrid Monte Carlo methods with a 2nd order Omelyan integrator can be applied for the simulation of the described CNS. In the end, some remarks on the implementation using C++ with OpenACC extension are given.
Polar Stratospheric Clouds Classification with Deep Learning
Mahmoud Shoush, University of Tartu, Estonia
Adviser: Sabine Grießbach, JSC
Polar Stratospheric Clouds (PSC) have a direct impact on ozone depletion. PSC provide surfaces upon which chemical reactions take place. These chemical reactions provide the production of free chlorine, and accordingly, this free chlorine directly destroys the ozone hole. Taking care of studying PSC could help to enhance models that forecast the evolution of the ozone hole. In this report, we present the potential of applying deep learning (DL) methods to classify PSC. There are different types of polar stratospheric clouds, for which a classification algorithm exists. However, it has been shown that a characteristic feature varies depending on particle size and shape. The Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) is one of the three atmospheric chemistry sensors onboard Envisat satellite (March 2002 - April 2012). The infrared limb emission measurements represent a unique dataset of daytime and night observation of PSC up to both poles. We propose the use of DL methods to learn and detect the particle size and shape in a way to classify the PSC. Using DL to retrieve information about the composition of PSC has a high potential and during our experiments, we achieved very good accuracy on both validation and test data around 95% and we are still working on testing it on real measured data to compare it with the physical knowledge.
Application of HPX to Blocked General Matrix Multiplication and QR Decomposition
Thomas Miethlinger, Johannes Kepler University, Austria
Adviser: Edoardo Di Napoli, JSC
In this work, we want to benchmark several implementations of the blocked general matrix-matrix multiplication and blocked QR decomposition, where we used OpenMP and HPX for shared-memory programming and MPI and HPX for distributed programs. In addition to comparing the resulting runtimes,
we point out the overall advantages and disadvantages of each framework with respect to operations in numerical linear algebra, and give an outlook to their importance in future implementations of the ChASE algorithm.