2013 guest student programme
The 2013 guest student programme ran from 5 August to 11 October 2013 with 13 students.
Persons on the photo:
left to right:
Front: Martin Perdacher, Qian Zhang, Viorel Chihaia, Jannis Ehrlich, Alexander Aschikhin, Nicola Cadenelli, Benjamin Schott
Center: Henrik Larsson, Thomas Müller, Patrick Steinbrecher, Maciej Golik, Erik Järleberg, Ivo Kabadshow, Thomas Neuhaus
Back: Michael Knobloch, Stefan Krieg, Ulrich Kemloh, Dominik Gräser, Markus Werner, Marcus Richter
missing: Hadeer El Habashy
Domain decomposition for simulating the route choice of pedestrians
Dominik Gräser, Wuppertal
Adviser: Dr. Ulrich Kemloh, JSC
This work describes the modelling of different route choice strategies for pedestrians on a navigation mesh. The mesh is obtained by applying a constrained delaunay triangulation on the provided geometry. The shortest path for the pedestrians is gained using the A* algorithm, which provides a sequence of mesh edges a pedestrian has to cross to reach a certain destination. A visibility router is introduced to enhance the process of crossing the mesh edges. Finally the funnel algorithm is implemented to further improve the process. In a case study the different routing strategies are compared in terms of usability to simulate realistic movement of pedestrians.
Benchmarking performance and scalability of the package PRIMME for sequences of dense correlated eigenproblems
Martin Perdacher, Wien
Adviser: Dr. Edoardo Di Napoli, JSC
The full-potential linearized augmented planewave (FLAPW) is an all-electron Density Functional Theory (DFT) method, which enables the simulation of electronic properties in solid materials. Within DFT, the Kohn-Sham equations are used to compute the density of interacting electrons. Solving the Kohn-Sham equations leads to the computation of non-linear generalized eigenproblems, where a self consistent cycle is the state of the art approach. It has been recently shown, that there is a correlation between two adjacent cycles in the self consistent approach, where the eigenvectors undergo an evolution process. The higher the index of two adjacent cycles, the more collinearity between two vectors is observed. In order to exploit the correlation between the eigenvectors of two adjacent cycles we test two Davidson methods, JDQMR and GD+K. These methods are implemented in the PRIMME framework. We show the benefit of using approximate versus random starting vectors.
Self-consistent atomic orbital computation and visualization
Qian Zhang, Aachen
Adviser: Prof. Dr. Erik Koch, GRS
The study of atomic orbitals plays an important role in understanding the intrinsic properties of atoms. In this report, we first discuss how to compute atomic orbitals for a one-electron system numerically. Then we generalize the problem to many-electron systems to obtain solutions in the self-consistent field approximation. In the end, we implement Monte-Carlo sampling to visualize the computed orbitals in three-dimensional space.
MD simulations to study the irregular stiffness behavior of poly(N-isopropylacrylamide)
Hadeer El Habashy, Cairo
Adviser: Dr. Sissi de Beer, JSC
The following study presents molecular dynamics simulations targeting a single chain Poly(N-isopropylacrylamide) to investigate its molecular stretching and stiffness behavior. Correlating force-extension curves to the intramolecular and molecular-solvent interactions energy terms as a function of temperature and/or time will be used to deeply understand and explain the stiffness behavior. Also, Molecular dynamics simulation on SPC/E water model is done to characterize the water model to be used in the main simulation with PNIPAM.
Communication-avoiding strategies for massively parallel N-body simulations
Erik Järleberg, Stockholm
Adviser: Dr. Ivo Kabadshow, JSC
In this work, we have implemented and evaluated a number of different parallelization strategies for the N-body problem. We have improved upon the result of an existing, communication-avoiding reference algorithm and have evaluated the performance on JUQUEEN. Our algorithm achieves a 85% parallel efficiency on 65536 cores, computing all forces and potentials of a particle system containing 114537 particles in 23 ms - a 35 percentage point improvement compared to the reference algorithm.
Molecular orbital generation sequences
Henrik Larsson, Kiel
Adviser: Dr. Thomas Müller, JSC
A program was written to easily use molecular orbitals from smaller quantum chemical calculations as an initial starting guess for larger ones. are orbital projections between basis sets, similar structures, point groups, spatial orientations of the molecular structure and the assembling of molecular orbitals from fragments. The implemented methods are working independent of the respective quantum chemical code. These projections can accelerate quantum chemical computations substantially (10 to 30%) and make otherwise non-converging wave function optimisations even possible.
Enable basic MPI-3 support for the Score-P performance measurement system
Nicola Cadenelli, Brescia
Adviser: Dr. Michael Knobloch, JSC
The Score-P measurement infrastructure is a highly scalable and easy-to-use tool suite for profiling, event tracing, and online analysis of High Performance Computing (HPC) applications. Score-P offers the user a maximum of convenience by supporting the following analysis tools: Scalasca, Periscope, Vampir, and Tau. In this report, we will describe the steps needed to expand the support of Score-P performance measurement system to the new features of MPI-3, taking a special care for the Neighborhood Collectives. Moreover, an introduction to the noteworthy features that MPI-3 includes and to the instrumentation of MPI functions will be given.
Performance and energy efficiency characterization of an embedded GPU platform
Markus Werner, Bonn
Adviser: Prof. Dr. Dirk Pleiter, JSC
The following work analyses a low-power compute node architecture, a board comprising a low-power ARM processor and a GPU for mobile devices. Ways to measure its performance and electric power consumption are explored. As reference system a GPU-accelerated sever node at the FZJ is analyzed to draw conclusions about the energy efficiency of the new architecture. Both architectures are presented and problems regarding the comparability of both systems are discussed. The energy to solution is the measure to compare the efficiency of the systems. In terms of performance the ARM-Board is inferior to its reference system, but resource usage is comparable in some benchmarks. For a presented application, the ARM-Board consumes 50% less energy compared to the server node while taking 5 times longer to solve the same problem.
How to deal with very small matrices in spacetime
Patrick Steinbrecher, Bielefeld
Adviser: Dr. Stefan Krieg, JSC
Vectorization had a big impact on the processing power of CPUs in the past few years. The present paper aims to show how to take advantage of Single Instruction Multiple Data (SIMD) operations by using architecture specific low-level compiler “macros” called intrinsics and the influence on Lattice Quantum Chromodynamics kernels.
Hybridization and tuning of a PIC code on JUQUEEN
Alexander Aschikhin, Hamburg
Adviser: Dr. Dirk Brömmel, JSC
The new BlueGene/Q system, JUQUEEN, offers significant computing capabilities for programs that show a high degree of scalability and are well-tailored to its architecture. EPOCH, a Particle-in-Cell plasma simulation program, relies solely on the MPI library for parallelization. To make better use of the shared memory on the node, we implemented an OpenMP-based parallelization scheme and analyzed the resulting scaling behavior.
Physical annealing in the microcanonical ensemble
Benjamin Schott, Leipzig
Adviser: Dr. Thomas Neuhaus, JSC
In this study, within the Microcanonical ensemble, a deterministic approach was used to find the ground state energy of the classical 2D Ising spin glass model by embedding it in a classical Heisenberg spin system coupled to a cold bath consisting of a classical Heisenberg spins chain. Results indicate that with this method computational effort scales exponentially in the system size. Scaling constants are given for a set of different coupling constants. An outlook for the use of this method to simulate quantum systems is given.
Standalone client for the UFTP data transfer tool
Maciej Golik, Kraków
Adviser: Dr. Bernd Schuller, JSC
The UNICORE File Transfer Protocol (UFTP) was designed to overcome some limitations and security issues of the File Transfer Protocol (FTP) and to improve user experience of people using the UNICORE Grid middleware. The protocol works with Network Address Translation (NAT) on client and on server side, this was done by redesigning FTP by combining features of passive and active modes and applying additional security fixes like encrypting command data by default. In addition to those UFTP requires opening only two ports on server firewall and allowing outgoing traffic on client firewalls. Also by splitting authentication and file transfer services to two different machines, UFTP guarantees that any security breach will have minimal impact on the rest of the infrastructure. All these changes allow users to use the new protocol for transferring files as fast as FTP but as securely as SCP. However, because UFTP was designed as part of UNICORE middleware it prevents some people from using it, because it requires installing the whole UNICORE software stack. My task in this project was to modify the UFTP client and to create an authentication server from scratch, as lightweight as possible. This change will allow more people to take advantage of UFTP and enable new usage scenarios.
Completely and highly efficient parallel implementation of the Lowe-Andersen thermostat
Jannis Ehrlich, Bremen
Adviser: Dr. Viorel Chihaia, JSC
Two completely parallel algorithms for the Lowe-Andersen thermostat are presented. These algorithms were implemented in the IBIsCO code and show a good control over the temperature. Moreover they have a significantly better scaling than the original and a previously developed parallel scheme and speed up the computation significantly. We further found that the total runtime of all implementations is not depending on the collision frequency and the cut off radius of the LA-thermostat.
Poster announcing the 2013 Student Colloquium