2011 guest student programme
The 2011 guest student programme ran from 6 August to 12 October 2011 with 12 students.
Winkel, Mathias (Ed.) (2011):
Proceedings 2011, JSC Guest Student Programme on Scientific Computing (PDF, 20 MB),
Technical Report FZJ-JSC-IB-2011-06
Persons on the photo, left to right, front to back:
1. row: Herwig Zilken, Natalie Schröder, Andreas Lücke, Momchil Ivanov, Sasha Alexander Alperovich, Janine George, Kaustubh Bhat
2. row: Mathias Winkel, Martin Müser, Fabio Pozzati, Bernhard Steffen, Francesco Piccolo, Sandra Ahnen, Sebastian Banert
3. row: Bernd Mohr, Hans Peschke, Petar Sirkovic
missing: Christian Heinrich
Dynamic load balancing in JuSPIC using MPI/SMPSs
Sandra Ahnen, Karlsruhe University
Adviser: Dr. Dirk Brömmel, JSC
Up to now, load balancing in the plasma simulation code JuSPIC has been achieved on two levels: resulting from a domain decomposition of the complete simulation volume, which is necessary for the MPI parallelisation and via distribution of SMPSs tasks onto threads using the new hybrid MPI/SMPSs programming model. During this project, so called dynamic load balancing as a third level has been added. In a first small example, this lead to a considerable speed-up of the program.
Support for performance measurements of MPI File I/O for the Scalasca toolset
Christian Heinrich, Cologne University
Advisers: Dr. Bernd Mohr, Dr. Brian Wylie, JSC
Scalasca is a portable and scalable performance measurement tool that provides sophisticated support for MPI. Unfortunately, support for the performance measurements of MPI File I/O was limited, for example it did not provide any facility to get information about the amount of bytes transferred (read/written) in a specific MPI File I/O call.
In this report, the steps that have been taken to implement this will be described. Additionally, an introduction into the Scalasca wrapper generator will be given.
Hierarchical Tree Construction in PEPC
Hans Peschke, Technische Universität Dresden, Mathematik
Advisers: Dr. Lukas Arnold, Mathias Winkel, JSC
The highly scalable parallel tree-code PEPC is the first Barnes-Hut tree-code implementation which runs efficiently on the entire 288k cores of JUGENE. This is possible as almost all parts of the code scale perfectly up to this amount of cores. The currently problematic code segment handles the global exchange of branch-nodes which is going to dominate the overall run-time for an increasing number of cores. Branch-nodes are essential for the tree-traversal, since they act as entry points to remote trees. The aim of this paper is to describe the scalability issues and to design and implement an algorithm for the hierarchical tree construction in order to optimise the global exchange of data.
How fast are local Metropolis updates for the Ising model on a graphics card
Momchil Ivanov, Universität Leipzig, Physik
Adviser: Dr. Thomas Neuhaus, JSC
This report gives implementation details and results from a computer program that has been created for simulating the Ising model with local Metropolis updates on a present-day NVIDIA GPU architecture for scientific computing. The results are comparable with implementations of a similar model on the same hardware architecture from . Correctness of the code is illustrated by providing results of physical observables from conducted simulations using the program. Speedup results with regard to a CPU implementation of the algorithm are provided for different system sizes and ECC enabled/disabled GPU memory. Short discussion on the implementations of the random number generators for the GPU that have been used (Mersenne Twister and XORWOW) is provided together with performance comparison, since the time cost is on the order of the Metropolis updates.
Volume Visualisation using the Tetrahedron Method
Kaustubh Bhat, German Research School for Simulation Siences Aachen
Adviser: Prof. Erik Koch, GRS
In solid state physics applications, there is a frequent need to visualise volumetric data and calculate integrals over these volumes to be able to interpret and understand the physical aspects of the model under consideration. Keeping this in mind, we develop an algorithm for finding iso-surfaces using discrete scalar data on three dimensional meshes. Once the iso-surfaces are formed, we calculate the integrals of the scalar function over the volume enclosed by the iso-surface. These volume integrals can be used to calculate information such as the electronic charge, spin etc. over the volume. The method that forms the basis of this technique is the scheme of marching tetrahedra. The method is then implemented in code so that it is possible to interactively visualise the resulting iso-surfaces. Another aspect of the code is the ability to iteratively find the iso-value, given the integral over the volume enclosed by the iso-surface.
Quantum Chemical Calculations on the Potential Energy Surface of Ozone
Janine George, RWTH Aachen
Adviser: Dr. Thomas Müller, JSC
This work focuses on the quantum chemical calculation of the ground state energy surface of ozone and, especially, on the minimum energy path within the dissociation threshold for the reaction O3(1A1) -> O(3P) + O2(3Sg-). In order to improve these quantum chemical calculations, mainly internally contracted Multi-reference Averaged Quadratic Coupled-Cluster (ic-MR-AQCC) and internally contracted Multi-reference Configuration Interaction with all Single and Double excitations with Davidson or Pople correction (ic-MR-CISD+QD/+QP) energies were compared with MR-AQCC and MR-CISD+QD/+QP energies. The barrier of the minimum energy path cut disappears by the application of the uncontracted methods instead of the internally contracted methods. This is explainable with the consideration of a higher amount of electron correlation energy. Uncontracted MR-AQCC and MR-CISD+QD/+QP overestimate the experimental dissociation energy of 1.143 eV even at the finite cc-pV5Z basis set by 0.030 eV. While the basis set superposition error (BSSE) can be ruled out as a source for this discrepancy, size-consistency corrections may be considered as a possible error source.
Evaluation of preconditioners for large sparse matrices
Alexander Alperovich, Tel Aviv University, Israel
Adviser: Dr. Bernhard Steffen, JSC
In this report I present the outcome of an examination and benchmark of several software packages for solving linear systems of equations using various preconditioners. The preconditiones examined are based on the Algebraic Multigrid method, Incomplete LU decomposition and the Frobenius Norm approximation. The packages use Krylov-based iterative methods such as Restarted Generalized Minimum Residual (GMRES) or Conjugate Gradient (CG) as the solvers, allowing different approaches to be set as the preconditioner.
The benchmark was performed on the supercomputers of the JSC, JUGENE and JUROPA.
A Coulomb Solver Based on a Parallel NFFT for the ScaFaCoS Library
Sebastian Banert, Chemnitz University of Technology
Advisers: René Halver, Dr. Godehard Sutmann, JSC
We describe a NFFT-accelerated Ewald summation for calculating electrostatic potentials and forces for a periodic system of charged particles. Furthermore, we have a glance at our implementation of this method within the ScaFaCoS library and present the results for some special systems. In the last part, we will give stimulations for further research and developments.
Viscous flow through fractal contacts
Andreas Lücke, Universität Paderborn
Adviser: Prof. Martin Müser, NIC
In this report we investigate and simulate the fluid leakage through a seal. The flow is obtained by numerically solving the Reynold’s lubrication equation for fractal contacts. Fractal topograhies are generated with different Hurst-roughness exponents in order to create simulation cells. The influence of an external force pressing a flat elastic body against rigid substrates is calculated with the Green’s function molecular dynamics (GFMD) and the overlap-model. Finally the flow currents for the two models are compared.
GPU based visualization of Adaptive Mesh Refinement data
Francesco Piccolo, Seconda Universitá di Napoli, Italy
Adviser: Dr. Herwig Zilken, JSC
The goal of this work is a CUDA implementation of a GPU based raycasting algorithm and an octree traversal in order to speed up the visualization of AMR datasets. For this purpose complex data structures are employed to map the entire dataset to the graphics memory. An octree texture based method is used to store the data in the GPU memory and the data lookup is based on a reduced-stack traversal algorithm. The visualization algorithm uses the inherently hierarchical data structure for an efficient visualization. The volume raycasting and hierarchical data retrieval are both computationally demanding and massively parallel problems.
Brain volume reconstruction - parallel implementation of unimodal registration
Petar Sirkovic, University of Zagreb, Croatia
Advisers: Timo Dickscheid, INM-1; Oliver Bücker, Dr. Bernhard Steffen, JSC
In order to construct a 3D brain volume, a large number of 2D brain slice images are combined. These images are usually significantly deformed during the preparation process and they have to be mapped to the correct geometrical place. In cases when there is no geometrically correct reference image, a sequence of mappings between neighbouring images is produced. This process is usually strictly sequential. It takes several hours to compute one binary registration, which leads to several months needed to produce a full brain volume. The purpose of this work is to test the parallelisation of this process, the concatenation of binary registrations and investigate the potential problems that arise on the way.
Porting and optimization of a Lattice Boltzmann D2Q37 code to Blue Gene/Q
Fabio Pozzati, University of Bologna, Italien, Computer Science
Advisers: Prof. Dirk Pleiter, Willi Homberg, JSC
We describe the implementation and optimization of a Lattice Boltzmann code for computational fluid-dynamics on the massively parallel BlueGene/Q architecture. We analyze the behaviour and the performance using a prototype version of BG/Q which is installed at the IBM research lab Böblingen. Using the large degree of parallelism of the underlying algorithm, it is possible to make use of all the available parallel resources of the architecture (multi-node, multi-core, SIMD).