# Guest Student Programme 2010

## Proceedings

Speck, Robert und Winkel, Mathias (Eds.) (2010)

Technical Report IB-2010-04, December 2010

## Abstracts

### Implementation and Evaluation of Integrators for the Fast Multipole Method

**Valentina Banciu, University of Bucharest**

**Adviser: Oliver Bücker, Ivo Kabadshow**

Among alternative linear scaling and/or low cost methods, the Fast Multipole Method (FMM) has become the method of choice in applications where one is interested in accurate potentials. In this report, we revisit the method operators and steps, and analyze different particle integrators with respect to accuracy, stability, effectiveness or memory footprint.

### Hardware and Software Routing on the QPACE parallel computer

**Konstantin Boyanov, DESY Zeuthen**

**Adviser: Willi Homberg**

The torus network of QPACE, the currently most energy efficient parallel computer in the world, allows up to now only for nearest-neighbor communication. This communication pattern is sufficient for numerical simulations in the field of Quantum Choromodynamics, the initial target application of QPACE. Nevertheless, expanding the torus network functionality to allow any-to-any communication is very important for broadening the spectrum of applications, which can take advantage of QPACE's high-performance, low-energy parallel architecture. Possible extensions to the custom designed Torus Network (TNW) are considered and a simple and low-overhead routing algorithm is proposed. Furthermore, the proposed algorithm is implemented and tested in the OMNeT++ event-based simulation environment. We show that the simulation model implemented is a good representation of the real hardware, and we also test and verify the algorithm implementation using communication patterns as they occur during matrix transposition.

### Development of a parallel, tree-based neighbour-search algorithm

**Andreas Breslau, Cologne University**

**Adviser: Paul Gibbon**

In Astrophysics it is quite common to use a combination of an N-body code and a SPH code for the computation of self-gravitating matter. SPH is a Lagrangian method where the particles are used as the discrete elements within a fluid description. Thermodynamic properties are computed at the simulation points from averages over neighbouring particles. To do this it is necessary to know the next neighbours of each particle. In this article the implementation of a neighbour search algorithm using the tree-code PEPC is described. The algorithm is based on the existing routine for the force summation, adapted to return neighbour lists instead of multipoles. The correctness of the parallel neigh- bour search is verified using both visual and quantitative tests. The scaling of the algorithm with particle and process number is shown to be O(N log N ) or better.

### Implementation of a Parallel I/O Module for the Particle-in-Cell Code PSC

**Axel Hübl, Dresden University of Technology**

**Adviser: Anupam Karmakar**

An efficient parallel I/O module for the particle-in-cell code PSC has been developed using the highly scalable library SIONlib, harnessing a one-file-for-all-tasks strategy. This module enables efficient production runs on large-scale HPC systems. The performance has been extensively tested and compared with the existing one-file-per-task I/O module. The new implementation largely reduces the resource requirement for data dumping as well as for post-processing for the code PSC.

### Pedestrian dynamics: Implementation and analysis of ODE-solvers

**Timo Hülsmann, Wuppertal University**

**Adviser: Ulrich Kemloh, Mohcine Chraibi**

In this report two approaches for run-time optimization of the General Centrifugal Force Model (GCFM) are analysed. First we give a short introduction in modeling pedestrian dynamics and introduce the GCFM. The first approach consists of ordering pedestrians' data in the local memory to preserve data locality. The purpose is to reduce cache misses and is done using space-filling curves. This is presented in the second part of this report. A small decrease of computation-time was achieved. In the third part different ODE-solvers are investigated, with fixed and with varying step-size. The solvers are the Velocity Verlet method and different orders of the Runge-Kutta-Fehlberg method. Here an increase of the average step-size was achieved.

### Integration of high order compact scheme into Multigrid

**Alina Georgiana Istrate, Wuppertal University**

**Adviser: Godehard Sutmann**

A 6th order compact difference scheme was implemented in a particle-particle particle- mesh method code for molecular simulation where multigrid method was used for solving the 3D Poisson equation.

### Statistical Modelling of Protein Folding

**Julie Krainau, Humboldt University Berlin**

**Adviser: Sandipan Mohanty, Jan H. Meinke**

The open source protein folding and aggregation software ProFASi implements a physics based approach for studying protein folding and thermodynamics using Monte Carlo simu- lations. In this report, the main ideas of this approach are outlined, along with a qualitative presentation of various terms of ProFASi's force field, and a newly developed method for simulations with constraints. Simulation results for a simple helical peptide, a 73 residue 3-helix bundle protein and a 76 residue α/β protein will be presented and compared.

### Domain Distribution for parallel Modeling of Root Water Uptake

**Martin Licht, Bonn University**

**Adviser: Natalie Schröder, Bernd Körfgen**

Towards a parallel simulation of water transport in coupled soil-root systems, we ana- lyze several strategies for soil domain distributions aligned to root geometries. Our results tentatively point out potential for a well-scaling simulation, when combined with adaptive mesh refinement and multithreaded root simulation. We recognize technical and conceptual questions that might emerge along this direction. For our investigations we enhanced the MPI-program parSWMS by a basic root model.

### Analysis Tools for the Results of Scalasca

**Markus Mayr, Vienna University of Technology**

**Adviser: Brian Wylie, Bernd Mohr**

Scalasca is a tool set for performance analysis of parallel applications. To detect errors in Scalasca, the software is tested as is customary. One of the main steps of the testing procedure is to try to find errors in Scalasca's analysis reports. These errors can be of two different kinds. First, the output can be ill-formed and second, the measurement or analysis data can be wrong. Both kinds of errors can be detected automatically. We provide tools that analyze Scalasca's output and report errors. This report serves as an overview for this set of testing tools and the library these tools are built upon.

### Modeling of doubly-connected fields of CPV/T solar collectors

**Yosef Meller, Tel Aviv University**

**Adviser: Bernhard Steffen**

### Towards Optimized Parallel Tempering Monte Carlo

**Marco Müller, Leipzig University**

**Adviser: Thomas Neuhaus, Michael Bachmann (IFF)**

Parallel tempering Monte Carlo methods are an important tool for numerical studies of models with large complexity, since interesting questions, like the origin of phase transitions and structure formation, can only be tackled by means of statistical analysis. This work introduces the method and discusses ways of improving performance when using many-core architectures.

### Scaling of Linear Algebra Library Routines on the IBM BlueGene/P System JUGENE

**Elin Solberg, University of Gothenburg**

**Adviser: Inge Gutheil**

Three different solvers of the dense real symmetric eigenproblem are available in the parallel linear algebra library ScaLAPACK - a fourth one, building on the algorithm MR3 , is planned to be included in a future, not yet announced, version of the library. This report presents the results of benchmarking the three library solvers as well as a still experimental version of the MR3 solver. The benchmarking was performed on the IBM BlueGene/P system JUGENE, using up to 8192 cores to solve problems with a maximum matrix size of 122880 × 122880. Two main cases were investigated: (1) eigenvalues randomly spread in a given interval and (2) massively clustered eigenvalues. In the first case the scalability and accuracy of the new MR3 solver did not quite answer expectations, whereas in the second case, the new solver proved to be a promising alternative to the existing routines.

## Group Photo

Persons on the photo, left to right, front to back:

*1. row:* Paul Gibbon, Elin Solberg, Julie Krainau, Yosef Meller, Marco Müller, Alina Istrate

*2. row:* Oliver Bücker, Konstantin Boyanov, Valentian Banciu, Natalie Schröder, Ulrich Kemloh, Robert Speck, Mathias Winkel

*3. row:* Axel Hübl, Bernhard Steffen, Andreas Breslau, Martin Licht

*4. row:* Markus Mayr, Brian Wylie, Thomas Neuhaus, Sandipan Mohanty, Timo Hülsmann