Navigation and service


Pretty Efficient Parallel Coulomb Solver

PEPC - The Pretty Efficient Parallel Coulomb Solver


The oct-tree method was originally introduced by Josh Barnes & Piet Hut in the mid 1980s to speed up astrophysical N-body simulations with long range interactions, see Nature 324, 446 (1986). Their idea was to use successively larger multipole-groupings of distant particles to reduce the computational effort in the force calculation from the usual O(N2) operations needed for brute-force summation to a more amenable O(N log N). Though mathematically less elegant than the Fast Multipole Method, the Barnes-Hut algorithm is well suited to dynamic, nonlinear problems and can be combined with multiple-timestep integrators.

The PEPC project (Pretty Efficient Parallel Coulomb Solver) is a public tree code that has been developed at Jülich Supercomputing Centre since the early 2000s. Our tree code is a non-recursive version of the Barnes-Hut algorithm, using a level-by-level approach to both tree construction and traversals. The parallel version is a hybrid MPI/PThreads implementation of the Warren-Salmon 'Hashed Oct-Tree' scheme, including several variations of the tree traversal routine - the most challenging component in terms of scalability.

PEPC ScreenshotSimulation of Interaction of Laser Radiation with neutral nanocluster with 3871 ions and electrons. Bottom left: Spacefilling Hilbert curve used in the code for domain decomposition; Bottom right: Oct-tree domain decomposition represented by the so-called branch nodes.

The code is structurally divided into three parts:

  1. kernel routines that handle all tree code specific data structures and communication as well as the actual tree traversal.
  2. interaction-specific modules, i.e. routines that apply to specific interaction kernels and multipole expansions. Currently, the following interaction kernels are available:

    • Coulomb-interaction/gravitation,
    • algebraic kernels for vortex methods,
    • nearest-neighbour interactions for smooth particle hydrodynamics (SPH).
  3. 'front-end' applications. For example

PEPC StructureStructure of the Treecode Framework. Currently, PEPC supports three interaction-specific modules and several respective frontends for different applications ranging from plasma physics through vortex-particle methods to smooth particle hydrodynamics. Due to well-defined interfaces, symbolized by the grey blocks, the implementation of further interaction models as well as additional frontends is straightforward.

Due to this structure, the adaption to further interaction kernels as well as additional applications and experimental variations to the tree code algorithm can be implemented conveniently.

Implementation details and scaling

PEPC itself is written in Fortran 2003 with some C wrappers for POSIX functions. In addition to the main PThreads-parallelised tree traversal, a version based on OpenMP is available and there has also been a version using SMPSs. Besides the hybrid parallelisation, a number of improvements to the original Warren-Sallmon ‘hashed oct-tree’ scheme have been included and allow for an excellent scaling of the code on different architectures with up to 2,048,000,000 particles across up to 294,912 processors (at then largest machine at JSC: JUGENE).

PEPC Scaling JUGENEStrong scaling of the tree traversal and force computation part of the algorithm for different total particle numbers N and compute cores C for the homogeneous (top) and inhomogeneous (bottom) particle distributions on the Blue Gene/P installation JuGene. The insets show the test particle setups used for the performance analysis.

In addition, it has been adapted to > 32 parallel threads per MPI rank to take full advantage of the capabilities of the BlueGene/Q installation JUQUEEN at JSC. As a result it shows great parallel scalability across the full machine.

PEPC Scaling JUQUEENWeak scaling and parallel efficiency of the parallel Barnes-Hut tree code PEPC across the full Blue Gene/Q installation JUQUEEN at JSC.

The code runs on IBM BlueGene/P (JUGENE) and /Q architectures (JUQUEEN), the Intel Cluster (JUROPA, JURECA, JUWELS), standard Linux clusters and workstations as well as OSX machines. In principle, it should be portable to any Unix-based parallel architecture. The README file provides an introduction to compiling and running the code.

A detailed description of the algorithm used by PEPC can be found in the technical report  PEPC: Pretty Efficient Parallel Coulomb-solver (PDF, 136 kB) ; variations of the parallel tree traversal routine in  NIC Series 33, 367 (2005) (PDF, 493 kB) . The physical model for the laser-plasma application is described in Phys. Plasmas 11, 2806 (2004). Latest improvements and scaling results are explained in Comp. Phys. Comm. 183, 880 (2012).

A tutorial introduction to tree codes can be found here.


If you would like to test the latest developments you can request svn access via