Performance Evaluation of the LBM Solver Musubi on Various HPC Architectures

Jiaxing Qi, Universität Siegen

This report presents the performance of the Lattice Boltzmann implementation Musubi to solve incompressible Navier-Stokes equations. Its high performance and scalability is shown on various HPC architectures. The Lattice Boltzmann discretization offers a simple numerical scheme with explicit time integration on Cartesian meshes. It makes use of a stencil with direct neighbors and is well suited for parallel execution on large scale systems. Musubi is maintained within the APES simulation framework that makes use of a distributed octree mesh representation and includes a mesh generation and a post-processing tool to enable fully parallel simulation workflows. An unstructured representation of the mesh is used, so only fluid elements are stored and computed, but an indirection in the access is necessary. This is especially important for complex geometries, where the fluid elements fill only a fraction of the overall volume. Musubi uses double buffering for the state representation to eliminate spatial data dependencies. The efficiency of this approach is evaluated on three HPC systems with different micro-architectures: the Blue Gene/Q (IBM PowerPC) system JUQUEEN at the FZJ in Jülich, the x86 (Intel Haswell) system SuperMUC at the LRZ in Munich, and the NEC SX-ACE system Kabuki at HLRS in Stuttgart. The node-wise performance as well as the scaling behaviour is presented and compared on these systems. We look into hybrid parallelism with OpenMP and MPI as well as into the vectorization.

Talk as pdf file

Last Modified: 23.11.2022