Navigation and service

I/O Performance for the NEST Neural Simulator

The NEST simulator for spiking neuronal networks is meant to be a scalable simulator: it is designed to function on laptops, clusters, current petascale supercomputers and beyond to emerging exascale architectures. At extreme scales, issues of memory utilization and network size have emerged as challenges to be solved [1]. Another challenge is efficiently writing and storing the results of simulations since traditional input-output (I/O) libraries have not been designed for the volume of data produced by the vast number of parallel threads in such simulations.

NEST divides the set of neurons in a simulation over all MPI ranks (each rank is associated with a shared memory node) and then subdivides each rank into OpenMP threads. Thus output must be coordinated between all these threads of execution. Traditional approaches have been for each rank to write out a single file that collects all ‘local’ neuron measures, or to ship all data to one rank that handles output to a single file. Writing to as many files as ranks leads to a metadata wall where simply opening the files creates a large number of filesystem lock collisions that can exponentially delay petascale simulations by the order of an hour; on the other hand, writing from a single process to a single file requires that all relevant measurements be available in that node’s shared memory, as well as creating load balancing issues both in the HPC equipment’s computational domain and over the distributed file system.

Our Contribution

Our approach has been to implement a SIONlib [2] backend (which scatters output over the distributed file system in use. Output goes into a small number of files, organized to avoid block level contention between different ranks [3, 4]. A proxy for benchmarking has been developed as well: the proxy is a ‘fake’ NEST simulator designed to test I/O without requiring a full simulation. The proxy uses performance data from real NEST simulations to tune the proxy parameters. Since the SIONlib/NEST output format is a binary format designed to optimize write performance, libraries are also being developed to translate the results into ‘write-once/read-many-times’ formats such as an HDF5 container like NIX or flat ASCII files.

NEST I/O PerformanceNEST I/O performance: Computational time on JUQUEEN for the ASCII backend and for two variations of the SIONlib backend, for both 50ms and 200ms simulation times [4]. Time shown is from the beginning to the end of one Simulate() call in NEST.


Our I/O extension to NEST allows us to extend the measurement of spike events with the measurement of any continuous neuron parameter (such as membrane potential) while reducing both the time spent in I/O and,, even more importantly, the variance in that time due to the system-wide distribution of I/O events. Particularly for beyond-petascale simulations, this may be crucial for developing the next generation of neural net simulations.


[1] S. Kunkel, M. Schmidt, J. Eppler, H. Plesser, G. Masumoto, J. Igarashi, S. Ishii, T.i Fukai, A. Morrison, M. Diesmann, and M. Helias. Spiking network simulation code for petascale computers. Frontiers in Neuroinformatics, 8(78), 2014.

[2] W. Frings, F. Wolf, and V. Petkov. Scalable massively parallel I/O to task-local files. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, pages 17:1–17:11, New York, NY, USA, 2009. ACM.

[3] 3. T. Schumann, W. Frings, A. Peyser, W. Schenck, K. Thust, and J. Eppler. Modeling the I/O behavior of the NEST simulator using a proxy. In S. Elgeti and J.-W. Simon, editors, Conference Proceedings of the YIC GACM 2015. Publication Server of RWTH Aachen University, 20–23 July 2015.

[4] S. Billaudelle. NEST I/O: Strategies for a peta-scale neural network simulator. In Press.


This work is supported by the Helmholtz Portfolio Theme "Supercomputing and Modeling for the Human Brain".