Tutorial at the CNS Meeting in Paris, France
Managing complex workflows in neural simulation and data analysis
- 13.Jul.2013 09:00
- 13.Jul.2013 16:50
Andrew P Davison, UNIC, CNRS, Gif sur Yvette
Sonja Grün, Research Center Jülich, Germany
Michael Denker, Research Center Jülich, Germany
In our attempts to uncover the mechanisms that govern brain processing on the level of interacting neurons, neuroscientists have taken on the challenge of tackling the sheer complexity exhibited by neuronal networks. Neuronal simulations are nowadays performed with a high degree of detail, covering large, heterogeneous networks. Experimentally, electrophysiologists can simultaneously record from hundreds of neurons in complicated behavioral paradigms. The data streams of simulation and experiment are thus highly complex; moreover, their analysis becomes most interesting when considering their intricate correlative structure. The increases in data volume, parameter complexity, and analysis difficulty represent a large burden for researchers in several respects. Experimenters, who traditionally need to cope with various sources of variability, require efficient ways to record the wealth of details of their experiment (“meta data”) in a concise and machine-readable way. Moreover, to facilitate collaborations between simulation, experiment and analysis there is a need for common interfaces for data and software tool chains, and clearly defined terminologies. Most importantly, however, neuroscientists have increasing difficulties in reliably repeating previous work, one of the cornerstones of the scientific method. At first sight this ought to be an easy task in simulation or data analysis, given that computers are deterministic and do not suffer from the problems of biological variability. In practice, however, the complexity of the subject matter and the long time scales of typical projects require a level of disciplined book-keeping and detailed organization that is difficult to keep up. The failure to routinely achieve replicability in computational neuroscience (probably in computational science in general, see ) has important implications for both the credibility of the field and for its rate of progress (since reuse of existing code is fundamental to good software engineering). For individual researchers, as the example of ModelDB has shown, sharing reliable code enhances reputation and leads to increased impact.
In this tutorial we will identify the reasons for the difficulties often encountered in organizing and handling data, sharing work in a collaboration, and performing manageable, reproducible yet complex computational experiments and data analyses. We will also discuss best practices for making our work more reliable and more easily reproducible by ourselves and others – without adding a huge burden to either our day-to-day research or the publication process. We will cover a number of tools that can facilitate a reproducible workflow and allow tracking the provenance of results from a published article back through intermediate analysis stages to the original data, models, and/or simulations. The tools that will be covered include Git , Mercurial , Sumatra , VisTrails , odML , Neo . Furthermore, we will highlight strategies to validate the correctness, reliability and limits of novel concepts and codes when designing computational analysis approaches (e.g., [8,9,10]).
 Donoho et al. (2009), 15 Years of Reproducible Research in Computational Harmonic Analysis,
Computing in Science and Engineering 11:8–18, doi:10.1109/MCSE.2009.15
 Pazienti & Grün (2006), Robustness of the significance of spike correlation with respect to
sorting errors, Journal of Computational Neuroscience 21:329–342
 Louis et al. (2010), Generation and selection of surrogate methods for correlation analysis,
In: Analysis of parallel spike trains, Grün & Rotter (eds.), Springer Series in Computational
 Louis et al. (2010) Surrogate spike train generation through