DataLad: Research Software for Research Data Management

Reproducible, reliable, transparent, and FAIR science – ultimately, this is what research software engineering (RSE) strives to contribute to. But it is not alone in its quest: Its companion, research data management (RDM) is right by its side. And, lets face it, whichever your field of study is, it is likely that your projects benefit from both.
The landscape of research data management tools is vast, and among the many open source software out there, one is developed right next door: The DataLad ecosystem (datalad.org). Albeit an international open source project, many of its developers and contributors work here at FZJ.
The intersection of RDM and RSE is a fun field to work in – not only because RDM and RSE are “a match made in heaven” [1], but also because RDM can learn from RSE principles [2]. DataLad was conceived to solve common data management challenges, coming from the perspective of software development. At its core, building upon Git and git-annex, it is a version control system for data, providing transport logistics and tooling for local or distributed, solitary or collaborative data management.

On top of that, it adds features for reproducible science, big [3] data, an extension mechanism that lets one customize the tool for their needs, and more. If you’re eager to learn all about what DataLad can do, it has a handbook [4]. And a YouTube channel [5]. And a theme park [6].
But what sparks joy for us developers in particular is solving real-life RDM challenges. In a weekly virtual office hour and a Matrix chatroom, the DataLad team is available to anyone interested to chat about the tool, its use cases, concrete applications, or feature requests. We’re developers, system administrators, data stewards, scientists and DataLad users, and we’re looking forward to meet you and your use case online.
Written by Adina Wagner.
[1] Check out Oliver Bertuchs talk from the HiRSE seminar on this: https://doi.org/10.5281/zenodo.6275987
[2] Still not tired of checking out talks? Have another one: https://doi.org/10.5281/zenodo.15845762
[3] Not your regular big, but big. Seriously big. You can listen to Julia Thoenissens talk for an example: https://www.distribits.live/talks/2024/thoenissen-microscopy/
[4] https://handbook.datalad.org
[5] https://youtube.com/datalad
[6] I’m kidding. But there is cool stuff being built with it, also at FZJ. For example a data portal with climate data (atris.fz-juelich.de), a meta data management system (dfg-cp-survey.trr379.de/ui), or a framework for reproducible processing (doi.org/10.1038/s41597-022-01163-2). And, hey, there is a free conference (distribits.live), too!