Search

link to homepage

Institute for Advanced Simulation (IAS)

Navigation and service


Data Management

Data Life Cycle Management

Data life cycle management is comprised of strategies, methods, applications, tools, and services required for all data processing steps. It supports data production and transferring, filtering and analysis up to long-term preservation, curation and publishing. Research and development in data life cycle management are based on the activities of LSDMA's Data Service Integration Team (DSIT) and data management activities for HPC systems.

Services and Tools

We develop services and tools for data organization, access, curation, preservation, repositories, and archives. The UNICORE middleware supports two major data management features: the ‘UNICORE file transfer protocol’ (UFTP) allows for transferring data fast over parallel streams through firewall, and a metadata management interface, allowing users to create, update, delete and search metadata. Further services and tools will be implemented in cooperation with national and international projects and initiatives such as EUDAT (European Data Infrastructure), RDA (Research Data Alliance), CLARIN (Common Language Resources and Technology Infrastructure), and DARIAH (Digital Research Infrastructure for the Arts and Humanities). Community specific services developed by Data Life Cycle Labs expand the generic services and tools portfolio for data life cycle management.

Data Access and Organization

The participation in national and international projects offers a great opportunity to gather and exchange experiences on data life cycle management. The first step when dealing with data management usually is the provision of seamless access to federated storage resources. We are directly involved in the implementation of the EUDAT B2Safe service and the DARIAH Storage Federation. Metadata functionalities is important to eInfrastructures, for example, EUDAT provides these by the B2Share and B2Find services while CLARIN offers Workspaces. All services will offer standardized interfaces, e.g. CDMI, which is currently implemented in EUDAT for the B2Safe service.

Curation and Preservation Support

With the increasing proliferation of the data infrastructures higher sophistication is required. The requirement to store structured data is met by providing 'Database as a Service' in DARIAH. Transforming data stored as data objects in the storage federation into new representations such as graphs is another requirement which presumes that transformations can be conducted on computing resources close to the data. This can be achieved with paradigms such as Map Reduce.
New knowledge is frequently derived from the analysis of older data using new methods or research questions. Original data is a valuable good that cannot even be re-produced in many cases. Additionally, only the existence and correct reference of original data makes scientific results traceable. Hence, the data centres must provide long-term preservation of data and data management policies are developed to achieve this.

Applications

Data Life Cycle Labs (DLCLs) aim at the optimization and implementation of the scientific data life cycle for large-scale scientific data through joint research & development activities of data management and analysis experts (a.k.a. data scientists) and domain scientists.

The focus of DLCL Neuroscience is on data management issues related to the creation of the ultrahigh-resolution Human Brain Model. This involves a continuous analysis of data life cycles which are typically associated with the data acquisition of brain slices, data transfer between cooperating partners, long term storage of large binary image data, federated and secure access to stored data, and the post-processing and analysis of image data. The DLCL will contribute to the development of tools for an advanced metadata management that will support the organisation of raw and processed data and allows intelligent queries for scientific purposes. Customized web-based user interfaces should enable scientists to efficiently access data from storage devices. Furthermore, the DLCL will analyse and deploy services for high-performance data transfers as well as authentication and authorization management taking-up services and results from the subtopic data life cycle management. The long-term objective is to offer a complete spectrum of necessary services for providing a virtual long-term database of the human brain atlas. The work of the DLCL Neuroscience will be performed in close collaboration with the SimLab Neuroscience and the associated partner programme “Decoding the Human Brain”.


Servicemeu

Homepage