Services
de.NBI Cloud
As a federally organized, academic cloud, the de.NBI Cloud offers free computing and storage resources for researchers in the life sciences. The resources are provided jointly by 8 different locations. The powerful hardware and the provision of special resources, such as GPUs, enable researchers to process and analyze data of almost any quantity and complexity. A central cloud portal makes it easy to request and manage cloud resources. Access to the platform and cloud resources is controlled via a central authentication and authorization infrastructure (LifeScience Login).
SimpleVM
This platform, which emerged from a de.NBI cloud project type, makes cloud computing accessible to everyone. Virtual machines and scalable clusters can be started and managed with just a few clicks, regardless of the user's cloud experience. Popular tools such as RStudio® and other research environments can be used in the cloud and operated via the web browser. In addition, packages can be easily installed via Anaconda®. Workshop mode makes it easy to run courses and training sessions in the cloud, as virtual machines can be pre-configured as required and made available to participants.
Tools
Metagenomic Toolkit
The metagenomic analysis of complex ecosystems with thousands of datasets, such as those available in the NCBI Sequence Read Archive, requires significant computational resources to perform the analysis in an acceptable time. In addition, efficient use of the underlying infrastructure is essential. Each analysis must be fully reproducible, and the workflow must be publicly accessible so that the logic behind the calculated results is comprehensible.
In addition, the Metagenomics Toolkit includes an assembly step optimized for machine learning that adapts the peak RAM required by a Metagenomics assembler to the actual requirements, minimizing the dependency on dedicated hardware with high memory. The Metagenomics Toolkit can be run on individual workstations, but also offers various optimizations for efficient cloud-based execution in clusters.
EMGB
The Exploratory MetaGenome Browser (EMGB) is a web-based platform for interactive visualization and analysis of metagenome datasets processed with the Metagenomics Toolkit. It enables real-time searches in large datasets containing millions of genes and annotations. Key features include an interactive taxonomic tree, Gene Ontology (GO) and KEGG metabolic maps that allow users to explore genes, contigs, MAGs, metabolic pathways and biological processes.
The platform supports multiple filtering options and enables comparison of datasets to assess the metabolic potential of microbial communities. An integrated Contig Viewer provides detailed insights into genetic context and regulatory patterns. Using Blastp/Blastx, researchers can also search for external nucleotide and protein sequences in all datasets. In addition, the platform includes the "Insights" module, which facilitates the identification of MAGs with key enzymes for anaerobic fermentation and thus supports the reconstruction of microbial functions in fermentation processes. EMGB was developed with HTML5 and AngularJS and supports both desktop and mobile devices.
Reflexiv
Reflexiv is an open-source, parallel de novo genome assembler that can be scaled in a computer cluster or in the cloud. It solves the problem of high memory consumption in de novo genome assembly by utilizing distributed computing resources. It also improves runtime performance through a parallel assembly algorithm. It is based on the Apache Spark platform, uses Spark RDD (resilient distributed dataset) to distribute large amounts of k-mers across the cluster, and assembles the genome in a recursive manner. At the algorithm level, we have introduced a k-mer reduction strategy that removes large amounts of redundant k-mers from the sequencing data, avoiding the repeated assembly of different k-mer lengths and further improving runtime performance. The result is that Reflexiv can process terabytes of metagenomics sequencing data on an ordinary in-house cluster with less than 200 gigabytes of memory consumption.