Computational Metagenomics
Thanks to modern sequencing technologies, huge amounts of DNA data can now be generated in a single experiment, making it possible to reconstruct genes and even entire genomes from environmental samples.
High-throughput sequencing in metagenomics
High-throughput sequencing plays a central role in metagenomics as it enables the analysis of microorganisms in a community without the need to isolate or culture them. In the past, the focus was on analyzing marker genes (e.g. 16S rDNA) to determine the composition of microbiomes and to observe changes over time or at different locations.
Challenges in genome reconstruction
New sequencing technologies make it possible to capture the functional potential of a microbial community. However, this requires a large amount of data and extensive computing power to assemble longer genome segments (contigs) from billions of smaller pieces of sequence. The complete reconstruction of the genomes of all members of a community down to species or subspecies level remains difficult. Often only fragments (contigs) can be reconstructed whose exact assignment to an organism remains unclear. Especially in the case of very similar genomes, the current computer-aided methods for grouping (binning) fail. Despite these hurdles, metagenome studies are successful in identifying important genes and genomes.
Integration of omics technologies
By combining omics technologies, a deeper understanding of microbial communities can be gained. Metatranscriptome studies provide insights into the gene expression patterns of microbes. However, improved computational methods are still needed to compare the transcription rates of multiple organisms, as current algorithms are limited to single genomes.
Cloud-based analysis and web integration
Our research group develops cloud-based analysis workflows based on modular, containerized applications. The results are integrated into web-based systems to facilitate analysis for our biological project partners. NoSQL databases are used to efficiently store large amounts of data, while the Spark framework supports comparative analysis of thousands of published metagenome datasets. Fast access is provided via HTML5 web applications, as used in search engines.
New approaches to metagenome assembly
Despite the progress in assembling complex metagenomes, such as the bovine pancreas, there is still a need for specialized assemblers for metagenome data. Existing short-read assemblers have been developed for isolated genomes, whereas metagenome datasets pose new assembly challenges. Our group is therefore working on new tools and strategies to improve metagenome assembly.