Data Science and Bioinformatics for Mass Spectrometry of Small Molecules

Thanks to advances in mass spectrometry, researchers can now analyze complex mixtures of small molecules with unprecedented resolution, enabling the discovery of metabolites, lipids, and other biomolecules in biological samples.

High-throughput mass spectrometry in small molecule analysis

High-throughput mass spectrometry (MS) is a cornerstone of modern metabolomics and small molecule research, allowing the comprehensive profiling of chemical compounds in biological systems without prior knowledge of their identities. Traditional approaches focused on targeted analysis of known metabolites, but recent technological advancements enable untargeted discovery of thousands of molecular features. This shift has transformed the study of small molecules in fields ranging from drug discovery to environmental toxicology and a plethora of other applications. Non-the-less, even targeted analysis still has its place for the reliable quantification of molecule concentrations, e.g. in clinical chemistry.

Challenges in small molecule identification


Despite the power of high-throughput MS, identifying and annotating small molecules remains a major challenge. The complexity of spectral data, overlapping peaks, and the vast chemical diversity of metabolites and lipids make it difficult to assign accurate molecular formulas or structures. Additionally, the lack of comprehensive spectral libraries or other identification methods and the need for precise alignment of chromatographic and mass spectral data further complicate analysis. Current computational tools often struggle to distinguish between structurally similar compounds or to resolve isobaric interferences. Last but not least, a lack of adoption of standards for reporting of identified molecules that reflects their level of structural resolution using mass spectrometry is hampering large scale data integration.

Integration with omics technologies


Combining mass spectrometry data with other omics layers—such as genomics, transcriptomics, and proteomics—provides a more holistic view of biological systems. For example, linking metabolite profiles to gene expression or enzyme activity can reveal functional pathways and regulatory networks. However, integrating these heterogeneous datasets requires advanced algorithms to correlate molecular features with biological contexts, a task that remains computationally intensive and technically demanding.

Cloud-based analysis and web integration

Our research group develops cloud-native workflows for processing and analyzing large-scale mass spectrometry datasets. By leveraging modular, containerized applications, we enable scalable and reproducible analysis across diverse platforms. The results are embedded in web-based systems and individual stand-alone tools to empower researchers with intuitive tools for data exploration and interpretation. We utilize NoSQL databases to manage the high volume and variability of spectral data, while distributed computing frameworks and workflow systems accelerate comparative analyses across thousands of datasets. Interactive dashboards, built with modern semantic web technologies, provide fast, browser-based access to complex results and cross-domain integration.

New approaches to data analysis


To address the limitations of existing methods, our group is pioneering novel computational strategies for small molecule analysis. This includes the development of innovative algorithms and machine learning models for spectral annotation, to predict fragmentation patterns, and graph-based algorithms to group related compounds. We are also developing specialized tools for handling high-resolution MS data, such as improved peak detection, alignment, and feature extraction and quality control. These innovations aim to enhance the accuracy, speed, and interpretability of mass spectrometry-based studies, ultimately advancing applications in biology, precision medicine, and other application areas.

Last Modified: 16.01.2026