Textbook: Materials Data Science

A book cover with a gradient background in shades of blue and orange, featuring abstract geometric patterns. (Mistral: Pixtral Large 2411, 2026-03-04)

Materials Data Science is a comprehensive textbook designed to equip students—particularly in materials science and engineering—with a rigorous, implementation-oriented understanding of modern data-driven methods. The book uniquely emphasizes building algorithms “from scratch” using Python and NumPy, enabling readers to engage directly with the mathematical and computational foundations rather than relying on high-level libraries.

The first part establishes the statistical groundwork required for data-driven analysis. Core concepts such as

  • random variables,
  • probability distributions,
  • Bayesian inference,
  • correlation analysis,
  • sampling strategies,
  • and exploratory data analysis

are introduced and consistently contextualized within materials science applications. This section functions both as an accessible entry point for students and a structured refresher for more advanced readers.

The second part develops the principles of statistical machine learning. It systematically introduces supervised learning methods for regression and classification, including both standard approaches and more advanced techniques such as kernel methods and support vector machines. Unsupervised learning is treated with a focus on dimensionality reduction (e.g., principal component analysis, t-SNE, UMAP) and clustering. The section also addresses essential practical components, including feature engineering, model validation, and performance assessment.

The final part focuses on neural networks and deep learning, with the explicit goal of demystifying their internal mechanics. Beginning with simple fully connected architectures, the text incrementally builds toward more complex models, including generative adversarial networks (GANs). By implementing all models from first principles, readers gain a detailed, mechanistic understanding that supports both critical evaluation and independent experimentation.

Overall, the textbook combines theoretical rigor with hands-on implementation, making it particularly suitable for students aiming to develop a deep, operational understanding of data science methodologies in a materials science context.

The textbook is available at Springer Link and retail.

Last Modified: 24.03.2026