# Introduction to Data Mining and Machine Learning

**Content**

Data mining and data analysis plays a growing role for the extraction of information both from simulations and experiments/microscopy. This requires knowledge of the algorithms and methods, the underlying statistical/probabilistic concepts as well as the ability to practically use such methods, e.g. through programming. The topic of this module consists of the following sections:

- basics of stochastics and statistics: events, probability, conditional probability, variance, mean, median, likelihood; Introduction to key concepts of probability theory: probability distributions, expectation values, central limit theorem;
- fundamentals of regression and classification;
- concepts of linear approaches, Bayesian methods, support vector machines, decision trees, neural networks ;
- training validation, testing, overfitting and corss-validation;
- selection of appropriate algorithms;
- implementation using python and open source libraries

**Objective**

Students will be exposed to fundamental knowledge in stochastics, statistics and combinatorics and will be able to apply this knowledge to test problems using the programming language Python. They will acquire an overview over data mining and machine learning approaches and the corresponding algorithms and will be able to choose the appropriate algorithm for a specific problem. Furthermore, they will be able to implement their own data analysis and machine learning algorithms using python and to independently deign solution approaches to solve problems of materials scientific relevance. Students will be able to analyze their results, judge their qualit and are able to validate them based on their domain knowledge.

**Recommended previous knowledge**

basic python programming knowledge; basic knowledge of concepts of statistical learning, e.g., regression, classifications

**Recommended reading**

- Phuong Vo. T. H, Martin Czygan, Getting Started with Python Data Analysis, 2015, Packt Publishing, Birminham, UK
- G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning, with applications in R. Springer, 2013

**Lecture and exercise dates** SS23

**Exercise:** Thursdays, 17:30 - 19:00 in GRS001, Schinkelstr. 2a**Lecture: **Fridays, 08:30 - 10:00 in GRS001, Schinkelstr. 2a

**Exam** SS23

Mode, date, time of the exam will be determined in the beginning of the semester.