Scientific Big Data Analytics

Speaker: Morris Riedel (JSC)
Date: Friday, 4 December 2015, 08:30-10:00
Session: Big Data @ I/O II
Talk type: Short talk (15min)

Abstract: The goal of this talk is to inform participants about the concept idea of scientific big data analytics driven by HPC. Two concrete and widely used data analytics techniques that are suitable to analyse ‘big data’ for scientific and engineering applications will be introduced. From the broad class of available clustering methods we focus on the density-based spatial clustering of applications with noise (DBSCAN) algorithm that also enables the identification of outliers or interesting anomalies. A parallel and scalable DBSCAN implementation, based on MPI/OpenMP and the hierarchical data format (HDF), will be discussed in the context of interesting scientific datasets. As one of the best out-of-the-box methods for classification the support vector machine (SVM) algorithm including kernel methods will be a focus. A parallel and scalable SVM implementation, based on MPI, will be described in detail by using a couple of challenging scientific datasets and smart feature extraction methods.

Last Modified: 18.11.2022