IAS Seminar "Data Science Scalable management and analysis of massive data"

24 May 2017 10:30
24 May 2017 11:30
PGI-Lecture Hall, building 4.8, 2nd floor, room 365
Ira Assent, Assoc. Prof. , Head of Data-Intensive Systems Research Group, Department of Computer Science, Aarhus University
The term Data Science has been coined to describe methods and techniques for handling and analyzing massive data volumes. I will present some of our recent contributions to data science: (i) for efficient processing of complex queries and (ii) for scalable data mining algorithms.
As an example of query processing, we consider the skyline operator for multicriteria decision making, which identifies data records that are optimal with respect to any preference function. By their nature, skyline queries are computationally costly, especially for large and high dimensional data. Our approach exploits the high parallelism in standard end-user graphics cards to obtain orders of magnitude runtime improvements. Another example taken from data mining, studies density-based clustering, an automatic method for succinctly identifying predominant patterns in the data, even in the presence of noise. We design an algorithmic solution that provably obtains the same clustering accuracy, but at substantially reduced runtimes. I will discuss applications of our data science research in industry and other fields of science in past and ongoing projects, and suggest promising research directions.
