IAS Seminar "Large-Scale Deep Learning: Advances, Troubles, Perspectives"
Speaker: | Dr. Jenia Jitsev, INM-6 |
Abstract: | |
Date: | Thursday, 9 June 2016, 15:00 |
Venue: | Jülich Supercomputing Centre, Rotunda, building 16.4, room 301 |
Announcement as pdf file: | Large-Scale Deep Learning: Advances, Troubles, Perspectives |
Until quite recently, there was only little hope that the vast gap between biological and artificial systems in their ability to learn from complex real-world data like natural images or natural speech will become smaller anytime soon. The stronger was the impact triggered by the success of ensemble of methods now termed Deep Learning, rooted in both machine learning and computational neuroscience. These methods achieved dramatic advances in solving problems that resisted the hardest attempts of the machine learning community for years and thus lead to major transformation and sparkled new enthusiasm in the whole field. New is also the degree to which the researcher community openly and rapidly communicates about results and make their findings available in form of open source tools that can be used by everyone. So far Deep Learning methods are responsible for numerous breakthroughs in natural image recognition, natural speech processing and natural language understanding, surpassing the previous state-of-the-art approaches by margins that usually require decades of research and gaining rapid widespread in technological applications. During this progress, Deep Learning methods demonstrated performance comparable to human level not only in natural image recognition tasks, but also recently outperformed human world champion in the game GO, which experts in the community were assuming not to happen for the next decade before.
This success didn’t come all of the sudden out of nothing. Preceding the breakthroughs visible now, there was a long period of research in neurally inspired machine learning, starting in late 80s - early 90s, that led to deeper understanding of learning as of a problem of a probabilistic inference of latent hidden causes that may underlie observable data (often casted in framework of graphical models and energy-based models). This formulation also helped to devise fast approximative methods for probabilistic inference given very large amounts of data and gain insight into necessary regularization techniques that made learning robust against notorious overfitting. At least as crucial for the major advance was the role of very large scale training datasets that became available in this form only few years ago, and the rise of clusters and GPUs as hardware accelerators for massive parallel execution of very deep neural networks (currently up to 1000 convolutional layers).
In the talk, I will provide an overview over the major achievements in the field so far and the ingredients responsible for this development. We’ll focus on the ability of the Deep Learning networks to learn hierarchies of useful features automatically from the raw data (e.g. raw image pixels) by applying a set of generic layerwise reapplicable operations that work across very different data sets. This is in strong contrast to traditional approaches in machine learning that required carefully hand-crafted features, useful for specific data set only. We sketch some of the troubles that accompany the progress, specifically the still heavily supervised mode of training that requires very large amount of prelabeled data available, and also difficulties to gain insight into representations inside the deep multi-layer networks. Attempts to address these issues will be pointed out. A hint on resemblance of processing done by deep networks with information processing done in the brain is delivered by a series of studies that link deep network function to responses in primate visual cortex, I also provide perspectives on upcoming research directions, such as execution of deep architectures on specialized and neuromorphic hardware, employing reinforcement signals for learning, and, based on own work on unsupervised learning in hierarchical recurrent neural networks, offer an outlook towards novel architectures that support learning from vast amounts of data without any labels.
Anyone interested is cordially invited to participate in this seminar.
Contact: Dr. Sabine Höfler-Thierfeldt, JSC