AlphaNumerics Zero
In the project AlphaNumerics Zero, we work on Reinforcement Learning methods that accelerate the convergence of a numerics method.
Recent successes in Reinforcement Learning, such as AlphaGoZero reaching superhuman Go skills [1] or the OpenAI Five playing Dota 2 on world class level [2] have been very inspiring. Many researchers try to transfer the success to other fields hoping that computers can act as agents with skills superior to humans. For AlphaNumerics Zero this idea is applied to numerical linear algebra.
Many numerical algorithms can be optimized by picking the right “magic” parameters. This can dramatically accelerate the convergence and hence reduce the computational effort. In the project AlphaNumerics Zero (αN0), Reinforcement Learning (RL) is used so that for a specified simulation problem, the computer learns to determine the “optimal” numerical solution method and its parameters by itself. The project focuses on iterative time-stepping schemes, i.e. specially spectral deferred correction (SDC) methods which are especially suited for supercomputers. This class of methods serves as a prototype for many different areas: stationary iterative solvers, preconditioning, parallel multigrid techniques, time integrators, and resilient numerical methods. Progress made here can be converted into progress in these other fields, with a very broad impact.
We support this project with RL expertise and technical support. In intense and fruitful discussions, we discuss options how to formulate the problem as an ML task and discuss directions like reward shaping and how to discretize the action space. We provided a first implementation of the framework in JAX and the PPG [3] algorithm, and support the project in exploiting the possibilities of a fully differentiable formulation of the problem.
- Silver, David, et al. "Mastering the game of go without human knowledge." nature 550.7676 (2017): 354-359.
- Berner, Christopher, et al. "Dota 2 with large scale deep reinforcement learning." arXiv preprint arXiv:1912.06680 (2019).
- Cobbe, Karl, et al. "Phasic Policy Gradient." arXiv preprint arXiv:2009.04416 (2020).