JULAIN Talk by Thijs Vogels
Communication-efficient distributed learning and PowerSGD
Machine Learning & Optimization Laboratory, École polytechnique fédérale de Lausanne (EPFL)
- When: 9 June 2022, 4pm
- Where: INM Seminar room building 15.9U, room 4001b
Invitation and moderation: Hanno Scharr, IAS-8
In data-parallel optimization of machine learning models, workers collaborate to speed up the training. By averaging their model updates with each other, the updates become more informative, resulting in faster convergence. For today’s deep learning models, model updates can be gigabytes large, and averaging them between all workers can be a bottleneck in the scalability of distributed learning. In this talk, we explore two approaches to alleviating communication bottlenecks: lossy communication compression and sparse (decentralized) communication. We focus on the PowerSGD communication compression algorithm which approximates gradient updates as low-rank matrices. PowerSGD can yield communication savings of > 100x and was used successfully to speed up the training of OpenAI’s DALL-E, RoBERTa, and Meta’s XLM-R.
Thijs is a PhD student at EPFL’s Machine Learning & Optimization Laboratory under Martin Jaggi. He works on developing and understanding practical optimization algorithms for large-scale distributed training of deep learning models.