Title: Flexible and Scalable Reinforcement Learning Systems
Abstract: Machine learning (ML) systems translate data into value for decision making. Recent breakthroughs in large ML models (e.g., GPT 4, Llama 3, Gemini) and the remarkable outcomes of reinforcement learning (eg., AlphaFold, FunSearch, AlphaGeometry) have shown that scalable and flexible ML training/inference on the industrial scale (e.g., tens of thousands of GPUs/accelerators) is critical to obtain state-of-the-art performance. This talk aims to answer the question “how to co-design multiple layers of the software/system stack to improve the flexibility, scalability and the performance of ML computation”. It addresses the challenges to design and build efficient ML systems that integrate the scalable ML layer, the distributed data/state management layer, and the compilation-based optimization layer.
Bio: Bo Zhao is a tenure-track assistant professor in the Department of Computer Science at Aalto University, leading the Aalto Data-Intensive System group. He is also affiliated with the Finnish Center for Artificial Intelligence. Bo’s research focuses on efficient machine learning systems at the intersection of scalable reinforcement learning systems and distributed data management systems, as well as compilation-based optimization techniques. He has published in top venues in the field (e.g., SOSP, USENIX ATC, SIGMOD, VLDB) and served in programme committees (e.g., EuroSys, SIGMOD, VLDB, ICDE) and reviewers of journals (e.g., JMLR, TPDS).