Banner Banner

Federated Data Preparation, Learning, and Debugging in Apache SystemDS

Sebastian Baunsgaard
Matthias Boehm
Kevin Innerebner
Mito Kehayov
Florian Lackner
Olga Ovcharenko
Arnab Phani
Tobias Rieger
David Weissteiner
Sebastian Benjamin Wrede

October 17, 2022

Federated learning allows training machine learning (ML) models without central consolidation of the raw data. Variants of such federated learning systems enable privacy-preserving ML, and address data ownership and/or sharing constraints. However, existing work mostly adopt data-parallel parameter-server architectures for mini-batch training, require manual construction of federated runtime plans, and largely ignore the broad variety of data preparation, ML algorithms, and model debugging. Over the last years, we extended Apache SystemDS by an additional federated runtime backend for federated linear-algebra programs, federated parameter servers, and federated data preparation. In this paper, we share the system-level compiler and runtime integration, new features such as multi-tenant federated learning, selected federated primitives, multi-key homomorphic encryption, and our monitoring infrastructure. Our demonstrator showcases how composite ML pipelines can be compiled into federated runtime plans with low overhead.