Federated Data Preparation, Learning, and Debugging in Apache SystemDS

Sebastian Baunsgaard

Matthias Boehm

Kevin Innerebner

Mito Kehayov

Florian Lackner

Olga Ovcharenko

Arnab Phani

Tobias Rieger

David Weissteiner

Sebastian Benjamin Wrede

October 17, 2022

Federated learning allows training machine learning (ML) models without central consolidation of the raw data. Variants of such federated learning systems enable privacy-preserving ML, and address data ownership and/or sharing constraints. However, existing work mostly adopt data-parallel parameter-server architectures for mini-batch training, require manual construction of federated runtime plans, and largely ignore the broad variety of data preparation, ML algorithms, and model debugging. Over the last years, we extended Apache SystemDS by an additional federated runtime backend for federated linear-algebra programs, federated parameter servers, and federated data preparation. In this paper, we share the system-level compiler and runtime integration, new features such as multi-tenant federated learning, selected federated primitives, multi-key homomorphic encryption, and our monitoring infrastructure. Our demonstrator showcases how composite ML pipelines can be compiled into federated runtime plans with low overhead.

https://dl.acm.org/doi/10.1145/3511808.3557162

BIFOLD AUTHORS

Prof. Dr. Matthias Böhm