Banner Banner

DAPHNE Runtime: Harnessing Parallelism for Integrated Data Analysis Pipelines

Aristotelis Vontzalidis
Stratos Psomadakis
Constantinos Bitsakos
Mark Dokter
Kevin Innerebner
Patrick Damme
Matthias Boehm
Florina Ciorba
Ahmed Eleliemy
Vasileios Karakostas
Aleˇs Zamuda
Dimitrios Tsoumakos1

March 22, 2024

Integrated data analysis pipelines combine rigorous data management and processing, high-performance computing and machine learning tasks. While these systems and operations share many compilation and runtime techniques, data analysts and scientists are currently dealing with multiple systems for each stage of their pipeline. DAPHNE is an open and extensible system infrastructure for such pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware accelerators and computational storage. In this demonstration, we focus on the DAPHNE runtime that provides the implementation of kernels for local, distributed and accelerator-enhanced operations, vectorized execution, integration with existing frameworks and libraries for productivity and interoperability, as well as efficient I/O and communication primitives.