Banner Banner

Big Data Engineering


Prof. Dr. Matthias Böhm


Technische Universität Berlin
Ernst-Reuter-Platz 7, 10587 Berlin

Data Science Abstractions and Systems, Performance-Accuracy Tradeoffs in Data Science, Data Cleaning Pipelines and Optimization


The mission of the Big Data Engineering group, led by Prof. Dr. Matthias Böhm, is to simplify data science by providing high-level, data-science-centric abstractions and building systems and tools to execute these tasks in an efficient and scalable manner. The general research interests include the exploration of performance-accuracy tradeoffs, tooling (script generators, label generation, advisors, etc.), seamless data augmentation, cleaning, feature engineering, model debugging and deployment, cost-effective cloud deployments, advanced optimization techniques, adaptive data storage and indexing, and the exploitation of modern hardware. 

Current research focuses on:
•    Data Cleaning Pipelines: Automatic enumeration of data cleaning pipelines for target ML application, hyper parameter optimization of cleaning primitives.
•    Model Debugging: Finding the top-k data slices where a trained model underperforms, linear-algebra-based enumeration and pruning algorithms.
•    Fine-grained Lineage Tracing and Reuse: Fine-grained, multi-level lineage tracing for versioning and reuse, lineage deduplication, full and partial reuse of intermediates.
•    Federated Linear Algebra and Parameter Servers: ML model training on federated raw data without central data consolidation, plan generation under awareness of privacy constraints, federated linear algebra programs and parameter servers.
•    Workload-aware Data Reorganization: Compression under awareness of data and workload (linear algebra program) characteristics, asynchronous data reorganization in standing executors (e.g., at standing federated workers).
•    Code Generation for Heterogeneous HW: Extended operator fusion and code generation for GPUs and heterogeneous devices, including sparsity exploitation across operations.

January 11, 2024

Photo recap: BIFOLD New Year's reception

At its New Year's reception BIFOLD welcomed a series of distinguished guests and friends from Berlin's AI community.

C: BIFOLD/Michael Setzpfandt
October 11, 2023

Photo recap: All Hands Meeting 2023

On October 9 and 10, 2023, BIFOLD welcomed the other Geman AI centers (ScaDS.AI Dresden/Leipzig, Lamarr Institute, Tübingen AI Center, MCML, and the DFKI) in Berlin. The annual meeting featured guests, partners, visitors, and researchers from all over Germany. 

C: BIFOLD/Michael Setzpfandt
October 10, 2023

AI centers are the foundation of the German AI ecosystem

On October 9th and 10th, 2023, the Berlin Institute for the Foundations of Learning and Data (BIFOLD) at TU Berlin invited scientists from the university AI competence centers (BIFOLD, ScaDS.AI Dresden/Leipzig, Lamarr Institute, Tübingen AI Center, and MCML) and the DFKI to Berlin to present and discuss the latest results of their research on the EUREF campus.

Prof. Dr. Matthias Böhm

Research Group Lead

Sebastian Baunsgaard Bifold researcher

Sebastian Baunsgaard

Doctoral Researcher

Patrick Damme Researcher BIFOLD

Patrick Damme

Doctoral Researcher

Sarah Hashmi

Secretary ML Sec & DAMS

David Justen

Doctoral researcher

Arnab Phani BIFOLD researcher

Arnab Phani

Doctoral Researcher

Kindly take note that only researchers who have received funding from BIFOLD have their individual profiles displayed on