![Banner](https://bifold.berlin/typo3conf/ext/psn_site_template_bifold/Resources/Public/images/innerpage-banner.webp)
![Banner](https://bifold.berlin/typo3conf/ext/psn_site_template_bifold/Resources/Public/images/innerpage-banner-overlay.png)
Big Data Engineering
Lead
Prof. Dr. Matthias Böhm
Technische Universität Berlin
Ernst-Reuter-Platz 7,
10587
Berlin
Data Science Abstractions and Systems, Performance-Accuracy Tradeoffs in Data Science, Data Cleaning Pipelines and Optimization
The mission of the Big Data Engineering group, led by Prof. Dr. Matthias Böhm, is to simplify data science by providing high-level, data-science-centric abstractions and building systems and tools to execute these tasks in an efficient and scalable manner. The general research interests include the exploration of performance-accuracy tradeoffs, tooling (script generators, label generation, advisors, etc.), seamless data augmentation, cleaning, feature engineering, model debugging and deployment, cost-effective cloud deployments, advanced optimization techniques, adaptive data storage and indexing, and the exploitation of modern hardware.
Current research focuses on:
• Data Cleaning Pipelines: Automatic enumeration of data cleaning pipelines for target ML application, hyperparameter optimization of cleaning primitives.
• Model Debugging: Finding the top-k data slices where a trained model underperforms, linear-algebra-based enumeration and pruning algorithms.
• Fine-grained Lineage Tracing and Reuse: Fine-grained, multi-level lineage tracing for versioning and reuse, lineage deduplication, full and partial reuse of intermediates.
• Federated Linear Algebra and Parameter Servers: ML model training on federated raw data without central data consolidation, plan generation under awareness of privacy constraints, federated linear algebra programs and parameter servers.
• Workload-aware Data Reorganization: Compression under awareness of data and workload (linear algebra program) characteristics, asynchronous data reorganization in standing executors (e.g., at standing federated workers).
• Code Generation for Heterogeneous HW: Extended operator fusion and code generation for GPUs and heterogeneous devices, including sparsity exploitation across operations.
POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance
Cross-Language Differential Testing of JSON Parsers
PLUTUS: Understanding Data Distribution Tailoring for Machine Learning
![News](https://bifold.berlin/fileadmin/user_upload/SIGMOD_Awards_NewsTeaser.png)
BIFOLD Researchers receive three SIGMOD Awards
Each year SIGMOD conference awards are bestowed on researchers who have especially contributed to the field of data management. In 2024 BIFOLD researchers were honored to receive three awards.
![News](https://bifold.berlin/fileadmin/user_upload/2024_06_09_Sigmod_Teaser.png)
BIFOLD at the 2024 ACM SIGMOD/PODS Conference
BIFOLD researchers presented four research papers, two demos, one workshop paper and were of a panel at the 2024 ACM SIGMOD/ PODS Conference in Santiago, Chile.
![News](https://bifold.berlin/fileadmin/user_upload/2024_04_23_Polar_EmotionalWaitingSadComputer.png)
“POLAR” lowers the adoption barrier for adaptive query processing in database systems
A preprint by BIFOLD researchers titled "POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance" is set to be presented at the VLDB conference in 2024. The database engineering paper introduces a technique for reordering joins that is adaptive, with a focus on non-invasive integration and low overhead.
![](https://bifold.berlin/fileadmin/user_upload/People/MatthiasBoehm.jpg)
![Sebastian Baunsgaard Bifold researcher](https://bifold.berlin/fileadmin/user_upload/People/SebastianBaunsgaard-Low.jpg)
![Dr. Patrick Damme Researcher BIFOLD](https://bifold.berlin/fileadmin/user_upload/People/Damme_Patrick.jpg)
![](https://bifold.berlin/fileadmin/user_upload/People/HashmiSarah.png)
![](https://bifold.berlin/fileadmin/_processed_/9/0/csm_David_Justen_2_d7bf84bac3.jpg)
![Arnab Phani BIFOLD researcher](https://bifold.berlin/fileadmin/_processed_/d/c/csm_ArnabPhani_copyright_8b0e8bb4c7.jpg)