Banner Banner

Prof. Dr. Matthias Böhm


Technische Universität Berlin
School IV EECS

Ernst-Reuter-Platz 7, 10587 Berlin

Prof. Dr. Matthias Böhm

Research Group Lead

Research Group Lead | BIFOLD

Full Professor and Chair: Big Data Engineering

Before joining BIFOLD Matthias Boehm was a BMK-endowed professor for data management at Graz University of Technology, Austria, and a research area manager for data management at the co-located Know-Center GmbH. His cross-organizational research group focuses on high-level, data science-centric abstractions as well as systems and tools to execute these tasks in an efficient and scalable manner. Prior to joining TU Graz in 2018, he was a research staff member at IBM Research - Almaden, CA, USA, with a major focus on compilation and runtime techniques for declarative, large-scale machine learning in Apache SystemML. Matthias received his Ph.D. from Dresden University of Technology, Germany in 2011 with a dissertation on cost-based optimization of integration flows. His previous research also includes systems support for time series forecasting as well as in-memory indexing and query processing. Matthias is a recipient of the 2016 VLDB Best Paper Award, a 2016 SIGMOD Research Highlight Award, a 2016 IBM Pat Goldberg Memorial Best Paper Award, and the 2021 SIGMOD DS&E Best Paper Award.

Current Projects: Apache SystemDS (An open source ML system for the end-to-end data science lifecycle), ExDRa (exploratory data science and federated ML over raw data, w/ Siemens, DFKI, and TU Berlin), DAPHNE (an open and extensible system infrastructure for integrated data analysis pipelines, w/ AVL, DLR, ETH Zurich, HPI Potsdam, ICCS, Infineon, Intel, ITU Copenhagen, KAI, TU Dresden, Uni Maribor, Uni Basel), and ReWaste F (recycling and recovery of waste for future, 4 scientific and 14 industrial partners)

2021 SIGMOD DS&E Best Paper Award
2016 IBM Pat Goldberg Memorial Best Paper Award
2016 SIGMOD Research Highlight Award
2016 VLDB Best Paper Award

  • System-oriented research for the end-to-end data science lifecycle from data integration, preparation, cleaning, over efficient ML training, to model debugging and deployment,
  • Large-scale, distributed machine learning and data management,
  • Query optimization (in ML systems, integration systems, database systems), and
  • In-memory indexing, query processing, and high-performance computing.