Banner Banner
Icon

August 29, 2025

Dr. Mahdi Esmailoghli

Researcher Spotlight: Dr. Mahdi Esmailoghli

Data Discovery instead of Big Data Drowning

In November 2024, Mahdi Esmailoghli completed his PhD with summa cum laude at Prof. Dr. Ziawasch Abedjan's research group, Data Integration and Data Preparation. In his dissertation, he developed a holistic system that helps researchers navigate vast “data lakes” more efficiently—a breakthrough that makes training machine learning models both faster and more effective.

Esmailoghli’s work focuses on data discovery: finding the right pieces of information at the right time, instead of drowning in oceans of raw and unstructured information. Now a postdoctoral researcher at Humboldt-Universität zu Berlin, he is taking this vision further, building intelligent systems that recommend entire data analysis pipelines and share expertise across scientific domains.

 

Please describe and explain your research focus.

Mahdi: I am a computer scientist with a focus on Big Data. My research helps to understand complex problems by analyzing vast amounts of raw data to extract relevant information.

Imagine you want to explore the causes of air pollution in Berlin on specific days of the year. Identifying the relevant data is a cumbersome task because the data is often scattered across so-called data lakes. These are massive digital repositories where governments, companies, or research institutions store raw data in its original form - tables, text files, sensor readings, images. Unlike traditional databases, data in data lakes is initially stored in an unfiltered and unstructured format.

To address this problem, I built a system during my Ph.D. called Blend. The idea behind Blend is simple: it brings together different methods for discovering patterns in data, such as finding connections between datasets (join discovery) or spotting statistical relationships (correlation discovery). By combining these components, Blend creates a unified environment that makes it easier to work with large data lakes. Users can describe their task and the kind of data they are looking for, and the system automatically identifies the datasets that are most relevant to their question.

What are your recent projects?

Mahdi: After my Ph.D., I moved from BIFOLD to Humboldt University in Berlin. Here, I am expanding my research from data discovery to data analysis pipelines. These pipelines are sequences of analysis steps—for example, cleaning the data, combining different datasets, and then running statistical models. My goal is to build systems that can suggest the most relevant pipelines by learning from a repository of past workflows.

Which major innovations do you expect in your research field in the next ten years?

Mahdi: I expect large language models to have a significant impact on the field of data discovery. They will revolutionize how users find and use data. It will enable them to express their intent more clearly, so that data discovery aligns more closely with how humans actually think and work.

AI is considered a disruptive technology - in which areas of life do you expect the greatest upheaval in the next ten years?

Mahdi: Recent developments in computer science are already transforming nearly every aspect of our lives. Large language models can now detect illnesses that doctors might miss, teach more effectively than many school teachers, and transform numerous industries. I would not be surprised to see an AI judge soon, capable of delivering sentiment-free judgments. However, the most significant aspect is often missing:  An evaluation of how well these models actually perform. To be used safely in critical situations, they need to demonstrate accuracy and trustworthiness.

What personally motivated you to enter research?

Mahdi: What excites me about research is tackling problems no one has solved before – moving beyond routine into open-end challenges. At the same time, teaching matters deeply to me: in a transactional world, it’s one of the few spaces of unconditional giving. The combination of discovery and teaching made academia the natural path for me.

Which living or historical scientist has fascinated you?

Mahdi: Alan Turing has always fascinated me. His genius not only laid the foundation for modern computing but also helped break the Nazi Enigma code—an achievement that shortened the war and saved countless lives. Few scientists have changed the world so profoundly.

Where would one find you if you are not sitting in front of the computer?

Mahdi: I love photography, hiking, camping, and dancing. If I’m not coding or writing a paper in my office, you might find me in a photo studio, at a campsite by a river, or in a dance studio, probably spinning to a lively 'Uno, dos, tres!'.