Researcher Spotlight: Dr. Haralampos Gavriilidis

Bridging the gaps between the world's databases could be the key to turning raw data into real insight

The world runs on scattered data. Every day, humanity generates 402.74 million terabytes of it, flowing through corporate systems, government databases, cloud platforms, and the devices we carry in our pockets. It is never in one place, rarely in the same format, and almost never designed to work together. For anyone trying to turn that raw abundance into meaningful insight, especially in a world increasingly powered by AI, that fragmentation is a fundamental problem. Haralampos Gavriilidis, a researcher at BIFOLD, has made it his mission to solve it: to make the exchange of information between different data systems faster, cheaper, and simpler.

From a broken computer to database research

Haralampos' interest in computers started with a gift gone wrong. When he was young, his aunt bought him a computer, but it arrived with technical problems. Unable to convince the vendor to exchange it, he fixed it himself. That hands-on encounter sparked a curiosity about how computers work that never left him.

He began a career as a web developer, writing SQL queries and tuning databases. But web development work quickly gave way to a deeper fascination with the infrastructure underneath: the systems that store, move, and manage data at scale.

A master's degree in data management followed, along with a growing realization: what he loved most was not building systems, but questioning them. The collaborative nature of research, the freedom to explore problems without a predetermined answer, and the guidance of mentors who challenged and encouraged him convinced him to go further.

He completed his doctorate under the supervision of BIFOLD co-Director Prof. Dr. Volker Markl. His thesis, "Query Processing and Interoperability Mechanisms for Federated Data Systems," tackled one of the central pain points of modern data infrastructure: how to run efficient analyses across distributed, heterogeneous data sources that were never designed to cooperate. The work produced three distinct contributions. XDB rethinks federated query execution by removing the central mediator entirely, letting database systems collaborate directly. XDBC is a modular, adaptive framework for data transfer between heterogeneous environments. And SheetReader addresses something more mundane but no less real: the stubborn persistence of spreadsheets in enterprise data pipelines, offering a high-performance parser that brings them into the fold.

Dr. Haralampos "Harry" Gavriilidis: is a self-declared database guy and motorcycle enthusiast who counts Edgar F. Codd among his greatest inspirations. What started with fixing a broken computer as a child became a career in data systems research. He completed his PhD in 2025 at BIFOLD, TU Berlin, under the supervision of Prof. Dr. Volker Markl, and is a Software Campus graduate. His mission: making federated query processing more efficient.

The problem with the middleman

At the heart of Haralampos's research is a deceptively simple question: when you have data scattered across multiple databases, what is the smartest way to query them all?

We don’t need a new query engine for everything. We have good engines out there. We just need to find the right ways to leverage them and sometimes, decentralizing things can be beneficial.

For decades, the dominant answer has been the mediator architecture, a central query engine that pulls data from different sources and processes it in one place. This approach is slow and expensive. It requires additional hardware and resources to maintain the infrastructure and moves more data than necessary. Joining two tables from different sources should not require sending both to a central system when only the smaller one needs to be moved.

His answer is XDB, a cross-database query-processing paradigm that replaces the central mediator with a collaborative model: existing database systems communicate directly with one another and compute results together. The approach combines the direct peer-to-peer communication of distributed systems with the clean user abstraction of federated databases, all while leveraging existing infrastructure. Whether systems communicate through a central mediator or directly, efficient data transfer remains crucial. His XDBC project focuses on how data can move adaptively between systems and how machine learning can help tune that process. Together with students, he even transformed XDBC into an interactive conference demo game with spaceships, giving a playful way to experience the challenges of data transfer firsthand.

PolyDB: From theory to practice

In 2023, Haralampos joined the Software Campus program, a BMFTR-funded initiative that pairs outstanding doctoral researchers with industry partners and a 100,000€ research budget. He led a team of five student researcher assistants while working closely with his industry partner to translate this vision into PolyDB.

I want my research to be practical. People should benefit from it soon. The Software Campus Grant put me in the right spot with great collaborators and an inspiring team, as well as leadership development opportunities, and generous funding for research, travel, and equipment.

Haralampos’ Software Campus project targets the real-world messiness of data integration: companies synchronizing databases across global operations, organizations analyzing sensitive data across locations without ever centralizing it, and yes, the countless Excel spreadsheets that remain deeply embedded in everyday business processes. PolyDB brings these challenges together, helping different data systems work with one another more seamlessly.

Yet the work is far from finished. Like every corner of computer science, the database community is grappling with the rise of machine learning, and Haralampos sees both the anxiety and the opportunity it brings. The key, in his view, is to treat AI as a tool rather than a threat: leveraging machine learning to make database systems smarter, while ensuring that decades of foundational research are not simply swept aside. The field is still early in figuring out what that balance looks like. What is already clear, however, is that AI systems are only as powerful as the data they can access, and scattered, fragmented data remains one of the biggest bottlenecks standing between raw potential and real-world impact.

Federated Query Processing: refers to the execution of a single query across multiple independent databases or data systems. Rather than requiring prior consolidation of all data into one repository, it enables analysis across distributed sources. Federated query processing systems may follow different architectural models, including centralized approaches, in which much of the computation takes place in a central mediator, and decentralized approaches, in which systems collaborate more directly to compute query results. The purpose of federated query processing is to reduce data silos and facilitate efficient analysis across organizations, platforms, and environments.

Research ideas rarely arrive at a desk

Haralampos follows discussions on social media, pays attention to what practitioners complain about, and talks to anyone willing to engage, from leading experts to bachelor's students asking the questions no one else thinks to ask. Sometimes, the most basic challenge to an assumption is what moves thinking forward. And when all of that fails, he goes for a walk.

As his PhD concludes, Haralampos finds himself at a crossroads most researchers would envy. He is preparing for a postdoc in Berkeley while also exploring how some of his ideas might take shape beyond academia, and will soon become a father. That leaves him with plenty to think about, whether on his 1990s Honda Africa Twin, on the Brazilian Jiu-Jitsu mat, or occasionally with a saxophone.

__

Three key works by Dr. Haralampos Gavriilidis

Fast and Scalable Data Transfer across Data Systems (SIGMOD, 2025). Introduces XDBC, a modular framework for fast and scalable data transfer across heterogeneous systems and environments, with automatic tuning that adapts transfer configurations to workload and infrastructure characteristics.

In-Situ Cross-Database Query Processing (ICDE, 2023). Introduces XDB, a decentralized approach to cross-database query processing that removes the central mediator and lets existing database systems collaborate directly, reducing both runtime and unnecessary data movement.

SheetReader: Efficient Specialized Spreadsheet Parsing (Information Systems, 2023). Introduces a specialized parser for spreadsheet files that uses structure-aware optimizations and parallelism to make loading data much faster and more memory-efficient, making it easier to integrate spreadsheets into broader data analytics pipelines.

Dr. Haralampos Gavriilidis featured in the media

Software Campus Interview

Disseminate Podcast exploring the “Fast and Scalable Data Systems” SIGMOD 2025 paper

DuckDB in Research Podcast about the “SheetReader: Efficient Spreadsheet Parsing” paper