Lunch Talk: Query Optimization in the Era of Edge-Cloud Continuum

June 05, 2025 12:00 - 13:00

TU Berlin, Einsteinufer 17, 10587 Berlin, EN 148

Contact person
Dr. Laura Wollenweber
laura.wollenweber@tu-berlin.de

Ankit Chaudary

The BIFOLD Lunch Talk series gives BIFOLD members and external partners the opportunity to engage in dialogue about their research in Machine Learning and Big Data. Each Lunch Talk offers BIFOLD members, fellows and colleagues from other research institutes the chance to present their research and to network with each other.

The Lunch Talk takes place at the TU Berlin. For further information on the Lunch Talks and registration, contact Dr. Laura Wollenweber via email.

Abstract: Sensor and edge nodes with data processing capabilities produce vast amounts of data outside the cloud. A new class of data management systems has been proposed to handle this large amount of data by unifying sensors, edge, and cloud infrastructures for efficient data processing. These systems reduce latency and improve resource efficiency by processing data closer to its source, i.e., on sensor and edge nodes, before transmitting only the necessary information to the cloud for further analysis. However, a major challenge for these systems is maintaining high resource efficiency while processing thousands of stream queries in an ever-evolving sensor-edge-cloud infrastructure.

We propose an end-to-end framework for computing resource-efficient plans, performing adaptive placements, and deploying query plans on a dynamic infrastructure. First, our framework uses incremental stream query merging (ISQM) to identify and manage sharing opportunities among thousands of queries under continuous operations. In particular, ISQM captures the semantic information of stream queries to enable merging even in the presence of syntactic differences. Our evaluation shows that ISQM exploits up to 65× more sharing opportunities than the naive baseline using hash-based signatures, scales linearly for thousands of queries, and saves a significant amount of resources compared to state-of-the-art approaches. Second, our framework uses incremental stream query placement (ISQP) to keep the operator placements valid under continuous query and infrastructure changes. ISQP performs a fine-grained identification of invalid operator placements and takes concurrent, incremental placement decisions. This allows our framework to reduce the optimization overhead needed to keep the placements valid even under continuous query and infrastructure changes. Our evaluations show that ISQP reduces the optimization overhead by one order of magnitude compared to the baseline. Lastly, our framework uses incremental stream query deployment (ISQD) to keep query deployment valid on a dynamic infrastructure. ISQD uses a greedy strategy to concurrently select and redeploy only the operator affected by topology changes. Additionally, ISQD removes the need for an external state management component by using ad-hoc queries to perform state migration in a hierarchical infrastructure. Our evaluations show ISQD incurs up to 7.5× less deployment latency and up to 39× less event time latency compared to the strongest baseline while keeping up with high-frequency topology changes.

In conclusion, this talk introduces a framework designed to generate resource-efficient plans and ensure the continuous execution of queries in a dynamic sensor-edge-cloud infrastructure. Our framework serves as a foundational component of NebulaStream, a next-generation IoT data management platform, enabling the efficient management of massive amount of workloads in an ever-evolving environment.

Bio: Ankit Chaudhary is a computer science researcher completing his Ph.D. at TU Berlin (Aug 2025), focusing on stream query optimization for IoT systems. He has published in CIDR, EDBT, SIGMOD, BTW, VLDB, and ICDE and contributed to NebulaStream. His work earned awards at the ICDE and VLDB conferences. In addition to his research work, Ankit has 10+ years of industry experience in data engineering.