Operator placement and parallelization for geo-distributed stream processing in large and heterogeneous topologies

Chatziliadis

Xenofon

October 13, 2025

The rapid growth of Internet of Things (IoT) devices has produced large volumes of sensor data across geo-distributed, resource-constrained environments. Although Stream Processing Engines (SPEs) traditionally handle this data in centralized cloud infrastructures, such solutions incur high latency and communication overhead. To overcome these challenges, the osmotic computing paradigm has emerged, enabling dynamic distribution of computations across cloud, fog, and edge resources to reduce latency and communication costs. However, shifting workloads from the cloud to fog and edge nodes requires to identify the parallelization and placement of operators, which are the core computational units in SPE workloads. While operator parallelization and placement are well-studied in centralized cloud settings, they remain largely unresolved in osmotic environments characterized by their geographical distribution, heterogeneity, and volatility. To address this, we begin by investigating the limitations of performance monitoring workloads in centralized stream processing. Our findings reveal that funneling computations to centralized nodes leads to node overload and excessive latency, emphasizing the need for effective distribution of SPE tasks across the edge–cloud continuum. Building on these insights, we tackle two unresolved challenges in osmotic computing–based stream processing. First, we propose NEMO, a resource-aware approach that efficiently parallelizes and places Decomposable Aggregation Functions (DAFs), across fog and edge resources. Second, we introduce NOVA, a novel approach for optimizing parallelization and placement of streaming join operators, which are essential for combining multiple data streams. Unlike traditional methods for the NP-hard operator placement problem that operate over a discrete set of nodes, NEMO and NOVA map the topology into a Euclidean space. Through iterative approximation, this transformation enables them to identify near-optimal placements in linear time and to re-optimize in near-constant time. Experimental evaluations show that NEMO lowers latency by up to 6× and reduces communication costs by up to 15× through dynamic, hierarchical aggregation trees. Its replication and placement strategy balances workloads, preventing node overload while maximizing resource efficiency. Meanwhile, NOVA achieves up to 39× lower latency, 4.5× higher throughput and prevents node overload compared to existing methods by decomposing large joins into smaller sub-joins strategically distributed across the edge–cloud continuum. By leveraging data distribution patterns (e.g., region-based keys), NOVA further minimizes data transfers and resource overhead. Collectively, these contributions advance osmotic computing by (1) providing an in-depth analysis of performance monitoring requirements for SPEs in IoT-scale deployments, (2) providing robust mechanisms for efficient aggregation of DAFs, and (3) delivering a comprehensive solution for distributed streaming joins. Together, they establish a foundation for scalable, efficient, and resource-aware stream processing in large-scale, heterogeneous, and geo-distributed environments.

https://doi.org/10.14279/depositonce-24818

BIFOLD AUTHORS

Xenofon Chatziliadis