Banner Banner

Network-aware worker placement for wide-area streaming analytics

Habib Mostafaei
Shafi Afridi
Jemal Abawajy

June 18, 2022

Many organizations leverage Distributed Stream processing systems (DPSs) to get insights from the data generated by different users/devices, e.g., the Internet of Things (IoT) devices or user clicks on a website, on geographically distributed datacenters. The worker nodes in such environments are connected through Wide Area Network (WAN) links with various delays and bandwidth. Therefore, minimizing the execution latency of a task on the worker nodes while using the links with enough bandwidth and lower cost to steer the traffic of the applications is a challenging task. In this paper, we formulate the worker node placement for a geo-distributed DSPs network as a multi-criteria decision-making problem. Then, we propose an additive weighting-based approach to solve it. The users can prioritize the worker node placement according to the network-relevant parameters. We also propose a framework that can be integrated with the current DPSs to execute the tasks. We test our placement approach on three widely used stream processing systems, i.e., Apache Spark, Apache Storm, and Apache Flink, on three custom graphs adopted from the real cloud providers. We run the streaming query of the Yahoo! streaming benchmark on these three DPSs. The experimental results show that our approach improves the performance of Spark up to 2.2x–7.2x, Storm up to 1.2x–3.4x, and Flink up to 1.4x–3.3x compared with other placement approaches, which makes our framework useful for use in practical environments.