The paper “Parallelizing Intra-Window Join on Multicores: An Experimental Study” by Shuhao Zhang, Yancan Mao, Jiong He, Philipp Grulich, Steffen Zeuch, Bingsheng He, Richard Ma and Volker Markl was accepted for presentation at the ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD/PODS 2021), which will take place from June 20 – 25, 2021 in Xi’an, China. This work is the result of a collaboration between researchers from the Database Systems and Information Management (DIMA) group at TU Berlin, the Intelligent Analytics for Massive Data (IAM) group at DFKI, the Department of Computer Science at the National University of Singapore and ByteDance.
The annual ACM SIGMOD/PODS Conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences in all aspects of data management. To learn more about SIGMOD/PODS, please visit https://2021.sigmod.org/.
The intra-window join (IaWJ), i.e., joining two input streams over a single window, is a core operation in modern stream processing applications. This paper presents the first comprehensive study on parallelizing the IaWJ on modern multicore architectures. In particular, we classify IaWJ algorithms into lazy and eager execution approaches. For each approach, there are further design aspects to consider, including different join methods and partitioning schemes, leading to a large design space. Our results show that none of the algorithms always performs the best, and the choice of the most performant algorithm depends on: (i) workload characteristics, (ii) application requirements, and (iii) hardware architectures. Based on the evaluation results, we propose a decision tree that can guide the selection of an appropriate algorithm.
A preprint version of the paper is available here.