Every year in summer, the Association of Computing Machinery (ACM) organizes its Special Interest Group in the Management of Data (SIGMOD) Conference. SIGMOD is a leading international forum for data management researchers, practitioners, developers, and users. Prior to the conference, an awards committee selects a system to receive the SIGMOD Systems Award. The award recognizes “an individual or set of individuals who developed a software or hardware system whose technical contributions have had significant impact on the theory or practice of large-scale data management systems.” These systems usually have large-scale real-world applications and have influenced the design of future data processing systems. This year Apache Flink won the award. The SIGMOD Systems award is one of the world's most prestigious awards in the field of data management. The only other time this award was given to a system that originated in Europe was in 2016 with MonetDB.
Apache Flink is an open-source big data stream analytics platform. The origins of Apache Flink can be traced back to 2008, when BIFOLD Director Prof. Dr. Volker Markl initially founded the Database Systems and Information Management (DIMA) Research Group at the Technische Universität (TU) Berlin. Soon after his arrival, he laid out the vision for a massively parallel batch data processing system based on post-relational user-defined functions, combining database and distributed systems concepts, with the goal of enabling modern data analysis and machine learning for big data.
In 2009, Volker Markl, jointly with researchers from TU Berlin, Humboldt University (HU) of Berlin, and the Hasso Plattner Institute (HPI) in Potsdam, co-wrote a DFG (German Research Foundation) research unit proposal entitled “Stratosphere – Information Management on the Cloud,” which was funded in 2010. The initial DFG grant (spanning 2010-2012) extended the original vision to develop a novel, database-inspired approach to analyze, aggregate, and query very large collections of either textual or (semi-)structured data on a virtualized, massively parallel cluster architecture.
Donation to the Apache Software Foundation
In 2014, the team at TU Berlin, which was driving the research and development of the core data processing infrastructure, decided to donate the code base to the Apache Software Foundation under the name “Flink.”
At the same time, several members of the DIMA Research Group at TU Berlin founded a startup, dataArtisans GmbH, to promote and commercialize Flink as an open-source system. Projects funded by the European Institute of Innovation and Technology and under the Horizon 2020 framework with partners at TU Berlin, DFKI, KTH in Stockholm, and ELTE/SZTAKI in Budapest laid the foundations for the global Flink open-source community.
Today Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams, developed by the Apache Software Foundation. The project is driven by an international open community with over a 1000 contributors. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed, and at any scale. It powers business-critical applications in many companies and enterprises around the globe. Moreover, Flink is an active platform for research and innovation in many universities and companies worldwide.