Banner Banner

Lunch Talk: Haneen Abdulrashid Mohammed “Fast lineage capture on analytical workloads and what to do with them”

Icon

July 03, 2024 Icon 12:00 - 13:00

Icon

Technische Universität Berlin, Einsteinufer 17, 10587 Berlin, 1st floor Room EN148

Join us for an enlightening Lunch Talk featuring Haneen Abdulrashid Mohammed, a distinguished guest researcher from Columbia University at BIFOLD. Haneen will explore the intricacies of "Fast Lineage Capture on Analytical Workloads and What to Do with Them," a crucial topic for anyone involved in data analytics and processing.

Abstract: In this talk Haneen Mohammed will discuss the current state of fine-grained lineage on OLAP workload, then introduce an approach to achieve low-overhead lineage capture in vectorized, interpreted engines. This approach is based on her insight that lineage and data movement are equivalent in columnar data processing logic, and existing variables already store lineage. Haneen and her group leverage standard data-flow analysis techniques to identify which variables and program state are necessary to reconstruct end-to-end lineage for any SQL query. They apply these instrumentation techniques to create SmokedDuck, a version of DuckDB that reduces lineage capture on TPC-H workload from 0.08-1462X, to as low as 0-0.2X, and which supports fast lineage access with 10x improvement over existing approaches.  
Haneen Mohammed will then present a use case that takes advantage of captured lineage to accelerate What-if analysis, a building block for many explanation and analytics applications that explore how a query's output changes due to input data changes, achieving a throughput of 1 million what-if per sec.

The BIFOLD Lunch Talk series gives BIFOLD members and external partners the opportunity to engage in dialogue about their research in Machine Learning and Big Data. Each Lunch Talk offers BIFOLD members, fellows and colleagues from other research institutes the chance to present their research and to network with each other.
The Lunch Talk takes place at the TU Berlin. For further information on the Lunch Talks and registration, contact Dr. Laura Wollenweber via email.