Researcher Spotlight: Dr. Arnab Phani

Optimizing the internal mechanics of machine learning systems

Dr. Arnab Phani earned his PhD under Prof. Matthias Böhm, DAMS Lab, at TU Berlin and BIFOLD before recently joining the DEEM Lab, chaired by Prof. Sebastian Schelter, as a Postdoctoral researcher. His doctoral research significantly contributed to optimizing the internal mechanics of machine learning systems, particularly by minimizing redundant computations across diverse execution environments. Before entering academia, Arnab spent several years as a senior software Engineer at Teradata Labs in India, where he helped develop core features for the Teradata database engine.

Please describe and explain your research focus?

Arnab: My core research lies at the intersection of machine learning and data management, where I work on addressing the fundamental challenges of efficiently handling data throughout the entire AI lifecycle. A prime example of this is my work on the intelligent reuse of intermediate computations. In complex machine learning pipelines, a vast amount of data is frequently reprocessed or recalculated—be it pre-processed datasets, feature transformations, or intermediate steps during model training. My research develops systems that can intelligently identify, capture, and efficiently retrieve these reusable results across diverse hardware, from computers (CPUs) to graphic cards (GPUs). This not only significantly boosts runtime efficiency and reduces computational waste, but also critically lowers the energy consumption of modern AI systems. Ultimately, these innovations are vital for building scalable, high-performance, and sustainable AI infrastructure, particularly for large-scale applications like popular large language models.

How do you translate these ideas into actual tools or frameworks?

Arnab: During my PhD at the DAMS Lab research group, we have developed an open-source system, Apache SystemDS, designed for end-to-end AI workflow. My research on reusing computation resulted in two frameworks: LIMA and MEMPHIS—which I integrated into SystemDS and made them available to the normal users. Our reuse framework tracks where every intermediate result comes from, which helps to identify and reuse it when the same computation repeats. At the heart of our system is a special "intermediate cache" that acts as a central manager. It handles finding and reusing data, freeing up memory, moving data between different computing resources, and deciding what data to keep or remove from memory. Beyond these core reuse mechanisms, my work also includes frameworks like UPLIFT, which enhances parallel execution by decomposing complex tasks. Emphasizing our commitment to open science, all our experimental setups and results are publicly available alongside SystemDS. Our dedication to tackling these complex challenges and open-source development was recognized when our work on MEMPHIS received the Best Research Paper Award at the 28th International Conference on Extending Database Technology (EDBT 2025).

What are your recent projects?

Arnab: I recently joined the DEEM Lab, led by Prof. Sebastian Schelter, at BIFOLD as a postdoctoral researcher. In this new role, my research continues to focus on critical aspects of data management for AI. Specifically, I am concentrating on enhancing runtime efficiency, fostering responsible data management practices, and effectively lowering the technical barriers for users engaging with complex AI systems. A key part of my role as a postdoc involves co-supervising PhD students, guiding them through their research journeys, and supporting them in publishing their findings at top-tier conferences.

What personally motivated you to enter research?

Arnab: My motivation for entering research stems from a curiosity and a constant desire to learn and explore important problems. After a fulfilling career as a senior software engineer at Teradata Labs, I found myself increasingly drawn to the freedom of open source research. This shift allows me to delve into fundamental challenges and contribute to collective knowledge, which is incredibly rewarding.

Which major innovations do you expect in your research field in the next ten years?

Arnab: I anticipate major innovations in my research field across three critical areas over the next decade. Firstly, we will see significant breakthroughs in enabling large language models (LLMs) to run efficiently on commodity hardware. This will democratize access to advanced AI capabilities. Secondly, a paramount focus will be on drastically cutting the energy cost associated with training and deploying AI. As AI models grow in complexity and usage, developing sustainable algorithms and systems will be crucial for environmental responsibility. Finally, I foresee substantial advancements in building fairness and privacy directly into AI systems from the ground up. This will be essential to ensure AI is not only powerful but also trustworthy and respectful of individual data rights.

Which living or historical scientist has fascinated you?

Arnab: Jim Gray fascinates me due to his monumental contributions to transaction processing and his pioneering work in database systems. His research laid the foundational for ensuring data consistency and reliability, which are critical in today's complex data-driven world. He received the Turing Award in 1998 for these achievements, which further underscores the immense impact of his work on computer science research and industry.

Where would one find you, if you are not sitting in front of the computer?

Arnab: When I am not engaged in research, you would most likely find me on the badminton court, enjoying a lively game. More recently, I have also developed an interest for photography, so you might spot me outdoors, capturing moments and exploring the streets of Berlin.