Towards Identifying Intent of Data Errors

Mohamed Abdelmaksoud

Konrad Rieck

Ziawasch Abedjan

September 01, 2025

Modern machine learning (ML) systems deployed in high-stakes domains such as hiring, lending, and healthcare heavily rely on structured, often user-provided input data. Errors in this data can arise from natural causes, such as noise, missing values, typos, or from strategic user manipulation intended to alter decision out comes. Existing ML pipelines typically treat all input errors uni formly, lacking mechanisms to distinguish between accidental er rors and intentional manipulations. The goal of this research is to develop a diagnostic framework that identifies erroneous input features, estimates the likelihood that each error was intentional, and quantifies its influence on the model’s output. In this paper, we outline the foundational challenges of our research agenda. We discuss risks and potentials in trying to separate intentional from non-intentional errors.

https://www.vldb.org/2025/Workshops/VLDB-Workshops-2025/GuideAI/GuideAI25_3.pdf

BIFOLD AUTHORS

Mohamed Abdelmaksoud

Prof. Dr. Konrad Rieck

Prof. Dr. Ziawasch Abedjan