Research Assistant - salary grade E13 TV-L Berliner Hochschulen
The Berlin Institute for the Foundations of Learning and Data (BIFOLD) is one of six national AI centres in Germany and is funded by the State of Berlin and the Federal Ministry of Education and Research. BIFOLD currently consists of 12 research groups with over 150 employees, a graduate school and the BIFOLD office. Fellows from the major Berlin universities, Charité - Universitätsmedizin Berlin and various other national and international universities and non-university research institutions are also involved.
Tasks
The DEEM Lab ( https://deem.berlin ) is looking for a research associate in responsible data engineering. The research will be conducted in close collaboration with Prof. Julia Stoyanovich from New York University (https://airesponsibly.net/people/julia/).
Responsible data engineering is emerging as a new discipline at the intersection of data engineering and AI that treats ethics, legal compliance, and inclusivity as central design considerations. The holistic nature of this approach is based on the observation that the decisions we make during data collection and preparation profoundly impact AI systems we build and deploy.
The goal of this position is to create a new system which helps data engineers to design data preparation pipelines that optimize model performance along a rich set of responsibility objectives, including accuracy, robustness, fairness, and legal compliance. For that, the system will proactively guide data engineers through the selection and evaluation of a large set of data preprocessing, data augmentation and feature selection operations. A reliable, efficient and easy-to-use open source implementation of this system will be created as part of the research project.
This endeavor is technically challenging in multiple ways. First, data preparation and model selection need to be optimized for multiple objectives, in contrast to existing approaches, which focus on a single objective only such as overall prediction accuracy. Second, the system will have to create, rewrite and concurrently execute large numbers of different pipeline variants, which requires an efficient runtime and novel query optimization techniques. Third, the research needs to account for current dramatic changes in the development practices of AI applications, e.g., AI assisted programming, tabular foundation models and AI-based data science agents.
Salary grade: TV-L 13, Berliner Hochschulen
Starting date: Earliest possible
Closing date: June 20, 2025
Full job posting: IV 177/25