Banner Banner

DORIAN in action: Assisted Design of Data Science Pipelines

Sergey Redyuk
Zoi Kaoudi
Sebastian Schelter
Volker Markl

September 05, 2022

Existing automated machine learning solutions and intelligent discovery assistants are popular tools that facilitate the end-user with the design of data science (DS) pipelines. However, they yield limited applicability for a wide range of real-world use cases and application domains due to(a) the limited support of DS tasks; (b) a small, static set of available operators; and (c) restriction to evaluation processes with quantifiable loss functions. We demonstrate DORIAN, a human-in-the-loop approach for the assisted design of data science pipelines that supports a large and growing set of DS tasks, operators, and arbitrary user-defined evaluation processes. Based on the user query, i.e., a dataset and a DS task, DORIAN computes a ranked list of candidate pipelines that the end-user can choose from, alter, execute and evaluate. It stores executed pipelines in an experiment database and utilizes similarity-based search to identify relevant previously-run pipelines from the experiment database. DORIAN also takes user interaction into account to improve suggestions over time. We show how users can interact with DORIAN to create and compare DS pipelines on various real-world DS tasks without the need for writing any code.