RUDOLFV: A FOUNDATION MODEL BY PATHOLOGISTS FOR PATHOLOGISTS

Jonas Dippel

Barbara Feulner

Tobias Winterhoff

Simon Schallenberg

Gabriel Dernbach

Andreas Kunft

Stephan Tietz

Philipp Jurmeister

David Horst

Lukas Ruff

Klaus-Robert Müller

Frederick Klauschen

Maximilian Alber

January 23, 2024

Histopathology plays a central role in clinical medicine and biomedical research. While artificial intelligence shows promising results on many pathological tasks, generalization and dealing with rare diseases, where training data is scarce, remains a challenge. Distilling knowledge from unlabeled data into a foundation model before learning from, potentially limited, labeled data provides a viable path to address these challenges. In this work, we extend the state of the art of foundation models for digital pathology whole slide images by semi-automated data curation and incorporating pathologist domain knowledge. Specifically, we combine computational and pathologist domain knowledge (1) to curate a diverse dataset of 103k slides corresponding to 750 million image patches covering data from different fixation, staining, and scanning protocols as well as data from different indications and labs across the EU and US, (2) for grouping semantically similar slides and tissue patches, and (3) to augment the input images during training. We evaluate the resulting model on a set of public and internal benchmarks and show that although our foundation model is trained with an order of magnitude less slides, it performs on par or better than competing models. We expect that scaling our approach to more data and larger models will further increase its performance and capacity to deal with increasingly complex real world tasks in diagnostics and biomedical research.

https://doi.org/10.48550/arXiv.2401.04079

BIFOLD AUTHORS

Jonas Dippel

Gabriel Dernbach

Prof. Dr. Klaus-Robert Müller

Prof. Dr. Frederick Klauschen