Banner Banner

Cancer Detection in Breast MRI Screening via Explainable AI Anomaly Detection

Felipe Oviedo
Anum S. Kazerouni
Philipp Liznerski
Yixi Xu
Michael Hirano
Robert A. Vandermeulen
Marius Kloft
Elyse Blum
Adam M. Alessio
Christopher I. Li
William B. Weeks
Rahul Dodhia
Juan M. Lavista Ferres
Habib Rahbar
Savannah C. Partridge

July 15, 2025

Background

Artificial intelligence (AI) models hold potential to increase the accuracy and efficiency of breast MRI screening; however, existing models have not been rigorously evaluated in populations with low cancer prevalence and lack interpretability, both of which are essential for clinical adoption.

Purpose

To develop an explainable AI model for cancer detection at breast MRI that is effective in both high- and low-cancer-prevalence settings.

Materials and Methods

This retrospective study included 9738 breast MRI examinations from a single institution (2005–2022), with external testing in a publicly available multicenter dataset (221 examinations). In total, 9567 consecutive examinations were used to develop an explainable fully convolutional data description (FCDD) anomaly detection model to detect malignancies on contrast-enhanced MRI scans. Performance was evaluated in three cohorts: grouped cross-validation (for both balanced [20.0% malignant] and imbalanced [1.85% malignant] detection tasks), an internal independent test set (171 examinations), and an external dataset. Explainability was assessed through pixelwise comparisons with reference-standard malignancy annotations. Statistical significance was assessed using the Wilcoxon signed rank test.

Results

FCDD outperformed the benchmark binary cross-entropy (BCE) model in cross-validation for both balanced (mean area under the receiver operating characteristic curve [AUC] = 0.84 ± 0.01 [SD] vs 0.81 ± 0.01; P < .001) and imbalanced (mean AUC = 0.72 ± 0.03 vs 0.69 ± 0.03; P < .001) detection tasks. At a fixed 97% sensitivity in the imbalanced setting, mean specificity across folds was 13% for FCDD and 9% for BCE (P = .02). In the internal test set, FCDD outperformed BCE for balanced (mean AUC = 0.81 ± 0.02 vs 0.72 ± 0.02; P < .001) and imbalanced (mean AUC = 0.78 ± 0.05 vs 0.76 ± 0.01; P < .02) detection tasks. For model explainability, FCDD demonstrated better spatial agreement with reference-standard annotations than BCE (internal test set: mean pixelwise AUC = 0.92 ± 0.10 vs 0.81 ± 0.13; P < .001). External testing confirmed that FCDD performed well, and better than BCE, in the balanced detection task (AUC = 0.86 ± 0.01 vs 0.79 ± 0.01; P < .001).

Conclusion

The developed explainable AI model for cancer detection at breast MRI accurately depicted tumor location and outperformed commonly used models in both high- and low-cancer-prevalence scenarios.

BIFOLD AUTHORS