Representational alignment of humans and machines for computer vision

Lukas Muttenthaler

April 04, 2025

Recent progress in computer vision has led to deep neural networks that match or even surpass human-level performance across various machine learning tasks. However, we identify a significant difference between the representations learned by these models and human conceptual understanding. We introduce a novel approximate Bayesian method for learning object concept representations from human behavior in a triplet odd-one-out task. The method uses variational inference to produce sparse, non-negative representations with uncertainty estimates, improving the reproducibility of object dimensions and the consistency in predicting human behavior. These mental representations can downstream be used to unravel the differences between human and neural network representations of object concepts. Our findings reveal that contemporary computer vision models, while achieving human-level performance, fail to align with human mental representations. Factors such as training data and objective function appear to play a crucial role in alignment, while the model architecture and number of parameters have minimal impact. We pinpoint the reason for this misalignment: human conceptual knowledge is hierarchically organized, while traditional training methods embed similar images close together, thereby often ignoring the global organization of object concepts. To address this mismatch, we developed a novel method that improves the global structure of these representations by linearly aligning them with human similarity judgments, leading to improved performance in few-shot learning and anomaly detection tasks. However, linearly aligning model representation can only improve the representations of large image/text models that have seen billions of images during pretraining but it fails to achieve the same for smaller models trained on less data. By training a teacher model to imitate human judgments and subsequently distilling this human-like structure into pretrained student models, we can improve the representations of any vision foundation model—irrespective of its pretraining task or the number of images it has seen during pretraining. The human-aligned models can better approximate human conceptual knowledge and achieve substantially improved generalization performance and out-of-distribution robustness than the non-aligned base models. Our findings underscore the importance of incorporating the hierarchical nature of human conceptual knowledge into the representations of neural networks, paving the way for more robust, interpretable, and human-aligned artificial intelligence systems.

https://depositonce.tu-berlin.de/items/0add8976-4d00-4dd0-8c24-6fbdcf253d2a

Representational alignment of humans and machines for computer vision

BIFOLD AUTHORS