Developing model-aligned and human-readable explanations for artificial intelligence

Thomas Schnake

November 11, 2024

Machine learning (ML) models, particularly deep neural networks, play an important role in science and industry, and have even entered our everyday lives. However, these models are generally known as ‘black boxes’ because their decision-making processes are not easily interpretable. We explore whether it is possible to obtain flexible and human-readable explanatory features—outputs from an explanation method that help interpret a model's prediction—by closely aligning them with the model-specific prediction process. Specifically, we are interested in the relevance of logical relationships between input features for the prediction. We developed a method that decomposes a model’s prediction into components that measure the influence of all possible higher-order interactions between input features. This approach captures the complex combinations of variables that drive the model’s output. By understanding these feature dependencies, we offer a human-readable abstraction of the prediction strategy, specifically by attributing relevance to logical formulas that express the relationships between features. We demonstrate how these methodologies can be applied to various models, including graph neural networks and Transformer models, and address the associated complexity challenges. We demonstrate the advantages of our methodologies on different datasets. In natural language processing (NLP), which studies how to process human language on a computer, our methods reveal how machine learning models understand grammatical relationships, such as negation. In quantum chemistry (QC), which examines the quantum mechanical behavior of atoms and molecules, we detect how specific atom interactions contribute e.g. to the energy predictions of a model, comparable to potential energy between atoms. Our methods offer a solid foundation for an advanced interface between the machine learning model and the user, enhancing the explanation and interpretation of the model's predictions.

https://depositonce.tu-berlin.de/items/e7f2a235-99b6-46b5-8679-e1709c2f4948

Developing model-aligned and human-readable explanations for artificial intelligence

BIFOLD AUTHORS