Princeton Visual AI Lab

HIVE: Evaluating the Human Interpretability of Visual Explanations

Sunnie S. Y. Kim   Nicole Meister   Vikram V. Ramaswamy   Ruth Fong   Olga Russakovsky
Princeton University
{sunniesuhyoung, nmeister, vr23, ruthfong, olgarus}

(Top left) Heatmap explanations by GradCAM and BagNet highlight decision-relevant image regions. (Bottom left) Prototype-based explanations by ProtoPNet and ProtoTree match image regions to canonical prototypes. This schematic is a much simplified version of the actual explanation. (Right) Example ProtoPNet explanation taken from the original paper. The full explanation has up to 10 rows. Existing evaluation metrics typically apply to one explanation form (e.g., only heatmaps). We design HIVE to evaluate diverse visual interpretability methods and enable cross-method comparison.


Extended Abstract

2min Talk



07/2022: HIVE has been accepted to ECCV 2022. Camera-ready coming soon!

06/2022: We presented HIVE at the CVPR 2022 Explainable AI for Computer Vision Workshop (spotlight talk & poster) and the CVPR 2022 Women in Computer Vision Workshop (poster).

05/2022: We presented HIVE at the CHI 2022 Human-Centered Explainable AI Workshop (poster).


As machine learning is increasingly applied to high-impact, high-risk domains, there have been a number of new methods aimed at making AI models more human interpretable. Despite the recent growth of interpretability work, there is a lack of systematic evaluation of proposed techniques. In this work, we propose HIVE (Human Interpretability of Visual Explanations), a novel human evaluation framework for visual interpretability methods that allows for falsifiable hypothesis testing, cross-method comparison, and human-centered evaluation. To the best of our knowledge, this is the first work of its kind. Using HIVE, we conduct IRB-approved human studies with nearly 1000 participants and evaluate four methods that represent the diversity of computer vision interpretability works: GradCAM, BagNet, ProtoPNet, and ProtoTree. Our results suggest that explanations engender human trust, even for incorrect predictions, yet are not distinct enough for users to distinguish between correct and incorrect predictions. We open-source HIVE to enable future studies and to encourage more human-centered approaches to interpretability research.


        author = {Sunnie S. Y. Kim and Nicole Meister and Vikram V. Ramaswamy and Ruth Fong and Olga Russakovsky},
        title = {{HIVE}: Evaluating the Human Interpretability of Visual Explanations},
        journal = {CoRR},
        volume = {abs/2112.03184},
        year = {2021}

HIVE (Human Interpretability of Visual Explanations)

We propose HIVE, a novel human evaluation framework for visual interpretability methods. Through careful design, HIVE allows for falsifiable hypothesis testing regarding the utility of explanations for identifying model errors, cross-method comparison between different interpretability techniques, and human-centered evaluation for understanding the practical effectiveness of interpretability.

In particular, we focus on AI-assisted decision making scenarios where humans use an AI (image classification) model and an interpretability method to make decisions about whether the model prediction is correct or more generally about whether to use the model and/or interpretability method. We evaluate how useful a given interpretability method is in these scenarios through the following tasks.

First, we evaluate interpretability methods on a simple agreement task, where we present users with a single model prediction-explanation pair for a given image and ask how confident they are in the prediction. This task simulates a common decision making setting and is close to existing evaluation schemes that consider a model’s top-1 prediction and an explanation for it. See above for the ProtoPNet evaluation UI.

However, it has been previously observed that users tend to believe in model predictions when given explanations for them. Hence, we evaluate methods on a distinction task to mitigate the effect of such confirmation bias in interpretability evaluation. Here we simultaneously show four prediction-explanation pairs and ask users to identify the correct prediction based on the provided explanations. This task measures how well explanations can help users distinguish between correct and incorrect predictions. See above for the GradCAM evaluation UI.

In summary, HIVE consists of the following steps: We first introduce the study and the interpretability method to be evaluated. Next, we show a preview of the evaluation task and provide example explanations for one correct and one incorrect model prediction to give participants appropriate references. Afterwards, participants complete the evaluation task. Throughout the study, we also ask subjective evaluation and user preference questions to make the most out of the human studies. Our study design was approved by our university’s Institutional Review Board (IRB).

Key Findings

1. When provided explanations, participants tend to believe that the model predictions are correct, revealing an issue of confirmation bias.

2. When given multiple model predictions and explanations, participants struggle to distinguish between correct and incorrect predictions based on the explanations. This suggests that interpretability works need to improve and evaluate their ability to identify and explain model errors.

3. We quantify prior work’s anecdotal observation that there exists a gap between prototype-based models’ similarity scores and human judgments of similarity which can hurt the quality of their interpretability.

4. Participants prefer a model with an explanation over a baseline model that doesn't come with an explanation. Before switching their preference, they require a baseline model to have higher accuracy and by a greater margin for higher-risk settings.

Please see the full paper for details.

Related Work

Below are some papers related to our work. We discuss them in more detail in the related work section of our paper.
[1] Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization. Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. IJCV 2019.
[2] Approximating CNNs with Bag-of-local-Features Models Works Surprisingly Well on ImageNet. Wieland Brendel, Matthias Bethge. ICLR 2019.
[3] This Looks Like That: Deep Learning for Interpretable Image Recognition. Chaofan Chen, Oscar Li, Chaofan Tao, Alina Jade Barnett, Jonathan Su, Cynthia Rudin. NeurIPS 2019.
[4] This Looks Like That... Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, Jonas Kohler. ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI.
[5] Neural Prototype Trees for Interpretable Fine-grained Image Recognition. Meike Nauta, Ron van Bree, Christin Seifert. CVPR 2021.
[6] Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. CVPR 2016.


This work is supported by the National Science Foundation Grant No. 1763642 to OR, the Princeton SEAS Howard B. Wentz, Jr. Junior Faculty Award to OR, the Princeton SEAS Project X Fund to RF and OR, and the Princeton SEAS and ECE Senior Thesis Funding to NM. We thank the authors of [1, 2, 3, 4, 5] for open-sourcing their code and the authors of [2, 4, 5, 6] for sharing their trained models. We also thank the AMT workers who participated in our studies, as well as the Princeton Visual AI Lab members (Dora Zhao, Kaiyu Yang, Angelina Wang, and others) who tested our user interface and provided helpful feedback.


Sunnie S. Y. Kim (