HIVE: Evaluating the Human Interpretability of Visual Explanations
(Top left) Heatmap explanations by GradCAM and BagNet highlight decision-relevant image regions.
(Bottom left) Prototype-based explanations by ProtoPNet and ProtoTree match image regions to canonical prototypes.
This schematic is a much simplified version of the actual explanation.
(Right) Example ProtoPNet explanation taken from the original paper. The full explanation has up to 10 rows.
Existing evaluation metrics typically apply to one explanation form (e.g., only heatmaps).
We design HIVE to evaluate diverse visual interpretability methods and enable cross-method comparison.
Abstract
As machine learning is increasingly applied to high-impact, high-risk domains,
there have been a number of new methods aimed at making AI models more human interpretable.
Despite the recent growth of interpretability work, there is a lack of systematic
evaluation of proposed techniques. In this work, we propose HIVE (Human Interpretability
of Visual Explanations), a novel human evaluation framework for visual interpretability
methods that allows for falsifiable hypothesis testing, cross-method comparison, and
human-centered evaluation. To the best of our knowledge, this is the first work of its kind.
Using HIVE, we conduct IRB-approved human studies with nearly 1000 participants and evaluate
four methods that represent the diversity of computer vision interpretability works:
GradCAM, BagNet, ProtoPNet, and ProtoTree. Our results suggest that explanations engender
human trust, even for incorrect predictions, yet are not distinct enough for users to
distinguish between correct and incorrect predictions. We open-source HIVE to enable
future studies and to encourage more human-centered approaches to interpretability research.
Citation
@article{kim2021hive,
author = {Sunnie S. Y. Kim and Nicole Meister and Vikram V. Ramaswamy and Ruth Fong and Olga Russakovsky},
title = {{HIVE}: Evaluating the Human Interpretability of Visual Explanations},
journal = {CoRR},
volume = {abs/2112.03184},
year = {2021}
}
HIVE (Human Interpretability of Visual Explanations)
We propose HIVE, a novel human evaluation framework for visual interpretability methods.
Through careful design, HIVE allows for
falsifiable hypothesis testing regarding the utility of explanations for identifying model errors,
cross-method comparison between different interpretability techniques,
and
human-centered evaluation for understanding the practical effectiveness of interpretability.
In particular, we focus on AI-assisted decision making scenarios where humans use an AI (image classification) model
and an interpretability method to make decisions about whether the model prediction is correct or more generally
about whether to use the model and/or interpretability method. We evaluate how useful a given interpretability
method is in these scenarios through the following tasks.
First, we evaluate interpretability methods on a simple
agreement task, where we present users with a single
model prediction-explanation pair for a given image and ask how confident they are in the prediction.
This task simulates a common decision making setting and is close to existing evaluation schemes that consider
a model’s top-1 prediction and an explanation for it. See above for the ProtoPNet evaluation UI.
However, it has been previously observed that users tend to believe in model predictions when given explanations for them.
Hence, we evaluate methods on a
distinction task to mitigate the effect of such confirmation bias in interpretability evaluation.
Here we simultaneously show four prediction-explanation pairs and ask users to identify the correct prediction based on the
provided explanations. This task measures how well explanations can help users distinguish between correct and incorrect predictions.
See above for the GradCAM evaluation UI.
In summary, HIVE consists of the following steps:
We first introduce the study and the interpretability method to be evaluated.
Next, we show a preview of the evaluation task and provide example explanations
for one correct and one incorrect model prediction to give participants appropriate references.
Afterwards, participants complete the evaluation task.
Throughout the study, we also ask subjective evaluation and user preference questions
to make the most out of the human studies.
Our study design was approved by our university’s Institutional Review Board (IRB).
Key Findings
1. When provided explanations, participants tend to believe that the model predictions are correct, revealing an issue of
confirmation bias.
2. When given multiple model predictions and explanations, participants struggle to distinguish between
correct and incorrect predictions based on the explanations. This suggests that interpretability works need to
improve and evaluate their ability to identify and explain model errors.
3. We quantify prior work’s anecdotal observation that there exists a gap between prototype-based models’
similarity scores and human judgments of similarity which can hurt the quality of their interpretability.
4. Participants prefer a model with an explanation over a baseline model that doesn't come with an explanation.
Before switching their preference, they require a baseline model to have higher accuracy and by a greater margin for higher-risk settings.
Please see the
full paper for details.
Related Work
Below are some papers related to our work. We discuss them in more detail in the related work section of our paper.
Acknowledgements
This work is supported by the National Science Foundation Grant No. 1763642 to OR,
the Princeton SEAS Howard B. Wentz, Jr. Junior Faculty Award to OR,
the Princeton SEAS Project X Fund to RF and OR,
and the Princeton SEAS and ECE Senior Thesis Funding to NM.
We thank the authors of [1, 2, 3, 4, 5] for open-sourcing their code and the authors of [2, 4, 5, 6] for sharing their trained models.
We also thank the AMT workers who participated in our studies, as well as the Princeton Visual AI Lab members
(Dora Zhao, Kaiyu Yang, Angelina Wang, and others) who tested our user interface and provided helpful feedback.