This repository contains the source code for a new method of evaluating the alignment between human perception and evaluation metrics for attribution-based methods.
The first step of evaluating an evaluation metric is to fine-tune a model on the metric.
You can define your backprop-enabled metric by extending the metric.LearnableMetric
class and supplying the appropriate trainer located in the trainer
module, or writing your own by extending the trainer.LearnableMetricTrainer
class.
After fine-tuning, supply your model to the create_harmony_dataset
method in the harmony_dataset
module to automatically create a dataset which can be used within the supplied web aplication in this repository.
Once you create the dataset, you can load it into the web application using Django fixtures. Then, simply run the application, add the annotators to the user database and label the previously created examples. The labels are saved to a database, which can subsequently be analyzed to report the alignment of an evaluation metric to the human perception.
Figure 3. An example from our user study. The original image and the model's prediction is presented in the first row. In the following row, we visualize the original model's attribution map for the target class, the fine-tuned model's map, and the option "Neither".