Transformer Interpretability Beyond Attention Visualization, CVPR 2021
Quantifying Attention Flow in Transformers, ACL 2020
Explaining Explanations: Axiomatic Feature Interactions for Deep Networks, arxiv preprint 2020
How does this interaction affect me? Interpretable attribution for feature interactions, NeurIPS 2020
Automated Dependence Plots, UAI 2020
ALE: Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models, Journal of the Royal Statistical Society Series B (Statistical Methodology) 2020
Causal Interpretations of Black-Box Models, Journal of Business & Economic Statistics 2019
Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models, CHI 2016
ICE: Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation, Journal of Computational and Graphical Statistics 2013
PDP: Greedy function approximation: A gradient boosting machine, Annals of statistics 2001
Alibi: https://github.com/SeldonIO/alibi
PDPbox: https://github.com/SauceCat/PDPbox, Scikit-learn