-
Notifications
You must be signed in to change notification settings - Fork 34
/
README.Rmd
187 lines (138 loc) · 7.36 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
output: github_document
bibliography: ./inst/REFERENCES.bib
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
tidy = "styler"
)
```
# shapr <img src="man/figures/NR-logo_utvidet_r32g60b136_small.png" align="right" height="50px"/>
<!-- badges: start -->
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version-last-release/shapr)](https://cran.r-project.org/package=shapr)
[![CRAN_Downloads_Badge](https://cranlogs.r-pkg.org/badges/grand-total/shapr)](https://cran.r-project.org/package=shapr)
[![R build status](https://github.com/NorskRegnesentral/shapr/workflows/R-CMD-check/badge.svg)](https://github.com/NorskRegnesentral/shapr/actions?query=workflow%3AR-CMD-check)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit/)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.02027/status.svg)](https://doi.org/10.21105/joss.02027)
<!-- badges: end -->
## Brief NEWS
This is `shapr` version 1.0.0 (Released on GitHub Nov 2024), which provides a full restructuring of the code based, and
provides a full suit of new functionality, including:
* A long list of approaches for estimating the contribution/value function $v(S)$, including Variational Autoencoders,
and regression-based methods
* Iterative Shapley value estimation with convergence detection
* Parallelized computations with progress updates
* Reweighted Kernel SHAP for faster convergence
* New function `explain_forecast()` for explaining forecasts
* Several other methodological, computational and user-experience improvements
* Python wrapper making the core functionality of `shapr` available in Python
Below we provide a brief overview of the breaking changes.
See the [NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md) for the full list of details.
### Breaking changes
The new syntax for explaining models essentially amounts to using a single function (`explain()`) instead of two functions (`shapr()` and `explain()`).
In addition, custom models are now explained by passing the prediction function directly to `explain()`,
some input arguments got new names, and a few functions for edge cases was removed to simplify the code base.
Note that the CRAN version of `shapr` (v0.2.2) still uses the old syntax.
The examples below uses the new syntax.
[Here](https://github.com/NorskRegnesentral/shapr/blob/cranversion_0.2.2/README.md) is a version of this README with the syntax of the CRAN version (v0.2.2).
### Python wrapper
We now also provide a Python wrapper (`shaprpy`) which allows explaining python models with the methodology implemented in `shapr`, directly from Python.
The wrapper is available [here](https://github.com/NorskRegnesentral/shapr/tree/master/python).
## The package
The `shapr` R package implements an enhanced version of the Kernel SHAP method, for approximating Shapley values,
with a strong focus on conditional Shapley values.
The core idea is to remain completely model-agnostic while offering a variety of methods for estimating contribution
functions, enabling accurate computation of conditional Shapley values across different feature types, dependencies,
and distributions.
The package also includes evaluation metrics to compare various approaches.
With features like parallelized computations, convergence detection, progress updates, and extensive plotting options,
shapr is as a highly efficient and user-friendly tool, delivering precise estimates of conditional Shapley values,
which are critical for understanding how features truly contribute to predictions.
A basic example is provided below.
Otherwise we refer to the [pkgdown website](https://norskregnesentral.github.io/shapr/) and the vignettes there
for details and further examples.
## Installation
We highly recommend to install the development version of shapr (with the new explanation syntax and all functionality),
```{r, eval = FALSE}
remotes::install_github("NorskRegnesentral/shapr")
```
To also install all dependencies, use
```{r, eval = FALSE}
remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)
```
**The CRAN version of `shapr` (NOT RECOMMENDED) can be installed with**
```{r, eval = FALSE}
install.packages("shapr")
```
## Example
`shapr` supports computation of Shapley values with any predictive model which takes a set of numeric features and
produces a numeric outcome.
The following example shows how a simple `xgboost` model is trained using the *airquality* dataset, and how `shapr`
explains the individual predictions.
We first enable parallel computation and progress updates with the following code chunk.
These are optional, but recommended for improved performance and user friendliness,
particularly for problems with many features.
```{r init_no_eval,eval = FALSE}
# Enable parallel computation
# Requires the future and future_lapply packages
future::plan("multisession", workers = 2) # Increase the number of workers for increased performance with many features
# Enable progress updates of the v(S)-computations
# Requires the progressr package
progressr::handlers(global = TRUE)
handlers("cli") # Using the cli package as backend (recommended for the estimates of the remaining time)
```
Here comes the actual example
```{r basic_example, warning = FALSE}
library(xgboost)
library(shapr)
data("airquality")
data <- data.table::as.data.table(airquality)
data <- data[complete.cases(data), ]
x_var <- c("Solar.R", "Wind", "Temp", "Month")
y_var <- "Ozone"
ind_x_explain <- 1:6
x_train <- data[-ind_x_explain, ..x_var]
y_train <- data[-ind_x_explain, get(y_var)]
x_explain <- data[ind_x_explain, ..x_var]
# Looking at the dependence between the features
cor(x_train)
# Fitting a basic xgboost model to the training data
model <- xgboost(
data = as.matrix(x_train),
label = y_train,
nround = 20,
verbose = FALSE
)
# Specifying the phi_0, i.e. the expected prediction without any features
p0 <- mean(y_train)
# Computing the actual Shapley values with kernelSHAP accounting for feature dependence using
# the empirical (conditional) distribution approach with bandwidth parameter sigma = 0.1 (default)
explanation <- explain(
model = model,
x_explain = x_explain,
x_train = x_train,
approach = "empirical",
phi0 = p0
)
# Printing the Shapley values for the test data.
# For more information about the interpretation of the values in the table, see ?shapr::explain.
print(explanation$shapley_values_est)
# Finally we plot the resulting explanations
plot(explanation)
```
See the [vignette](https://norskregnesentral.github.io/shapr/articles/understanding_shapr.html) for further basic usage
examples.
## Contribution
All feedback and suggestions are very welcome. Details on how to contribute can be found
[here](https://norskregnesentral.github.io/shapr/CONTRIBUTING.html). If you have any questions or comments, feel
free to open an issue [here](https://github.com/NorskRegnesentral/shapr/issues).
Please note that the 'shapr' project is released with a
[Contributor Code of Conduct](https://norskregnesentral.github.io/shapr/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.
## References