You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DALEX works fine with {tidymodels} when models are fit using the formula interface. When using a recipe in which some columns are dropped these still show up in the plots and tables, instead of not being part of the explanation.
Here's a reproducible example:
library(DALEX)
#> Welcome to DALEX (version: 1.2.1).#> Find examples and detailed introduction at: https://pbiecek.github.io/ema/#> Additional features will be available after installation of: ggpubr.#> Use 'install_dependencies()' to get all suggested dependencies
library(tidymodels)
#> ── Attaching packages ───────────────────────────────────── tidymodels 0.1.0 ──#> ✓ broom 0.5.6 ✓ recipes 0.1.12 #> ✓ dials 0.0.7 ✓ rsample 0.0.7 #> ✓ dplyr 1.0.0 ✓ tibble 3.0.1 #> ✓ ggplot2 3.3.2 ✓ tune 0.1.0 #> ✓ infer 0.5.2 ✓ workflows 0.1.1 #> ✓ parsnip 0.1.1.9000 ✓ yardstick 0.0.6 #> ✓ purrr 0.3.4#> ── Conflicts ──────────────────────────────────────── tidymodels_conflicts() ──#> x purrr::discard() masks scales::discard()#> x dplyr::explain() masks DALEX::explain()#> x dplyr::filter() masks stats::filter()#> x dplyr::lag() masks stats::lag()#> x recipes::step() masks stats::step()xgb_model<- boost_tree(mode="regression") %>%
set_engine(engine="xgboost")
xgb_recipe<- recipe(mpg~., mtcars) %>%
step_rm(am, gear)
xgb_workflow<-workflows::workflow() %>%
workflows::add_model(., xgb_model) %>%
workflows::add_recipe(., xgb_recipe)
fitted<-xgb_workflow %>%
fit(data=mtcars)
#> [22:26:43] WARNING: amalgamation/../src/objective/regression_obj.cu:170: reg:linear is now deprecated in favor of reg:squarederror.expl<-DALEX::explain(
fitted,
data=mtcars %>% select(-mpg),
y=mtcars$mpg,
predict_function=function(x, y){predict(x, new_data=y) %>% pull(.pred)})
#> Preparation of a new explainer is initiated#> -> model label : workflow ( �[33m default �[39m )#> -> data : 32 rows 10 cols #> -> target variable : 32 values #> -> model_info : package Model of class: workflow package unrecognized , ver. Unknown , task regression ( �[33m default �[39m ) #> -> predict function : function(x, y) { predict(x, new_data = y) %>% pull(.pred) } #> -> predicted values : numerical, min = 10.33316 , mean = 19.83897 , max = 31.99849 #> -> residual function : difference between y and yhat ( �[33m default �[39m )#> -> residuals : numerical, min = -0.3375872 , mean = 0.2516516 , max = 1.901513 #> �[32m A new explainer has been created! �[39m
variable_importance(expl, type="ratio")
#> variable mean_dropout_loss label#> 1 _full_model_ 1.000000 workflow#> 2 vs 1.000000 workflow#> 3 am 1.000000 workflow <<<<<- shouldn't be here#> 4 gear 1.000000 workflow <<<<<- shouldn't be here#> 5 carb 2.576819 workflow#> 6 drat 3.091860 workflow#> 7 qsec 6.172457 workflow#> 8 hp 9.013178 workflow#> 9 wt 32.111813 workflow#> 10 disp 41.519480 workflow#> 11 cyl 43.896437 workflow#> 12 _baseline_ 317.729359 workflow
plot(variable_attribution(expl, new_observation=mtcars[1,], type="break_down"))
I understand DALEX might not be targeting compatibility with the full {tidymodels} workflow, but maybe there's a way to make this work? I haven't looked at the source code yet, but I assume explain() takes all the columns from the data supplied, which are not the same columns after the recipe is applied during the prediction phase. Simply removing the columns from the data won't work, because the recipe expects them to be there.
The text was updated successfully, but these errors were encountered:
Thanks. DALEX treats the models like a black box, so it doesn't check what variables the model uses. Calculating the importance or attribution it checks each column given in the date in the explain() function.
If you know how to check which variables use model created with tidymodel I can add support for such models.
But for now API of tidymodels is changing very dynamically and I haven't found any information how to extract data used for model training from tidyverse model.
So, the easiest solution (and universal) is to filter out variables with the 'zero' contributions.
This way they will not appear on the plots. The easiest way to do this is with the filter function (the explanation is simple data frames).
DALEX works fine with
{tidymodels}
when models are fit using the formula interface. When using a recipe in which some columns are dropped these still show up in the plots and tables, instead of not being part of the explanation.Here's a reproducible example:
Created on 2020-06-30 by the reprex package (v0.3.0)
Session info
I understand DALEX might not be targeting compatibility with the full
{tidymodels}
workflow, but maybe there's a way to make this work? I haven't looked at the source code yet, but I assumeexplain()
takes all the columns from the data supplied, which are not the same columns after the recipe is applied during the prediction phase. Simply removing the columns from the data won't work, because the recipe expects them to be there.The text was updated successfully, but these errors were encountered: