Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it make sense to work directly with Tidymodel objects of class model_fit #236

Open
davidkane9 opened this issue Aug 22, 2024 · 12 comments

Comments

@davidkane9
Copy link
Contributor

Consider:

library(tidymodels)
library(equatiomatic)
#> 
#> Attaching package: 'equatiomatic'
#> The following object is masked from 'package:modeldata':
#> 
#>     penguins

mod <- linear_reg() |> 
  fit(mpg ~ hp + drat, data = mtcars)

extract_eq(mod$fit) # works

$$ \operatorname{mpg} = \alpha + \beta_{1}(\operatorname{hp}) + \beta_{2}(\operatorname{drat}) + \epsilon $$

extract_eq(mod) # fails
#> Error in UseMethod("extract_lhs", model): no applicable method for 'extract_lhs' applied to an object of class "c('_lm', 'model_fit')"

Created on 2024-08-22 with reprex v2.1.0

There is no bug here. But things would be easier for my students if extract_eq() were to accept mod, or any object resulting from fitting a tidymodel. This seems like it should be an easy fix: just check the class at the start, and pull out the fitted object if it is class model_fit.

I can submit a PR if this seems sensible.

@phgrosjean
Copy link
Collaborator

Yes, it make sense and you can propose a pull request... but you must take care to avoid increasing the list of (indirect) package dependencies. We try to keep this list as low as possible for obvious reasons. I am not in favor of even a suggests to {tidymodels} in DESCRIPTION, due to the very large number of indirect hard dependencies (95 packages for {tidymodels}), even if some are already dependencies of {broom} obviously.

nrow(pak::pkg_deps("tidymodels"))
## [1] 95                                                                      
nrow(pak::pkg_deps("broom"))
## [1] 23

So you should basically provide a method for the generic extract_eq() for class model_fit, which is easy to do, but without referencing {tidymodels}, even in the examples ! Not easy...

@davidkane9
Copy link
Contributor Author

Thanks for the feedback! My testing suggests that all that is needed is to add these lines at the top of extract_eq.default.

  if (inherits(model, c("model_fit", "workflow"))) {
    if ("fit" %in% names(model)) {
      model <- model$fit
    }
  }

At the least, this solves my problem, keeps all the current tests passing, and does not require any new packages. Would something like this be OK for the PR, or do we really need a new method?

@phgrosjean
Copy link
Collaborator

No, not really. You have to write a proper method model_fit for extract_eq().

@davidkane9
Copy link
Contributor Author

#237 submitted. Thanks for the guidance!

@datalorax
Copy link
Owner

Yes, it make sense and you can propose a pull request... but you must take care to avoid increasing the list of (indirect) package dependencies. We try to keep this list as low as possible for obvious reasons. I am not in favor of even a suggests to {tidymodels} in DESCRIPTION, due to the very large number of indirect hard dependencies (95 packages for {tidymodels}), even if some are already dependencies of {broom} obviously.

nrow(pak::pkg_deps("tidymodels"))
## [1] 95                                                                      
nrow(pak::pkg_deps("broom"))
## [1] 23

So you should basically provide a method for the generic extract_eq() for class model_fit, which is easy to do, but without referencing {tidymodels}, even in the examples ! Not easy...

Related.... #152. Not sure if you want to take that on at any point @phgrosjean. I haven't had too many problems with the dependencies from broom or broom.mixed, but that could help.

@phgrosjean
Copy link
Collaborator

@datalorax Yes, but #152 needs a lot more work and testing that I can do for now. This one is easier.

@datalorax
Copy link
Owner

Yes, 100% agree.

@phgrosjean
Copy link
Collaborator

I have made a workflow method too. I did a couple of tests. When {broom.mixed} is required, there is an error if it is not loaded:

> library(parsnip)
> library(equatiomatic)
> mod <- linear_reg() |>
+   set_engine("stan") |>
+   fit(mpg ~ hp + drat, data = mtcars)
> extract_eq(mod) # Error
Error in warn_on_stanreg(x) : 
  The supplied model object seems to be outputted from the rstanarm package. Tidiers for mixed model output now live in the broom.mixed package.
> library(broom.mixed) # Required for the model used
> extract_eq(mod) # Works
$$
E( \operatorname{mpg} ) = \alpha + \beta_{1}(\operatorname{hp}) + \beta_{2}(\operatorname{drat})
$$

The example on the {workflows} main page does not work with the stan engine, but it works with base lm() model when there is a preprocessing step (here using step_ns()):

> library(parsnip)
> library(workflows)
> library(broom.mixed) # Required for the model used, or extract_eq() will choke!
> library(equatiomatic)
> # This is the example in {workflows} main page:
> spline_cars <- recipe(mpg ~ ., data = mtcars) |>
+   step_ns(disp, deg_free = 10)
> bayes_lm <- linear_reg() |>
+   set_engine("stan")
> # To use these, you would generally run:
> spline_cars_prepped <- prep(spline_cars, mtcars)
> bayes_lm_fit <- fit(bayes_lm, mpg ~ ., data = juice(spline_cars_prepped))
> # This one does not works with {equatiomatic}:
> extract_eq(bayes_lm_fit)
Error in terms.formula(formula(model)) : 
  '.' in formula and no 'data' argument

Now, if I use lm() instead, it works:

> linear_lm <- linear_reg()
> spline_cars_prepped <- prep(spline_cars, mtcars)
> linear_lm_fit <- fit(linear_lm, mpg ~ ., data = juice(spline_cars_prepped))
> # This one works with {equatiomatic}:
> extract_eq(linear_lm_fit)
$$
\operatorname{mpg} = \alpha + \beta_{1}(\operatorname{cyl}) + \beta_{2}(\operatorname{hp}) + \beta_{3}(\operatorname{drat}) + \beta_{4}(\operatorname{wt}) + \beta_{5}(\operatorname{qsec}) + \beta_{6}(\operatorname{vs}) + \beta_{7}(\operatorname{am}) + \beta_{8}(\operatorname{gear}) + \beta_{9}(\operatorname{carb}) + \beta_{10}(\operatorname{disp\_ns\_01}) + \beta_{11}(\operatorname{disp\_ns\_02}) + \beta_{12}(\operatorname{disp\_ns\_03}) + \beta_{13}(\operatorname{disp\_ns\_04}) + \beta_{14}(\operatorname{disp\_ns\_05}) + \beta_{15}(\operatorname{disp\_ns\_06}) + \beta_{16}(\operatorname{disp\_ns\_07}) + \beta_{17}(\operatorname{disp\_ns\_08}) + \beta_{18}(\operatorname{disp\_ns\_09}) + \beta_{19}(\operatorname{disp\_ns\_10}) + \epsilon
$$

Now, the workflow version also works:

> car_wflow <- workflow() |>
+     add_recipe(spline_cars) |>
+     add_model(linear_lm)
> # Now you can prepare the recipe and estimate the model via a single call to fit():
> car_wflow_fit <- fit(car_wflow, data = mtcars)
> # and extract_eq() works too:
> extract_eq(car_wflow_fit)
$$
\operatorname{..y} = \alpha + \beta_{1}(\operatorname{cyl}) + \beta_{2}(\operatorname{hp}) + \beta_{3}(\operatorname{drat}) + \beta_{4}(\operatorname{wt}) + \beta_{5}(\operatorname{qsec}) + \beta_{6}(\operatorname{vs}) + \beta_{7}(\operatorname{am}) + \beta_{8}(\operatorname{gear}) + \beta_{9}(\operatorname{carb}) + \beta_{10}(\operatorname{disp\_ns\_01}) + \beta_{11}(\operatorname{disp\_ns\_02}) + \beta_{12}(\operatorname{disp\_ns\_03}) + \beta_{13}(\operatorname{disp\_ns\_04}) + \beta_{14}(\operatorname{disp\_ns\_05}) + \beta_{15}(\operatorname{disp\_ns\_06}) + \beta_{16}(\operatorname{disp\_ns\_07}) + \beta_{17}(\operatorname{disp\_ns\_08}) + \beta_{18}(\operatorname{disp\_ns\_09}) + \beta_{19}(\operatorname{disp\_ns\_10}) + \epsilon
$$

I am pretty sure there are many other {tidymodels} model_fit, workflow objects, or other that do not work properly with {equatiomatic}... you mileage will vary!

@phgrosjean
Copy link
Collaborator

@datalorax @davidkane9 I am waiting for your possible comments or suggestions before I submit the new version of {equationmatic} to CRAN.

@datalorax
Copy link
Owner

Thanks @phgrosjean, I am confused why explicit calls to broom.mixed is needed when it's already an import. Is there a way to avoid that?

It does not surprise me that many of the models won't work. We could go about trying to make most of them work but that also feels like a significant undertaking. So basically I'm fine with whatever you want to do.

@phgrosjean
Copy link
Collaborator

Yes, it is not clear why library(broom.mixed) solved the error in the example. I don't know either.

@davidkane9
Copy link
Contributor Author

Apologies for not responding! This solves my problem. Please release a new version to CRAN, whenever convenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants