Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add crossval function that allows for k-fold cross validation of a machine learning model. #22

Open
pchest opened this issue Apr 30, 2020 · 9 comments

Comments

@pchest
Copy link
Collaborator

pchest commented Apr 30, 2020

The idea would be to create a function crossval(x, ...) that takes a machine learning model as an input and allows users to evaluate the model's performance across k splits of an evaluation data set.

@stefan-mueller
Copy link
Contributor

The quanteda.classifiers package contains the functions crossval() and performance() which allow for straightforward k-fold cross-validation of textmodel_nb() and textmodel_svm(). While these models are included in quanteda.textmodels, functions for cross-validation are missing in the package.

Would it make sense to add these functions to quanteda.textmodels? This will allow users to validate their models without having to install the development package quanteda.classifiers (which also imports keras)?

@pchest
Copy link
Collaborator Author

pchest commented Jul 13, 2022

This is a good suggestion. So long as Ken is onboard, I'll be happy to port them over.

@kbenoit
Copy link
Contributor

kbenoit commented Jul 14, 2022

I wonder if this is the best approach, or whether an integration into the new(er) tidymodels framework would be the better way to proceed.

@pchest
Copy link
Collaborator Author

pchest commented Jul 14, 2022

Certainly, integrating into the tidymodels framework could help us to reach a larger audience. Just to clarify, the idea would be to make quanteda.textmodel functions compatible with the tidymodels cross-validation workflow (like this example) rather than duplicating our efforts, correct?

@kbenoit
Copy link
Contributor

kbenoit commented Jul 15, 2022

yes exactly - but in a way that extends quanteda.textmodels rather than requiring any new package.

@pchest
Copy link
Collaborator Author

pchest commented Jul 17, 2022

Got it. I've cloned tidymodels to better understand how their functions work relative to quanteda. The functions that seem to be good starting points for improving compatibility are fit and fit_resamples, as they are crucial to the tidymodels k-fold cross-validation workflow.

@EmilHvitfeldt
Copy link

Hey 👋 just wanted to chime in to say that I'm here to help/answer questions related to any tidymodels effort :)

@pchest
Copy link
Collaborator Author

pchest commented Jul 18, 2022

@EmilHvitfeldt Hey! I'm glad to hear you are interested in our project. The objective would be to make quanteda.textmodel functions compatible with tidymodel model validation functions. For instance, we'd something like the following to work:


folds <- vfold_cv(data, v = 10)

nb_mod <- textmodel_nb()
nb_val <- nb_mod %>% 
    fit_resamples(folds)

collect_metrics(nb_val)

I'm examining both packages to see what changes would be needed to make this happen. Any suggestions would be welcome!

@jblumenau
Copy link

Hi all,

Just wondering whether anything came of this idea? I am teaching with quanteda.textmodels this term and had been wondering whether there was a native quanteda cross-validation function for, e.g., textmodel_nb. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants