-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with highly imbalanced data in Autosklearn #1164
Comments
Hey @ShirinNajdi thanks a lot for your interest. Unfortunately, the underlying issues in scikit-learn are still there: scikit-learn/scikit-learn#3855 and scikit-learn/scikit-learn#9630 We'll re-evaluate whether we can use the imbalanced learn extension to provide SMOTE in Auto-sklearn. |
Thank you for the quick reply. Similar situation can be tackled with imblearn.pipeline when using Bayesian hyper-parameter optimisation with SMOTE. The idea was transforming training data with SMOTE and keeping test data untouched. I would be happy to know your idea. |
The fact that transform needs to behave different at fit and at predict time is one issue. The other issue is that scikit-learn does not support changing the targets, so one cannot add new training samples in a pipeline. However, one would like to use SMOTE in the middle of a pipeline, after scaling and one hot encoding, but before using the classifier. One could indeed use imblearn.pipeline, but we did not have the time to look into whether we can make use of that library. |
@mfeurer Would you be open to a pull request to implement the SMOTE method into Auto-Sklearn? Cheers. |
Do you mean by integrating imblearn? |
Closing this for now as it is an issue in scikit-learn. We can reassess this once scikit-learn allows changing the number of data points in a pipeline. |
@mfeurer Sorry for the very late response. I believe that when I made my comment, I did not thoroughly review each comment and G. issues shared. It is understandable why this could not have been done at the time. Meanwhile, @ShirinNajdi Check out https://github.com/prabhant/gama/tree/imblearn. Otherwise, check out AMLTK and create your own search space: https://github.com/automl/amltk. can also be done using GAMA. However, a PR should be available by the end of the summer to facilitate all of this, so I believe AMLTK is the better way to go now but for you to expend you search, look into: https://github.com/openml-labs/gama |
Hi,
In view of the issue #113, I would like to know if there is an update regarding including SMOTE in Autosklearn package?
The text was updated successfully, but these errors were encountered: