-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve sklearn compatibility #2
Comments
Thanks a lot for your work! I'll definitely look further into it once the summer starts and I'm done grading exams.
I am not entirely sure that we get interpretable probabilities when we minimise the MSE loss instead of the cross-entropy (CE) loss. I implemented group lasso as a regression algorithm, not a classification algorithm so I think the right thing to do here is to move everything but the gradient computation in a base class and have one subclass for It has been a long-term goal to implement a solver for the CE loss, but I wanted a working least-squares algorithm first because that is easier to implement. If we want to implement a group-lasso regularised CE solver, then we need to find an upper bound of the Lipschitz coefficient for the gradient of the cross-entropy loss. I'll do some maths on the train back from work tomorrow and see how difficult it will be to implement. Alternatively, there are line-search methods that should work, but these are more complex than simply computing the Lipschitz coefficient.
Thank you for this, it was on my todo list, but I hadn't come to it yet.
This is also a purely classification feature.
Your solution is much better! I don't even remember why I chose the format that I did.
I agree, this should not be difficult to implement.
Multi-class group lasso is not super trivial as it is unclear whether we want sparsity across the different labels (i.e. do we accept that a different set of groups are used to check if the output is a cat than if the output is a dog). It should probably be a flag to control this behaviour.
Yes! This will, however, be much work for me as I haven't worked much with sparse matrices in Python.
There should be, we might want to override init as well to set the groups. Again, I didn't do this since I used a regression model as a base. |
Whoops, I was so focused on classification (I encountered group lasso always in this context) that I misinterpreted your estimator as a (work-in-progress) classifier. Sorry about this! Some of my comments certainly don't make immediately sense, then. I should've made the effort to actually understand what optimization problem is solved in
Splitting the functionality into two classes is probably simpler/better because of the different interfaces for classifiers and regressors. Regarding the extra features: one thing after the other :) Me too, I don't have a lot of experience with sparse matrices. For the classifier with multi-class, one could get inspiration from
and elsewhere it is written:
|
You can actually use linear regression models for classification problems. This is done frequently in chemometrics, where you often have more features than samples. PLSR (which lies somewhere between linear regression and PCA) is therefore used to predict dummy encoded variables. However, when people use linear functions for classification problems, then we cannot directly interpret their outputs as probabilities, but rather as a scoring. I'd recommend encoding your variables as +1/-1 pairs and using group lasso to predict on these. Once you've selected your variables, you can use them to fit a logistic model as well if you want the probabilities. Otherwise, just set a hard threshold at 0. This rambling made me realise that we might want to use group lasso as a transformer, returning the dataset after removing unused variables for it to fit nicely into a pipeline. After a quick google search for logistic group lasso, I found this paper. However, their algorithm is block coordinate descent which is famously slow (orders of magnitude slower than FISTA). |
I am also interested in group lasso for logistic regression. Do I read this correctly that you recommend the following pipeline:
If that's right, why do we need step 1? Why not just run the regression with |
Hi, I am sorry for the late response. My suggestions above were for an old version of the codebase, I have now implemented logistic (2 class) and multinomial (multi-class) group lasso. As to the +-1 bit, it is irrelevant now that I have implemented intercepts. Also, I aim on improving the documentation later this year (maybe next year), but that will have to wait until my teaching workload has reduced and my tendonitis is better. I hope this was at least somewhat helpful. |
All issues posted here are moved to separate issues. The only issue left is to inherit from the corresponding class in scikit-learn (issue #4) |
Great work! Thank you Yngve!!! Will have a look at it in the next couple of days. I'm reawakening a project that might benefit from your tool. |
Thank you! Just one note. If you updated the past week, then you should update once more because I introduced a bug in the logistic regression class for two-class problems. This is fixed in v1.3.1. |
Because I need better sklearn support for my needs, I started to look at your implementation and punctually improved a couple of things.
You may want to have a look at my fork here: https://github.com/normanius/group-lasso/tree/normanius/sklearn-compliance I don't ask for a pull request yet because it is work-in-progress. Happy to contribute a bit more if I find the time.
I made use
check_estimator
and mixins (BaseEstimator
,ClassifierMixin
,LinearClassifierMixin
) to check how compliant the estimator is with sklearn. (See also the sklearn contribution guidelines.) Currently,GroupLasso
is not fully compliant withClassifierMixin
nor withLinearClassifierMixin
because it lacks support for multi-class classification and sparse data. The mixins come also with their tests.I assembled a list of improvements that I've partially realized already. Feel free to use the contribution. Because I am not very familiar with the algorithmic aspects of group lasso, someone definitely should have a closer look at the changes.
predict_proba
anddecision_function
are missing.predict
does not return class predictions but scores. (My fix is temporary, functionspredict
anddecision_function
are copied fromLinearClassifierMixin
- they should be removed onceGroupLasso
inheritsLinearClassifierMixin
)fit()
. This is becauseGroupLasso
uses random coefficient initialization. There are sklearn guidelines for the handling of random states.y
are not necessarily numeric. In particular,grad()
makes use of the assumption thaty
takes numerical values. In my fix, I employ LabelEncoder to converty
into a numeric vector. But not entirely sure if this is the canonical way to do it. Furthermore, there is no multi-class support at the moment. Have a look atLogisticRegression
to see how multi-class is handled there.groups = [3,1,1,2,3]
. If the features must be organized specially (e.g. ordering of the features such that groups are[1,1,2,3,3]
), it can be done under the hood. I added_check_and_format_groups()
to convert the group assignment into the original list-of-tuples format - though I haven't checked if this can optimized/simplified any further.self.intercept_ = 0
- but it is not computed at the moment.LogisticRegression
and only overridefit()
?There is certainly more to do. But this is what I assembled so far.
The text was updated successfully, but these errors were encountered: