Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the loss() method for the Logistic Regression example raises an IndexError #14

Open
brjhill opened this issue Apr 17, 2020 · 3 comments

Comments

@brjhill
Copy link

brjhill commented Apr 17, 2020

Thank you so much for making this module! We're using it for work on a genomics project. I was interested in calculating loss(X, c) (using the same variable names in the Logistic Regression example) but am getting this error:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 719 but corresponding boolean dimension is 720

While on the subject of losses, what is the difference between the .loss() method and the .losses_ attribute (I assume this is loss as a function of number of FISTA iterations until convergence)?

Thanks!

@yngvem
Copy link
Owner

yngvem commented Apr 17, 2020

Oh, to easily deal with the intercept, I simply padded the data matrix with a column of ones. As for the losses_ attribute, that is computed during training if the LogisticGroupLasso.LOG_LOSSES flag is set to True.

I will write some more about this later when I have the time. Though for best performance, I do recommend using the group lasso estimator in a pipeline like described in this example.

@brjhill
Copy link
Author

brjhill commented Apr 17, 2020

That works too, we're exploring both options (using it as an estimator and using it as a transformer/variable selection tool). In that case, can I still use the loss() method on the training data? What would be a working example of its use?

Also, I am assuming the unregularized loss function is the same as here (i.e. cross entropy loss) when LogisticGroupLasso.LOG_LOSSES is set to True?

@yngvem
Copy link
Owner

yngvem commented Feb 4, 2021

LOG_LOSSES specifies that the loss per iteration should be stored in a list, which is very useful for debugging.

For LogisticGroupLasso, I always use the overparametrised softmax formulation, which should be equivalent to the sigmoidal cross entropy loss in the binary classification problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants