-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ShrunkCovariance #309
Conversation
Co-authored-by: Mateusz Sluszniak <[email protected]>
Co-authored-by: José Valim <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nitpicks.
{x - location, location} | ||
end | ||
|
||
defn empirical_covariance(x) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use Nx.covariance/2
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, look at the PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I forgot to ask: @norm4nn did you try setting ddof: 0
when calling Nx.covariance/2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, ddof: 0
is default. It's strange that the tests are failing, because Nx.covariance/2
does exactly what is implemented here. Could it be the case that the data in your test is not centered and you are using Nx.covariance/2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right! This is the case, I will fix this on Thursday.
Co-authored-by: Paulo Valente <[email protected]>
Co-authored-by: Krsto Proroković <[email protected]>
@@ -182,9 +149,9 @@ defmodule Scholar.Covariance.LedoitWolf do | |||
|
|||
defnp ledoit_wolf_shrinkage_complex(x) do | |||
{num_samples, num_features} = Nx.shape(x) | |||
emp_cov = empirical_covariance(x) | |||
emp_cov = Nx.covariance(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is x
centered here? If yes, maybe reverting it to empirical_covariance
might be better. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think empirical_covariance/1
was centering x,
if that's what you're asking about. However, I'm still a bit confused as to why, when I used empirical_covariance/1
instead of Nx.covariance/2,
the Covariance.*.fit/2
functions produced the same output as the scikit-learn versions when I passed a non-centered x
and set assume_centered?
to true.
While I feel this case is invalid and can't think of any practical use case for passing such arguments, it still feels odd that the output in this edge case differs when using Nx.covariance/2
compared to the scikit-learn version. It seems empirical_covariance
might not operate exactly the same way as Nx.covariance/2
, so I also like the idea of reverting it to empirical_covariance
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lemme have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need
emp_cov = Nx.covariance(x) | |
emp_cov = Nx.covariance(x, Nx.reshape(0, {1})) |
for the code to be totally equivalent.
empirical_covariance was blindly assuming the mean is 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empirical_covariance was blindly assuming the mean is 0
Yes, because the data should be centered before. That is why using empirical_covariance
might be better idea. Like this the data won't be centered twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you want me to change it back to empirical_covariance
version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed response! I am still checking why empirical_covariance
and Nx.covariance
are giving completely different results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@norm4nn let's revert to empirical_covariance
for now so we can merge this and get unstuck. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done
4297069
to
f72d08f
Compare
💚 💙 💜 💛 ❤️ |
What has been added:
Following up on #304: I added the
?
sign to theassume_centered
option, but I'm unsure about replacingempirical_covariance/1
withNx.covariance/3
because tests don’t pass when I make that change. I believe the tests should stay as they are, since I’ve already verified the results against scikit-learn.