-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spearman cor test #304
base: master
Are you sure you want to change the base?
Spearman cor test #304
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #304 +/- ##
==========================================
+ Coverage 93.75% 93.81% +0.06%
==========================================
Files 28 28
Lines 1729 1746 +17
==========================================
+ Hits 1621 1638 +17
Misses 108 108
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks mostly good!
I'm not sure which existing implementations could be used to check results. Have you looked at those mentioned by Wikipedia, as there are several which return p-values?
src/correlation.jl
Outdated
""" | ||
SpearmanCorrelationTest(x, y) | ||
|
||
Perform a t-test for the hypothesis that ``\\text{Cor}(x,y) = 0``, i.e. the rank-based Spearman correlation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break all lines at 92 chars like in the CorrelationTest
docstring. Also I would say "Spearman rank correlation" rather than "rank-based".
src/correlation.jl
Outdated
end | ||
end | ||
|
||
testname(p::SpearmanCorrelationTest) = "Spearman correlation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testname(p::SpearmanCorrelationTest) = "Spearman correlation" | |
testname(p::SpearmanCorrelationTest) = "Spearman correlation" |
test/correlation.jl
Outdated
let out = sprint(show, w) | ||
@test occursin("reject h_0", out) && !occursin("fail to", out) | ||
end | ||
# let ci = confint(w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
test/correlation.jl
Outdated
# @test first(ci) ≈ -0.1105478 atol=1e-6 | ||
# @test last(ci) ≈ 0.0336730 atol=1e-6 | ||
# end | ||
@test pvalue(x) ≈ 0.09275 atol=1e-2 # value from R's cor.test(..., method="spearman") which does not use a t test algorithm AS 89 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we find a more precise value to test against? This way of writing the test is misleading as even 0.09
would pass.
In the worst case, we should test with a lower tolerance against the value we return, and just note in a comment the value returned by R.
src/correlation.jl
Outdated
Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of: | ||
|
||
* small sample sizes n < 25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of: | |
* small sample sizes n < 25 | |
Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation, which performs insufficiently in the case of: | |
* sample sizes below 25 |
src/correlation.jl
Outdated
[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183. | ||
|
||
[2] A. J. Bishara and J. B. Hittner, “Confidence intervals for correlations when data are not normal,” Behav Res, vol. 49, no. 1, pp. 294–309, Feb. 2017, doi: 10.3758/s13428-016-0702-8. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/correlation.jl
Outdated
Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of: | ||
|
||
* small sample sizes n < 25 | ||
* a high true population Spearman correlation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the StackExchange thread this is more precisely:
* a high true population Spearman correlation | |
* a true population Spearman correlation above 0.95 |
It also mentions ordinal data. Did you omit it on purpose? I admit it's not super explicit.
src/correlation.jl
Outdated
In these cases a bootstrap confidence interval can perform better [2]. | ||
|
||
# External resources | ||
[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183. | |
[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating Pearson, Kendall and Spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183. |
Thanks for your detailed review! I tried to incorporate it as suggested. I used the Also mention now the ordinal data as it is the main message of the Ruscio paper (Now added to docstring). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. Looks almost ready. I've just made a few more comments.
Regarding comparison of CIs against R, I hadn't realized spearmanCI doesn't implement the same CI method. In that case I don't think it makes sense to test against these values, as there are mathematically legitimate reasons to get different results. Maybe you could check against code that isn't included in a package such as this one instead? Then if that matches we can just test against the exact values we return to prevent regressions.
dof(test) > 1 || return (-one(T), one(T)) # Otherwise we can get NaNs | ||
q = quantile(Normal(), 1 - (1-level) / 2) | ||
fisher = atanh(test.r) | ||
bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q # Estimates variance as in Bonnet et al. (2000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q # Estimates variance as in Bonnet et al. (2000) | |
# Estimates variance as in Bonett and Wright (2000) | |
bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q |
|
||
function population_param_of_interest(p::CorrelationTest) | ||
param = p.k != 0 ? "Partial correlation" : "Correlation" | ||
param = p.k != 0 ? "Partial Pearson correlation" : "Pearson correlation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that nobody says "partial Pearson correlation" even if that would sound more explicit.
param = p.k != 0 ? "Partial Pearson correlation" : "Pearson correlation" | |
param = p.k != 0 ? "Partial correlation" : "Pearson correlation" |
end | ||
|
||
function StatsAPI.confint(test::SpearmanCorrelationTest{T}, level::Float64=0.95) where T | ||
dof(test) > 1 || return (-one(T), one(T)) # Otherwise we can get NaNs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this case? Maybe also for other corner cases like having NaN
s or Inf
in the input.
Perform a t-test for the hypothesis that ``\\text{Cor}(x,y) = 0``, i.e. the Spearman rank | ||
correlation ρₛ of vectors `x` and `y` is zero. | ||
|
||
Implements `pvalue` for the t-test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implements `pvalue` for the t-test. | |
Implements `pvalue` for the t-test using the Fisher transformation. |
@test first(ci) ≈ -0.1333692 atol=1e-6 | ||
@test last(ci) ≈ 0.01065576 atol=1e-6 | ||
end | ||
@test pvalue(x) ≈ 0.09274721 atol=1e-2 # value from R's cor.test(..., method="spearman") which does not use a t test algorithm AS 89 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT R will use AS 89 if you pass exact=TRUE
, right? It would be good to test against an implementation which uses AS 89, even if we need to use another software.
Hello, how is this different from #53 ? |
@mapi1 thanks for the PR. Do you foresee you'll have the opportunity to move it forward? cheers |
This PR adds the
SpearmanCorrelationTest
as suggested in #236.For the confidence interval I took inspiration from this StackExchange thread and used the suggested variance estimator to counter the non-normal distribution of the ranks.
Unfortunately, I could not really add meaningful tests for it as R's
cor.test
does not give the intervals for Spearman correlation and uses another algorithm to calculate the p-value as well. Maybe someone has an idea here or knows a tool that can calculate this already correctly.