Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add signrank distribution #173

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ArnoStrouwen
Copy link

Implementation of the signrank distribution.
The cdf is heavily optimized, since that is what is needed for the popular hypothesis test associated with this distribution.

The testing in this package seems quite involved, I could use some pointers on where to add the tests.

using StatsFuns
using Rmath

signrankpdf.(-2:12,4)
dsignrank.(-2:12,4,false)

signranklogpdf.(-2:12,4)
dsignrank.(-2:12,4,true)

signrankcdf.(-2:12,4)
psignrank.(-2:12,4,true,false)

signranklogcdf.(-2:12,4)
psignrank.(-2:12,4,true,true)

signrankccdf.(-2:12,4)
psignrank.(-2:12,4,false,false)

signranklogccdf.(-2:12,4)
psignrank.(-2:12,4,false,true)

signrankinvcdf.(-0.01:0.01:1.01,4)
qsignrank.(-0.01:0.01:1.01,4,true,false)
signrankinvcdf.(0.0624,4)
signrankinvcdf.(0.0625,4)
signrankinvcdf.(0.0626,4)
qsignrank.(0.0624,4,true,false)
qsignrank.(0.0625,4,true,false)
qsignrank.(0.0626,4,true,false)

signrankinvlogcdf.(log.(0.0:0.01:1.01),4)
qsignrank.(log.(0.0:0.01:1.01),4,true,true) # 0.0 does not match, is the R definition correct?
qsignrank.(log(0.0),4,true,true)
qsignrank.(0.0,4,true,false) 
signrankinvlogcdf.(log(0.0624),4)
signrankinvlogcdf.(log(0.0625),4)
signrankinvlogcdf.(log(0.0626),4)
qsignrank.(log(0.0624),4,true,true)
qsignrank.(log(0.0625),4,true,true)
qsignrank.(log(0.0626),4,true,true)


signrankinvccdf.(-0.01:0.01:1.01,4)
qsignrank.(-0.01:0.01:1.01,4,false,false) 
signrankinvccdf.(0.0624,4)
signrankinvccdf.(0.0625,4)
signrankinvccdf.(0.0626,4)
qsignrank.(0.0624,4,false,false)
qsignrank.(0.0625,4,false,false)
qsignrank.(0.0626,4,false,false)

signrankinvlogccdf.(log.(0.0:0.01:1.01),4)
qsignrank.(log.(0.0:0.01:1.01),4,false,true) # 0.0 does not match
signrankinvlogccdf.(log(0.0624),4)
signrankinvlogccdf.(log(0.0625),4)
signrankinvlogccdf.(log(0.0626),4)
qsignrank.(log(0.0624),4,false,true)
qsignrank.(log(0.0625),4,false,true)
qsignrank.(log(0.0626),4,false,true)

@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 0% with 55 lines in your changes missing coverage. Please review.

Project coverage is 57.97%. Comparing base (8f50565) to head (e63bf83).

Files with missing lines Patch % Lines
src/distrs/signrank.jl 0.00% 55 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (8f50565) and HEAD (e63bf83). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (8f50565) HEAD (e63bf83)
12 10
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #173      +/-   ##
==========================================
- Coverage   62.99%   57.97%   -5.03%     
==========================================
  Files          14       15       +1     
  Lines         635      690      +55     
==========================================
  Hits          400      400              
- Misses        235      290      +55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andreasnoack
Copy link
Member

I'd try to add it to test/rmath.jl` similarly to

StatsFuns.jl/test/rmath.jl

Lines 233 to 241 in 8f50565

rmathcomp_tests("chisq", [
((1,), 0.0:0.1:8.0),
((4,), 0.0:0.1:8.0),
((9,), 0.0:0.1:8.0),
((9,), 0:8),
((1,), 0f0:0.1f0:8f0),
((1,), Float16(0):Float16(0.1):Float16(8)),
((9,), 0//1:8//1),
])
to compare with the Rmath results.

@ArnoStrouwen
Copy link
Author

Some differences with R remain:
How minus infinity is handled in invlogcdf, in my opinion is it correct to return zero here and not NaN.

julia> signrankinvcdf(4, 0.0)
0.0
julia> signrankinvlogcdf(4, log(0))
0.0
julia> qsignrank.(0.0,4,true,false) 
0.0
julia> qsignrank.(log(0.0),4,true,true)
NaN

Rounding in the quantiles is also an issue:

julia> qsignrank.(-2.1,4,true,true)
1.0
julia> qsignrank.(-2.0794415416798357,4,true,true)
1.0
julia> qsignrank.(-2.0,4,true,true)
2.0

julia> signrankinvlogcdf(4,-2.1)
1.0
julia> signrankinvlogcdf(4,-2.0794415416798357)
2.0
julia> signrankinvlogcdf(4,-2.0)
2.0

The true cdf jumps at 0.125, but that value does not roundtrip with exp/log.

julia> exp(-2.0794415416798357)
0.12500000000000003
julia> log(0.125)
-2.0794415416798357
julia> exp(log(0.125))
0.12500000000000003

@ArnoStrouwen
Copy link
Author

ArnoStrouwen commented Feb 23, 2025

I improved rounding by getting rid of all exp in my code.
Still, some rounding differences with R remain for invlogccdf, which I don't know how to solve.
https://github.com/JuliaStats/Rmath-julia/blob/6f2d37ff112914d65559bc3e0035b325c11cf361/src/dpq.h#L52-L53

end

function signrankinvlogccdf(n::Int, logp::Float64)
signrankinvlogcdf(n, log1mexp(logp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you might be able to match for this one if you instead of relying on log1mexp then do a search similarly to signrankinvlogcdf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants