-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make view(::AbstractWeights, ...)
return an AbstractWeights
#723
base: master
Are you sure you want to change the base?
Conversation
This is necessary to preserve the information regarding the type of weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and makes sense.
I totally forgot that issue.
I assume the math for this kind of slicing already have been checked when getindex
for this kind of slicing was permitted?
Looking at https://github.com/JuliaStats/StatsBase.jl/blob/2faa6e80b7966b915086d8cd5a4a4d89a2126db5/src/moments.jl
I am not sure that this kind of slicing should be allowed for ProbabilityWeight
s ?
Is it still a valid ProbabilityWeight
s if you slice it in this way?
But I am not at all an expert on this math; where as I assume you are.
So if you've thought it through then this should be all good.
Yes it's fine for frequency weights and analytic weights. For probability weights, it's a complex matter, but it's better to allow people to use weights for a subsample than to completely disallow it (otherwise you wouldn't be able to use them in the presence of missing values at all). |
What functions we provide are If I understand things correctly |
Essentially
Well that's right if you take a random subsample, but if you select a nonrandom subsample (which is the most common case) weights will not be exactly correct IIUC. But in practice it's better to use somewhat incorrect weights than no weights at all (often the difference won't be that large). Also, if you take a subsample based on a variable which was used as a strata to construct the weights then the result will be OK. The correct way to handle this in general is to use software designed to take into account complex survey designs (like design in R or svy in Stata). See for example https://www.restore.ac.uk/PEAS/subgroups.php. |
We also need to take care of the fact that
as this expression drops a dimension, so |
Relatedly |
Maybe |
As discussed on Slack, it turns out that this PR in its current state has the problem that mutating the view will corrupt the parent, as its sum won't be updated. In addition to being dangerous, technically, this is breaking even though it's not very likely that people rely on it. The only solution I see to avoid breaking something is to have We could consider making weight vectors immutable (again) in the next breaking release to simplify this. |
I've pushed a commit to make I also added another commit making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some comments.
I've adapted testsets so that they loop over views of weights for all tests that cover weights. This makes the code more complex but that's probably worth it if we want to fully support views of weights. |
Ah, |
Do you want me to review now, or should I wait until you work on this? |
As you prefer. The change to support |
{S <: Real, W <: AbstractWeights{S}} | ||
@boundscheck checkbounds(wv, inds...) | ||
@inbounds v = invoke(view, Tuple{AbstractArray, Vararg{Any}}, wv, inds...) | ||
weightstype(W){S, eltype(wv), typeof(v)}(v, missing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weightstype(W){S, eltype(wv), typeof(v)}(v, missing) | |
return weightstype(W){S, eltype(wv), typeof(v)}(v, missing) |
@inbounds invoke(view, Tuple{AbstractArray, Vararg{Any}}, wv, inds...) | ||
end | ||
|
||
# Always recompute the sum for views of AbstractWeights, as we cannot know whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move the definitions of sum
and copy
into one place to make sure the reader can see both definitions side by side?
OK. I have commented. I think it is a good idea to use a separate |
Maybe the only comment is that it would be even safer and a bit faster (but it would complicate code so I am not sure it is worth it) to instead of having |
Actually, I wonder whether it wouldn't be better to keep returning |
As you prefer. Still we would need to make sure the view is one dimensional, as currently one can do:
|
Yes, we would only change signatures to accept |
Yes, then |
This is necessary to preserve the information regarding the type of weights.
Fixes #719 and #561.
Cc: @bkamins, @oxinabox