-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subset
and merge
for VarInfo
(clean version)
#544
Conversation
subset
and merge
for VarInfo
subset
and merge
for VarInfo
(clean version)
src/varinfo.jl
Outdated
""" | ||
merge(varinfo_left::VarInfo, varinfo_right::VarInfo) | ||
|
||
Merge two `VarInfo` instances into one, giving precedence to `varinfo_right` when reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add some more docs for this a bit later today
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #544 +/- ##
==========================================
+ Coverage 81.51% 82.76% +1.24%
==========================================
Files 25 25
Lines 3003 3180 +177
==========================================
+ Hits 2448 2632 +184
+ Misses 555 548 -7
☔ View full report in Codecov by Sentry. |
Do any of you want to take a look at this @devmotion @yebai ? |
I'll take a look later today. @sunxd3 can you also help review this PR? |
end | ||
|
||
function subset(metadata::Metadata, vns::AbstractVector{<:VarName}) | ||
# TODO: Should we error if `vns` contains a variable that is not in `metadata`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least a warning?
src/varinfo.jl
Outdated
metadata = merge_metadata(varinfo_left.metadata, varinfo_right.metadata) | ||
lp = getlogp(varinfo_left) + getlogp(varinfo_right) | ||
# TODO: Is this really the way we want to combine `num_produce`? | ||
num_produce = varinfo_left.num_produce[] + varinfo_right.num_produce[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me the use of num_produce
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically, taking the sum of two num_produce
produces no meaning. I think we should throw an error if both num_produce
are non-zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_produce
is used by the particle samplers to indicate how many observations we've seen / what's the current "observation index".
Semantically, taking the sum of two num_produce produces no meaning. I think we should throw an error if both num_produce are non-zero.
I don't think that's quite right. It makes sense if you go "execute model A and then separately execute model B, and now we want to merge the two"; to ensure that the observation index is correctly, we need to add them, no?
Is this a scenario we want to consider? Probably not. But to me it wasn't obvious what else to do.
No matter, I don't think erroring if num_produce
is non-zero is the right way to go. We should allow something like
varinfo_with_num_produce = last(evaluate!!(model, varinfo, context))
merge(varinfo_with_num_produce, varinfo)
If there's no meaning in adding them, then I suggest the alternative is to just give precedence to varinfo_right
, as is the semantics of merge
(if a field/property/key is present in both left
and right
, then the value in right
takes precedence).
EDIT: It's fair to ask if we should also do that for logp
. I went with sum because I was imagining scenarios where we want to do something like execute two (sub-)models in parallel and then merge the resulting varinfos. In such a scenario, we want to add the logp
fields together. But we could maybe make this a separate function, e.g. merge_with_add_logp
or something (preferably named in a better way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what should we do here? Give precedence to varinfo_right
or leave as it is?
Looks good to me, but I am not very familiar with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a lot of additional code to support VarInfo
. Is there a plan to drop VarInfo
in favour of SimpleVarInfo
at some point?
We have nearly everything we need in SimpleVarInfo
now. So, the marginal benefit of keeping both VarInfo
and SimpleVarInfo
is diminishing.
src/varinfo.jl
Outdated
metadata = merge_metadata(varinfo_left.metadata, varinfo_right.metadata) | ||
lp = getlogp(varinfo_left) + getlogp(varinfo_right) | ||
# TODO: Is this really the way we want to combine `num_produce`? | ||
num_produce = varinfo_left.num_produce[] + varinfo_right.num_produce[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically, taking the sum of two num_produce
produces no meaning. I think we should throw an error if both num_produce
are non-zero.
@@ -1331,6 +1669,15 @@ function setorder!(vi::VarInfo, vn::VarName, index::Int) | |||
return vi | |||
end | |||
|
|||
""" | |||
getorder(vi::VarInfo, vn::VarName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this API -- we can consider depreciating num_produce
in favour of getorder
in the longer run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. Isn't num_produce
and order
different things? At least they are two different fields in VarInfo
. I I just added a getorder
method because I've been trying to make the interface of VarInfo
simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_produce
is used to track the current observation index (for all vn
s), and its current value inserted to VarInfo.metadata
when a new vn
is created.
We don't though 😕 Until that has been done, we need to continue supporting |
Yes, I liked the |
Agreed, but that will take time; it's non-trivial to get right IMO. |
# the `eltype` to `VarName`? This might be useful when someone does `[@varname(x[1]), @varname(m)]` which | ||
# might result in a `Vector{Any}`. | ||
""" | ||
subset(varinfo::AbstractVarInfo, vns::AbstractVector{<:VarName}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would not already have so many getindex
methods, I would have thought that getindex
would be a natural name for this function. But maybe it's still an option?
Then we could have getindex(::AbstractVarInfo, ::AbstractVector{<:VarName}) -> AbstractVarInfo
and getindex(::T, ::VarName) -> typeof_varname_variate
, similar to [1,2,3][[1,3]] = [1, 3]
and [1,2,3][2] = 2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like this yes, but I also really don't want to touch getindex
in this codebase 😅
Happy to make this a long-term goal or something though!
src/abstract_varinfo.jl
Outdated
See docstring of [`subset(varinfo, vns)`](@ref) for examples. | ||
""" | ||
function Base.merge(varinfo::AbstractVarInfo, varinfo_others::AbstractVarInfo...) | ||
return merge(Base.merge(varinfo, first(varinfo_others)), Base.tail(varinfo_others)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that length(varinfo_others) == 0
, and then the function would error it seems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah true; I'll address this.
I meant to add tests for merge
with more than two inputs too, but didn't get around to it. Will do that too.
Okay, so I think this is now properly ready for a potential approval (after comments have been resolved) |
Pull Request Test Coverage Report for Build 6518813946
💛 - Coveralls |
Okay, so I added quite a bit of changes to ensure that |
Test coverage should now be improved 👍 |
This is ready for a look-over now btw. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @torfjelde -- it looks good to me!
@devmotion had some open comments here, so I really would have preferred to have his 👍 on this PR too |
I was thinking that #542 would be a quick merge, and so I just based #543 on #542 so I could easily test the new
Gibbs
in Turing.jl, buuut it seems that wasn't as quick as I thought and so this PR replaces #543 without any dependence on #542.Hence: see #543 for examples and some comments