-
-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Move Dataset and InferenceData into module (#213)
* Create InferenceObjects module * Reorganize files into submodule * Import module * Don't use arviz_version() * Remove reference to doc_str * Collect all Python interop in xarray.jl * Move base conversion functions to InferenceObjects * Move utils to InferenceObjects * Remove unnecessary method * Add missing imports * Explicitly list schema group names * Update imports * Add back missing import * Split dataset tests * Split InferenceData tests * Load module directly * Separate out dataset conversion tests * Separate InferenceData conversion tests * Remove reference to ArviZ * Test package_version * Source helpers for now * Move default_var_name * Add back removed conversion methods * Move PyCall conversion tests * Remove requirement to have arviz_version attribute * Don't test for arviz_version * Test an actual package * Refer to correct function * Don't load non-existent file * Run formatter * Increment version number * Run formatter * Move merge to InferenceObjects * Reorganize docs * Write example that doesn't require example data * Collect dimension-related code * Rename function * Rearrange tests * Reorganize code * Set default coordinates to axis of underlying array * Use broadcasting and add type annotation * Add dimension tests * Add missing variable declarations * Don't import internal functions * Fix tests * Add missing tests * Point to LookupArrays * Add OffsetArrays as a test dependency * Copy test helper functions * Move from_namedtuple to InferenceObjects * Move rekey to InferenceObjects * Move flatten and rest of rekey to InferenceObjects * Copy helper functions * Run formatter * Revert "Move flatten and rest of rekey to InferenceObjects" This reverts commit 9ca8d5e. * Revert "Move rekey to InferenceObjects" This reverts commit 6e736e4. * Move utilities to InferenceObjects * Document, improve, and test utilities * Evaluate conditional statically
- Loading branch information
Showing
32 changed files
with
1,272 additions
and
876 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
name = "ArviZ" | ||
uuid = "131c737c-5715-5e2e-ad31-c244f01c1dc7" | ||
authors = ["Seth Axen <[email protected]>"] | ||
version = "0.6.0" | ||
version = "0.6.1" | ||
|
||
[deps] | ||
Conda = "8f4d0f93-b110-5947-807f-2305c1781a2d" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
module InferenceObjects | ||
|
||
using Dates: Dates | ||
using DimensionalData: DimensionalData, Dimensions, LookupArrays | ||
using OrderedCollections: OrderedDict | ||
|
||
# groups that are officially listed in the schema | ||
const SCHEMA_GROUPS = ( | ||
:posterior, | ||
:posterior_predictive, | ||
:predictions, | ||
:log_likelihood, | ||
:sample_stats, | ||
:prior, | ||
:prior_predictive, | ||
:sample_stats_prior, | ||
:observed_data, | ||
:constant_data, | ||
:predictions_constant_data, | ||
:warmup_posterior, | ||
:warmup_posterior_predictive, | ||
:warmup_predictions, | ||
:warmup_sample_stats, | ||
:warmup_log_likelihood, | ||
) | ||
const SCHEMA_GROUPS_DICT = Dict(n => i for (i, n) in enumerate(SCHEMA_GROUPS)) | ||
const DEFAULT_SAMPLE_DIMS = Dimensions.key2dim((:chain, :draw)) | ||
|
||
export Dataset, InferenceData | ||
export convert_to_dataset, convert_to_inference_data, from_namedtuple, namedtuple_to_dataset | ||
|
||
include("utils.jl") | ||
include("dimensions.jl") | ||
include("dataset.jl") | ||
include("inference_data.jl") | ||
include("convert_dataset.jl") | ||
include("convert_inference_data.jl") | ||
include("from_namedtuple.jl") | ||
|
||
end # module |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
Base.convert(::Type{Dataset}, obj) = convert_to_dataset(obj) | ||
Base.convert(::Type{Dataset}, obj::Dataset) = obj | ||
|
||
""" | ||
convert_to_dataset(obj; group = :posterior, kwargs...) -> Dataset | ||
Convert a supported object to a `Dataset`. | ||
In most cases, this function calls [`convert_to_inference_data`](@ref) and returns the | ||
corresponding `group`. | ||
""" | ||
function convert_to_dataset end | ||
|
||
function convert_to_dataset(obj; group::Symbol=:posterior, kwargs...) | ||
idata = convert_to_inference_data(obj; group, kwargs...) | ||
dataset = getproperty(idata, group) | ||
return dataset | ||
end | ||
convert_to_dataset(data::Dataset; kwargs...) = data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
""" | ||
convert(::Type{InferenceData}, obj) | ||
Convert `obj` to an `InferenceData`. | ||
`obj` can be any type for which [`convert_to_inference_data`](@ref) is defined. | ||
""" | ||
Base.convert(::Type{InferenceData}, obj) = convert_to_inference_data(obj) | ||
Base.convert(::Type{InferenceData}, obj::InferenceData) = obj | ||
Base.convert(::Type{NamedTuple}, data::InferenceData) = NamedTuple(data) | ||
NamedTuple(data::InferenceData) = parent(data) | ||
|
||
""" | ||
convert_to_inference_data(obj; group, kwargs...) -> InferenceData | ||
Convert a supported object to an [`InferenceData`](@ref) object. | ||
If `obj` converts to a single dataset, `group` specifies which dataset in the resulting | ||
`InferenceData` that is. | ||
See [`convert_to_dataset`](@ref) | ||
# Arguments | ||
- `obj` can be many objects. Basic supported types are: | ||
+ [`InferenceData`](@ref): return unchanged | ||
+ [`Dataset`](@ref)/`DimensionalData.AbstractDimStack`: add to `InferenceData` as the only | ||
group | ||
+ `NamedTuple`/`AbstractDict`: create a `Dataset` as the only group | ||
+ `AbstractArray{<:Real}`: create a `Dataset` as the only group, given an arbitrary | ||
name, if the name is not set | ||
More specific types may be documented separately. | ||
# Keywords | ||
- `group::Symbol = :posterior`: If `obj` converts to a single dataset, assign the resulting | ||
dataset to this group. | ||
- `dims`: a collection mapping variable names to collections of objects containing | ||
dimension names. Acceptable such objects are: | ||
+ `Symbol`: dimension name | ||
+ `Type{<:DimensionsionalData.Dimension}`: dimension type | ||
+ `DimensionsionalData.Dimension`: dimension, potentially with indices | ||
+ `Nothing`: no dimension name provided, dimension name is automatically generated | ||
- `coords`: a collection indexable by dimension name specifying the indices of the given | ||
dimension. If indices for a dimension in `dims` are provided, they are used even if | ||
the dimension contains its own indices. If a dimension is missing, its indices are | ||
automatically generated. | ||
- `kwargs`: remaining keywords forwarded to converter functions | ||
""" | ||
function convert_to_inference_data end | ||
|
||
convert_to_inference_data(data::InferenceData; kwargs...) = data | ||
function convert_to_inference_data(stack::DimensionalData.AbstractDimStack; kwargs...) | ||
return convert_to_inference_data(Dataset(stack); kwargs...) | ||
end | ||
function convert_to_inference_data(data::Dataset; group=:posterior, kwargs...) | ||
return convert_to_inference_data(InferenceData(; group => data); kwargs...) | ||
end | ||
function convert_to_inference_data(data::AbstractDict{Symbol}; kwargs...) | ||
return convert_to_inference_data(NamedTuple(data); kwargs...) | ||
end | ||
function convert_to_inference_data(var_data::AbstractArray{<:Real}; kwargs...) | ||
data = (; default_var_name(var_data) => var_data) | ||
return convert_to_inference_data(data; kwargs...) | ||
end | ||
function convert_to_inference_data( | ||
data::NamedTuple{<:Any,<:Tuple{Vararg{AbstractArray{<:Real}}}}; | ||
group=:posterior, | ||
kwargs..., | ||
) | ||
ds = namedtuple_to_dataset(data; kwargs...) | ||
return convert_to_inference_data(ds; group) | ||
end | ||
|
||
""" | ||
default_var_name(data) -> Symbol | ||
Return the default name for the variable whose values are stored in `data`. | ||
""" | ||
default_var_name(data) = :x | ||
function default_var_name(data::DimensionalData.AbstractDimArray) | ||
name = DimensionalData.name(data) | ||
name isa Symbol && return name | ||
name isa AbstractString && !isempty(name) && return Symbol(name) | ||
return default_var_name(parent(data)) | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
""" | ||
Dataset{L} <: DimensionalData.AbstractDimStack{L} | ||
Container of dimensional arrays sharing some dimensions. | ||
This type is an | ||
[`DimensionalData.AbstractDimStack`](https://rafaqz.github.io/DimensionalData.jl/stable/api/#DimensionalData.AbstractDimStack) | ||
that implements the same interface as `DimensionalData.DimStack` and has identical usage. | ||
When a `Dataset` is passed to Python, it is converted to an `xarray.Dataset` without copying | ||
the data. That is, the Python object shares the same memory as the Julia object. However, | ||
if an `xarray.Dataset` is passed to Julia, its data must be copied. | ||
# Constructors | ||
Dataset(data::DimensionalData.AbstractDimArray...) | ||
Dataset(data::Tuple{Vararg{<:DimensionalData.AbstractDimArray}}) | ||
Dataset(data::NamedTuple{Keys,Vararg{<:DimensionalData.AbstractDimArray}}) | ||
Dataset( | ||
data::NamedTuple, | ||
dims::Tuple{Vararg{DimensionalData.Dimension}}; | ||
metadata=DimensionalData.NoMetadata(), | ||
) | ||
In most cases, use [`convert_to_dataset`](@ref) to create a `Dataset` instead of directly | ||
using a constructor. | ||
""" | ||
struct Dataset{L,D<:DimensionalData.AbstractDimStack{L}} <: | ||
DimensionalData.AbstractDimStack{L} | ||
data::D | ||
end | ||
|
||
Dataset(args...; kwargs...) = Dataset(DimensionalData.DimStack(args...; kwargs...)) | ||
Dataset(data::Dataset) = data | ||
|
||
Base.parent(data::Dataset) = getfield(data, :data) | ||
|
||
Base.propertynames(data::Dataset) = keys(data) | ||
|
||
Base.getproperty(data::Dataset, k::Symbol) = getindex(data, k) | ||
|
||
function setattribute!(data::Dataset, k::Symbol, value) | ||
setindex!(DimensionalData.metadata(data), value, k) | ||
return value | ||
end | ||
@deprecate setattribute!(data::Dataset, k::AbstractString, value) setattribute!( | ||
data, Symbol(k), value | ||
) false | ||
|
||
""" | ||
namedtuple_to_dataset(data; kwargs...) -> Dataset | ||
Convert `NamedTuple` mapping variable names to arrays to a [`Dataset`](@ref). | ||
# Keywords | ||
- `attrs`: a Symbol-indexable collection of metadata to attach to the dataset, in addition | ||
to defaults. Values should be JSON serializable. | ||
- `library::Union{String,Module}`: library used for performing inference. Will be attached | ||
to the `attrs` metadata. | ||
- `dims`: a collection mapping variable names to collections of objects containing dimension | ||
names. Acceptable such objects are: | ||
+ `Symbol`: dimension name | ||
+ `Type{<:DimensionsionalData.Dimension}`: dimension type | ||
+ `DimensionsionalData.Dimension`: dimension, potentially with indices | ||
+ `Nothing`: no dimension name provided, dimension name is automatically generated | ||
- `coords`: a collection indexable by dimension name specifying the indices of the given | ||
dimension. If indices for a dimension in `dims` are provided, they are used even if | ||
the dimension contains its own indices. If a dimension is missing, its indices are | ||
automatically generated. | ||
""" | ||
function namedtuple_to_dataset end | ||
function namedtuple_to_dataset( | ||
data; attrs=(;), library=nothing, dims=(;), coords=(;), default_dims=DEFAULT_SAMPLE_DIMS | ||
) | ||
dim_arrays = map(keys(data)) do var_name | ||
var_data = data[var_name] | ||
var_dims = get(dims, var_name, ()) | ||
return array_to_dimarray(var_data, var_name; dims=var_dims, coords, default_dims) | ||
end | ||
attributes = merge(default_attributes(library), attrs) | ||
metadata = OrderedDict{Symbol,Any}(pairs(attributes)) | ||
return Dataset(dim_arrays...; metadata) | ||
end | ||
|
||
""" | ||
default_attributes(library=nothing) -> NamedTuple | ||
Generate default attributes metadata for a dataset generated by inference library `library`. | ||
`library` may be a `String` or a `Module`. | ||
""" | ||
function default_attributes(library=nothing) | ||
return ( | ||
created_at=Dates.format(Dates.now(), Dates.ISODateTimeFormat), | ||
library_attributes(library)..., | ||
) | ||
end | ||
|
||
library_attributes(library) = (; inference_library=string(library)) | ||
library_attributes(::Nothing) = (;) | ||
function library_attributes(library::Module) | ||
return ( | ||
inference_library=string(library), | ||
inference_library_version=string(package_version(library)), | ||
) | ||
end | ||
|
||
# DimensionalData interop | ||
|
||
for f in [:data, :dims, :refdims, :metadata, :layerdims, :layermetadata] | ||
@eval begin | ||
DimensionalData.$(f)(ds::Dataset) = DimensionalData.$(f)(parent(ds)) | ||
end | ||
end | ||
|
||
# Warning: this is not an API function and probably should be implemented abstractly upstream | ||
DimensionalData.show_after(io, mime, ::Dataset) = nothing | ||
|
||
attributes(data::DimensionalData.AbstractDimStack) = DimensionalData.metadata(data) | ||
|
||
Base.convert(T::Type{<:DimensionalData.DimStack}, data::Dataset) = convert(T, parent(data)) | ||
|
||
function DimensionalData.rebuild(data::Dataset; kwargs...) | ||
return Dataset(DimensionalData.rebuild(parent(data); kwargs...)) | ||
end |
Oops, something went wrong.
f0e4020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
f0e4020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/66182
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via: