Skip to content

Commit

Permalink
Move Dataset and InferenceData into module (#213)
Browse files Browse the repository at this point in the history
* Create InferenceObjects module

* Reorganize files into submodule

* Import module

* Don't use arviz_version()

* Remove reference to doc_str

* Collect all Python interop in xarray.jl

* Move base conversion functions to InferenceObjects

* Move utils to InferenceObjects

* Remove unnecessary method

* Add missing imports

* Explicitly list schema group names

* Update imports

* Add back missing import

* Split dataset tests

* Split InferenceData tests

* Load module directly

* Separate out dataset conversion tests

* Separate InferenceData conversion tests

* Remove reference to ArviZ

* Test package_version

* Source helpers for now

* Move default_var_name

* Add back removed conversion methods

* Move PyCall conversion tests

* Remove requirement to have arviz_version attribute

* Don't test for arviz_version

* Test an actual package

* Refer to correct function

* Don't load non-existent file

* Run formatter

* Increment version number

* Run formatter

* Move merge to InferenceObjects

* Reorganize docs

* Write example that doesn't require example data

* Collect dimension-related code

* Rename function

* Rearrange tests

* Reorganize code

* Set default coordinates to axis of underlying array

* Use broadcasting and add type annotation

* Add dimension tests

* Add missing variable declarations

* Don't import internal functions

* Fix tests

* Add missing tests

* Point to LookupArrays

* Add OffsetArrays as a test dependency

* Copy test helper functions

* Move from_namedtuple to InferenceObjects

* Move rekey to InferenceObjects

* Move flatten and rest of rekey to InferenceObjects

* Copy helper functions

* Run formatter

* Revert "Move flatten and rest of rekey to InferenceObjects"

This reverts commit 9ca8d5e.

* Revert "Move rekey to InferenceObjects"

This reverts commit 6e736e4.

* Move utilities to InferenceObjects

* Document, improve, and test utilities

* Evaluate conditional statically
  • Loading branch information
sethaxen authored Aug 13, 2022
1 parent 4070390 commit f0e4020
Show file tree
Hide file tree
Showing 32 changed files with 1,272 additions and 876 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "ArviZ"
uuid = "131c737c-5715-5e2e-ad31-c244f01c1dc7"
authors = ["Seth Axen <[email protected]>"]
version = "0.6.0"
version = "0.6.1"

[deps]
Conda = "8f4d0f93-b110-5947-807f-2305c1781a2d"
Expand Down
6 changes: 4 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,10 @@ makedocs(;
"Stats" => "api/stats.md",
"Diagnostics" => "api/diagnostics.md",
"Data" => "api/data.md",
"InferenceData" => "api/inference_data.md",
"Dataset" => "api/dataset.md",
"InferenceObjects" => [
"InferenceData" => "api/inference_data.md",
"Dataset" => "api/dataset.md",
],
],
],
checkdocs=:exports,
Expand Down
2 changes: 0 additions & 2 deletions docs/src/api/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ from_samplechains
## IO / General conversion

```@docs
convert_to_inference_data
from_dict
from_json
from_namedtuple
Expand All @@ -28,7 +27,6 @@ to_netcdf
```@docs
concat
extract_dataset
merge
```

## Example data
Expand Down
12 changes: 12 additions & 0 deletions docs/src/api/inference_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,15 @@ Base.setindex

`InferenceData` also implements the same iteration interface as its underlying `NamedTuple`.
That is, iterating over an `InferenceData` iterates over its groups.

## General conversion

```@docs
convert_to_inference_data
```

## General functions

```@docs
merge
```
13 changes: 9 additions & 4 deletions src/ArviZ.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ __precompile__()
module ArviZ

using Base: @__doc__
using Dates
using Requires
using REPL
using DataFrames
Expand Down Expand Up @@ -35,6 +34,15 @@ import StatsBase: summarystats
import Markdown: @doc_str
import PyCall: PyObject

include("InferenceObjects/InferenceObjects.jl")

using .InferenceObjects
import .InferenceObjects: convert_to_inference_data, namedtuple_of_arrays
# internal functions temporarily used/extended here
using .InferenceObjects:
attributes, flatten, groupnames, groups, hasgroup, rekey, setattribute!
import .InferenceObjects: namedtuple_of_arrays

# Exports

## Plots
Expand Down Expand Up @@ -129,14 +137,11 @@ end
include("utils.jl")
include("rcparams.jl")
include("xarray.jl")
include("dataset.jl")
include("inference_data.jl")
include("data.jl")
include("diagnostics.jl")
include("plots.jl")
include("bokeh.jl")
include("stats.jl")
include("stats_utils.jl")
include("namedtuple.jl")

end # module
40 changes: 40 additions & 0 deletions src/InferenceObjects/InferenceObjects.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
module InferenceObjects

using Dates: Dates
using DimensionalData: DimensionalData, Dimensions, LookupArrays
using OrderedCollections: OrderedDict

# groups that are officially listed in the schema
const SCHEMA_GROUPS = (
:posterior,
:posterior_predictive,
:predictions,
:log_likelihood,
:sample_stats,
:prior,
:prior_predictive,
:sample_stats_prior,
:observed_data,
:constant_data,
:predictions_constant_data,
:warmup_posterior,
:warmup_posterior_predictive,
:warmup_predictions,
:warmup_sample_stats,
:warmup_log_likelihood,
)
const SCHEMA_GROUPS_DICT = Dict(n => i for (i, n) in enumerate(SCHEMA_GROUPS))
const DEFAULT_SAMPLE_DIMS = Dimensions.key2dim((:chain, :draw))

export Dataset, InferenceData
export convert_to_dataset, convert_to_inference_data, from_namedtuple, namedtuple_to_dataset

include("utils.jl")
include("dimensions.jl")
include("dataset.jl")
include("inference_data.jl")
include("convert_dataset.jl")
include("convert_inference_data.jl")
include("from_namedtuple.jl")

end # module
19 changes: 19 additions & 0 deletions src/InferenceObjects/convert_dataset.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Base.convert(::Type{Dataset}, obj) = convert_to_dataset(obj)
Base.convert(::Type{Dataset}, obj::Dataset) = obj

"""
convert_to_dataset(obj; group = :posterior, kwargs...) -> Dataset
Convert a supported object to a `Dataset`.
In most cases, this function calls [`convert_to_inference_data`](@ref) and returns the
corresponding `group`.
"""
function convert_to_dataset end

function convert_to_dataset(obj; group::Symbol=:posterior, kwargs...)
idata = convert_to_inference_data(obj; group, kwargs...)
dataset = getproperty(idata, group)
return dataset
end
convert_to_dataset(data::Dataset; kwargs...) = data
90 changes: 90 additions & 0 deletions src/InferenceObjects/convert_inference_data.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
"""
convert(::Type{InferenceData}, obj)
Convert `obj` to an `InferenceData`.
`obj` can be any type for which [`convert_to_inference_data`](@ref) is defined.
"""
Base.convert(::Type{InferenceData}, obj) = convert_to_inference_data(obj)
Base.convert(::Type{InferenceData}, obj::InferenceData) = obj
Base.convert(::Type{NamedTuple}, data::InferenceData) = NamedTuple(data)
NamedTuple(data::InferenceData) = parent(data)

"""
convert_to_inference_data(obj; group, kwargs...) -> InferenceData
Convert a supported object to an [`InferenceData`](@ref) object.
If `obj` converts to a single dataset, `group` specifies which dataset in the resulting
`InferenceData` that is.
See [`convert_to_dataset`](@ref)
# Arguments
- `obj` can be many objects. Basic supported types are:
+ [`InferenceData`](@ref): return unchanged
+ [`Dataset`](@ref)/`DimensionalData.AbstractDimStack`: add to `InferenceData` as the only
group
+ `NamedTuple`/`AbstractDict`: create a `Dataset` as the only group
+ `AbstractArray{<:Real}`: create a `Dataset` as the only group, given an arbitrary
name, if the name is not set
More specific types may be documented separately.
# Keywords
- `group::Symbol = :posterior`: If `obj` converts to a single dataset, assign the resulting
dataset to this group.
- `dims`: a collection mapping variable names to collections of objects containing
dimension names. Acceptable such objects are:
+ `Symbol`: dimension name
+ `Type{<:DimensionsionalData.Dimension}`: dimension type
+ `DimensionsionalData.Dimension`: dimension, potentially with indices
+ `Nothing`: no dimension name provided, dimension name is automatically generated
- `coords`: a collection indexable by dimension name specifying the indices of the given
dimension. If indices for a dimension in `dims` are provided, they are used even if
the dimension contains its own indices. If a dimension is missing, its indices are
automatically generated.
- `kwargs`: remaining keywords forwarded to converter functions
"""
function convert_to_inference_data end

convert_to_inference_data(data::InferenceData; kwargs...) = data
function convert_to_inference_data(stack::DimensionalData.AbstractDimStack; kwargs...)
return convert_to_inference_data(Dataset(stack); kwargs...)
end
function convert_to_inference_data(data::Dataset; group=:posterior, kwargs...)
return convert_to_inference_data(InferenceData(; group => data); kwargs...)
end
function convert_to_inference_data(data::AbstractDict{Symbol}; kwargs...)
return convert_to_inference_data(NamedTuple(data); kwargs...)
end
function convert_to_inference_data(var_data::AbstractArray{<:Real}; kwargs...)
data = (; default_var_name(var_data) => var_data)
return convert_to_inference_data(data; kwargs...)
end
function convert_to_inference_data(
data::NamedTuple{<:Any,<:Tuple{Vararg{AbstractArray{<:Real}}}};
group=:posterior,
kwargs...,
)
ds = namedtuple_to_dataset(data; kwargs...)
return convert_to_inference_data(ds; group)
end

"""
default_var_name(data) -> Symbol
Return the default name for the variable whose values are stored in `data`.
"""
default_var_name(data) = :x
function default_var_name(data::DimensionalData.AbstractDimArray)
name = DimensionalData.name(data)
name isa Symbol && return name
name isa AbstractString && !isempty(name) && return Symbol(name)
return default_var_name(parent(data))
end
128 changes: 128 additions & 0 deletions src/InferenceObjects/dataset.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
"""
Dataset{L} <: DimensionalData.AbstractDimStack{L}
Container of dimensional arrays sharing some dimensions.
This type is an
[`DimensionalData.AbstractDimStack`](https://rafaqz.github.io/DimensionalData.jl/stable/api/#DimensionalData.AbstractDimStack)
that implements the same interface as `DimensionalData.DimStack` and has identical usage.
When a `Dataset` is passed to Python, it is converted to an `xarray.Dataset` without copying
the data. That is, the Python object shares the same memory as the Julia object. However,
if an `xarray.Dataset` is passed to Julia, its data must be copied.
# Constructors
Dataset(data::DimensionalData.AbstractDimArray...)
Dataset(data::Tuple{Vararg{<:DimensionalData.AbstractDimArray}})
Dataset(data::NamedTuple{Keys,Vararg{<:DimensionalData.AbstractDimArray}})
Dataset(
data::NamedTuple,
dims::Tuple{Vararg{DimensionalData.Dimension}};
metadata=DimensionalData.NoMetadata(),
)
In most cases, use [`convert_to_dataset`](@ref) to create a `Dataset` instead of directly
using a constructor.
"""
struct Dataset{L,D<:DimensionalData.AbstractDimStack{L}} <:
DimensionalData.AbstractDimStack{L}
data::D
end

Dataset(args...; kwargs...) = Dataset(DimensionalData.DimStack(args...; kwargs...))
Dataset(data::Dataset) = data

Base.parent(data::Dataset) = getfield(data, :data)

Base.propertynames(data::Dataset) = keys(data)

Base.getproperty(data::Dataset, k::Symbol) = getindex(data, k)

function setattribute!(data::Dataset, k::Symbol, value)
setindex!(DimensionalData.metadata(data), value, k)
return value
end
@deprecate setattribute!(data::Dataset, k::AbstractString, value) setattribute!(
data, Symbol(k), value
) false

"""
namedtuple_to_dataset(data; kwargs...) -> Dataset
Convert `NamedTuple` mapping variable names to arrays to a [`Dataset`](@ref).
# Keywords
- `attrs`: a Symbol-indexable collection of metadata to attach to the dataset, in addition
to defaults. Values should be JSON serializable.
- `library::Union{String,Module}`: library used for performing inference. Will be attached
to the `attrs` metadata.
- `dims`: a collection mapping variable names to collections of objects containing dimension
names. Acceptable such objects are:
+ `Symbol`: dimension name
+ `Type{<:DimensionsionalData.Dimension}`: dimension type
+ `DimensionsionalData.Dimension`: dimension, potentially with indices
+ `Nothing`: no dimension name provided, dimension name is automatically generated
- `coords`: a collection indexable by dimension name specifying the indices of the given
dimension. If indices for a dimension in `dims` are provided, they are used even if
the dimension contains its own indices. If a dimension is missing, its indices are
automatically generated.
"""
function namedtuple_to_dataset end
function namedtuple_to_dataset(
data; attrs=(;), library=nothing, dims=(;), coords=(;), default_dims=DEFAULT_SAMPLE_DIMS
)
dim_arrays = map(keys(data)) do var_name
var_data = data[var_name]
var_dims = get(dims, var_name, ())
return array_to_dimarray(var_data, var_name; dims=var_dims, coords, default_dims)
end
attributes = merge(default_attributes(library), attrs)
metadata = OrderedDict{Symbol,Any}(pairs(attributes))
return Dataset(dim_arrays...; metadata)
end

"""
default_attributes(library=nothing) -> NamedTuple
Generate default attributes metadata for a dataset generated by inference library `library`.
`library` may be a `String` or a `Module`.
"""
function default_attributes(library=nothing)
return (
created_at=Dates.format(Dates.now(), Dates.ISODateTimeFormat),
library_attributes(library)...,
)
end

library_attributes(library) = (; inference_library=string(library))
library_attributes(::Nothing) = (;)
function library_attributes(library::Module)
return (
inference_library=string(library),
inference_library_version=string(package_version(library)),
)
end

# DimensionalData interop

for f in [:data, :dims, :refdims, :metadata, :layerdims, :layermetadata]
@eval begin
DimensionalData.$(f)(ds::Dataset) = DimensionalData.$(f)(parent(ds))
end
end

# Warning: this is not an API function and probably should be implemented abstractly upstream
DimensionalData.show_after(io, mime, ::Dataset) = nothing

attributes(data::DimensionalData.AbstractDimStack) = DimensionalData.metadata(data)

Base.convert(T::Type{<:DimensionalData.DimStack}, data::Dataset) = convert(T, parent(data))

function DimensionalData.rebuild(data::Dataset; kwargs...)
return Dataset(DimensionalData.rebuild(parent(data); kwargs...))
end
Loading

2 comments on commit f0e4020

@sethaxen
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/66182

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.6.1 -m "<description of version>" f0e4020d7bf2184b8ee4ac48f39a9a2d4f988706
git push origin v0.6.1

Please sign in to comment.