Skip to content

Commit

Permalink
merge
Browse files Browse the repository at this point in the history
  • Loading branch information
Alexander-Barth committed May 3, 2024
2 parents 004e596 + 526e549 commit c15ba20
Show file tree
Hide file tree
Showing 9 changed files with 81 additions and 23 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/Documenter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
name: Documentation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- run: |
sudo apt-get install python3-matplotlib
- uses: julia-actions/julia-buildpkg@latest
Expand Down
7 changes: 4 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ jobs:
arch:
- x64
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@v1
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@v2
with:
version: ${{ matrix.version }}
arch: ${{ matrix.arch }}
Expand All @@ -41,6 +41,7 @@ jobs:
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v1
- uses: codecov/codecov-action@v4
with:
file: lcov.info
token: ${{ secrets.CODECOV_TOKEN }}
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ function copernicus_marine_catalog(product_id,dataset_id,
stac_url = "https://stac.marine.copernicus.eu/metadata/catalog.stac.json",
asset = "timeChunked")

cat = STAC.Catalog(stac_url);
cat = STAC.Catalog(stac_url);
item_canditates = filter(startswith(dataset_id),keys(cat[product_id].items))
# use last version per default
dataset_version_id = sort(item_canditates)[end]
Expand All @@ -36,9 +36,7 @@ product_id = "MEDSEA_MULTIYEAR_PHY_006_004"
dataset_id = "med-cmcc-ssh-rean-d"

url = copernicus_marine_catalog(product_id,dataset_id)
# surprisingly requesting missing data chunks results in the HTTP error
# code 403 (permission denied) rather than 404 (not found) for the CMEMS server.
ds = ZarrDataset(url,_omitcode=[404,403]);
ds = ZarrDataset(url);

# longitude, latitude and time are the coordinate variables defined in the
# zarr dataset
Expand Down
47 changes: 45 additions & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,57 @@

## ZarrDatasets


See the [documentation of JuliaGeo/CommonDataModel.jl](https://juliageo.org/CommonDataModel.jl/stable/) for the full documentation of the API. As a quick reference, here is an example how to create and read a Zarr file store as a quick reference.

### Create a Zarr file store

The following example create a Zarr file store in the directory `"/tmp/test-zarr"`:

```julia
using ZarrDatasets

# sample data
data = [i+j for i = 1:3, j = 1:5]

directoryname = "/tmp/test-zarr"
mkdir(directoryname)

ds = ZarrDataset(directoryname,"c")
defDim(ds,"lon",size(data,1))
defDim(ds,"lat",size(data,2))
zv = defVar(ds,"varname",Int64,("lon","lat"))
zv[:,:] = data
zv.attrib["units"] = "m"
close(ds)
```

### Loading a Zarr file store

The data and units can be loaded by indexing the data set structure `ds`.

```julia
using ZarrDatasets
directoryname = "/tmp/test-zarr"
ds = ZarrDataset(directoryname)
data = ds["varname"][:,:]
data_units = ds["varname"].attrib["units"]
```



```@autodocs
Modules = [ZarrDatasets]
```





### Differences between Zarr and NetCDF files

* All metadata (in particular attributes) is stored in JSON files for the Zarr format with the following implications:
* JSON does not distinguish between integers and real numbers. They are all considered as generic numbers. Whole numbers are loaded as `Int64` and decimal numbers `Float64`. It is not possible to store the number `1.0` as a real number.
* JSON does not distinguish between integers and real numbers. They are all considered as generic numbers. Whole numbers are loaded as `Int64` and real numbers `Float64`. It is not possible to store the number `1.0` as a real number.
* The order of keys in a JSON document is undefined. It is therefore not possible to have a consistent ordering of the attributes or variables.
* The JSON standard does not allow NaN, +Inf, -Inf (https://github.com/capnproto/capnproto/issues/261).
* The JSON standard does not allow the values NaN, +Inf, -Inf which is problematic for attributes ([zarr-python #412](https://github.com/zarr-developers/zarr-python/issues/412), [zarr-specs #81](https://github.com/zarr-developers/zarr-specs/issues/81)). However, there is a special case for the fill-value to handle NaN, +Inf and -Inf.
* All dimensions must be associated to Zarr variables.
25 changes: 17 additions & 8 deletions src/dataset.jl
Original file line number Diff line number Diff line change
Expand Up @@ -61,16 +61,21 @@ CDM.maskingvalue(ds::ZarrDataset) = ds.maskingvalue

"""
ds = ZarrDataset(url::AbstractString,mode = "r";
_omitcode = 404,
_omitcode = [404,403],
maskingvalue = missing)
ZarrDataset(f::Function,url::AbstractString,mode = "r";
maskingvalue = missing)
Open the zarr dataset at the url or path `url`. Only the read-mode is
currently supported. `ds` supports the API of the
Open the zarr dataset at the url or path `url`. The mode can only be `"r"` (read-only)
or `"c"` (create). `ds` supports the API of the
[JuliaGeo/CommonDataModel.jl](https://github.com/JuliaGeo/CommonDataModel.jl).
The experimental `_omitcode` allows to work-around servers that return
HTTP error different than 404 for missing chunks.
The experimental `_omitcode` allows to define which HTTP error code should be used
for missing chunks. For compatibility with python's Zarr, the HTTP error 403
(permission denied) is also used to missing chunks in addition to 404 (not
found).
The parameter `maskingvalue` allows to define which special value should be used
as replacement for fill values. The default is `missing`.
Example:
Expand Down Expand Up @@ -101,11 +106,10 @@ zos1 = ZarrDataset(url) do ds
ds["zos"][:,:,end,1]
end # implicit call to close(ds)
```
"""
function ZarrDataset(url::AbstractString,mode = "r";
parentdataset = nothing,
_omitcode = 404,
_omitcode = [404,403],
maskingvalue = missing,
attrib = Dict(),
)
Expand Down Expand Up @@ -134,7 +138,7 @@ function ZarrDataset(url::AbstractString,mode = "r";
end
elseif mode == "c"
store = Zarr.DirectoryStore(url)
zg = zgroup(store, "",attrs = Dict(attrib))
zg = zgroup(store, "",attrs = Dict{String,Any}(attrib))
iswritable = true
end
ZarrDataset(parentdataset,zg,dimensions,iswritable,maskingvalue)
Expand All @@ -153,3 +157,8 @@ function ZarrDataset(f::Function,args...; kwargs...)
close(ds)
end
end

export ZarrDataset
export defDim
export defVar
#export defGroup
3 changes: 2 additions & 1 deletion src/variable.jl
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ function CDM.defVar(ds::ZarrDataset,name::SymbolOrString,vtype::DataType,dimensi
fillvalue = get(attrib,"_FillValue",nothing)
end

_attrib = Dict(attrib)
_attrib = Dict{String,Any}(attrib)
_attrib["_ARRAY_DIMENSIONS"] = reverse(dimensionnames)

_size = ntuple(length(dimensionnames)) do i
Expand All @@ -62,6 +62,7 @@ function CDM.defVar(ds::ZarrDataset,name::SymbolOrString,vtype::DataType,dimensi
if isnothing(chunksizes)
chunksizes = _size
end

zarray = zcreate(
vtype, ds.zgroup, name, _size...;
chunks = chunksizes,
Expand Down
1 change: 1 addition & 0 deletions test/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@ Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[compat]
Aqua = "0.8"
CommonDataModel = "0.3.6"
NCDatasets = "0.14"
julia = "1"
3 changes: 3 additions & 0 deletions test/test_aqua.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
using Aqua
using ZarrDatasets


Aqua.test_ambiguities(ZarrDatasets)
# some internal ambiguities in DiskArray 0.3 probably fixed in 0.4
Aqua.test_all(ZarrDatasets, ambiguities = false)
10 changes: 6 additions & 4 deletions test/test_write.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
using Test
using ZarrDatasets
using ZarrDatasets:
defDim,
Expand All @@ -7,13 +8,14 @@ data = rand(Int32,3,5)

fname = tempname()
mkdir(fname)
gattrib = Dict{String,Any}("title" => "this is the title")
gattrib = Dict("title" => "this is the title")
ds = ZarrDataset(fname,"c",attrib = gattrib)

ds.attrib["number"] = 1
defDim(ds,"lon",3)
defDim(ds,"lat",5)

attrib = Dict{String,Any}(
attrib = Dict(
"units" => "m/s",
"long_name" => "test",
)
Expand All @@ -25,7 +27,7 @@ vtype = Int32

zv = defVar(ds,varname,vtype,dimensionnames, attrib = attrib)
zv[:,:] = data
zv.attrib["lala"] = 12
zv.attrib["number"] = 12
zv.attrib["standard_name"] = "test"
ds.attrib["history"] = "test"
close(ds)
Expand All @@ -34,7 +36,7 @@ ds = ZarrDataset(fname)

zv = ds[varname]

@test zv.attrib["lala"] == 12
@test zv.attrib["number"] == 12
@test zv.attrib["standard_name"] == "test"
@test ds.attrib["history"] == "test"

Expand Down

0 comments on commit c15ba20

Please sign in to comment.