-
-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fill_value serialization of NaN; add property-based tests #2802
base: main
Are you sure you want to change the base?
Fix fill_value serialization of NaN; add property-based tests #2802
Conversation
The bad news is, this issue is going to be slightly more than I've said here. The good news is that the property-based tests caught some edge cases. |
Nice, I've been meaning to add this to Zarr: @st.composite
def v3_array_metadata(draw: st.DrawFn) -> bytes:
from zarr.codecs.bytes import BytesCodec
from zarr.core.chunk_grids import RegularChunkGrid
from zarr.core.chunk_key_encodings import DefaultChunkKeyEncoding
from zarr.core.metadata.v3 import ArrayV3Metadata
# separator = draw(st.sampled_from(['/', '\\']))
shape = draw(array_shapes)
ndim = len(shape)
chunk_shape = draw(npst.array_shapes(min_dims=ndim, max_dims=ndim))
dtype = draw(zrst.v3_dtypes())
fill_value = draw(npst.from_dtype(dtype))
dimension_names = draw(
st.none() | st.lists(st.none() | simple_text, min_size=ndim, max_size=ndim)
)
metadata = ArrayV3Metadata(
shape=shape,
data_type=dtype,
chunk_grid=RegularChunkGrid(chunk_shape=chunk_shape),
fill_value=fill_value,
attributes=draw(simple_attrs),
dimension_names=dimension_names,
chunk_key_encoding=DefaultChunkKeyEncoding(separator="/"), # FIXME
codecs=[BytesCodec()],
storage_transformers=(),
)
return metadata.to_buffer_dict(prototype=default_buffer_prototype())["zarr.json"] What do you think of a |
I love the idea. I was thinking the other day that the obvious path out of the bugs that are currently popping up would be property-based testing, so I was pretty pleased to see that there's already some work in that direction Out of curiosity, do we have something like json schema that we could apply against the outputs to at least verify structure? We'd still need to define all the rules that exist in terms of value/type dependencies etc. but that's an easy win if it exists somewhere |
I'm not aware of a JSON schema definition for the array metadata. If one existed, it would necessarily only support partial validation, because JSON schema can't express certain invariants in the metadata document, like the requirement that dimensional attributes (shape, chunk_shape, etc) be consistent. |
A fairly easy alternative way to handle this would be to simply write a test that takes the I still think a generic metadata strategy is probably useful. |
Yeah, you'd definitely need json-schema and custom validation rules that encode relationships among different fields |
The current serialization of
fill_value
inArrayV2Metadata
does not fully conform to the spec, particularly for:Changes
allow_nan
toFalse
onjson.dumps
Resolves: #2741
TODO:
docs/user-guide/*.rst
changes/