Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change compression to compressor in netCDF.3.translate zarr.create_dataset calls #535

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wrongkindofdoctor
Copy link

Replaces 'compressionwith correctcompressor`argument in netCDF3.translate zarr.create_dataset calls.
Fixes #534
Tested with python 3.12 and kerchunk v.0.2.7 on RHEL8 OS

@wrongkindofdoctor wrongkindofdoctor marked this pull request as ready for review January 13, 2025 16:51
@martindurant
Copy link
Member

This is right, but I don't understand why no one hit it before! Is it possible to add a test in https://github.com/fsspec/kerchunk/blob/main/tests/test_netcdf.py which would have failed, but with this change passes?

@wrongkindofdoctor
Copy link
Author

@martindurant I'll see if I can update the netcdf unit tests to capture this behavior.

@wrongkindofdoctor
Copy link
Author

@martindurant It looks like test_netcdf.test_unlimited should catch the error if the zarr version is 3.0.0 or later. If I run it independently with Zarr v3 (I added a print statement to show the zarr version), I get the following output:

/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/bin/python3.12 /net/jml/pycharm-2024.3/plugins/python-ce/helpers/pycharm/_jb_pytest_runner.py --target test_netcdf.py::test_unlimited 
Testing started at 10:38 AM ...
Launching pytest with arguments test_netcdf.py::test_unlimited --no-header --no-summary -q in /net/jml/kerchunk/tests

============================= test session starts ==============================
collecting ... collected 1 item

test_netcdf.py::test_unlimited 

======================== 1 failed, 2 warnings in 1.12s =========================
FAILED                                    [100%]
Running with Zarr 3.0.0
tests/test_netcdf.py:79 (test_unlimited)
unlimited_dataset = '/tmp/pytest-of-Jessica.Liptak/pytest-3/test_unlimited0/test.nc'

    def test_unlimited(unlimited_dataset):
        fn = unlimited_dataset
        expected = xr.open_dataset(fn, engine="scipy")
        h = netCDF3.NetCDF3ToZarr(fn)
>       out = h.translate()

test_netcdf.py:84: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../kerchunk/netCDF3.py:194: in translate
    arr = z.create_dataset(
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/typing_extensions.py:2853: in wrapper
    return arg(*args, **kwargs)
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/zarr/core/group.py:2395: in create_dataset
    return Array(self._sync(self._async_group.create_dataset(name, **kwargs)))
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/zarr/core/sync.py:187: in _sync
    return sync(
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/zarr/core/sync.py:142: in sync
    raise return_result
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/zarr/core/sync.py:98: in _runner
    return await coro
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <AsyncGroup memory://22386488263744>, name = 'lat', shape = (10,)
kwargs = {'chunks': (10,), 'compression': None, 'dtype': dtype('>f4'), 'fill_value': None}
data = None

    @deprecated("Use AsyncGroup.create_array instead.")
    async def create_dataset(
        self, name: str, *, shape: ShapeLike, **kwargs: Any
    ) -> AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]:
        """Create an array.
    
        .. deprecated:: 3.0.0
            The h5py compatibility methods will be removed in 3.1.0. Use `AsyncGroup.create_array` instead.
    
        Arrays are known as "datasets" in HDF5 terminology. For compatibility
        with h5py, Zarr groups also implement the :func:`zarr.AsyncGroup.require_dataset` method.
    
        Parameters
        ----------
        name : str
            Array name.
        **kwargs : dict
            Additional arguments passed to :func:`zarr.AsyncGroup.create_array`.
    
        Returns
        -------
        a : AsyncArray
        """
        data = kwargs.pop("data", None)
        # create_dataset in zarr 2.x requires shape but not dtype if data is
        # provided. Allow this configuration by inferring dtype from data if
        # necessary and passing it to create_array
        if "dtype" not in kwargs and data is not None:
            kwargs["dtype"] = data.dtype
>       array = await self.create_array(name, shape=shape, **kwargs)
E       TypeError: AsyncGroup.create_array() got an unexpected keyword argument 'compression'
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/zarr/core/group.py:1169: TypeError

Process finished with exit code 1

You'll see that I am using the kerchunk conda package. I have not specified the Zarr in this test environment, so Zarr 3.0.0 is installed by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AsyncGroup.create_array() got an unexpected keyword argument 'compression' in netcdf3.py module
2 participants