Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Non-DCI H5DataIO does not passthrough maxshape property which crashes pynwb Timeseries #1148

Closed
cboulay opened this issue Jul 9, 2024 · 1 comment · Fixed by #1149
Assignees
Labels
category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users
Milestone

Comments

@cboulay
Copy link
Contributor

cboulay commented Jul 9, 2024

What happened?

The pynwb documentation provides an example of passing a H5DataIO object as the data argument to the TimeSeries constructor. docs here. However, this only works in the case where timestamps=None because, if timestamps is not None, then the TimeSeries initializer will check for the presence of maxshape and this fails because H5DataIO does not have .maxshape, it only has .io_settings["maxshape"].

I raised this bug in pynwb here: NeurodataWithoutBorders/pynwb#1929

Are there any unintended side-effects of adding a maxshape getter to H5DataIO?
Should this be fixed in pynwb? Maybe its TimeSeries _check_time_series_dimension can have a special case if the data object is H5DataIO.

For now I will monkey-patch H5DataIO to add a maxshape property.

Steps to Reproduce

import datetime
import numpy as np
from hdmf.backends.hdf5.h5_utils import H5DataIO
from pynwb import NWBHDF5IO, NWBFile, TimeSeries


def test_nwb_stream_timestamps():
    srate = 10.0
    data_shape = 20, 16, 2
    test_data = np.arange(np.prod(data_shape)).reshape(data_shape)

    dataio = H5DataIO(
        shape=(0,) + data_shape[1:],
        dtype=test_data.dtype,
        maxshape=(None,) + data_shape[1:],
        fillvalue=np.nan,
    )
    # Next line fails
    data_series = TimeSeries(
        name="some name",
        data=dataio,
        timestamps=np.arange(test_data.shape[0]),
        # rate=srate,  # Comment previous line and uncomment this to bypass error
        unit="n/a",
    )

    nwbfile = NWBFile(
        session_description="demonstrate streaming timestamps",
        identifier="abcd1234",
        session_start_time=datetime.datetime.now(datetime.timezone.utc),
    )
    nwbfile.add_acquisition(data_series)

    # Write the data to file
    file_path = Path(tempfile.gettempdir()) / "test_nwb_stream_timestamps.nwb"
    io = NWBHDF5IO(file_path, "w")
    io.write(nwbfile)

    dataio.dataset.resize(len(dataio.dataset) + len(data), axis=0)
    dataio.dataset[-len(data) :] = data
    timestampio.dataset.resize(len(timestampio.dataset) + len(timestamps), axis=0)
    timestampio.dataset[-len(timestamps) :] = timestamps

    io.close()

Traceback

>       data_series = TimeSeries(
            name="TODO: name from params",
            data=dataio,
            timestamps=timestampio,
            # rate=srate,
            unit="n/a",
        )

test_nwb.py:57: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../.venv/lib/python3.9/site-packages/hdmf/utils.py:667: in func_call
    pargs = _check_args(args, kwargs)
../../../.venv/lib/python3.9/site-packages/hdmf/utils.py:639: in _check_args
    parsed = __parse_args(
../../../.venv/lib/python3.9/site-packages/hdmf/utils.py:351: in __parse_args
    valshape = get_data_shape(argval)
../../../.venv/lib/python3.9/site-packages/hdmf/utils.py:916: in get_data_shape
    if hasattr(data, 'maxshape'):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <hdmf.backends.hdf5.h5_utils.H5DataIO object at 0x14755d190>
attr = 'maxshape'

    def __getattr__(self, attr):
        """Delegate attribute lookup to data object"""
        if attr == '__array_struct__' and not self.valid:
            # np.array() checks __array__ or __array_struct__ attribute dep. on numpy version
            raise InvalidDataIOError("Cannot convert data to array. Data is not valid.")
        if not self.valid:
>           raise InvalidDataIOError("Cannot get attribute '%s' of data. Data is not valid." % attr)
E           hdmf.data_utils.InvalidDataIOError: Cannot get attribute 'maxshape' of data. Data is not valid.

../../../.venv/lib/python3.9/site-packages/hdmf/data_utils.py:1104: InvalidDataIOError

Operating System

macOS

Python Executable

Python

Python Version

3.9

Package Versions

h5py==3.11.0
hdmf==3.14.1
hdmf_zarr==0.8.0
pynwb==2.8.0

@cboulay cboulay changed the title [Bug]: H5DataIO does not expose maxshape property which crashes pynwb Timeseries in some configs. [Bug]: Non-DCI H5DataIO does not passthrough maxshape property which crashes pynwb Timeseries Jul 10, 2024
@mavaylon1
Copy link
Contributor

I will take a closer look this week. Thanks for the the PR!

@mavaylon1 mavaylon1 self-assigned this Jul 22, 2024
@mavaylon1 mavaylon1 added category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users labels Jul 22, 2024
@mavaylon1 mavaylon1 added this to the 3.14.2 milestone Jul 22, 2024
@mavaylon1 mavaylon1 assigned stephprince and unassigned mavaylon1 Jul 24, 2024
stephprince added a commit that referenced this issue Aug 28, 2024
… pynwb TimeSeries. (#1149)

* Add passthrough on non-DCI H5DataIO to support its use in pynwb TimeSeries. Fixes #1148.

* CHANGELOG update for #1149

* Add another maxshape fallback (self.shape)

* Incorporated @stephprince suggestions on #1149.

---------

Co-authored-by: Steph Prince <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants