-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New backend : Exdir #629
Comments
I don’t think they’re set in stone. Changing them to attributes wouldn’t effect PyNWB much, and I don’t think it would change the user-facing API. I think, in general, we would be open to changing these if it would make an alternative backend easier to implement. @oruebel, @bendichter what are your thoughts? |
@ajtritt @lepmik The dataset vs attribute discussion needs to take some intricacies of HDF5 into account. You can not compress attributes, can only read them in the whole and not change them an infinite number of times. See also NeurodataWithoutBorders/nwb-schema#45 (comment) for an earlier discussion on that topic. |
@lepmik does exdir support links, object references, and region references? I played around with implementing a ZARR-based backend at some point and while dropping in ZARR for h5py worked fine, dealing properly with links and datasets of object and region references requires effort since they are not natively supported by ZARR.
From a backend perspective this is determined by the schema, i.e., this is a schema issue rather than being the issue of the backend. See also NeurodataWithoutBorders/nwb-schema#48 which discusses the proposed changes (and exceptions to the list you mentioned) |
@oruebel exdir does not support links, this is mainly to avoid cross platform issues e.g. symlinks in Windows vs Linux. However, object and region references should be trivial to implement as we are using Are links absolutely necessary? Sounds like we can edit the schema then to fit exdir more properly in terms of attributes vs datasets. |
Changing the schema means changing the NWB format and as such effects everyone, not just exdir.
Ultimately a complete backend must implement all primitives of the specification language in some form, i.e., groups, attributes, datasets, links, all the various data types (including region/object references). Omitting a primitive means you cannot properly map the whole format.
I'm not sure this is trival, but ultimately an object-reference is essentially a link stored as an element of a dataset. I.e. if you can do object-references you should be able to implement links. |
Something like links is absolutely necessary. It is used to keep data normalized without implicit relationships. You could treat references and links the same, given that they are similar in spirit. As @oruebel said, references are treated as data, where as a link is treated as an object. References have the additional benefit of being "unbreakable". I can't think of instances where we rely on that distinction, but it might be something to keep in mind when adding this feature to exdir. We use references where it's more convenient to store the relationship as data, such as a column of a table or as an attribute. One thing to keep in mind if you do go the route of treating links as references is how this will impact reading exdir data into the intermediate data translation layer. I suggest you read the overview of the NWB architecture for all the specifics. Briefly, any backend must read data into the Builder subtype that correspond to Spec subtype for which the data instantiates. For example, if something is specified to be a link (i.e. is a LinkSpec), then to properly translate that into the user API (i.e. Container objects), it must be read in by the backend as a LinkBuilder. |
@lepmik Is there a document which desribes the mapping between the HDF5 types and the exdir types? Or alternatively a list of what is not supported in exdir? |
@t-b It's pretty much one to one, however there might be some slight differences in |
From the paper:
So it seems the bridges to build are the ones already mentioned: links, object references, and region references Another difference that may come up:
HDF5 can do parallel operations on an individual dataset but not across datasets, so we've gone out of our way to design data structures accordingly, with data that could belong to multiple datasets concatenated into a single dataset (e.g. |
@bendichter Also, to clarify when we say "parallel"--HDF5 parallelism is only enabled at the process level, not at the thread level. HDF5 can be made thread safe, but you lose concurrency. A @bendichter said, I'm not sure these things are relevant in the context of storage primitives, but should be considered if trying to develop a comparable backend replacement. |
Had noted some discussion about using different backends here and noted Zarr came up, which we use and work on. As well as Exdir, which seems similar in some ways to Zarr. At the risk of bringing up a tangential discussion, am interested to discuss with Exdir about opportunities for our two communities to collaborate. Have raised issue ( https://github.com/zarr-developers/zarr/issues/334 ) for this purpose so as not to hijack this thread. 😄 |
I just want to update the the exdir team on some developments that came out just before the NWB 2.0 release. Previously, we noted that there were 3 outstanding types of data structures that we use and would need to be somehow implemented by an alternative backend: links, object references, and region references. I want to let you know that we recently moved away from using region references. Previously, they were used in 2 places: The remaining data relationships that need to be developed are links and object references (datasets that are an array of links). It seems like links could be implemented with filesystem soft-links as noted here #300 so the big remaining challenge is implementing object references. |
Just to avoid possible confusion, it is correct that the NWB:N 2.0 core schema does not use region references any more. However, region references are still supported, i.e., users can still use them in extensions. In short, to support NWB:N 2.0, region references are not critical (only links and object references) but for full support it will still be useful to support region references as well. So at least for a first go at integrating exdir, focusing on supporting links and datasets of object references will be an excellent start. Hopefully this will help to simplify the problem for integrating exdir. |
Yes thanks for the clarification. Region references have been removed from the core schema but have not been officially removed as a supported type, so they could be used by extensions (though I am not aware of any extensions that use them) |
Hi!
I'm considering adding an additional backend exdir. Which should be one to one compatible with HDF5, so hopefully not to much work. I have spoken briefly about it with the NWB presenter at SfN 2017.
One issue, or question that comes up is how you determine attributes vs datasets. For example,
session_description
,sesison_start_time
andfile_create_date
are stored as datasets, however, (file)source
is an attribute. In my opinion all these are attributes.So, when adding a new backend, is it possible to chose what should be stored as datasets and attributes, or is this "set in stone"?
The text was updated successfully, but these errors were encountered: