Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Allowable file formats #36

Open
bendichter opened this issue Sep 21, 2023 · 5 comments
Open

[Feature] Allowable file formats #36

bendichter opened this issue Sep 21, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@bendichter
Copy link

The docs are not specific about what file formats are allowed within this standard. The formats in the example (spikeGLX and mp4) look good, but how far could this be extended to other standards? Some data formats require proprietary software or specific operating systems to read the data, some are poorly documented, and some lack sufficient metadata to be readable on their own without additional information.

I think it would be helpful to maintain a list of allowable formats for different data types, and require users to convert any non-compliant data formats to an open standard e.g. NWB or NIX.

For instance, for electrophysiology raw, you might allow:

  • SpikeGLX
  • OpenEphys
  • Intan
  • NeuroScope
  • NWB
  • EDF
@bendichter bendichter added the enhancement New feature or request label Sep 21, 2023
@JoeZiminski
Copy link
Member

Thanks @bendichter, my thoughts are that the current specification is aimed to be somewhat lightweight, to cast a broad a net as possible, essentially a relatively small subset of more formal BIDS specifications / proposals. The hope is as people become more familiar with BIDS-like standardisation and see its benefits, they will move closer to full BIDS compliance. My worry about restricting proprietary software formats is that researchers stuck with those formats will think this specification is not for them, and then they will not follow any of it (even the parts they could, like folder organisation). However will be interested to hear also what @adamltyson @niksirbi think.

I that the docs do not do a good enough job of highlighting this - it is not clear that if people are interested in better standardisation and the benefits of open file formats etc. they should check out BIDS / existing BEPS. So I think at the very least the docs can be changed to indicate this and explain the standardization ecosystem (and this specifications place within it) better.

@adamltyson
Copy link
Member

I think swc/neuro-blueprint should stay a directory and file naming standard to ease adoption as much as possible, It can then be combined with (optional) metadata and data type standards. The main aim has always been ease of understanding and adoption by researchers for whom this type of thing isn't standard practice.

Agree with everything you say about proprietary file formats though @bendichter. Maybe we should curate a list of recommended formats?

@niksirbi
Copy link
Member

Even if we don't mandate a restricted list of file formats, there is nothing wrong with having recommended ones. It might also help educate people about what formats to prefer (on the grounds of longevity, openness, adoption etc.) So if the acquisition system can export one of the recommended formats, they should prefer that over others.

@niksirbi
Copy link
Member

I that the docs do not do a good enough job of highlighting this - it is not clear that if people are interested in better standardisation and the benefits of open file formats etc. they should check out BIDS / existing BEPS. So I think at the very least the docs can be changed to indicate this and explain the standardization ecosystem (and this specifications place within it) better.

I completely agree with this.

@bendichter
Copy link
Author

Yes, I think that's fair. I understand you are trying to achieve a middle ground where experimentalists have some guidance to create a uniform file organization while maintaining the convenience of keeping their raw data format. I agree it's a good idea to mention this explicitly in the docs and perhaps nudge users to think about the FAIRness of their data format in terms of:

  • proprietary software
  • dependence on particular hardware
  • openness of standard and of APIs
  • self-describing nature
  • disk efficiency
  • I/O efficiency

In that context, it might be useful to point users to particularly good standards more as a guidance than as a constraint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants