Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New neo IO class for swan #33

Open
shashwatsridhar opened this issue Apr 1, 2021 · 1 comment
Open

New neo IO class for swan #33

shashwatsridhar opened this issue Apr 1, 2021 · 1 comment
Assignees
Milestone

Comments

@shashwatsridhar
Copy link
Collaborator

shashwatsridhar commented Apr 1, 2021

Given the recent new interest in Swan, I was trying to get Swan to work with a neo IO class which is compatible with newer versions of neo (currently we only support blackrockio_v4). My idea was to create a pipeline wherein the user converts her data to a common intermediary format that's easy to write to (eg. npy) and then provide a conversion script to convert the intermediary files to a neo compatible file format.

The problem Is, I couldn't find any format in neo that: 1) supports writing blocks, AND 2) supports lazy loading / channel-by-channel loading. For example, nixIO and pickleIO can write blocks just fine, but they cannot be loaded channel-by-channel or lazy loaded. For users with many sessions to analyze, this would quickly become intractable due to memory limitations.

One solution is to have the user convert their data to channel-by-channel intermediary files, and convert data from each channel to single .pkl files. While this would work, it does not seem like a very elegant solution, requiring two conversion steps to get data in a Swan compatible format.

An alternative solution would be to create a SwanNumpyIO class based on neo.BaseFromRaw and neo.BaseRawIO. This would read in folders corresponding to individual sessions, each containing certain required numpy files, and use numpy's memmap functionality to read in data channel-by-channel. This has three advantages that I can see:

  1. the users only need to convert their data to numpy (with the structure I propose below), and

  2. swan is then relatively independent of neo release cycles, allowing for quicker bug fixes and improvements in data IO

  3. data can be quickly loaded channel-by-channel

The numpy format I propose is as follows:

Each session is stored in a folder, whose name corresponds to the dataset name. The folder contains four files:

  • spikes.npy - a 3xN array with the following structure

    • spiketimes | . | . | . | . | . | . | . | . | . |

    • labels | . | . | . | . | . | . | . | . | . |

    • channels | . | . | . | . | . | . | . | . | . |

    • contains all information about the spikes and units in that dataset

    • easy to read and convert to neo Groups corresponding to units/clusters

  • waveforms.npy - an MxN array containing all waveforms corresponding to the spike times in the spikes.npy file - M is the dimension of each waveform

  • events.npy - the timestamps and names of all experimental events in the segment are recorded in this 2xN array, with the following structure

    • timestamps | . | . | . | . | . | . | . | . | . |
    • names | . | . | . | . | . | . | . | . | . |
  • metadata.json - any additional metadata corresponding to the data, stored in the form of nested dicts
    - high level keys correspond to block metadata
    - optional "clusters" key contains nested dicts with metadata for each cluster
    - the annotations will be stored as neo annotations to the dataset

(I'm still not sure of the precise structure of the metadata.json file)

I have never implemented a neoIO class, so I might be misjudging the complexity of the task itself. I was hoping @JuliaSprenger and @mdenker could share their thoughts and insights here. Do you think it's worth it?

@JuliaSprenger
Copy link

Hi @shashwatsridhar.
On a first read this sounds like you are falling in the let-me-introduce-yet-another-standard-trap.
I think the NixIO_fr might help you as it allows reading Nix files in a lazy mode, if the neo structure is raw-compatible (i.e. has the same number of channels across segments). Since you are also planning to switch to the latest version of the BlackrockIO this would anyway be the case for the data you are going to load in the future.
Alternatively if you would like to still separate the different types of data into different files as you describe it above it would make sense to use an existing format (e.g. openephys, exdir) and extend the neo capabilities for that. Here, the first format can be read by neo, but not written yet and the latter one is on the list to be included in neo at some point in the future.

@mdenker mdenker added this to the Release 0.2.0 milestone May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants