-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yasa mice branch #72
Comments
This is great @matiasandina!! I'm sure this will be super useful to others. Can I ask: did you re-train a new classifier using these updated features? If so, can you describe the training set as well? And more importantly, would you be willing to share the training set and/or the trained classifier (lightgbm tree paths)? |
The dataset was collected from OSF. It contains mouse EEG/EMG recordings (sampling rate: 512 Hz) and sleep stage labels (epoch length: 2.5 sec). |
Thanks @matiasandina! I think that if we are to create a new branch (like "yasa_mice"), the minimum that we need is:
And then, we can create a separate repo (see https://github.com/raphaelvallat/yasa_classifier as an example) with the code to reproduce the trained classifier, i.e. model training, data partitioning, performance evaluation, etc. A few questions:
|
This is what I can do for now, please let me know if you can reproduce it.
Performance is quite high, above 90 for accuracy and a bit lower for cohen's kappa. I think it can be smoothed out to gain even more performance but I'm OK so far. You can find performance values in this folder, which includes accuracy, cohen's kappa, and classification matrices for each of the 50 4h recordings that I used for testing. The dataset contains one EEG and one EMG. I believe the choice of electrode will for sure affect the algorithm. Again, data in mice is not as standard. Even for labs that have detailed information, they see that the brain itself enters stages at slightly different moments. Below a result from LFP electrodes by Soltani 2019 I am collecting with a "high throughput" (9 EEG + 2 EMG), so I will have more to say about this in the future. Most critical thing is, classifier accuracy will be extremely dependent on the nature of the data itself. Mice data is not standard (in the same way human data is somewhat standardized). I expect all labs to need to re-train with data they generated themselves before getting good results with any classifier. I have not done so because I haven't collected enough data yet, but I don't see this as being an issue in the future (provided me and a few others in lab can label our own data!). An alternative would be to find ways of normalizing datasets so that the feature extractor would extract the same values for features in all. This sounds easier said than donw, but it could be great in light of people sharing open datasets like the one I used for training. |
This is really great work @matiasandina, thank you!
Could you say more? What if we only include features that are normalized to the mean and unit variance of the recording, i.e. we normalize data amplitude across recordings?
I am still undecided on what is the best way to go, i.e. a separate
Do you think that the length of the input data will change the output? Can you just pass 24-hour recordings or even multi-day recording to the sleep staging function? If there are mice researchers following this thread, please feel free to chime in with your ideas and preferences :) |
Mice data is not standardized in the same way human data might be. A few quick examples of these.
This is not true for mice. The coordinates are usually modified as per what the experimenter wants (one valid reason to do so is the fact that you might want to implant something in addition to the EEGs into the brain and you don't have space if you follow what others have done).
This is not to say the data is corrupted or low quality, it's just less industrial (?), less plug and play as I imagine human data would be.
The training was done with 24 h recordings. I don't think the length of the input data would change the output. I don't have multi-day at hand right now but it would be nice to try. Regarding the branch thing...Maybe the question is whether the branches would diverge so much that it's more burden to code handlers than to split them. I think it might be worth to keep it all together. Not sure how they implemented this in code, but people from Deep Lab Cut have taken the route of |
@matiasandina thanks for the detailed explanation! Another naive question: is mice sleep similar to rat sleep? Do you think your classifier would work well on rats data? Maybe another option is to have some sort of configurable file that determine the features, e.g. which feature to compute, the length of the epoch, the length of the smoothing window, etc. That way, you just need the config file (*.json or *.yaml) and the updated classifier to run YASA on another species. |
Sorry for the late reply. Provided re-training, I think this is flexible to work multi-species. I like the idea of a config file that determines the features! It's been a bit difficult to find some time to work on this since coming back from vacation and trying to get my PhD in motion again, but it's on the list! |
This issue contains brief details of what I changed to adapt
staging.py
to work with the recordings I had from mice.The most significant change is the use of
epoch_sec
inget_features()
,fit()
, andsliding_window()
.I don't remember why I kept this
min()
call. Myepoch_sec
was 2.5 seconds, so I didn't test what happens whenepoch_sec
is different.I removed the temporal axis because mice don't sleep in one lump. I think this might also help with classifying human napping data. I just commented it out, but it wouldn't be difficult to put a conditional statement there or have a better solution.
I changed the units, which I think has been superseded in #59
Another minor thing is the naming of features, that hardcodes the "min" into the variable name. I would consider using "epoch" instead of "min".
In the future, I also plan to change this, because I expect to be able to run yasa in real-time.
I think these lines create problems for people used to mice data because it's usually the case that they don't use all these ratios. For my classifier, I used them and I think they contain value, but it would be nice to check whether things are present before calculating ratios.
Below everything you can find the full file.
The text was updated successfully, but these errors were encountered: