Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to match BIDS IDs and CanProCo IDs #88

Open
jcohenadad opened this issue Apr 16, 2024 · 18 comments
Open

How to match BIDS IDs and CanProCo IDs #88

jcohenadad opened this issue Apr 16, 2024 · 18 comments

Comments

@jcohenadad
Copy link
Member

jcohenadad commented Apr 16, 2024

Context

The Montreal team would like to share spinal cord lesion segmentation obtained by @plbenveniste to the Toronto team (@leelisae) for the purpose of linking spinal cord lesion load with additional qMRI measures done in Toronto.

Problem

The Montreal team uses BIDS structure sent by the UBC team (internal git-annex SHA: a04d89739c769dc03f23fcda183df62c62f586a9), while the Toronto team uses the CanProCo original file name. How can we match files between the two teams?

Solutions

The file that has matched IDs for both datasets should be accessible by both teams. Is this file centralized at a single point on the UBC server? If so, could it be made accessible to all CanProCo researchers without having to ask for it (sending the file by email implies that the file sent will be out-of-sync with the original and maintained file, which is prone to error, eg: if either the BIDS or the original dataset is being updated).

Related issue #86

@zachvav
Copy link

zachvav commented Apr 18, 2024

We are working on a more thorough solution, but in the meantime the participants.tsv file in Montreal's BIDS dataset can be used to link scans to the original CanProCo IDs. For example 'sub-tor092' corresponds to 'CAN-01-CON-092' and 'ses-M0' corresponds to M0. Sequences names in Montreal's BIDS dataset can be remapped back to original names (that @leelisae is using) like so:

T2w -> 3D-T2W-S
T2star -> Axial_Multi_Echo-S
acq-MT_MTS -> MTsat_ON_OFF-S
acq-T1w_MTS -> MTsat_T1-S
PSIR -> PSIR-S
acq-MToff_MTS -> MTsat_OFF-S
acq-MTon_MTS -> MTsat_ON-S
acq-MT_MTS -> MTsat-S
STIR -> STIR-S

To shed some light on this situation, @leelisae actually has direct access to UBCs internal server and has been pulling new scans regularly, rather than using the packaged BIDS structured datasets (like what Montreal has received). This is because she was previously UBC student and still has a UBC login and VPN access. The reason @leelisae's data is structured differently is because she is using the source data before it has been passed through our script that re-structures the dataset to BIDS.

Hope this helps!

@jcohenadad
Copy link
Member Author

Thank you for your inputs @zachvav. What I still don't understand, though, is that during a conversion with @leelisae this week, she mentioned that patients are also organized by phenotypes, and that the ID number could be the same (and what would distinguish them would be the phenotype). Eg: CAN-01-CON-02 and CAN-01-RRMS-02. Whereas in the case of the BIDS structure, the phenotype is not encoded, and hence there could not be two identical IDs used across phenotypes. Or maybe I misunderstood something from our conversation @leelisae? It would be worth clarifying

@zachvav
Copy link

zachvav commented Apr 19, 2024

I believe @leelisae might have been talking about the ID numbers being shared between sites e.g. CAN-01-CON-02 and CON-02-CON-02. Within a site however ID numbers are not re-used between phenotypes. CAN-01-CON-02 and CON-01-RRMS-02 would not be valid.

@jcohenadad
Copy link
Member Author

@leelisae would you be able to confirm? Thank you

@leelisae
Copy link
Collaborator

@zachvav is correct. I apologize, @jcohenadad - I misunderstood & miscommunicated. I meant to say that the ID numbers, at rare times, could be shared between sites (e.g., CAN-02-PPM-201 & CAN-03-RRM-201). It seems like once we receive the document from UBC matching filenames, we'd be good to go!

@zachvav
Copy link

zachvav commented Apr 19, 2024

To clarify, the IDs between the BIDS dataset and the original are actually the same IDs, but just coded differently (without phenotype and with a three letter site code e.g. 'tor' rather then a site ID number). We do not have a document that matches IDs as they are already implicitly linked.

If it would be helpful to you @leelisae however, I can generate a one-time CSV that has the BIDS IDs of Montreal's data and their corresponding CanProCo IDs.

@jcohenadad
Copy link
Member Author

jcohenadad commented Apr 20, 2024

To clarify, the IDs between the BIDS dataset and the original are actually the same IDs, but just coded differently (without phenotype and with a three letter site code e.g. 'tor' rather then a site ID number). We do not have a document that matches IDs as they are already implicitly linked.
If it would be helpful to you @leelisae however, I can generate a one-time CSV that has the BIDS IDs of Montreal's data and their corresponding CanProCo IDs.

I would instead suggest to write a script that does the conversion based on this logic (simple regex) and upload it on this repos (eg under a utils/ or scripts/ folder), instead of creating a one-time CSV, because by experience, stuff that is supposed to be "one-time" ends up being reused 2 years later by someone else who does not have the context, and who will end up making mistakes.

@zachvav
Copy link

zachvav commented Apr 22, 2024

Thanks for the suggestion. I have created a script to backwards match the BIDS NII files to files in our original structure on the UBC server and have sent Lisa the output.

To match files instead of just IDs, the conversion is actually slightly more complicated then simple regex. As an example, the BIDS structure recoded MT sequences like so: MTsat-S -> acq-MT_MTS and MTsat_ON_OFF-S -> acq-MT_MTS. To go backwards we don't actually know whether acq-MT_MTS was MTsat-S or MTsat_ON_OFF-S without looking at the original file structure to see which MTSat sequences was present.

Because of this, the script itself requires access to both the BIDS and the original directory trees and therefore must be run from within UBCs network and would be unusable by external sites. Instead of including the script itself in the repo I suggest we both include the script output as a new TSV file and add the original CanProCo ID to the participants.tsv file going forward.

@leelisae
Copy link
Collaborator

Thank you both for your help!

@jcohenadad and/or @plbenveniste - Since it seems like we can now match the BIDS IDs to original CanProCo IDs, would I be able to receive your baseline (M0) SC lesion masks that you've already created? Or, do you advise that I still run Pierre-Louis' pipeline myself to re-generate SC lesion masks?

Also, as we previously spoke:

@jcohenadad - Would you be able to write an example code to linearly register SC lesion masks in subject PSIR to subject MT space?

@plbenveniste - Would you be able to write a few sentences about the new SC lesion segmentation tool and send me any citations, so I could add this to the Methods section of the manuscript? Of course, I will include you as a co-author too.

@jcohenadad
Copy link
Member Author

@jcohenadad and/or @plbenveniste - Since it seems like we can now match the BIDS IDs to original CanProCo IDs, would I be able to receive your baseline (M0) SC lesion masks that you've already created? Or, do you advise that I still run Pierre-Louis' pipeline myself to re-generate SC lesion masks?

Yes, @plbenveniste is on it. In fact, ideally, the segmentations should be pushed to the main repos by @zachvav, and a new version of the dataset re-sent to @leelisae

@jcohenadad - Would you be able to write an example code to linearly register SC lesion masks in subject PSIR to subject MT space?

Because your file structure (non-BIDS) is different than my file structure (BIDS), if I design a script based on my file structure it won't work on your file structure. So, my suggestion is that you send me an example subject, with your current analysis script, which I will modify to add the code for PSIR registration.

Moving forward, we should all be working with the same file structure.

@plbenveniste
Copy link
Collaborator

@jcohenadad and/or @plbenveniste - Since it seems like we can now match the BIDS IDs to original CanProCo IDs, would I be able to receive your baseline (M0) SC lesion masks that you've already created? Or, do you advise that I still run Pierre-Louis' pipeline myself to re-generate SC lesion masks?

@leelisae You can find the manual segmentations for all the M0/baseline participant which were manually segmented in the following zip file canproco_M0_lesion_segmentations.zip. However, some participants were not segmented because the image quality was good enough (the excluded subjects should all be in the following exclude.yml file).
To copy the lesion segmentations, I used the following line of code in the canproco repo:

find ./ -type f -name '*ses-M0*lesion-manual.*' -exec cp {} ~/Desktop/canproco_M0_lesion_segmentations \;

@plbenveniste - Would you be able to write a few sentences about the new SC lesion segmentation tool and send me any citations, so I could add this to the Methods section of the manuscript? Of course, I will include you as a co-author too.

Here is a brief description of the model created for automatic spinal cord lesion segmentation:

A deep learning model for cervical spinal cord MS lesions segmentation was developed using the self-configuring nnUNet v2 framework (https://pubmed.ncbi.nlm.nih.gov/33288961/). It is a region-based model, outputting a single segmentation image containing 2 classes representing the spinal cord and MS lesions. Training data was based on the CanProCo dataset (M0) and consisted of sagittal PSIR 0.7×0.7×3 mm3 (4 sites, 333 participants) and sagittal STIR 0.7×0.7×3 mm3 (1 site, 92 participants). The ground truth spinal cord labels were generated by the contrast-agnostic model (https://arxiv.org/abs/2310.15402) with manual corrections when required (~5% of the images), and the ground truth MS lesion labels were generated manually from scratch by a trained radiologist.

As for the citation:
Benveniste PL, Valošek J, Chen M, Molinier N, Eunyoung Lee L, Prat A, Vavasour Z, Tam R, Traboulsee A, Kolind S, Oh J, Cohen-Adad J. Automatic Segmentation of Spinal Cord MS Lesions Across Multiple Sites, Contrasts and Vendors. 9th annual Americas Committee for Treatment and Research in Multiple Sclerosis (ACTRIMS) Forum 2024, West Palm Beach, Florida 2024.

@zachvav
Copy link

zachvav commented Apr 23, 2024

I agree it would be much easier if we all used the BIDS structure going forward! I have now included the masks provided by @plbenveniste in the BIDS repository on our end. Any new datasets that we send will include the M0 lesions masks.

While doing this I noticed that the following NII files do not have a corresponding .JSON file:

sub-mon006_ses-M0_PSIR_lesion-manual.nii.gz
sub-edm065_ses-M0_PSIR_lesion-manual.nii.gz
sub-mon052_ses-M0_PSIR_lesion-manual.nii.gz
sub-tor051_ses-M0_PSIR_lesion-manual.nii.gz

We don't actually need the JSON files for anything, so this isn't a problem itself; however, I wanted to make sure this is expected behavior and there aren't any files that have been accidentally missed.

@plbenveniste
Copy link
Collaborator

plbenveniste commented Apr 23, 2024

Thanks for highlighting this issue @zachvav.

This is an issue for us. We use the JSON files to trace where the segmentations come from. I will investigate on this.

For now, I can just say that the 4 segmentation masks are empty, and I couldn't identify any lesions in the images. Therefore, I must :

  • Identify how the masks were created
  • Create the missing json files.
  • Add the missing json files to the dataset
  • Update the sent zip file

@plbenveniste
Copy link
Collaborator

The problem comes from a previous manual labelling after receiving the new M0 batch from Erin (more details in issue #39).

To inspect each file history, I ran :

git log --follow -p ./derivatives/labels/sub-mon006/ses-M0/anat/sub-mon006_ses-M0_PSIR_lesion-manual.nii.gz

What was done:

  • For sub-mon006_ses-M0_PSIR_lesion-manual.nii.gz, sub-edm065_ses-M0_PSIR_lesion-manual.nii.gz and sub-tor051_ses-M0_PSIR_lesion-manual.nii.gz: JSON files were created and added to the dataset.
  • For sub-mon052_ses-M0_PSIR_lesion-manual.nii.gz the image was too blurry to get a precise lesion segmentation mask: therefore the mask was deleted. The image was added to the exclude.yml file.

Changes were pushed to branch plb/correct_4_files on our git-annex dataset.

Here is the updated zip file with the M0 lesion segmentation:
canproco_M0_lesion_segmentations.zip

Thanks again for your feedback @zachvav

@leelisae
Copy link
Collaborator

@jcohenadad - Yes, I will send you an example subject and analysis script likely via dropbox later today.

Thank you all for your generous help! @jcohenadad @plbenveniste @zachvav

@jcohenadad
Copy link
Member Author

Yes, I will send you an example subject and analysis script likely via dropbox later today.

I suggest to put the analysis script in this repository, under e.g. lisa/. Code review/versioning will be much easier that way.

@leelisae
Copy link
Collaborator

leelisae commented May 6, 2024

@jcohenadad - This is a friendly follow-up re: example code to linearly register SC lesion masks in subject PSIR to subject MT space, then, calculating MTR for ROIs excluding SC lesions. As a reminder, I sent you the example subject data & analysis script via email on Apr 25. Thank you!

@jcohenadad
Copy link
Member Author

@jcohenadad - This is a friendly follow-up re: example code to linearly register SC lesion masks in subject PSIR to subject MT space, then, calculating MTR for ROIs excluding SC lesions. As a reminder, I sent you the example subject data & analysis script via email on Apr 25. Thank you!

I've created a specific issue for this #91 (the current issue is about something else)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants