Merge pull request #70 from neuropoly/nm/add_sacrum_pipeline

Add sacrum generation to main
neuropoly · Oct 5, 2024 · 0813433 · 0813433
2 parents 3a55ac6 + 96f3eb7
commit 0813433
Show file tree

Hide file tree

Showing 14 changed files with 1,603 additions and 24 deletions.
diff --git a/pyproject.toml b/pyproject.toml
@@ -62,6 +62,7 @@ homepage = "https://github.com/neuropoly/totalspineseg"
 repository = "https://github.com/neuropoly/totalspineseg"
 Dataset101_TotalSpineSeg_step1 = "https://github.com/neuropoly/totalspineseg/releases/download/r20240921/Dataset101_TotalSpineSeg_step1_r20240921.zip"
 Dataset102_TotalSpineSeg_step2 = "https://github.com/neuropoly/totalspineseg/releases/download/r20240921/Dataset102_TotalSpineSeg_step2_r20240921.zip"
+Dataset300_SacrumDataset = "https://github.com/neuropoly/totalspineseg/releases/download/sacrum-seg/Dataset300_SacrumDataset.zip"
 
 [project.scripts]
 totalspineseg = "totalspineseg.inference:main"
@@ -80,6 +81,7 @@ totalspineseg_crop_image2seg = "totalspineseg.utils.crop_image2seg:main"
 totalspineseg_extract_soft = "totalspineseg.utils.extract_soft:main"
 totalspineseg_extract_levels = "totalspineseg.utils.extract_levels:main"
 totalspineseg_extract_alternate = "totalspineseg.utils.extract_alternate:main"
+totalspineseg_install_weights = "totalspineseg.utils.install_weights:main"
 
 [build-system]
 requires = ["pip>=23", "setuptools>=67"]

diff --git a/scripts/generate_sacrum_masks/README.md b/scripts/generate_sacrum_masks/README.md
@@ -0,0 +1,99 @@
+# Generate sacrum masks guides
+
+This file provides the different steps that were carried out to generate sacrum masks for the MRI totalspineseg project.
+
+More information about the context and problems can be found in this [issue](https://github.com/neuropoly/totalspineseg/issues/18).
+
+The main idea was to use open-source datasets with "easy to make" sacrum masks on MRI (T1w and T2w) scans to train a [nnUNetV2](https://github.com/MIC-DKFZ/nnUNet) model that will be able to segment sacrums on the whole-spine dataset and the spider dataset.
+
+# Training
+
+If you want to retrain the model, you can follow these steps.
+
+## I - Dowload the training datasets
+
+To generate the sacrum masks, 3 open-source datasets were used:
+
+| [GoldAtlas](https://zenodo.org/records/583096) | [SynthRAD2023](https://aapm.onlinelibrary.wiley.com/doi/full/10.1002/mp.16529) | [MRSpineSeg](https://paperswithcode.com/dataset/mrspineseg-challenge) |
+| :---: | :---: | :---: |
+| <img width="780" alt="Screenshot 2024-01-02 at 5 10 39 AM" src="https://github.com/neuropoly/totalsegmentator-mri/assets/68945192/a324ba05-1118-4eb3-bd4f-f9aabd077477">  | <img width="628" alt="Screenshot 2024-01-02 at 5 10 53 AM" src="https://github.com/neuropoly/totalsegmentator-mri/assets/68945192/10ddd780-ec42-4540-a091-19d2b2dc3e53"> | <img width="671" alt="Screenshot 2024-01-02 at 5 11 19 AM" src="https://github.com/neuropoly/totalsegmentator-mri/assets/68945192/a0069483-ad59-48bd-9c3e-a436888a39d7"> |
+
+> These datasets were chosen because they had these sacrum masks available or because they had co-registered MRI and CT images that allowed us to rely on the [CT total segmentator network](https://github.com/wasserth/TotalSegmentator) to generate these labels.
+
+These datasets were BIDSified and stored on our internal servers:
+- SynthRAD2023 (Internal access: `[email protected]:datasets/synthrad-challenge-2023.git`)
+- MRSpineSeg (Internal access: `[email protected]:datasets/mrspineseg-challenge-2021.git`)
+- GoldAtlas (Internal access: `[email protected]:datasets/goldatlas.git`)
+
+## II - Register CT labels to MRI
+
+> For this step, the https://github.com/spinalcordtoolbox/spinalcordtoolbox (SCT) was used.
+
+As specified before, some sacrum masks were generated using the [CT total segmentator network](https://github.com/wasserth/TotalSegmentator) but due to slightly different image shape between MRI (T1w and T2w) and CT scans (see [issue](https://github.com/neuropoly/totalspineseg/issues/18)), CT segmentations were registered to MRI space. To do that, the script `totalspineseg/utils/register_CT_seg_to_MR.py` was used on the three datasets.
+
+> Registration was also performed on the dataset `MRSpineSeg` due to slightly different q-form and s-form between segmentations and images.
+
+```bash
+python "$TOTALSPINESEG/totalspineseg/utils/register_CT_to_MR.py" --path-img <PATH-TO-BIDS-FOLDER>
+```
+
+## III - Generate a config file to select the data for training
+
+To select the data used for training, a [config file](https://github.com/spinalcordtoolbox/disc-labeling-hourglass/issues/25#issuecomment-1695818382) was used. 
+
+First fetch the paths to all the sacrum masks that will be used for TRAINING/VALIDATION/TESTING. The datasets should be stored inside the same parent folder
+
+> Run this following command in the parent folder folder of the datasets.
+
+```bash
+find ~+ -type f -name *_seg.nii.gz | grep -v CT | sort > train_sacrum.txt
+```
+
+Then run this command to generate the JSON config file.
+
+```bash
+python "$TOTALSPINESEG/totalspineseg/data_management/init_data_config.py" --txt train_sacrum.txt --type LABEL --split-validation SPLIT_VAL --split-test SPLIT_TEST
+```
+
+With `SPLIT_VAL` the fraction of the data used for validation and `SPLIT_TEST` the fraction of the data used for testing.
+
+Finally, to organize your data according to nnUNetV2 format, run this last command.
+
+```bash
+export nnUNet_raw="$TOTALSPINESEG_DATA"/nnUNet/raw
+python "$TOTALSPINESEG/totalspineseg/data_management/convert_config_to_nnunet.py" --config train_sacrum.json --path-out "$nnUNet_raw" -dnum 300
+```
+
+## IV - Train with nnUNetV2
+
+> Regarding nnUNetV2 installation and general usage, please check https://github.com/ivadomed/utilities/blob/main/quick_start_guides/nnU-Net_quick_start_guide.md
+
+Now that your data is ready, you can run nnUNetV2 preprocessing
+
+```bash
+export nnUNet_preprocessed="$TOTALSPINESEG_DATA"/nnUNet/preprocessed
+export nnUNet_results="$TOTALSPINESEG_DATA"/nnUNet/results/sacrum
+nnUNetv2_plan_and_preprocess -d 300 --verify_dataset_integrity -c 3d_fullres
+```
+
+Then train using this command
+
+```bash
+CUDA_VISIBLE_DEVICES=<GPU_ID> nnUNetv2_train 300 3d_fullres 0
+```
+
+# Inference on whole-spine and spider dataset
+
+To run nnUNetV2's inference, keep the largest component and store the data according to BIDS standard you must run:
+
+> Before running you must download the datasets `whole-spine` and `spider-challenge-2023` and update the variable `DATASETS_PATH` inside the config file `totalspineseg/resources/configs/test_sacrum.json`. 
+> This last path corresponds to the parent folder of the two datasets `whole-spine` and `spider-challenge-2023`.
+
+```bash
+bash "$TOTALSPINESEG/scripts/generate_sacrum_masks/generate_sacrum.sh"
+```
+
+
+
+
+
diff --git a/scripts/generate_sacrum_masks/generate_sacrum.sh b/scripts/generate_sacrum_masks/generate_sacrum.sh
@@ -0,0 +1,138 @@
+#!/bin/bash
+
+# This script calls nnUNetV2's inference to generate sacrum masks using a JSON config file (see totalspineseg/ressources/configs) and saves labels following BIDS' convention.
+
+# The following variables and paths MUST be updated before running the script:
+#  - PATH_CONFIG: to the config file `test_sacrum.json`
+#  - DERIVATIVE_FOLDER: name of the derivative folder (default=labels)
+#  - PATH_REPO: to the repository
+#  - PATH_NNUNET_MODEL: to the nnunet model Dataset300_SacrumDataset
+#  - AUTHOR: the author
+
+# The totalspineseg environment must be activated before running the script
+
+# Uncomment for full verbose
+# set -x
+
+# Immediately exit if error
+set -e -o pipefail
+
+# Exit if user presses CTRL+C (Linux) or CMD+C (OSX)
+trap "echo Caught Keyboard Interrupt within script. Exiting now.; exit" INT
+
+# GET PARAMS
+# ======================================================================================================================
+# SET DEFAULT VALUES FOR PARAMETERS.
+# ----------------------------------------------------------------------------------------------------------------------
+PATH_CONFIG="$TOTALSPINESEG/totalspineseg/resources/configs/test_sacrum.json"
+
+LABEL_SUFFIX="_label-sacrum_seg"
+PATH_REPO="$TOTALSPINESEG"
+NNUNET_RESULTS="$TOTALSPINESEG_DATA/nnUNet/results/sacrum"
+NNUNET_EXPORTS="$TOTALSPINESEG_DATA/nnUNet/exports"
+NNUNET_MODEL="Dataset300_SacrumDataset"
+PATH_NNUNET_MODEL="$NNUNET_RESULTS/$NNUNET_MODEL/nnUNetTrainer__nnUNetPlans__3d_fullres/"
+ZIP_URL="https://github.com/neuropoly/totalspineseg/releases/download/sacrum-seg/Dataset300_SacrumDataset.zip"
+PROCESS="nnUNet3D"
+DERIVATIVE_FOLDER="labels"
+FOLD="0"
+
+# Print variables to allow easier debug
+echo "See variables:"
+echo "PATH_CONFIG: ${PATH_CONFIG}"
+echo "DERIVATIVE_FOLDER: ${DERIVATIVE_FOLDER}"
+echo "LABEL_SUFFIX: ${LABEL_SUFFIX}"
+echo
+echo "PATH_REPO: ${PATH_REPO}"
+echo "NNUNET_RESULTS: ${NNUNET_RESULTS}"
+echo "NNUNET_EXPORTS: ${NNUNET_EXPORTS}"
+echo "NNUNET_MODEL: ${NNUNET_MODEL}"
+echo "FOLD: ${FOLD}"
+echo
+
+# FUNCTIONS
+# ======================================================================================================================
+# Segment sacrum using our nnUNet model
+segment_sacrum_nnUNet(){
+  local file_in="$1"
+  local file_out="$2"
+  local nnunet_model="$3"
+  local fold="$4"
+
+  # Call python function
+  python3 "${PATH_REPO}"/totalspineseg/utils/run_nnunet_inference_single_subject.py -i "${file_in}" -o "${file_out}" -path-model "${nnunet_model}" -fold "${fold}" -use-gpu -use-best-checkpoint
+}
+
+# Generate a json sidecar file
+generate_json(){
+  local path_json="$1"
+  local process="$2"
+
+  # Call python function
+  python3 "${PATH_REPO}"/totalspineseg/utils/create_json_sidecar.py -path-json "${path_json}" -process "${process}"
+}
+
+# Keep largest component only
+keep_largest_component(){
+  local seg_in="$1"
+  local seg_out="$2"
+
+  # Call python function
+  python3 "${PATH_REPO}"/totalspineseg/utils/largest_component_filewise.py --seg-in "${seg_in}" --seg-out "${seg_out}"
+}
+
+# Keep largest component only
+download_weights(){
+  local dataset="$1"
+  local url="$2"
+  local results_path="$3"
+  local exports_path="$4"
+
+  # Call python function
+  totalspineseg_install_weights --nnunet-dataset "${dataset}" --zip-url "${url}" --results-folder "${results_path}" --exports-folder "${exports_path}"
+}
+
+# ======================================================================================================================
+# SCRIPT STARTS HERE
+# ======================================================================================================================
+# Fetch datasets path
+DATASETS_PATH=$(jq -r '.DATASETS_PATH' "${PATH_CONFIG}")
+
+# Go to folder where data will be copied and processed
+cd "$DATASETS_PATH"
+
+# Fetch TESTING files
+FILES=$(jq -r '.TESTING[]' "${PATH_CONFIG}")
+
+# Download and install nnUNet weights
+download_weights "$NNUNET_MODEL" "$ZIP_URL" "$NNUNET_RESULTS" "$NNUNET_EXPORTS"
+
+# Loop across the files
+for FILE_PATH in $FILES; do
+    BIDS_FOLDER=$(echo "$FILE_PATH" | cut -d / -f 1)
+    IN_FILE_NAME=$(echo "$FILE_PATH" | awk -F / '{print $NF}' )
+    OUT_FILE_NAME=${IN_FILE_NAME/".nii.gz"/"${LABEL_SUFFIX}.nii.gz"}
+    IMG_PATH=${FILE_PATH/"${BIDS_FOLDER}/"/}
+    SUB_PATH=${IMG_PATH/"/${IN_FILE_NAME}"/}
+    BIDS_DERIVATIVES="${BIDS_FOLDER}/derivatives/${DERIVATIVE_FOLDER}"
+    OUT_FOLDER="${BIDS_DERIVATIVES}/${SUB_PATH}"
+    OUT_PATH="${OUT_FOLDER}/${OUT_FILE_NAME}"
+
+    # Create DERIVATIVES_FOLDER if missing
+    if [[ ! -d ${OUT_FOLDER} ]]; then
+        echo "Creating folders $OUT_FOLDER"
+        mkdir -p "${OUT_FOLDER}"
+    fi
+
+    # Generate output segmentation
+    echo "Generate segmentation ${FILE_PATH} ${OUT_PATH}"
+    segment_sacrum_nnUNet "$FILE_PATH" "$OUT_PATH" "$PATH_NNUNET_MODEL" "$FOLD"
+    keep_largest_component "$OUT_PATH" "$OUT_PATH"
+
+    # Generate json sidecar
+    JSON_PATH=${OUT_PATH/".nii.gz"/".json"}
+    echo "Generate jsonsidecar ${JSON_PATH}"
+    generate_json "$JSON_PATH" "$PROCESS"
+
+done
+
diff --git a/totalspineseg/__init__.py b/totalspineseg/__init__.py
@@ -12,4 +12,5 @@
 from .utils.preview_jpg import preview_jpg_mp
 from .utils.reorient_canonical import reorient_canonical_mp
 from .utils.resample import resample, resample_mp
-from .utils.transform_seg2image import transform_seg2image, transform_seg2image_mp
+from .utils.transform_seg2image import transform_seg2image, transform_seg2image_mp
+from .utils.install_weights import install_weights