Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MHub / GC - Add GC nnUNet Pancreas model #39

Merged
merged 25 commits into from
Feb 28, 2024

Conversation

silvandeleemput
Copy link
Contributor

@silvandeleemput silvandeleemput commented Aug 1, 2023

This PR contains the required code for the GC nnUNet Pancreas model.
GitHub Repo: https://github.com/DIAGNijmegen/CE-CT_PDAC_AutomaticDetection_nnUnet
GC page: https://grand-challenge.org/algorithms/pdac-detection/

Algorithm I/O

  • Input should be a CT Abdomen image (MHA) (This MHub implementation expects a Dicom which is converted to MHA using MhaPanImgConverter)
  • Outputs are:
    • MHA and DICOMSEG Segmentation with: 1-veins, 2-arteries, 3-pancreas, 4-pancreatic duct, 5-bile duct, 6-cysts, 7-renal vein
    • A heatmap of the pancreatic tumor likelihood
    • Case-level pancreatic tumor likelihood value, in range [0, 1]
  • Test images for this algorithm can be downloaded from here: https://zenodo.org/record/3431873
  • IDC test image:
Dataset: CTpred-Sunitinib-panNET

Case: PAN_14

SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.98806944896875131873924657854330253112

s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/5475d74e-33db-497e-ace0-00faba1a6a9a/*' .

Caveats

  • The PR target is main, but should be something like m-gc-nnunet-pancreas
  • In the Dockerfile the MHub model repo integration is currently marked TODO since it requires the creation of the appropriate branch for this code first.
  • This implementation still uses a run.py script because the MHAConverter doesn't have the panimg backend yet.
  • I added a temporary HEATMAP = Meta(mod="heatmap") which might be moved to MHubio/core/templates.py
  • The segmentation regions for this algorithm are not yet part of dseg repo, hence I have added a dseg.json file

…, mhaimporter and specific gcnnunetpancreasrunner

* Fixed Dockerfile with nocuda for now using the new base image
* Added two utils for this runner (MHAImporter, custom GCnnunetPancreasRunner)
* Marked bunch of things as TODO
@LennyN95
Copy link
Member

LennyN95 commented Aug 2, 2023

Hi Sil,

Great work!

Some comments:

  • We changed our strategy and ditch the model branches. We'll still use those for creation/patching of models but in the Docker container you can fetch the model from the main branch.
  • You can use the run module instead of the scripts for custom modules (like converters) as long as they're placed in the utils folder and the script name matches the class name

We'll inspect the segmentations and update them to SegDB

@silvandeleemput
Copy link
Contributor Author

  • We changed our strategy and ditch the model branches. We'll still use those for creation/patching of models but in the Docker container you can fetch the model from the main branch.

Great, I'll target the main branch instead.

  • You can use the run module instead of the scripts for custom modules (like converters) as long as they're placed in the utils folder and the script name matches the class name

Ah, I found that while the Runner modules get automatically picked up, this doesn't happen with converters (I added it as a * import to the __init__.py. Maybe you can check this? It might have something to do with Module class versus Converter class.

@LennyN95
Copy link
Member

Converters will be picked up automatically only if the file and class names are identical (which implies a one-module per class policy).

@LennyN95 LennyN95 self-assigned this Aug 13, 2023
@silvandeleemput
Copy link
Contributor Author

Converters will be picked up automatically only if the file and class names are identical (which implies a one-module per class policy).

Okay, that makes sense. On a related note, what is the current status of using panimg as a backend for conversion using the MhaConverter?

@LennyN95
Copy link
Member

LennyN95 commented Aug 29, 2023

I just updated our MHA conversion module. You now can specify engine: plastimatch|panimg in the config file to use panimg as conversion backend ;)

  MhaConverter:
    engine: plastimatch

@silvandeleemput
Copy link
Contributor Author

silvandeleemput commented Sep 14, 2023

I have updated this PR:

  • Uses the panimg backend now for the MhaConverter
  • Repository version has been pinned to a commit hash
  • Added more comments for clarity

Some open issues:

  • In the Dockerfile the MHub model repo integration is currently marked TODO since this still has to be tested after integration
  • A HEATMAP = Meta(mod="heatmap") is still marked as TODO in the runner file which should probably be moved to MHubio/core/templates.py
  • The segmentation regions for this algorithm are not yet part of dseg repo, hence I have added a dseg.json file, will these be added?

@LennyN95 LennyN95 added the Requires SegDB Update For segmentation models that delineate additional structures currently not covered by our SegDB. label Nov 22, 2023
@LennyN95
Copy link
Member

LennyN95 commented Nov 22, 2023

The algorithm delineates the following ROI.

NOTE: We will add them to our SegDB shortly. Some need clarification (what is delineated, in what context, complete/incomplete, etc.). A visual demonstration would be ideal to ensure correct usage of SCT codes.

  • Veins (needs clarification)
  • Artery, right and left (needs clarification)
  • Pancreas (SegDB ID PANCREAS)
  • Pancreatic duct (SCT 69930009)
  • Bile duct (SCT 28273000)
  • Cysts (SegDB ID CYST; usage in context, e.g. pancreatic cyst: PANCREAS+CYST)
  • Renal vein (SCT 56400007)

@silvandeleemput
Copy link
Contributor Author

Some of the algorithm segmentation output viewed on GC:
nnunet_pancreas_seg_labels

I think that for both the veins and for the arteries there is no distinction between left and right, just one broad class for each.
Veins and arteries seem incomplete in the sagittal direction (top and bottom).

The area of focus is the pancreas enclosed in a bounding box with some margins for the other tissues.

@LennyN95
Copy link
Member

Not sure if there is (great) value in adding the incomplete structures, even on a general SCT.
I'd focus on the complete ones and regard the others (they will still be accessible through the original model output files).

@fedorov what do you think?

@LennyN95
Copy link
Member

LennyN95 commented Dec 1, 2023

Added the following IDs to SegDB: PANCREATIC_DUCT, BILE_DUCT, RENAL_VEIN.

@LennyN95 LennyN95 added +Model: ACTION REQUIRED and removed Requires SegDB Update For segmentation models that delineate additional structures currently not covered by our SegDB. labels Dec 1, 2023
@LennyN95
Copy link
Member

LennyN95 commented Dec 7, 2023

I guess you are using 3D Slicer for viewing?

Yes, that's the 3D render of the DicomSEG in Slicer.

Do we want to include Arteries and Veins?

EDIT: I just added them to SegDB, you can use ARTERY and VEIN as unspecific general IDs.

models/gc_nnunet_pancreas/config/default.yml Show resolved Hide resolved
models/gc_nnunet_pancreas/config/default.yml Outdated Show resolved Hide resolved
models/gc_nnunet_pancreas/config/default.yml Show resolved Hide resolved
models/gc_nnunet_pancreas/config/default.yml Show resolved Hide resolved
models/gc_nnunet_pancreas/dockerfiles/Dockerfile Outdated Show resolved Hide resolved
models/gc_nnunet_pancreas/dockerfiles/Dockerfile Outdated Show resolved Hide resolved
models/gc_nnunet_pancreas/meta.json Outdated Show resolved Hide resolved
models/gc_nnunet_pancreas/utils/cli.py Show resolved Hide resolved
@silvandeleemput
Copy link
Contributor Author

Hi, we spoke to Natália today and she says that we should exclude the Bile Duct, Cysts, and Renal vein outputs, as these are not properly validated segmentations. The evaluation of Natália's algorithm was focused on the cancer probability maps and specifically the maximum value of those for the overall case cancer probability. We will add the overall case cancer probability statistic as an output, remove the bile duct cysts and renal veins from the output DICOM segmentation, and add some comments about the limitations of the original output segmentation.

@silvandeleemput
Copy link
Contributor Author

/review

This model has been updated significantly and needs to be reviewed again. A new test file will be provided that contains all the output segmentations for the segmentation output map and also the new case-level likelihood value. I'll send it via Slack.

@github-actions github-actions bot added the REQUEST REVIEW Attach this label to your PR when your submission is "in progress" and is ready to be reviewed by us label Jan 18, 2024
@LennyN95 LennyN95 removed the REQUEST REVIEW Attach this label to your PR when your submission is "in progress" and is ready to be reviewed by us label Jan 25, 2024
@silvandeleemput
Copy link
Contributor Author

output.zip

@silvandeleemput
Copy link
Contributor Author

silvandeleemput commented Feb 2, 2024

/test

sample:
  idc_version: 17
  data:
    - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.98806944896875131873924657854330253112
      aws: 5475d74e-33db-497e-ace0-00faba1a6a9a
      path: dicom

reference:
  url: https://github.com/MHubAI/models/files/14125409/output.zip

Test Results (24.02.08_14.58.03_52I9eOKFVS)
checked_files:
- path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.98806944896875131873924657854330253112/nnunet_pancreas.seg.mha
  checker: ImageFileCheck
  findings:
  - label: Dice Score
    description: Dice score between reference and test image
    info: 0.999966
- path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.98806944896875131873924657854330253112/nnunet_pancreas.seg.mha
  checker: SizeCheck
  findings:
  - label: Size Difference
    description: file size is smaller than reference
    info:
      src_size: 83269
      ref_size: 83270
      diff_size: -1
- path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.98806944896875131873924657854330253112/nnunet_pancreas.seg.dcm
  checker: DicomsegContentCheck
- path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.98806944896875131873924657854330253112/nnunet_pancreas_case_level_likelihood.json
  checker: DataFileCheck
- path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.98806944896875131873924657854330253112/nnunet_pancreas_case_level_likelihood.json
  checker: SizeCheck
- path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.98806944896875131873924657854330253112/nnunet_pancreas_heatmap.mha
  checker: ImageFileCheck
  findings:
  - label: Exception
    description: An exception occurred during check
    info: 'could not convert string to float: '''''
- path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.98806944896875131873924657854330253112/nnunet_pancreas_heatmap.mha
  checker: SizeCheck
  findings:
  - label: Size Difference
    description: file size is smaller than reference
    info:
      src_size: 9389151
      ref_size: 9389181
      diff_size: -30
summary:
  files_missing: 0
  files_extra: 0
  checks:
    ImageFileCheck:
      files: 2
      findings:
        Dice Score: 1
        Exception: 1
    SizeCheck:
      files: 3
      findings:
        Size Difference: 2
    DicomsegContentCheck:
      files: 1
    DataFileCheck:
      files: 1
conclusion: false

@github-actions github-actions bot added the REQUEST TEST Attach this label to your PR when your submission is ready to be tested by us. label Feb 2, 2024
@silvandeleemput
Copy link
Contributor Author

output.zip

@silvandeleemput
Copy link
Contributor Author

silvandeleemput commented Feb 8, 2024

/test

sample:
  idc_version: 17.0
  data:
    - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.237127321134041099456397194653519856815
      aws_url: s3://idc-open-data/be1ba92c-7476-4853-a305-47c8e3e32660/*
      path: dicom

reference:
  url: https://github.com/MHubAI/models/files/14211021/output.zip

Test Results (24.02.08_19.39.01_bQJenUOogX)
checked_files:
- file: nnunet_pancreas.seg.mha
  path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.237127321134041099456397194653519856815/nnunet_pancreas.seg.mha
  checks:
  - checker: ImageFileCheck
    notes:
    - label: Data Type
      description: Data type of the reference image
      info: uint8
    - label: Dice Score
      description: Dice score between reference and test image
      info: 0.999981
- file: nnunet_pancreas.seg.dcm
  path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.237127321134041099456397194653519856815/nnunet_pancreas.seg.dcm
  checks:
  - checker: DicomsegContentCheck
    notes:
    - label: Segment Count
      description: The number of segments identified in the inspected dicomseg file.
      info: 5
- file: nnunet_pancreas_case_level_likelihood.json
  path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.237127321134041099456397194653519856815/nnunet_pancreas_case_level_likelihood.json
  checks:
  - checker: DataFileCheck
  - checker: SizeCheck
- file: nnunet_pancreas_heatmap.mha
  path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.237127321134041099456397194653519856815/nnunet_pancreas_heatmap.mha
  checks:
  - checker: ImageFileCheck
    notes:
    - label: Data Type
      description: Data type of the reference image
      info: float32
    - label: Image Diff Stat
      description: Statistics of the diff image between src and ref.
      info:
        MIN: -0.000488
        AVE: 0.0
        MAX: 0.000488
        MAE: 0.0
        MSE: 0.0
        DIF: 1082396.0
        NUM: 77856768.0
summary:
  files_missing: 0
  files_extra: 0
  checks:
    ImageFileCheck:
      files: 2
    DicomsegContentCheck:
      files: 1
    DataFileCheck:
      files: 1
    SizeCheck:
      files: 1
conclusion: true

@fedorov
Copy link
Member

fedorov commented Feb 8, 2024

@LennyN95 I missed your comment below on slack, and it struck me just now that you changed the format of how AWS path is communicated.

image

If you see my response, I say you need the URL. The UUID is not sufficient to locate the folder in AWS.

image

@fedorov
Copy link
Member

fedorov commented Feb 8, 2024

If you want to use just the UUID, you will need: 1) change the label from "aws" to "crdc_series_uuid" - since that is what it is; 2) add a step in your workflow to resolve crdc_series_uuid to AWS URL via idc-index. IDC images are spread across several AWS buckets, that is why you cannot use the fixed S3 bucket prefix.

@LennyN95 LennyN95 removed the REQUEST TEST Attach this label to your PR when your submission is ready to be tested by us. label Feb 27, 2024
@LennyN95 LennyN95 merged commit 3119ba8 into MHubAI:main Feb 28, 2024
1 check passed
@silvandeleemput silvandeleemput deleted the m-gc-nnunet-pancreas branch March 5, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants