Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MHub / GC - Add the Node21 baseline model #41

Merged
merged 18 commits into from
Apr 25, 2024

Conversation

silvandeleemput
Copy link
Contributor

@silvandeleemput silvandeleemput commented Aug 2, 2023

This PR adds the GC STOIC baseline model (from the STOIC2021 challenge) to MHub.
GitHub Repo: https://github.com/node21challenge/node21_detection_baseline
GC page: https://node21.grand-challenge.org/

Algorithm I/O

  • Input should be a Chest X-Ray image (MHA) (This MHub implementation expects a Dicom which is converted to MHA using panimg MhaConverter)
  • Outputs are:
    • A JSON file with the bounding boxes and probabilities for the nodules
  • Test DICOM images for this algorithm can be downloaded from IDC:
# To download the files in this manifest, first install s5cmd (https://github.com/peak/s5cmd),
# then run the following command:
# s5cmd --no-sign-request --endpoint-url https://storage.googleapis.com run file_manifest_gcs.s5cmd
cp s3://public-datasets-idc/8fb5dfd4-0bd1-4dc2-a8a7-7a867c40450e/* .
cp s3://public-datasets-idc/3c0bd6df-1772-4bb2-abe5-570665311253/* .

Caveats

  • I added the CR = Meta(mode="CR") # CR Computed Radiography label, which should move to mhubio/core/templates.py, not really sure if this is the best label for Chest X-Rays (CXR)
  • The PR target is main, but should be something like m-gc-node21-baseline
  • In the Dockerfile the MHub model repo integration is currently marked TODO and should be uncommented/edited
  • Added a TODO to the MHAConverter since it doesn't have the panimg backend yet (without this backend it fails for my test image), and it uses a run.py script to use the panimg mhaconverter.

@LennyN95 LennyN95 self-assigned this Aug 13, 2023
silvandeleemput and others added 2 commits September 14, 2023 15:29
* Dockerfile
  * Use mhubio.run as entrypoint
  * Fixed commit for node21 repo
  * Updated TODO on MHub model code integration
* default.yml
  * Changed mhaconverter backend to panimg
* Node21BaselineRunner.py
  * Added comments
* Removed
  * run.py
  * PanImgConverters.py
@silvandeleemput
Copy link
Contributor Author

Updated this PR:

  • Dockerfile
    • Use mhubio.run as entrypoint
    • Fixed commit for node21 repo
    • Updated TODO on MHub model code integration
  • default.yml
    • Changed mhaconverter backend to panimg
  • Node21BaselineRunner.py
    • Added comments
  • Removed
    • run.py
    • PanImgConverters.py

@LennyN95
Copy link
Member

NOTE: This PR proposes a model currently using a dynamic length output. We are working on a method to export dynamic length outputs and will likely introduce a @IO.Out.Data.Many decorator (similar to our @IO.Output -> @IO.Outputs decorators).

@LennyN95 LennyN95 added the Requires MHub-IO Update Only for PR with model suggestions which require extended MHub-IO functionality. label Nov 22, 2023
@LennyN95 LennyN95 added +Model: ACTION REQUIRED and removed Requires MHub-IO Update Only for PR with model suggestions which require extended MHub-IO functionality. labels Nov 23, 2023
@silvandeleemput
Copy link
Contributor Author

@silvandeleemput
Copy link
Contributor Author

silvandeleemput commented Apr 18, 2024

/test

sample:
  idc_version: 17.0
  data:
    - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.6279.6001.210102868760281756294235082201
      aws_url: s3://idc-open-data/b1c859c3-7471-43e7-9882-0bb5420ec18a/*
      path: dicom

reference:
  url: https://github.com/MHubAI/models/files/15024198/gc_node21_baseline_output.zip

Test Results (24.04.18_22.31.12_E3krCTzQUg)
id: e8377a1b-338f-4fab-8647-fe4b72a250be
date: '2024-04-18 22:36:27'
checked_files:
- file: nodules.json
  path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.6279.6001.210102868760281756294235082201/nodules.json
  checks:
  - checker: DataFileCheck
    notes:
    - label: Value Match
      description: These keys have identical values
      info: type,boxes.[0].corners.[0].[0],boxes.[0].corners.[0].[1],boxes.[0].corners.[0].[2],boxes.[0].corners.[1].[0],boxes.[0].corners.[1].[1],boxes.[0].corners.[1].[2],boxes.[0].corners.[2].[0],boxes.[0].corners.[2].[1],boxes.[0].corners.[2].[2],boxes.[0].corners.[3].[0],boxes.[0].corners.[3].[1],boxes.[0].corners.[3].[2],boxes.[0].probability,boxes.[1].corners.[0].[0],boxes.[1].corners.[0].[1],boxes.[1].corners.[0].[2],boxes.[1].corners.[1].[0],boxes.[1].corners.[1].[1],boxes.[1].corners.[1].[2]
        (+ 31 more)
    findings:
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[0].[0]' is similar in source file
      subpath: boxes.[2].corners.[0].[0]
      info:
        scale: 10
        precision: -1
        src: 56.17
        ref: 52.17
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[0].[1]' is similar in source file
      subpath: boxes.[2].corners.[0].[1]
      info:
        scale: 100
        precision: -1
        src: 210.43
        ref: 259.47
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[1].[0]' is similar in source file
      subpath: boxes.[2].corners.[1].[0]
      info:
        scale: 1
        precision: -1
        src: 48.52
        ref: 31.0
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[1].[1]' is similar in source file
      subpath: boxes.[2].corners.[1].[1]
      info:
        scale: 100
        precision: -1
        src: 210.43
        ref: 259.47
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[2].[0]' is similar in source file
      subpath: boxes.[2].corners.[2].[0]
      info:
        scale: 1
        precision: -1
        src: 48.52
        ref: 31.0
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[2].[1]' is similar in source file
      subpath: boxes.[2].corners.[2].[1]
      info:
        scale: 100
        precision: -1
        src: 201.47
        ref: 237.66
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[3].[0]' is similar in source file
      subpath: boxes.[2].corners.[3].[0]
      info:
        scale: 10
        precision: -1
        src: 56.17
        ref: 52.17
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[3].[1]' is similar in source file
      subpath: boxes.[2].corners.[3].[1]
      info:
        scale: 100
        precision: -1
        src: 201.47
        ref: 237.66
    - label: Value similar
      description: Value of key 'boxes.[3].corners.[0].[0]' is similar in source file
      subpath: boxes.[3].corners.[0].[0]
      info:
        scale: 10
        precision: -1
        src: 52.16
        ref: 56.17
    - label: Value similar
      description: Value of key 'boxes.[3].corners.[0].[1]' is similar in source file
      subpath: boxes.[3].corners.[0].[1]
      info:
        scale: 100
        precision: -1
        src: 259.48
        ref: 210.43
    - label: Value similar
      description: Value of key 'boxes.[3].corners.[1].[0]' is similar in source file
      subpath: boxes.[3].corners.[1].[0]
      info:
        scale: 1
        precision: -1
        src: 31.0
        ref: 48.52
    - label: Value similar
      description: Value of key 'boxes.[3].corners.[1].[1]' is similar in source file
      subpath: boxes.[3].corners.[1].[1]
      info:
        scale: 100
        precision: -1
        src: 259.48
        ref: 210.43
    - label: Value similar
      description: Value of key 'boxes.[3].corners.[2].[0]' is similar in source file
      subpath: boxes.[3].corners.[2].[0]
      info:
        scale: 1
        precision: -1
        src: 31.0
        ref: 48.52
    - label: Value similar
      description: Value of key 'boxes.[3].corners.[2].[1]' is similar in source file
      subpath: boxes.[3].corners.[2].[1]
      info:
        scale: 100
        precision: -1
        src: 237.66
        ref: 201.47
    - label: Value similar
      description: Value of key 'boxes.[3].corners.[3].[0]' is similar in source file
      subpath: boxes.[3].corners.[3].[0]
      info:
        scale: 10
        precision: -1
        src: 52.16
        ref: 56.17
    - label: Value similar
      description: Value of key 'boxes.[3].corners.[3].[1]' is similar in source file
      subpath: boxes.[3].corners.[3].[1]
      info:
        scale: 100
        precision: -1
        src: 237.66
        ref: 201.47
    - label: Value similar
      description: Value of key 'boxes.[4].probability' is similar in source file
      subpath: boxes.[4].probability
      info:
        scale: 1
        precision: 1
        src: 0.06
        ref: 0.05
  - checker: SizeCheck
summary:
  files_missing: 0
  files_extra: 0
  checks:
    DataFileCheck:
      files: 1
      findings:
        Value similar: 17
    SizeCheck:
      files: 1
conclusion: false

Test Results (24.04.25_08.24.29_0ZZ5RYzDlk)
id: 12aafd5a-a277-4f27-a5c7-708d5670cf91
date: '2024-04-25 08:30:39'
checked_files:
- file: nodules.json
  path: /app/test/src/1.3.6.1.4.1.14519.5.2.1.6279.6001.210102868760281756294235082201/nodules.json
  checks:
  - checker: DataFileCheck
    notes:
    - label: Value Match
      description: These keys have identical values
      info: type,boxes.[0].corners.[0].[0],boxes.[0].corners.[0].[1],boxes.[0].corners.[0].[2],boxes.[0].corners.[1].[0],boxes.[0].corners.[1].[1],boxes.[0].corners.[1].[2],boxes.[0].corners.[2].[0],boxes.[0].corners.[2].[1],boxes.[0].corners.[2].[2],boxes.[0].corners.[3].[0],boxes.[0].corners.[3].[1],boxes.[0].corners.[3].[2],boxes.[0].probability,boxes.[1].corners.[0].[0],boxes.[1].corners.[0].[1],boxes.[1].corners.[0].[2],boxes.[1].corners.[1].[0],boxes.[1].corners.[1].[1],boxes.[1].corners.[1].[2]
        (+ 43 more)
    findings:
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[0].[0]' is similar in source file
      subpath: boxes.[2].corners.[0].[0]
      info:
        scale: 1
        precision: 1
        src: 52.16
        ref: 52.17
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[0].[1]' is similar in source file
      subpath: boxes.[2].corners.[0].[1]
      info:
        scale: 1
        precision: 1
        src: 259.48
        ref: 259.47
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[1].[1]' is similar in source file
      subpath: boxes.[2].corners.[1].[1]
      info:
        scale: 1
        precision: 1
        src: 259.48
        ref: 259.47
    - label: Value similar
      description: Value of key 'boxes.[2].corners.[3].[0]' is similar in source file
      subpath: boxes.[2].corners.[3].[0]
      info:
        scale: 1
        precision: 1
        src: 52.16
        ref: 52.17
    - label: Value similar
      description: Value of key 'boxes.[4].probability' is similar in source file
      subpath: boxes.[4].probability
      info:
        scale: 1
        precision: 1
        src: 0.06
        ref: 0.05
  - checker: SizeCheck
summary:
  files_missing: 0
  files_extra: 0
  checks:
    DataFileCheck:
      files: 1
      findings:
        Value similar: 5
    SizeCheck:
      files: 1
conclusion: false

@silvandeleemput
Copy link
Contributor Author

/review

We updated the meta.json file, we implemented the suggestion for the OutputDatas (probabilities, and bounding boxes as JSON), and we have wrapped the algorithm/model in a CLI to avoid stdout capture issues. It should be as good as done, please have a look.

@github-actions github-actions bot added the REQUEST REVIEW Attach this label to your PR when your submission is "in progress" and is ready to be reviewed by us label Apr 18, 2024
Copy link
Member

@LennyN95 LennyN95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the structure clean, the utils folder ideally only contains any MHub-IO related utilities (e.g., custom Modules such as the Node21BaselineRunner) but no model implementation code. All model implementation should go into an src folder, all additional resources into an res folder.

Making such a change is purely organizational (although for the utils folder, it is somewhat meaningful as we parse this folder to detect available Modules).

However, in case you have to make any code adjustments and we have to redo the test, I'd ask for this change.

Copy link
Member

@LennyN95 LennyN95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failed and showed (high) variation in the bounding boxes. This needs to be investigated before we can proceed.

@silvandeleemput
Copy link
Contributor Author

I investigated this issue and found the root cause for the large bounding box differences.

Rerunning on my machine consistently gave me the same output as I have uploaded before.
I have 5 detected nodules with each 4 corners (3 values x,y,z=0) and a probability.

Comparing these with your test results I found:

  • The number of detected bounding boxes matches the number I found on my side (5). This seems ok.
  • All found probabilities match pretty closely. (some small deviations are expected because of GPU hardware/driver differences and rounding errors). These seem fine.
  • Bounding boxes 0, 1, and 4 seem fine and the same as what I found.
  • Bounding boxes 2 & 3 are swapped, having the same probability! Giving rise to the big found differences. It seems to be a sorting issue.

I can fix this by applying a better sorting algorithm after the output is generated to produce a more consistent output JSON file. This can be added to the cli.py and go together with a small refactor of the cli.py file to a /src directory. This should be under /models/gc_node21_baseline/src/ then ?

@silvandeleemput
Copy link
Contributor Author

@LennyN95 alright, I have added the improved sorting, moved the cli.py to a src directory, and updated the model version to 1.1, both in the meta.json and the Dockerfile. The previous test can be redone and should produce better results because the sorting issue is fixed. This model should be as good as done.

@silvandeleemput
Copy link
Contributor Author

I looked at the new result generated after adding the sorting fix, and the sorting works now: the same bounding boxes are found and sorted in the correct order. The found differences this time around are only small rounding errors for both the coordinates of the bounding boxes and for the probabilities, which are expected. This result is acceptable.

@LennyN95 LennyN95 removed REQUEST REVIEW Attach this label to your PR when your submission is "in progress" and is ready to be reviewed by us TEST REQUESTED labels Apr 25, 2024
Copy link
Member

@LennyN95 LennyN95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test passed (rounding errors expected, see comment from the model contributor).

@LennyN95 LennyN95 merged commit 2df5376 into MHubAI:main Apr 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants