MHub / GC - Add grt123 Model for lung cancer prediction based on lung nodules #27

silvandeleemput · 2023-07-04T13:18:28Z

Hi, this PR contains the required code for getting our modified version of the publicly available grt123 model, winner of the Data Science Bowl 2017 Kaggle challenge.

Caveats

The PR target is main, but should be something like m-gc-grt123-lung-cancer
In the Dockerfile the MHub model integration is currently marked TODO since it requires the creation of the appropriate branch for this code first.
This implementation still uses a run.py script because the MHAConverter doesn't have the panimg backend yet.

Algorithm I/O

Input should be a CT Lung image (MHA) (This MHub implementation expects a Dicom which is converted to MHA using MhaPanImgConverter)
Output is a JSON file containing the predicted nodule findings (locations) and predicted cancer probability scores.
Test images for this algorithm can be found here

…un.py, and others...

silvandeleemput · 2023-08-01T11:50:48Z

This PR has been updated for the new base image.

LennyN95 · 2023-08-13T13:16:31Z

Some suggestions

For the final prediction score, add a ValueOutput to the Runner module
Remove thetmp_path config
Use requestTempDir for temporary folders (example)
Remove custom gpu checks if not vital for the model runner (we may implement such a feature globally in mhubio, let's discuss this in our next meeting!)

* default.yml * renamed from config.yml * added version and description * updated pipeline with panimg mhaconverter * Dockerfile * added fixed commit hash for grt123 repo git clone * updated entrypoint * LungCancerClassifierRunner.py * removed tmp_path config option * added requestTempDir for tmp_path * added more comments * Removed script files and custom PanImgConverter

silvandeleemput · 2023-09-13T10:46:27Z

I just updated and cleaned up this PR:

default.yml
- renamed from config.yml
- added version and description
- updated pipeline with panimg mhaconverter
Dockerfile
- added fixed commit hash for grt123 repo git clone
- updated entrypoint
LungCancerClassifierRunner.py
- removed tmp_path config option
- added requestTempDir for tmp_path
- added more comments
Removed script files and custom PanImgConverter

This satisfies all but two of your suggestions:

Remove custom gpu checks if not vital for the model runner (we may implement such a feature globally in mhubio, let's discuss this in our next meeting!)

Is required for the model to know how the number of GPUs it has available, so I left it in.

For the final prediction score, add a ValueOutput to the Runner module

This is tricky, since it outputs quite a detailed report with multiple scores per nodule per scan (see below). Maybe it is better to leave it as is? Or maybe we could only add a single cancer probability ValueOutput for the whole scan (last probability value in the report after cancerinfo)).

{
    "lungcad": {
        "revision": "9a4ca0415c7fc1d3023a16650bf1cdce86f8bb59",
        "name": "grt123",
        "datetimeofexecution": "09/13/2023 12:28:46",
        "coordinatesystem": "World",
        "computationtimeinseconds": 48.483961
    },
    "imageinfo": {
        "dimensions": [
            512,
            512,
            140
        ],
        "voxelsize": [
            0.820312,
            0.820312,
            2.5
        ],
        "origin": [
            -228.800003,
            -210.0,
            -379.0
        ],
        "orientation": [
            1.0,
            0.0,
            0.0,
            0.0,
            1.0,
            0.0,
            0.0,
            0.0,
            1.0
        ],
        "seriesuid": "dicom"
    },
    "findings": [
        {
            "id": 0,
            "x": -46.800003000000004,
            "y": -31.0,
            "z": -168.0,
            "probability": 0.9999904632568359,
            "cancerprobability": 0.75371253490448
        },
        {
            "id": 1,
            "x": 73.199997,
            "y": 79.0,
            "z": -191.0,
            "probability": 0.999943733215332,
            "cancerprobability": 0.6662029027938843
        },
        {
            "id": 2,
            "x": 24.199996999999982,
            "y": -47.0,
            "z": -171.0,
            "probability": 0.9999247789382935,
            "cancerprobability": 0.47204485535621643
        },
        {
            "id": 3,
            "x": -82.800003,
            "y": 108.0,
            "z": -159.0,
            "probability": 0.8182298541069031,
            "cancerprobability": 0.004258527886122465
        },
        {
            "id": 4,
            "x": 49.199996999999996,
            "y": -67.0,
            "z": -275.0,
            "probability": 0.7932767271995544,
            "cancerprobability": 0.014665956608951092
        },
        {
            "id": 5,
            "x": 69.19999700000001,
            "y": -63.0,
            "z": -287.0,
            "probability": 0.6662660241127014,
            "cancerprobability": 0.0054708984680473804
        },
        {
            "id": 6,
            "x": 126.199997,
            "y": 28.0,
            "z": -334.0,
            "probability": 0.6302552819252014,
            "cancerprobability": 0.008180161006748676
        },
        {
            "id": 7,
            "x": -62.800003000000004,
            "y": -4.0,
            "z": -102.99999999999999,
            "probability": 0.48750755190849304,
            "cancerprobability": 0.00867788027971983
        },
        {
            "id": 8,
            "x": 73.199997,
            "y": 80.0,
            "z": -215.0,
            "probability": 0.36484286189079285,
            "cancerprobability": 0.03693072870373726
        }
    ],
    "cancerinfo": {
        "casecancerprobability": 0.9574154615402222,
        "referencenoduleids": [
            0,
            1,
            2,
            3,
            4
        ]
    }
}

models/gc_grt123_lung_cancer/dockerfiles/Dockerfile

LennyN95 · 2023-11-22T12:00:17Z

NOTE: This PR proposes a model currently using a dynamic length output. We are working on a method to export dynamic length outputs and will likely introduce a @IO.Out.Data.Many decorator (similar to our @IO.Output -> @IO.Outputs decorators).

LennyN95 · 2023-11-23T13:44:54Z

This is tricky, since it outputs quite a detailed report with multiple scores per nodule per scan (see below). Maybe it is better to leave it as is? Or maybe we could only add a single cancer probability ValueOutput for the whole scan (last probability value in the report after cancerinfo)).

Let's do both, or actually all three :)

Export the original JSON file
Use a Value Output for the overall score
Use a second, dynamic Value Output for all findings.

Dynamic Value Outputs are supported by the newest MHub-IO release now :)
The documentation can be found here and are fully supported in our ReportExporter Module by using an aggregate function (similar to files) but with some additional value operations.

An example implementation would look like this:

@ValueOutput.Name('lnrisk')
@ValueOutput.Label('Lung Nodule Risk-Score.')
@ValueOutput.Type(int)
@ValueOutput.Description('The predicted risk score for a single lung nodule detected by the alggorithm.')
class LNRisk(ValueOutput):
   pass

def getLungNodulesRiskScores(dicom_dir) -> List[int]:
   # ... find lung nodules, and report back an array of risk scores
   return lst_scores
   
class MyModule(Module):

   @IO.Instance
   @IO.Input('in_data', 'dicom:mod=ct', the='chest CT image')
   @IO.OutputDatas('lnrisks', LNRisk)
   def task(self, instance: Instance, in_data: InstanceData, lnrisks: LNRisk):

      scores = getLungNodulesRiskScores(in_data.abspath)

      for nodule_i, score in enumerate(scores):

         # create value output instance and set the value (we can also modify the description)
         lnrisk = LNRisk()
         lnrisk.description += f" (for nodule {nodule_i})"
         lnrisk.value = score

         # add to collection
         lnrisks.add(lnrisk)

…or all findings

silvandeleemput · 2023-11-27T09:32:19Z

@LennyN95 I have added the case level score and the dynamic scores per finding.
I also found that it might be useful for the dynamic scores to set the metadata for associated values (like position and id) per output value, like so:

models/models/gc_grt123_lung_cancer/utils/LungCancerClassifierRunner.py

Lines 114 to 119 in 2d4365a

    
           for finding in results_dict["findings"]: 
        
               nodule_cancer_prob = LNCancerProb() 
        
               nodule_cancer_prob.meta = Meta(id=finding['id'], x=finding['x'], y=finding['y'], z=finding['z'], ) 
        
               nodule_cancer_prob.description += f" (for nodule {finding['id']} at location ({finding['x']}, {finding['y']}, {finding['z']}))" 
        
               nodule_cancer_prob.value = finding["cancerprobability"] 
        
               lncancerprobs.add(nodule_cancer_prob)

Setting the metadata in this way helps with debugging as well, with debug output like:

├── lncancerprob [Lung Nodule cancer probability score.]
│   The predicted cancer probability score for a single lung nodule detected by the algorithm (for nodule 8 at location (73.199997, 80.0, -215.0))
│   └── Lung Nodule cancer probability score. (0.03693072870373726)
│   ├── id: 8
│   ├── x: 73.199997
│   ├── y: 80.0
│   └── z: -215.0

Do you think this would be of added value?
Furthermore, could you give some feedback on the general implementation of the output values, i.e. is it satisfactory?

LennyN95 · 2023-11-27T10:29:20Z

Adding the ID (and coordinates) to the metadata is excellent (because then, when exporting the report, you could technically filter by these values).

However, metadata is used to query files/data and is not available in the report exporter. So if a value is to be exportable, it needs its own value output. We could of course also implement a directive to export data metadata to the report. However, I find that this could create confusion about the ReportExporter, as it is less clear when information is stored in metadata and when as value output.

models/gc_grt123_lung_cancer/meta.json

LennyN95 · 2024-01-10T10:44:16Z

Note: test passed (09.01.2024).

DICOM case from NLST on IDC: 
https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection_id=nlst

Case ID: 100002
StudyInstanceUID: 1.2.840.113654.2.55.68425808326883186792123057288612355322
SeriesInstanceUID: 1.2.840.113654.2.55.229650531101716203536241646069123704792

s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/b22ed5a6-ad69-4b00-ba26-ae75e96345f8/*' .

Expected output:
---------------
grt123_lung_cancer_findings.json

├── lncancerprob [Lung Nodule cancer probability score.]
│   The predicted cancer probability score for a single lung nodule detected by the algorithm (for nodule 0 at location (112.39999399999999, -92.0, -260.545013))
│   └── Lung Nodule cancer probability score. (0.01483230385929346)
│   ├── id: 0
│   ├── x: 112.39999399999999
│   ├── y: -92.0
│   └── z: -260.545013
├── lncancerprob [Lung Nodule cancer probability score.]
│   The predicted cancer probability score for a single lung nodule detected by the algorithm (for nodule 1 at location (-47.60000600000001, 75.0, -245.54501299999998))
│   └── Lung Nodule cancer probability score. (0.01104212086647749)
│   ├── id: 1
│   ├── x: -47.60000600000001
│   ├── y: 75.0
│   └── z: -245.54501299999998
├── clcancerprob [Case level cancer probability score.]
│   Case level probability score
│   └── Case level cancer probability score. (0.025710642337799072)
│   ├── min: 0.0
│   ├── max: 1.0
│   └── type: probability

silvandeleemput · 2024-01-11T14:53:34Z

We have updated the meta.json, please have a look if you agree. There were no changes to the code, so the test should still pass. We should be able to move on with this.

LennyN95 · 2024-01-11T16:55:55Z

There's one last problem showing up here:

When running the model, the following print out is not captured and pollutes the console: (diag-)image_loader not found, loading dicom will not be possible, caused by this line.

The problem is, that the print statement is executed at import time, which is before we (technically can) start capturing prints.

You may be able to resolve the issue by moving the import inside the task() method of the runner Module.

For the upcoming models, please always check that the print-out is clean and when running in normal mode (no --print or --debug), the output is consistent (any uncaptured outputs like this one will break the appearance).

…rt print statement MHubAI#27

silvandeleemput · 2024-01-12T22:59:29Z

@LennyN95 Good catch. The print statement issue has been addressed by doing as you suggested. I missed the minor glitch when running it in normal mode the first time I checked it. I'll be more aware of the issue with the upcoming models.

LennyN95

Requested changes addressed, tests passed.

LennyN95

Two more changes required to enhance the integration into the MHub model repository and UX when searching / browsing models on mhub.ai/models.

models/gc_grt123_lung_cancer/meta.json

silvandeleemput added 6 commits June 22, 2023 17:32

add initial implementation grt123 model

395140a

made Dockerfiles install publicly available v2.0.0 of grt123

e1b4fd0

cleanup output JSON report

a09b573

Move git HEAD file in Dockerfiles to retain proper hash content

ec5f146

change MHub/DIAG -> MHub/GC in comments

20729a9

Updated for new base image (single Dockerfile), updated config.yml, r…

2942e37

…un.py, and others...

LennyN95 self-assigned this Aug 13, 2023

silvandeleemput and others added 2 commits September 13, 2023 12:00

Merge branch 'MHubAI:main' into m-grt123

ebd93ac

silvandeleemput added 2 commits October 3, 2023 12:48

Merge branch 'MHubAI:main' into m-grt123

7ce8a0a

Create meta.json

364831a

LennyN95 added the Requires MHub-IO Update Only for PR with model suggestions which require extended MHub-IO functionality. label Nov 22, 2023

LennyN95 reviewed Nov 22, 2023

View reviewed changes

models/gc_grt123_lung_cancer/dockerfiles/Dockerfile Outdated Show resolved Hide resolved

LennyN95 added +Model: ACTION REQUIRED and removed Requires MHub-IO Update Only for PR with model suggestions which require extended MHub-IO functionality. labels Nov 23, 2023

silvandeleemput and others added 4 commits November 24, 2023 22:33

Merge branch 'MHubAI:main' into m-grt123

5f4d04c

added mhub model definition and removed first comment line Dockerfile

ecdd166

cleanup runner imports, add new style logging

5ab7f20

added value output for overall score and added dynamic value output f…

2d4365a

…or all findings

LennyN95 requested changes Nov 27, 2023

View reviewed changes

models/gc_grt123_lung_cancer/meta.json Show resolved Hide resolved

models/gc_grt123_lung_cancer/meta.json Show resolved Hide resolved

PR comments on mata.json

49cdd7a

LennyN95 reviewed Jan 3, 2024

View reviewed changes

models/gc_grt123_lung_cancer/meta.json Outdated Show resolved Hide resolved

models/gc_grt123_lung_cancer/meta.json Outdated Show resolved Hide resolved

models/gc_grt123_lung_cancer/meta.json Outdated Show resolved Hide resolved

PR comments on mata.json

0b4ac0a

miriam-groeneveld added 2 commits January 3, 2024 13:40

PR comments on mata.json

c0d9076

DSB and evaluation dataset

f253229

LennyN95 removed the +Model: ACTION REQUIRED label Jan 8, 2024

LennyN95 requested changes Jan 10, 2024

View reviewed changes

models/gc_grt123_lung_cancer/meta.json Outdated Show resolved Hide resolved

models/gc_grt123_lung_cancer/meta.json Outdated Show resolved Hide resolved

models/gc_grt123_lung_cancer/meta.json Outdated Show resolved Hide resolved

meta.json - update links and data details MHubAI#27

fb5f2c9

move main import inside the task method of the runner to squelch impo…

15123ca

…rt print statement MHubAI#27

LennyN95 approved these changes Jan 16, 2024

View reviewed changes

LennyN95 requested changes Jan 16, 2024

View reviewed changes

models/gc_grt123_lung_cancer/meta.json Outdated Show resolved Hide resolved

models/gc_grt123_lung_cancer/meta.json Outdated Show resolved Hide resolved

LennyN95 requested changes Jan 16, 2024

View reviewed changes

models/gc_grt123_lung_cancer/meta.json Outdated Show resolved Hide resolved

LennyN95 closed this Jan 16, 2024

LennyN95 reopened this Jan 16, 2024

meta.json - matched model name, updated output label and description

62871f2

LennyN95 approved these changes Jan 18, 2024

View reviewed changes

meta.json - add version 2.0.0 to details

2f6a999

LennyN95 merged commit ada037d into MHubAI:main Feb 28, 2024
1 check passed

silvandeleemput deleted the m-grt123 branch March 5, 2024 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MHub / GC - Add grt123 Model for lung cancer prediction based on lung nodules #27

MHub / GC - Add grt123 Model for lung cancer prediction based on lung nodules #27

silvandeleemput commented Jul 4, 2023 •

edited

Loading

silvandeleemput commented Aug 1, 2023

LennyN95 commented Aug 13, 2023

silvandeleemput commented Sep 13, 2023 •

edited

Loading

LennyN95 commented Nov 22, 2023

LennyN95 commented Nov 23, 2023

silvandeleemput commented Nov 27, 2023

LennyN95 commented Nov 27, 2023

LennyN95 commented Jan 10, 2024

silvandeleemput commented Jan 11, 2024

LennyN95 commented Jan 11, 2024

silvandeleemput commented Jan 12, 2024

LennyN95 left a comment

LennyN95 left a comment

MHub / GC - Add grt123 Model for lung cancer prediction based on lung nodules #27

MHub / GC - Add grt123 Model for lung cancer prediction based on lung nodules #27

Conversation

silvandeleemput commented Jul 4, 2023 • edited Loading

Caveats

Algorithm I/O

silvandeleemput commented Aug 1, 2023

LennyN95 commented Aug 13, 2023

silvandeleemput commented Sep 13, 2023 • edited Loading

LennyN95 commented Nov 22, 2023

LennyN95 commented Nov 23, 2023

silvandeleemput commented Nov 27, 2023

LennyN95 commented Nov 27, 2023

LennyN95 commented Jan 10, 2024

silvandeleemput commented Jan 11, 2024

LennyN95 commented Jan 11, 2024

silvandeleemput commented Jan 12, 2024

LennyN95 left a comment

Choose a reason for hiding this comment

LennyN95 left a comment

Choose a reason for hiding this comment

silvandeleemput commented Jul 4, 2023 •

edited

Loading

silvandeleemput commented Sep 13, 2023 •

edited

Loading