-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MHub / GC - Add grt123 Model for lung cancer prediction based on lung nodules #27
Conversation
…un.py, and others...
This PR has been updated for the new base image. |
Some suggestions
|
* default.yml * renamed from config.yml * added version and description * updated pipeline with panimg mhaconverter * Dockerfile * added fixed commit hash for grt123 repo git clone * updated entrypoint * LungCancerClassifierRunner.py * removed tmp_path config option * added requestTempDir for tmp_path * added more comments * Removed script files and custom PanImgConverter
I just updated and cleaned up this PR:
This satisfies all but two of your suggestions:
Is required for the model to know how the number of GPUs it has available, so I left it in.
This is tricky, since it outputs quite a detailed report with multiple scores per nodule per scan (see below). Maybe it is better to leave it as is? Or maybe we could only add a single cancer probability ValueOutput for the whole scan (last probability value in the report after cancerinfo)). {
"lungcad": {
"revision": "9a4ca0415c7fc1d3023a16650bf1cdce86f8bb59",
"name": "grt123",
"datetimeofexecution": "09/13/2023 12:28:46",
"coordinatesystem": "World",
"computationtimeinseconds": 48.483961
},
"imageinfo": {
"dimensions": [
512,
512,
140
],
"voxelsize": [
0.820312,
0.820312,
2.5
],
"origin": [
-228.800003,
-210.0,
-379.0
],
"orientation": [
1.0,
0.0,
0.0,
0.0,
1.0,
0.0,
0.0,
0.0,
1.0
],
"seriesuid": "dicom"
},
"findings": [
{
"id": 0,
"x": -46.800003000000004,
"y": -31.0,
"z": -168.0,
"probability": 0.9999904632568359,
"cancerprobability": 0.75371253490448
},
{
"id": 1,
"x": 73.199997,
"y": 79.0,
"z": -191.0,
"probability": 0.999943733215332,
"cancerprobability": 0.6662029027938843
},
{
"id": 2,
"x": 24.199996999999982,
"y": -47.0,
"z": -171.0,
"probability": 0.9999247789382935,
"cancerprobability": 0.47204485535621643
},
{
"id": 3,
"x": -82.800003,
"y": 108.0,
"z": -159.0,
"probability": 0.8182298541069031,
"cancerprobability": 0.004258527886122465
},
{
"id": 4,
"x": 49.199996999999996,
"y": -67.0,
"z": -275.0,
"probability": 0.7932767271995544,
"cancerprobability": 0.014665956608951092
},
{
"id": 5,
"x": 69.19999700000001,
"y": -63.0,
"z": -287.0,
"probability": 0.6662660241127014,
"cancerprobability": 0.0054708984680473804
},
{
"id": 6,
"x": 126.199997,
"y": 28.0,
"z": -334.0,
"probability": 0.6302552819252014,
"cancerprobability": 0.008180161006748676
},
{
"id": 7,
"x": -62.800003000000004,
"y": -4.0,
"z": -102.99999999999999,
"probability": 0.48750755190849304,
"cancerprobability": 0.00867788027971983
},
{
"id": 8,
"x": 73.199997,
"y": 80.0,
"z": -215.0,
"probability": 0.36484286189079285,
"cancerprobability": 0.03693072870373726
}
],
"cancerinfo": {
"casecancerprobability": 0.9574154615402222,
"referencenoduleids": [
0,
1,
2,
3,
4
]
}
} |
NOTE: This PR proposes a model currently using a dynamic length output. We are working on a method to export dynamic length outputs and will likely introduce a |
Let's do both, or actually all three :)
Dynamic Value Outputs are supported by the newest MHub-IO release now :) An example implementation would look like this: @ValueOutput.Name('lnrisk')
@ValueOutput.Label('Lung Nodule Risk-Score.')
@ValueOutput.Type(int)
@ValueOutput.Description('The predicted risk score for a single lung nodule detected by the alggorithm.')
class LNRisk(ValueOutput):
pass
def getLungNodulesRiskScores(dicom_dir) -> List[int]:
# ... find lung nodules, and report back an array of risk scores
return lst_scores
class MyModule(Module):
@IO.Instance
@IO.Input('in_data', 'dicom:mod=ct', the='chest CT image')
@IO.OutputDatas('lnrisks', LNRisk)
def task(self, instance: Instance, in_data: InstanceData, lnrisks: LNRisk):
scores = getLungNodulesRiskScores(in_data.abspath)
for nodule_i, score in enumerate(scores):
# create value output instance and set the value (we can also modify the description)
lnrisk = LNRisk()
lnrisk.description += f" (for nodule {nodule_i})"
lnrisk.value = score
# add to collection
lnrisks.add(lnrisk) |
@LennyN95 I have added the case level score and the dynamic scores per finding. models/models/gc_grt123_lung_cancer/utils/LungCancerClassifierRunner.py Lines 114 to 119 in 2d4365a
Setting the metadata in this way helps with debugging as well, with debug output like:
Do you think this would be of added value? |
Adding the ID (and coordinates) to the metadata is excellent (because then, when exporting the report, you could technically filter by these values). However, metadata is used to query files/data and is not available in the report exporter. So if a value is to be exportable, it needs its own value output. We could of course also implement a directive to export data metadata to the report. However, I find that this could create confusion about the ReportExporter, as it is less clear when information is stored in metadata and when as value output. |
Note: test passed (09.01.2024).
|
We have updated the meta.json, please have a look if you agree. There were no changes to the code, so the test should still pass. We should be able to move on with this. |
There's one last problem showing up here: When running the model, the following print out is not captured and pollutes the console: The problem is, that the print statement is executed at import time, which is before we (technically can) start capturing prints. You may be able to resolve the issue by moving the import inside the For the upcoming models, please always check that the print-out is clean and when running in normal mode (no |
…rt print statement MHubAI#27
@LennyN95 Good catch. The print statement issue has been addressed by doing as you suggested. I missed the minor glitch when running it in normal mode the first time I checked it. I'll be more aware of the issue with the upcoming models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requested changes addressed, tests passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more changes required to enhance the integration into the MHub model repository and UX when searching / browsing models on mhub.ai/models.
Hi, this PR contains the required code for getting our modified version of the publicly available grt123 model, winner of the Data Science Bowl 2017 Kaggle challenge.
Caveats
main
, but should be something likem-gc-grt123-lung-cancer
TODO
since it requires the creation of the appropriate branch for this code first.Algorithm I/O