This repository provides the ODYN inference code for the models used within the paper Development and Validation of an Artificial Intelligence-based Pipeline for Predicting Oral Epithelial Dysplasia Malignant Transformation.
The first step in this pipeline is to use HoVer-Net+ (see original paper here) to segment the epithelium and nuclei. We have used the TIAtoolbox (see paper here) implementation of HoVer-Net+ in the below scripts. Next, we have used a Transformer-based model to segment the dyplastic regions of the WSIs (see paper here).
We determine a slide as being normal or oral epithelial dysplasia (OED), by calculating the proportion of the epithelium that is predicted to be dysplastic. If this is above a certain threshold, we classify the case as OED.
Following this, for OED slides, we generate patch-level morphological and spatial features to use in our ODYN pipeline. We generate an ODYN-score for each slide by passing these patch-level features through a pre-trained multi-layer perceptron (MLP).
Note, this repository is for use with oral tissue H&E-stained WSIs/ROIs alone. We recommend running inference of the ODYN model on the GPU. Nuclear instance segmentation, in particular, will be very slow when run on CPU.
We use Python 3.11 with the tiatoolbox package installed. By default this uses PyTorch 2.2.
conda create -n odyn python=3.11 cudatoolkit=11.8
conda activate odyn
pip install tiatoolbox
pip uninstall torch
conda install pytorch
pip install h5py
pip install docopt
pip install ml-collections
Below are the main directories in the repository:
dataloader/
: the data loader and augmentation pipelinedoc/
: image files used for rendering the READMEutils/
: scripts for metric, patch generationmodels/
: model definition
Below are the main executable scripts in the repository:
run_odyn.py
: main inference script for ODYN, runs the below scripts consecutively (except heatmaps)dysplasia_segmentation.py
: transformer inference scriptepithelium_segmentation.py
: hovernetplus inference scriptoed_diagnosis.py
: script to diagnose a slide as OED vs normal (using output from above script)feature_generation.py
: script to generate features for the final MLP model (using output from above script)oed_prognosis.py
: main inference script for geenrating the ODYN-score for predicting malignant transformationheatmap_generation.py
: script to generate heatmapsvisualize_output.py
: script to convert output annotations into a single annotation store for easy viewing with TIAVizvisualize_output.sh
: bash script to load TIAViz for visualising all ODYN output at WSI-level
Input:
- WSIs supported by OpenSlide, including
svs
,tif
,ndpi
andmrxs
.
Output:
- HoVer-Net nuclei and epithelium segmentations as
dat
andpng
files, respectively. These segmentations are saved at 0.5 mpp resolution. Nucleidat
files have a key as the ID for each nucleus, which then contain a dictionary with the keys:- 'box': bounding box coordinates for each nucleus
- 'centroid': centroid coordinates for each nucleus
- 'contour': contour coordinates for each nucleus
- 'prob': per class probabilities for each nucleus
- 'type': prediction of category for each nucleus
- Transformer dysplasia segmentations as
png
files. These segmentations are saved at 1 mpp resolution. - ODYN diagnosis and prognosis CSV. This CSV will have a row for each input WSI. The columns will then display
slide_name
,status
,ODYN-score
. Thestatus
is whether ODYN has classified the slide as being normal or OED. TheODYN-score
is whether ODYN has predicted that the slide this lesion is from will progress to malignancy. - [Optional] ODYN heatmaps as
png
files. These segmentations are saved at 2 mpp resolution.
We use the following weights in this work. If any of the models or checkpoints are used, please ensure to cite the corresponding paper.
- The Transformer model weights (for dyplasia segmentation) obtained from training on the Sheffield OED dataset: OED Transformer checkpoint.
- The HoVer-Net+ model weights (for epithelium segmentation) obtained from training on the Sheffield OED dataset: OED HoVer-Net+ checkpoint. Note, these weights are updated compared to TIAToolbox's and are those obtained in this paper.
- The MLP model weights obtained from training on each fold of the Sheffield OED dataset: OED MLP checkpoints.
A user can run the ODYN pipeline on all their slides using the below command. This can be quite slow as nuclear segmentation (with HoVer-Net+) is run at 0.5mpp.
Usage:
python run_odyn.py --input_data_file="/path/to/input/data/file/" --input_dir="/path/to/input/slides/or/images/dir/" --output_dir="/path/to/output/dir/" --transformer_weights="/path/to/hovernetplus/checkpoint/" --hovernetplus_weights="/path/to/hovernetplus/checkpoint/" --mlp_weights="/path/to/mlp/checkpoint/" --mlp_norm_params="/path/to/mlp/norm/params/" --mlp_cutoff_file="/path/to/mlp/cutoffs/"
Alternatively, to have more control, a user can run each of the stages used by the ODYN model at a time. These are shown below. We recommend users to do use this method.
The first stage is to run the Transformer-based model on the WSIs to generate dysplasia segmentations. This is relatively fast and is run at 1.0mpp. Note, the model_checkpoint
is the path to the Transformer segmentation weights available to download from above.
Usage:
python dysplasia_segmentation.py --input_dir="/path/to/input/slides/or/images/dir/" --output_dir="/path/to/transformer/output/dir/" --model_checkpoint="/path/to/transformer/checkpoint/"
The second stage is to run HoVer-Net+ on the WSIs to generate epithelial and nuclei segmentations. This can be quite slow as run at 0.5mpp. Note, the model_checkpoint
is the path to the HoVer-Net+ segmentation weights available to download from above. However, if none are provided then the default version of HoVer-Net+ used with TIAToolbox, will be used.
Usage:
python epithelium_segmentation.py --input_dir="/path/to/input/slides/or/images/dir/" --output_dir="/path/to/epithelium/output/dir/" --model_checkpoint="/path/to/hovernetplus/checkpoint/"
The second stage is to classify a slide as being OED vs normal.
Usage:
python oed_diagnosis.py ---input_epith="/path/to/hovernetplus/mask/output/" --input_dysplasia="/path/to/transformer/output/" --output_dir="/path/to/output/dir/"
The fourth stage is to tesselate the image into smaller patches and generate correpsonding patch-level morphological and spatial features using the nuclei/layer segmentations. Note the mask_dir
is the epithelial mask output and nuclei_dir
is the nuclei output directory from the HoVer-Net+ step.
Usage:
python feature_generation.py --input_dir="/path/to/input/slides/or/images/dir/" --mask_dir="/path/to/hovernetplus/mask/output/" --nuclei_dir="/path/to/hovernetplus/nuclei/output/" --output_dir="/path/to/output/feature/dir/"
The final stage is to infer using the MLP on the tiles (and their features) generated in the previous steps. Here, the input_ftrs_dir
is the directroy containnig the features created in the previous steps. The model_checkpoint
path is to the weights provided above, and the input_data_file
is the path to the data file describing the slides to process. An example file is provided in data_file_template.csv
.
Usage:
python oed_prognosis.py --input_data_file="/path/to/input/data/file/" --input_ftrs_dir="/path/to/input/tile/ftrs/" --model_checkpoint="/path/to/mlp/checkpoint/" --output_dir="/path/to/output/dir/"
We can also generate heatmaps for these images. Change the stride
within the file from 128 to create smoother images. However, a decreased stride by 2X will increase the processing time by 2X. Note this use the combined mask prdocued by the oed_diagnosis.py
script.
Usage:
python heatmap_generation.py --input_dir="/path/to/input/slides/or/images/dir/" --mask_dir="/path/to/combined/mask/" --nuclei_dir --checkpoint_path="/path/to/mlp/checkpoint/" --output_dir="/path/to/heatmap/output/dir/"
Below we use the TIAToolbox's TIAViz tool to visualise the model output from OsDYN. Simply ammend the slide_dir
and overlay_dir
to the corresponding folders. Note, TIAViz will look two directory levels deep for overlays, prefixed with the name of the slide. You may need to change the permissions of the script to make it executable. If you use TIAViz, then please cite the paper TIAViz: A Browser-based Visualization Tool for Computational Pathology Models.
$ chmod u+wrx ./visualize_output.sh
$ ./visualize_output.sh
We have made an interactive demo to help visualise the output of our model. Note, this is not optimised for mobile phones and tablets. The demo was built using the TIAToolbox tile server.
Check out the demo here.
In the demo, we provide multiple examples of WSI-level results. These include:
- Dysplasia segmentations (using the Transformer model). Here, dysplasia is in red.
- Intra-epithelial layer segmentation (using HoVer-Net+). Here, orange is stroma, red is the basal layer, green the (core) epithelial layer, and blue keratin.
- Nuclei segmentations (using HoVer-Net+). Here, orange is "other" nuclei (i.e. connective/inflammatory), whilst the epithelial nuclei are coloured according to their intra-epithelial layer (see above).
- ODYN heatmaps where red spots show areas of high importance for predicting malignant transformation.
Each histological object can be toggled on/off by clicking the appropriate buton on the right hand side. Also, the colours and the opacity can be altered.
Code is under a GPL-3.0 license. See the LICENSE file for further details.
Model weights are licensed under Attribution-NonCommercial-ShareAlike 4.0 International. Please consider the implications of using the weights under this license.
If you find ODYN useful or use it in your research, please cite our paper:
@article {Shephard2024,
author = {Shephard, Adam J and Mahmood, Hanya and Raza, Shan E Ahmed and Araujo, Anna Luiza Damaceno and Santos-Silva, Alan Roger and Lopes, Marcio Ajudarte and Vargas, Pablo Agustin and McCombe, Kristopher D and Craig, Stephanie G and James, Jacqueline and Brooks, Jill M and Nankivell, Paul and Mehanna, Hisham and Khurram, Syed A and Rajpoot, Nasir},
title = {Development and Validation of an Artificial Intelligence-based Pipeline for Predicting Oral Epithelial Dysplasia Malignant Transformation},
elocation-id = {2024.11.13.24317264},
year = {2024},
doi = {10.1101/2024.11.13.24317264},
publisher = {Cold Spring Harbor Laboratory Press},
URL = {https://www.medrxiv.org/content/early/2024/11/13/2024.11.13.24317264},
eprint = {https://www.medrxiv.org/content/early/2024/11/13/2024.11.13.24317264.full.pdf},
journal = {medRxiv}
}