Skip to content

1. Preprocessing the Data

Joshua Levy edited this page Aug 14, 2019 · 8 revisions

Here, we discuss how to preprocess the data for the following image formats: .npy, .svs, .tiff, .tif, .vms, .vmu, .ndpi, .scn, .mrxs, .svslide, .bif, .jpeg, .png.

Any WSI that you have must be placed in the same directory. PathFlowAI will search this directory for these image types and runs its preprocessing pipeline individually on each image.

Suppose we set up a directory inputs/ , placing image A01.npy into this directory. Accompanying it for a classification task needs to be an .xml file with the same basename, A01, in the same directory. The XML file must be an annotation file exported using an annotation suite such as ASAP, QuPath, etc. If seeking to run a segmentation task, replace the XML file with a numpy file (file name is [basename]_mask.npy) of the same size of the image (make sure sizes agree!) containing a segmentation mask (for now, background must be labeled as 0, and next components must be ordered from 0, 1, ... etc.). In the near future, an accompanying annotation file of sorts will not be required to run the pipeline.

So considering the task, the input/ directory would contain:
A01.npy
A01_mask.npy if segmentation
A01.xml if classification
Repeating this for the other images.

Once this is done, the preprocessing pipeline can be run.

pathflowai-preprocess preprocess_pipeline -odb patch_information.db --preprocess --patches --basename A01 --input_dir inputs/ --patch_size 256 --intensity_threshold 45. -tc 7 -t 0.05

This searches the input/ directory for files beginning with A01, will autodetect an image file, and then decide whether this is a classification or segmentation task given whether a xml file was chosen or an _mask.npy file.

First, let's suppose this is a segmentation task, where we want the patches to be 256x256.

Clone this wiki locally