diff --git a/README.md b/README.md index a463f90..f47e86d 100644 --- a/README.md +++ b/README.md @@ -1,263 +1,5 @@ +`pylidc` is a python library intended to improve workflow associated with the [LIDC dataset](https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI). -# pylidc +Install via pip: `pip install pylidc`. -`pylidc` is a python library intended to enhance workflow associated with the [LIDC dataset](https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI), including utilities for both querying by attributes (e.g., collecting all annotations where malignancy is labeled as greater than 3 and spiculation is labeled a value equal to 1), and for common functional routines that act on the associated data (e.g., estimating the diameter or volume of a nodule from one of its annotations). - -Routines for visualizing the annotations, both atop the CT data and as a surface in 3D, are implemented. These functionalities are implemented via an object relational mapping (ORM), using `sqlalchemy` to an sqlite database containing the annotation information from the XML files provided by the LIDC dataset. - -![](https://raw.githubusercontent.com/pylidc/pylidc/master/img/viz-in-scan-example.png) -![](https://raw.githubusercontent.com/pylidc/pylidc/master/img/viz-in-3d-example.png) - -Table of Contents -================= - - * [Installation and setup](#installation-and-setup) - * [Dicom file directory configuration](#dicom-file-directory-configuration) - * [Citing](#citing) - * [Example usage](#example-usage) - * [Basic examples](#basic-examples) - * [The Scan model](#the-scan-model) - * [The Annotation model](#the-annotation-model) - * [Advanced queries](#advanced-queries) - * [Get a random result](#get-a-random-result) - * [Query multiple model parameters with a join](#query-multiple-model-parameters-with-a-join) - * [Resampling the volumes](#resampling-the-volumes) - * [Clustering annotations](#clustering-annotations-to-identify-those-which-refer-to-the-same-physical-nodule) - - -## Installation and setup - -`pylidc` has been used on Linux, Mac, and Windows, and on Python 2 and 3. - -The package can be installed via `pip`: - - pip install pylidc - -### Dicom file directory configuration - -While `pylidc` has many functions for analyzing and querying only annotation data (which do not require DICOM image data access), `pylidc` also has many functions that do require access to the DICOM files associated with the LIDC dataset. `pylidc` looks for a special configuration file that tells it where DICOM data is located on your system. You can use `pylidc` without creating this configuration file, but of course, any functions that depend on CT image data will not be usable. - -`pylidc` looks in your home folder for a configuration file called, `.pylidcrc` on Mac and Linux, or `pylidc.conf` on Windows. You must create this file. On Linux and Mac, the file should be located at `/home/[user]/.pylidcrc`. On Windows, the file should be located at `C:\Users\[User]\pylidc.conf`. - -The configuration file should be formatted as follows: - - [dicom] - path = /path/to/big_external_drive/datasets/LIDC-IDRI - warn = True - -If you want to use `pylidc` without utilizing the DICOM data (for say, querying annotation attributes, etc.), you can remove `path` and set `warn` to `False`, i.e., - - [dicom] - warn = False - -and the module won't bother you about it each time you import the module. - -The expected folder hierarchy in the specified `path` is: `PatientID` > `StudyInstanceUID` > `SeriesInstanceUID` > `*.dcm`. If you downloaded the data from the TCIA site, the folder hierarchy will (probably!) already be formatted in this way. - -## Citing - -If you find `pylidc` helpful to your research, you could cite by noting that software was developed for, and first mentioned in, the following publication: - -> Matthew C. Hancock, Jerry F. Magnan. **Lung nodule malignancy classification using only radiologist quantified image features as inputs to statistical learning algorithms: probing the Lung Image Database Consortium dataset with two statistical learning methods**. *SPIE Journal of Medical Imaging*. Dec. 2016. [http://dx.doi.org/10.1117/1.JMI.3.4.044504](http://dx.doi.org/10.1117/1.JMI.3.4.044504) - -## Example usage - -### Basic examples - -There are three data models: `Scan`, `Annotation`, and `Contour`. The relationships are "one to many" for each model going left to right, i.e., `Scan`'s have many `Annotation`'s, and `Annotation`'s have many `Contour`'s. The main models to query are the `Scan` and `Annotation` models. - -The main workhorse for querying is the `pylidc.query` function. This function just wraps the `sqlalchemy.query` function. - -#### The `Scan` model - -Here's some example usage for querying scan objects. - - import pylidc as pl - - qu = pl.query(pl.Scan).filter(pl.Scan.slice_thickness <= 1) - print(qu.count()) - # => 97 - - scan = qu.first() - print(scan.patient_id, scan.pixel_spacing, scan.slice_thickness) - # => LIDC-IDRI-0066, 0.63671875, 0.6 - - print(len(scan.annotations)) - # => 11 - - print(scan.get_path_to_dicom_files()) - # '/path/to/big_external_drive/datasets/LIDC-IDRI/LIDC-IDRI-0066/1.3.6.1.4.1.14519.5.2.1.6279.6001.143774983852765282237869625332/1.3.6.1.4.1.14519.5.2.1.6279.6001.430109407146633213496148200410' - -You can engage an interactive slice view by calling: - - scan.visualize() - -By default, note that calling `visualize` on a scan object doesn't include its annotation information. However, if `scan.visualize` is supplied with a list of `Annotation` objects (say, grouped by the `scan.cluster_annotations()` function), then this annotation information *will* be displayed. For example - - nodules = scan.cluster_annotations() - print(len(nodules)) - # => 3 - - # Visualize the slices with the provided annotations indicated by arrows. - scan.visualize(annotations) - -`scan.cluster_annotations` returns a list. Each element of the list is a list of `Annotation` objects where each `Annotation` refers to the same nodule in the scan (probably). See [Clustering annotations](#clustering-annotations-to-identify-those-which-refer-to-the-same-physical-nodule) for more details on this function. - - -#### The `Annotation` model - -Let's grab the first annotation from the `Scan` object above: - - ann = scan.annotations[0] - print(ann.scan.patient_id) - # => LIDC-IDRI-0066 - - print(ann.spiculation, ann.Spiculation) - # => 3, Medium Spiculation - - print("%.2f, %.2f, %.2f" % (ann.diameter, ann.surface_area, ann.volume)) - # => 15.49, 1041.37, 888.05 - - ann.print_formatted_feature_table() - # => Feature Meaning # - # => - - - - # => Subtlety | Obvious | 5 - # => Internalstructure | Soft Tissue | 1 - # => Calcification | Absent | 6 - # => Sphericity | Ovoid | 3 - # => Margin | Poorly Defined | 1 - # => Lobulation | Near Marked Lobulation | 4 - # => Spiculation | Medium Spiculation | 3 - # => Texture | Solid | 5 - # => Malignancy | Moderately Suspicious | 4 - - fvals, fstrings = ann.feature_vals(return_str=True) - - for fname,fval,fstr in zip(pl.annotation_feature_names, fvals, fstrings): - print(fname.title(), fval, fstr) - # => 'Subtlety', 5, 'Obvious' - # => 'Internalstructure', 1, 'Soft Tissue' - # => 'Calcification', 6, 'Absent' - # => 'Sphericity', 3, 'Ovoid' - # => 'Margin', 1, 'Poorly Defined' - # => 'Lobulation', 4, 'Near Marked Lobulation' - # => 'Spiculation', 3, 'Medium Spiculation' - # => 'Texture', 5, 'Solid' - # => 'Malignancy', 4, 'Moderately Suspicious' - -Let's try a different query on the annotations directly: - - qu = pl.query(pl.Annotation).filter(pl.Annotation.lobulation > 3, - pl.Annotation.malignancy == 5) - print(qu.count()) - # => 183 - - ann = qu.first() - print(ann.lobulation, ann.Lobulation, ann.malignancy, ann.Malignancy) - # => 4, Near Marked Lobulation, 5, Highly Suspicious - - print(len(ann.contours)) - # => 8 - - print(ann.contours_to_matrix().shape) - # => (671, 3) - - print(ann.contours_to_matrix().mean(axis=0) - ann.centroid) - # => [ 0. 0. 0.] - -You can engage an interactive slice viewer that displays annotation values and the radiologist-drawn contours: - - ann.visualize_in_scan() - -You can also view the nodule contours in 3d by calling: - - ann.visualize_in_3d() - -### Advanced queries - -#### Get a random result - -One common objective for data exploration is to grab a random instance from some query. You can accomplish this by import `func` from `sqlalchemy`, and using `random`. - - from sqlalchemy import func - scan = pl.query(pl.Scan).filter(pl.Scan.contrast_used == True).order_by(func.random()).first() - ann = pl.query(pl.Annotation).filter(pl.Annotation.malignancy == 5).order_by(func.random()).first() - -The first query grabs a random `Scan` instance where contrast is used. The second query grabs a random `Annotation` instance where malignancy is equal to 5. - -#### Query multiple model parameters with a join - -Another common objective is to query for an `Annotation` object which is constrained by its corresponding `Scan` in some way. For example: - - anns = pl.query(pl.Annotation).join(pl.Scan).filter(pl.Scan.slice_thickness < 1, pl.Annotation.malignancy != 3) - -### Resampling the volumes - -The `Annotation` member function, `uniform_cubic_resample`, takes a cubic region of interest with the centroid at the center of the volume. The corresponding CT value volume is resampled to have voxel spacing of 1 millimeter and a side length as given by the functions `side_length` parameter. Along with the uniformly resampled, cubic CT image volume, a corresponding boolean-valued volume is also returned that is 1 where the nodule exists in the resampled CT volume and 0 otherwise. - - import pylidc as pl - import matplotlib.pyplot as plt - from skimage.measure import find_contours - - ann = pl.query(pl.Annotation).first() - vol, seg = ann.uniform_cubic_resample(side_length = 100) - print(vol.shape, seg.shape) - # => (101, 101, 101) (101, 101, 101) - - # View middle slice of interpolated volume (pixel spacing now = 1mm) - plt.imshow(vol[:,:,50], cmap=plt.cm.gray) - - # View middle slice of interpolated segmentation volume as contours - # atop the interpolated image. - contours = find_contours(seg[:,:,50], 0.5) - for contour in contours: - plt.plot(contour[:,1], contour[:,0], '-r') - - plt.show() - -![](https://raw.githubusercontent.com/pylidc/pylidc/master/img/resample-example.png) - -### Clustering annotations to identify those which refer to the same physical nodule. - -The LIDC dataset doesn't assign unique global identifiers to the physical nodules. For a given physical nodule, there may exist up to 4 annotations that refer to it. The annotations are anonymous, so even if it is known that 4 annotations refer to the same nodule, it is impossible to tell which annotator provided each annotation across multiple nodules consistently. - -However, we can estimate when annotations refer to the same physical nodule in a scan by examining the properties of the annotations and clustering them based on the properties. `pylidc` provides [a number of distance metrics between annotations](https://github.com/pylidc/pylidc/pylidc/annotation_distance_metrics.py) based on the annotation contour coordinates. The `Scan` model provides a [`cluster_annotations`](https://github.com/pylidc/pylidc/pylidc/Scan.py) function which then clusters annotations by determining the connected components of the adjacency graph associated with a chosen distance-between-annotations metric and a chosen distance tolerance. - -Here's an example: - -``` -import pylidc as pl - -scan = pl.query(pl.Scan).first() -nods = scan.cluster_annotations() - -print("Scan is estimated to have", len(nods), "nodules.") - -for i,nod in enumerate(nods): - print("Nodule", i+1, "has", len(nod), "annotations.") - for j,ann in enumerate(nod): - print("-- Annotation", j+1, "centroid:", ann.centroid) -``` - -Output: -``` -Scan is estimated to have 4 nodules. -Nodule 1 has 4 annotations. --- Annotation 1 centroid: [ 331.90680101 312.30982368 1480.44962217] --- Annotation 2 centroid: [ 328.60546875 309.91796875 1479.73046875] --- Annotation 3 centroid: [ 327.91666667 309.88293651 1479.01785714] --- Annotation 4 centroid: [ 332.55660377 313.88050314 1479.94339623] -Nodule 2 has 4 annotations. --- Annotation 1 centroid: [ 360.81122449 169.19642857 1542.10459184] --- Annotation 2 centroid: [ 360.82233503 169.21319797 1542.14720812] --- Annotation 3 centroid: [ 361.05243446 168.86142322 1542.34269663] --- Annotation 4 centroid: [ 361.25501433 171. 1542.80659026] -Nodule 3 has 1 annotations. --- Annotation 1 centroid: [ 336.41666667 348.83333333 1545.75 ] -Nodule 4 has 4 annotations. --- Annotation 1 centroid: [ 340.54020979 245.07692308 1606.14160839] --- Annotation 2 centroid: [ 341.29061103 244.65275708 1605.90834575] --- Annotation 3 centroid: [ 341.75417299 244.03490137 1606.95827011] --- Annotation 4 centroid: [ 341.53110048 245.58532695 1606.5 ] -``` -You can supply annotation clusters (variable, `nods`, above) to the `scan.visualize` function, and arrows will annotate where the nodules are present in the scan. +[See the full documentation and tutorials here.](https://pylidc.github.io) diff --git a/pylidc/Annotation.py b/pylidc/Annotation.py index 9d5217b..adddf6e 100644 --- a/pylidc/Annotation.py +++ b/pylidc/Annotation.py @@ -49,41 +49,101 @@ class Annotation(Base): has many contours, each of which refers to the contour drawn for nodule in each scan slice. - Parameters + Attributes ---------- - subtlety: int - blah - - internalStructure: int - blah - - calcification: int - blah - - sphericity: int - blah - - margin: int - blah - - lobulation: int - blah - - spiculation: int - blah - - texture: int - blah - - malignancy: int - blah + subtlety: int, range = {1,2,3,4,5} + Difficulty of detection. Higher values indicate easier detection. + + 1. 'Extremely Subtle' + 2. 'Moderately Subtle' + 3. 'Fairly Subtle' + 4. 'Moderately Obvious' + 5. 'Obvious' + + internalStructure: int, range = {1,2,3,4} + Internal composition of the nodule. + + 1. 'Soft Tissue' + 2. 'Fluid' + 3. 'Fat' + 4. 'Air' + + calcification: int, range = {1,2,3,4,6} + Pattern of calcification, if present. + + 1. 'Popcorn' + 2. 'Laminated' + 3. 'Solid' + 4. 'Non-central' + 5. 'Central' + 6. 'Absent' + + sphericity: int, range = {1,2,3,4,5} + The three-dimensional shape of the nodule in terms of its roundness. + + 1. 'Linear' + 2. 'Ovoid/Linear' + 3. 'Ovoid' + 4. 'Ovoid/Round' + 5. 'Round' + + margin: int, range = {1,2,3,4,5} + Description of how well-defined the nodule margin is. + + 1. 'Poorly Defined' + 2. 'Near Poorly Defined' + 3. 'Medium Margin' + 4. 'Near Sharp' + 5. 'Sharp' + + lobulation: int, range = {1,2,3,4,5} + The degree of lobulation ranging from none to marked + + 1. 'No Lobulation' + 2. 'Nearly No Lobulation' + 3. 'Medium Lobulation' + 4. 'Near Marked Lobulation' + 5. 'Marked Lobulation' + + spiculation: int, range = {1,2,3,4,5} + The extent of spiculation present. + + 1. 'No Spiculation' + 2. 'Nearly No Spiculation' + 3. 'Medium Spiculation' + 4. 'Near Marked Spiculation' + 5. 'Marked Spiculation' + + texture: int, range = {1,2,3,4,5} + Radiographic solidity: internal texture (solid, ground glass, + or mixed). + + 1. 'Non-Solid/GGO' + 2. 'Non-Solid/Mixed' + 3. 'Part Solid/Mixed' + 4. 'Solid/Mixed' + 5. 'Solid' + + malignancy: int, range = {1,2,3,4,5} + Subjective assessment of the likelihood of + malignancy, assuming the scan originated from a 60-year-old male + smoker. + + 1. 'Highly Unlikely' + 2. 'Moderately Unlikely' + 3. 'Indeterminate' + 4. 'Moderately Suspicious' + 5. 'Highly Suspicious' Example - --------- - :: + ------- + A short usage example for the Annotation class:: + import pylidc as pl + # Get the first annotation with spiculation value greater than 3. - ann = pl.query(pl.Annotation).filter(pl.Annotation.spiculation > 3).first() + ann = pl.query(pl.Annotation)\\ + .filter(pl.Annotation.spiculation > 3).first() print(ann.spiculation) # => 4 @@ -93,14 +153,10 @@ class Annotation(Base): print(ann.Spiculation) # => Medium-High Spiculation - q = pl.query(pl.Annotation).join(pl.Scan) - q = q.filter(pl.Scan.slice_thickness <= 1, - pl.Annotation.malignancy == 5) - print(q.count()) - # => 58 - - ann = q.first() - print("%.2f, %.2f, %.2f" % (ann.diameter, ann.surface_area, ann.volume)) + ann = anns.first() + print("%.2f, %.2f, %.2f" % (ann.diameter, + ann.surface_area, + ann.volume)) # => 17.98, 1221.40, 1033.70 """ __tablename__ = 'annotations' @@ -133,9 +189,10 @@ def __setattr__(self, name, value): #################################### # { Begin semantic attribute functions + @property def Subtlety(self): - """return subtlety value as string""" + """Semantic interpretation of `subtlety` value as string.""" s = self.subtlety assert s in range(1,6), "Subtlety score out of bounds." if s == 1: return 'Extremely Subtle' @@ -146,7 +203,7 @@ def Subtlety(self): @property def InternalStructure(self): - """return internalStructure value as string""" + """Semantic interpretation of `internalStructure` value as string.""" s = self.internalStructure assert s in range(1,5), "Internal structure score out of bounds." if s == 1: return 'Soft Tissue' @@ -156,7 +213,7 @@ def InternalStructure(self): @property def Calcification(self): - """return calcification value as string""" + """Semantic interpretation of `calcification` value as string.""" s = self.calcification assert s in range(1,7), "Calcification score out of bounds." if s == 1: return 'Popcorn' @@ -168,7 +225,7 @@ def Calcification(self): @property def Sphericity(self): - """return sphericity value as string""" + """Semantic interpretation of `sphericity` value as string.""" s = self.sphericity assert s in range(1,6), "Sphericity score out of bounds." if s == 1: return 'Linear' @@ -179,7 +236,7 @@ def Sphericity(self): @property def Margin(self): - """return margin value as string""" + """Semantic interpretation of `margin` value as string.""" s = self.margin assert s in range(1,6), "Margin score out of bounds." if s == 1: return 'Poorly Defined' @@ -190,7 +247,7 @@ def Margin(self): @property def Lobulation(self): - """return lobulation value as string""" + """Semantic interpretation of `lobulation` value as string.""" s = self.lobulation assert s in range(1,6), "Lobulation score out of bounds." if s == 1: return 'No Lobulation' @@ -201,7 +258,7 @@ def Lobulation(self): @property def Spiculation(self): - """return spiculation value as string""" + """Semantic interpretation of `spiculation` value as string.""" s = self.spiculation assert s in range(1,6), "Spiculation score out of bounds." if s == 1: return 'No Spiculation' @@ -212,7 +269,7 @@ def Spiculation(self): @property def Texture(self): - """return texture value as string""" + """Semantic interpretation of `texture` value as string.""" s = self.texture assert s in range(1,6), "Texture score out of bounds." if s == 1: return 'Non-Solid/GGO' @@ -223,7 +280,7 @@ def Texture(self): @property def Malignancy(self): - """return malignancy value as string""" + """Semantic interpretation of `malignancy` value as string.""" s = self.malignancy assert s in range(1,6), "Malignancy score out of bounds." if s == 1: return 'Highly Unlikely' @@ -232,7 +289,6 @@ def Malignancy(self): elif s == 4: return 'Moderately Suspicious' elif s == 5: return 'Highly Suspicious' - # } End attribute functions #################################### @@ -241,9 +297,19 @@ def feature_vals(self, return_str=False): Return all feature values as a numpy array in the order presented in `feature_names`. - return_str: bool, default False + Parameters + ---------- + return_str: bool, default=False If True, a list of strings is also returned, corresponding to the meaning of each numerical feature value. + + Return + ------ + fvals[, fstrs]: array[, list of strings] + `fvals` is an array of numerical values corresponding to the + numerical feature values for the annotation. `fstrs` is a + list of semantic string interpretations of the numerical + values given in `fvals`. """ fvals = np.array([getattr(self,f) for f in feature_names]) if return_str: @@ -256,7 +322,7 @@ def feature_vals(self, return_str=False): def print_formatted_feature_table(self): """ - Return all feature values as a string table. + Print all feature values as a string table. """ fnames = feature_names fvals, fstrings = self.feature_vals(True) @@ -276,13 +342,14 @@ def bbox(self, pad=None): Parameters ---------- - pad: int, list of ints, or float, default=None * If None (default), then no padding is used. - * If an integer is provided, then the slices are padded + * If an integer is provided, then the bounding box is padded uniformly by this integer amount. - * If a list of integers is provided, then it is of the form - `[(i1,i2), (j1,j2), (k1,k2)]` + * If a list of integers is provided, then it is of the form:: + + [(i1,i2), (j1,j2), (k1,k2)] + and indicates the pad amounts along each coordinate axis. * If a float is provided, then the slices are padded such that the bounding box occupies at least `pad` physical units @@ -298,8 +365,16 @@ def bbox(self, pad=None): to the maximum (or minimum, depending on the direction) possible index. - Examples - -------- + Return + ------ + bb: 3-tuple of Python `slice` objects. + `bb` is the corresponding bounding box (with desired padding) + in the CT image volume. `bb[i]` is a slice corresponding + to the the extent of the bounding box along the + coordinate axis `i`. + + Example + ------- The example below illustrates the various `pad` argument types:: @@ -403,14 +478,37 @@ def bbox(self, pad=None): def bbox_dims(self, pad=None): """ - Return the physical dimensions of the nodule bounding box in mm. + Return the physical dimensions of the nodule bounding box in + millimeters along each coordinate axis. + Parameters + ---------- pad: int, list, or float, default=None See :meth:`pylidc.Annotation.bbox` for a description of this argument. + + Return + ------ + dims: ndarray, shape=(3,) + `dims[i]` is the length in millimeters of the bounding box along + the coordinate axis `i`. + + Example + ------- + An example where we compare the bounding box volume vs the nodule + volume:: + + import pylidc as pl + + ann = pl.query(pl.Annotation).first() + + print("%.2f mm^3, %.2f mm^3" % (ann.volume, + np.prod(ann.bbox_dims()))) + # => 2439.30 mm^3, 5437.58 mm^3 """ res = [self.scan.pixel_spacing,]*2 + [self.scan.slice_spacing] - return [(b.stop-1-b.start)*r for r,b in zip(res, self.bbox(pad=pad))] + return np.array([(b.stop-1-b.start)*r + for r,b in zip(res, self.bbox(pad=pad))]) def bbox_matrix(self, pad=None): @@ -420,46 +518,71 @@ def bbox_matrix(self, pad=None): a 3x2 matrix where each row is the (start, stop) indices of the i, j, and k axes. + Parameters + ---------- pad: int, list, or float - See `Annotation.bbox` for description of this argument. - - NOTE: The indices return by `bbox_array` are *inclusive*, whereas - the indices of the slice objects in the tuple return by `bbox` - are offset by +1 in the "stop" index. - - Example: - >>> import pylidc as pl - >>> ann = pl.query(pl.Annotation).first() - >>> - >>> bb = ann.bbox() - >>> bm = ann.bbox_matrix() - >>> - >>> print(all([bm[i,0] == bb[i].start for i in range(3)])) - >>> # => True - >>> - >>> print(all([bm[i,1]+1 == bb[i].stop for i in range(3)])) - >>> # => True + See :meth:`pylidc.Annotation.bbox` for a + description of this argument. + + Note + ---- + The indices return by `bbox_array` are *inclusive*, whereas + the indices of the slice objects in the tuple return by `bbox` + are offset by +1 in the "stop" index. + + Return + ------ + bb_mat: ndarray, shape=(3,2) + `bb_mat[i]` is the stop and start indices (inclusive) of the + bounding box along coordinate axis `i`. + + Example + ------- + An example of the difference between `bbox` and `bbox_matrix`:: + + import pylidc as pl + ann = pl.query(pl.Annotation).first() + + bb = ann.bbox() + bm = ann.bbox_matrix() + + print(all([bm[i,0] == bb[i].start for i in range(3)])) + # => True + + print(all([bm[i,1]+1 == bb[i].stop for i in range(3)])) + # => True """ return np.array([[sl.start, sl.stop-1] for sl in self.bbox(pad=pad)]) + @property def centroid(self): """ - Return the center of mass of the nodule as determined by its + The center of mass of the nodule as determined by its radiologist-drawn contours. - Example: - >>> import pylidc as pl - >>> import matplotlib.pyplot as plt - >>> - >>> ann = pl.query(pl.Annotation).first() - >>> i,j,k = ann.centroid.round().astype(np.int) - >>> vol = ann.scan.to_volume() - >>> - >>> plt.imshow(vol[:,:,k], cmap=plt.cm.gray) - >>> plt.plot(j, i, '.r', label="Nodule centroid") - >>> plt.legend() - >>> plt.show() + Example + ------- + An example of plotting the centroid on a CT image slice:: + + import pylidc as pl + import matplotlib.pyplot as plt + + ann = pl.query(pl.Annotation).first() + i,j,k = ann.centroid + + vol = ann.scan.to_volume() + + plt.imshow(vol[:,:,int(k)], cmap=plt.cm.gray) + plt.plot(j, i, '.r', label="Nodule centroid") + plt.legend() + plt.show() + + Return + ------ + centr: ndarray, shape=(3,) + `centr[i]` is the average index value of all contour index values + for coordinate axis `i`. """ return self.contours_matrix.mean(axis=0) @@ -471,8 +594,10 @@ def diameter(self): where the diamter passes outside the boundary of the nodule, or through cavities within the nodule. - returns: float (or float,Contour) - Returns the diameter as float, accounting for the axial-plane + Return + ------ + diam: float + The maximal diameter as float, accounting for the axial-plane resolution of the scan. The units are mm. """ greatest_diameter = -np.inf @@ -502,6 +627,11 @@ def surface_area(self): """ Estimate the surface area by summing the areas of a trianglation of the nodules surface in 3d. Returned units are mm^2. + + Return + ------ + sa: float + The estimated surface area in squared millimeters. """ mask = self.boolean_mask() mask = np.pad(mask, [(1,1), (1,1), (1,1)], 'constant') # Cap the ends. @@ -532,8 +662,11 @@ def volume(self): bottom, respectively. If the annotation only has one contour, we use the `slice_thickness` attribute of the scan. - returns: float - The estimated 3D volume of the annotated nodule. Units are mm^3. + Return + ------ + vol: float + The estimated 3D volume of the annotated nodule. Units are cubic + millimeters. """ volume = 0. zvals = np.unique([c.image_z_position for c in self.contours]) @@ -565,10 +698,12 @@ def volume(self): return volume def visualize_in_3d(self, edgecolor='0.2', cmap='viridis', - step=1, backend='matplotlib'): + step=1, figsize=(5,5), backend='matplotlib'): """ Visualize in 3d a triangulation of the nodule's surface. + Parameters + ---------- edgecolor: string color or rgb 3-tuple Sets edgecolors of triangulation. Ignored if backend != matplotlib. @@ -582,15 +717,20 @@ def visualize_in_3d(self, edgecolor='0.2', cmap='viridis', The `step_size` parameter for the skimage marching_cubes function. Bigger values are quicker, but yield coarser surfaces. + figsize: tuple, default=(5,5) + Figure size for the displayed volume. + backend: string The backend for visualization. Default is matplotlib. Execute `from pylidc.Annotation import viz3dbackends` to see available backends. - Example: - >>> ann = pl.query(pl.Annotation).first() - >>> ann.visualize_in_3d(edgecolor='green', cmap='autumn') - >>> ann.visualize_in_3d(backend='mayavi') # If mayavi available. + Example + ------- + A short example:: + + ann = pl.query(pl.Annotation).first() + ann.visualize_in_3d(edgecolor='green', cmap='autumn') """ if backend not in viz3dbackends: raise ValueError("backend should be in %s." % viz3dbackends) @@ -599,19 +739,17 @@ def visualize_in_3d(self, edgecolor='0.2', cmap='viridis', if cmap not in plt.cm.cmap_d.keys(): raise ValueError("Invalid `cmap`. See `plt.cm.cmap_d.keys()`.") - mask = self.boolean_mask() - mask = np.pad(mask, [(1,1), (1,1), (1,1)], 'constant') # Cap the ends. + # Pad to cap the ends for masks that hit the edge. + mask = self.boolean_mask(pad=[(1,1), (1,1), (1,1)]) - rxy = self.scan.pixel_spacing - rz = self.scan.slice_thickness + rij = self.scan.pixel_spacing + rk = self.scan.slice_thickness if backend == 'matplotlib': verts, faces, _, _= marching_cubes(mask.astype(np.float), 0.5, - spacing=(rxy, rxy, rz), + spacing=(rij, rij, rk), step_size=step) - maxes = np.ceil(verts.max(axis=0)) - - fig = plt.figure() + fig = plt.figure(figsize=figsize) ax = fig.add_subplot(111, projection='3d') t = np.linspace(0, 1, faces.shape[0]) @@ -620,13 +758,16 @@ def visualize_in_3d(self, edgecolor='0.2', cmap='viridis', facecolors=plt.cm.cmap_d[cmap](t)) ax.add_collection3d(mesh) - ax.set_xlim(0, maxes[0]) + ceil = max(self.bbox_dims(pad=[(1,1), (1,1), (1,1)])) + ceil = int(np.round(ceil)) + + ax.set_xlim(0, ceil) ax.set_xlabel('length (mm)') - ax.set_ylim(0, maxes[1]) + ax.set_ylim(0, ceil) ax.set_ylabel('length (mm)') - ax.set_zlim(0, maxes[2]) + ax.set_zlim(0, ceil) ax.set_zlabel('length (mm)') plt.tight_layout() @@ -635,7 +776,7 @@ def visualize_in_3d(self, edgecolor='0.2', cmap='viridis', try: from mayavi import mlab sf = mlab.pipeline.scalar_field(mask.astype(np.float)) - sf.spacing = [rxy, rxy, rz] + sf.spacing = [rij, rij, rk] mlab.pipeline.iso_surface(sf, contours=[0.5]) mlab.show() except ImportError: @@ -644,11 +785,19 @@ def visualize_in_3d(self, edgecolor='0.2', cmap='viridis', def visualize_in_scan(self, verbose=True): """ - Interactive visualization of the slices of the scan along with scan - and annotation information. The visualization begins - (but is not limited to) the first slice where the nodule occurs - (according to the annotation). Contours are plotted atop the images - for visualization and can be toggled on and off. + Engage an interactive visualization of the slices of the scan + along with scan and annotation information. + + The visualization begins (but is not limited to) the first slice + where the nodule occurs (according to the annotation). Annotation + contours are plotted on top of the images + for visualization and can be toggled on and off, using an interactive + check mark utility. + + Parameters + ---------- + verbose: bool, default=True + Turn the image loading statement on/off. """ images = self.scan.load_all_dicom_images(verbose) @@ -776,37 +925,37 @@ def update_contours(_): @property def contour_slice_zvals(self): - """Returns an array of unique z-coordinates for the contours.""" - return np.unique([c.image_z_position for c in self.contours]) + """An array of unique z-coordinates for the contours.""" + return np.sort([c.image_z_position for c in self.contours]) @property def contour_slice_indices(self): """ - Returns a list of indices into the scan where each contour - belongs. An example should clarify: - - >>> import pylidc as pl - >>> - >>> ann = pl.query(pl.Annotation) - >>> - >>> zvals = ann.contour_slice_zvals - >>> kvals = ann.contour_slice_indices - >>> scan_zvals = ann.scan.slice_zvals - >>> - >>> for k,z in zip(kvals, zvals): - >>> # the two z values should the same (up to machine precision) - >>> print(k, z, scan_zvals[k]) + Returns an array of indices into the scan where each contour + belongs. An example should clarify:: + + import pylidc as pl + + ann = pl.query(pl.Annotation) + + zvals = ann.contour_slice_zvals + kvals = ann.contour_slice_indices + scan_zvals = ann.scan.slice_zvals + + for k,z in zip(kvals, zvals): + # the two z values should the same (up to machine precision) + print(k, z, scan_zvals[k]) """ - slc = self.scan.slice_zvals - return [np.abs(slc-z).argmin() for z in self.contour_slice_zvals] + return np.sort([c.image_k_position for c in self.contours]) @property def contours_matrix(self): """ - Return all the contours in a 3D numpy array. + All the contour index values a 3D numpy array. """ return np.vstack([c.to_matrix(include_k=True) - for c in sorted(self.contours, key=lambda c: c.image_z_position)]) + for c in sorted(self.contours, + key=lambda c: c.image_z_position)]) def boolean_mask(self, pad=None, bbox=None): """ @@ -816,8 +965,11 @@ def boolean_mask(self, pad=None, bbox=None): volume would be placed in the full image volume according to the `bbox` attribute. + Parameters + ---------- pad: int, list, or float, default=None - See `Annotation.bbox` for description of this argument. + See :meth:`pylidc.Annotation.bbox` for a + description of this argument. bbox: 3x2 NumPy array, default=None If `bbox` is provided, then `pad` is ignored. This argument allows @@ -825,20 +977,24 @@ def boolean_mask(self, pad=None, bbox=None): or for pre-computation of bbox when working with multiple Annotation object. - Example: - >>> import pylidc as pl - >>> import matplotlib.pyplot as plt - >>> - >>> ann = pl.query(pl.Annotation).first() - >>> vol = ann.scan.to_volume() - >>> - >>> mask = ann.boolean_mask() - >>> bbox = ann.bbox() - >>> - >>> print("Avg HU inside nodule: %.1f" % vol[bbox][mask].mean()) - >>> # => Avg HU inside nodule: -280.0 - >>> print("Avg HU outside nodule: %.1f" % vol[bbox][~mask].mean()) - >>> # => Avg HU outside nodule: -732.2 + Example + ------- + An example:: + + import pylidc as pl + import matplotlib.pyplot as plt + + ann = pl.query(pl.Annotation).first() + vol = ann.scan.to_volume() + + mask = ann.boolean_mask() + bbox = ann.bbox() + + print("Avg HU inside nodule: %.1f" % vol[bbox][mask].mean()) + # => Avg HU inside nodule: -280.0 + + print("Avg HU outside nodule: %.1f" % vol[bbox][~mask].mean()) + # => Avg HU outside nodule: -732.2 """ bb = self.bbox_matrix(pad=pad) if bbox is None else bbox @@ -954,8 +1110,15 @@ def uniform_cubic_resample(self, side_length=None, resample_vol=True, specified `side_length`. Thus, the returned volumes have dimensions, `(side_length+1,)*3` (since `side_length` is the spacing). + TODO + ---- + It would be nice if this function performed fully general + interpolation, i.e., not necessarily uniform spacing and allowing + different resample-resolutions along different coordinate axes. - side_length: integer, default None + Parameters + ---------- + side_length: integer, default=None The physical length of each side of the new cubic volume in millimeters. The default, `None`, takes the max of the nodule's bounding box dimensions. @@ -966,7 +1129,7 @@ def uniform_cubic_resample(self, side_length=None, resample_vol=True, out-of-bounds image index, then the image is padded with the minimum CT image value. - resample_vol: boolean, default True + resample_vol: boolean, default=True If False, only the segmentation volume is resampled. irp_pts: 3-tuple from meshgrid @@ -974,42 +1137,67 @@ def uniform_cubic_resample(self, side_length=None, resample_vol=True, points, rather than the automatically calculated points. This allows for sampling segmentation volumes over a common coordinate-system. - return_irp_pts: boolean, default False + return_irp_pts: boolean, default=False If True, the interpolation points (ix,iy,iz) at which the volume(s) were resampled are returned. These can potentially be provided as an argument to `irp_pts` for separate selfotations that refer to the same nodule, allowing the segmentation volumes to be resampled in a common coordinate-system. - verbose: boolean, default True + verbose: boolean, default=True Turn the loading statement on / off. - returns: [ct_volume,] mask [, irp_pts] + Return + ------ + [ct_volume,] mask [, irp_pts]: ndarray, ndarray, list of ndarrays `ct_volume` and `mask` are the resampled CT and boolean volumes, respectively. `ct_volume` and `irp_points` are optionally returned, depending on which flags are set (see above). - Example: - >>> ann = pl.query(pl.Annotation).first() - >>> ct_volume, mask = ann.uniform_cubic_resample(side_length=70) - >>> print(ct_volume.shape, mask.shape) - >>> # => (71, 71, 71), (71, 71, 71) - >>> # (Nodule is centered at (35,35,35).) - >>> - >>> import matplotlib.pyplot as plt - >>> plt.imshow( ct_volume[:,:,35] * (0.2 + 0.8*mask[:,:,35]) ) - >>> plt.show() + Example + ------- + An example:: + + import numpy as np + import matplotlib.pyplot as plt + import pylidc as pl + + ann = pl.query(pl.Annotation).first() + + # resampled volumes will have uniform side length of 70mm and + # uniform voxel spacing of 1mm. + n = 70 + vol,mask = ann.uniform_cubic_resample(n) + + + # Setup the plot. + img = plt.imshow(np.zeros((n+1, n+1)), + vmin=vol.min(), vmax=vol.max(), + cmap=plt.cm.gray) + + + # View all the resampled image volume slices. + for i in range(n+1): + img.set_data(vol[:,:,i] * (mask[:,:,i]*0.6+0.2)) + + plt.title("%02d / %02d" % (i+1, n)) + plt.pause(0.1) + """ - bbox = self.bbox - bboxd = self.bbox_dimensions - rxy = self.scan.pixel_spacing + bbox = self.bbox_matrix() + bboxd = self.bbox_dims() + rij = self.scan.pixel_spacing + rk = self.scan.slice_spacing + + imin,imax = bbox[0] + jmin,jmax = bbox[1] + kmin,kmax = bbox[2] - imin,imax = bbox[0].astype(int) - jmin,jmax = bbox[1].astype(int) + xmin,xmax = imin*rij, imax*rij + ymin,ymax = jmin*rij, jmax*rij - xmin,xmax = imin*rxy, imax*rxy - ymin,ymax = jmin*rxy, jmax*rxy - zmin,zmax = bbox[2] + zmin = self.scan.slice_zvals[kmin] + zmax = self.scan.slice_zvals[kmax] # { Begin input checks. if side_length is None: @@ -1033,41 +1221,12 @@ def uniform_cubic_resample(self, side_length=None, resample_vol=True, # Get the indices where the nodule stops and starts # with respect to the scan z values. - kmin = np.where(zmin == img_zs)[0][0] - kmax = np.where(zmax == img_zs)[0][0] + #kmin = np.where(zmin == img_zs)[0][0] + #kmax = np.where(zmax == img_zs)[0][0] # Initialize the boolean mask. mask = self.boolean_mask() - ######################################################## - # { Begin mask corrections. - - # This block handles the case where - # the contour selfotations "skip a slice". - if mask.shape[2] != (kmax-kmin+1): - old_mask = mask.copy() - - # Create the new mask with appropriate z-length. - mask = np.zeros((old_mask.shape[0], - old_mask.shape[1], - kmax-kmin+1), dtype=np.bool) - - # Map z's to an integer. - z_to_index = dict(zip( - img_zs[kmin:kmax+1], - range(img_zs[kmin:kmax+1].shape[0]) - )) - - # Map each slice to its correct location. - for k in range(old_mask.shape[2]): - mask[:, :, z_to_index[contour_zs[k]] ] = old_mask[:,:,k] - - # Get rid of the old one. - del old_mask - - # } End mask corrections. - ######################################################## - ######################################################## # { Begin interpolation grid creation. # (The points at which the volumes will be resampled.) @@ -1113,7 +1272,7 @@ def uniform_cubic_resample(self, side_length=None, resample_vol=True, # then `ax` is the minimum possible index, 0. A similar # diagram helps with the `bx` index. - T = np.arange(0, 512)*rxy + T = np.arange(0, 512)*rij if xhat[0] <= 0: ax = 0 @@ -1163,7 +1322,7 @@ def uniform_cubic_resample(self, side_length=None, resample_vol=True, # padding `mask`. padvals = [(imin-ax, bx-1-imax), # The `b` terms have a `+1` offset (jmin-ay, by-1-jmax), # from being an index that is - (kmin-az, bz-1-kmax)] # correct with the `-1` here. + (kmin-az, bz-1-kmax)] # corrected with the `-1` here. mask = np.pad(mask, pad_width=padvals, mode='constant', constant_values=False) diff --git a/pylidc/Contour.py b/pylidc/Contour.py index 4711b57..e16d853 100644 --- a/pylidc/Contour.py +++ b/pylidc/Contour.py @@ -10,9 +10,11 @@ class Contour(Base): """ - The Contour class holds the coordinate data for the vertices of - a :class:`pylidc.Annotation` object for a single slice in the CT volume. + The Contour class holds the nodule boundary coordinate data of a + :class:`pylidc.Annotation` object for a single slice in the CT volume. + Attributes + ---------- inclusion: bool If True, the area inside the contour is included as part of the nodule. If False, the area inside the contour is excluded @@ -20,11 +22,11 @@ class Contour(Base): image_z_position: float This is the `imageZposition` defined via the xml annnotations - for this contour. It is the z-coordinate of dicom - attribute, `(0020,0032)`. + for this contour. It is the z-coordinate of DICOM + attribute (0020,0032). dicom_file_name: string - This is the name of the corresponding dicom file for the scan + This is the name of the corresponding DICOM file for the scan to which this contour belongs, having the same `image_z_position`. coords: string @@ -33,18 +35,24 @@ class Contour(Base): `to_matrix` method, which returns a numpy array rather than a string. - Example: - >>> import pylidc as pl - >>> import matplotlib.pyplot as plt - >>> - >>> ann = pl.query(pl.Annotation).first() - >>> vol = ann.scan.to_volume() - >>> ii,jj = ann.contours[0].to_matrix(include_k=False).T - >>> - >>> plt.imshow(vol[:,:,46], cmap=plt.cm.gray) - >>> plt.plot(jj, ii, '-r', lw=1, label="Nodule Boundary") - >>> plt.legend() - >>> plt.show() + Example + ------- + Plotting a contour on top of the image volume:: + + import pylidc as pl + import matplotlib.pyplot as plt + + ann = pl.query(pl.Annotation).first() + vol = ann.scan.to_volume() + con = ann.contours[3] + + k = con.image_k_position + ii,jj = ann.contours[3].to_matrix(include_k=False).T + + plt.imshow(vol[:,:,46], cmap=plt.cm.gray) + plt.plot(jj, ii, '-r', lw=1, label="Nodule Boundary") + plt.legend() + plt.show() """ __tablename__ = 'contours' id = sq.Column('id', sq.Integer, primary_key=True) @@ -66,18 +74,30 @@ def __setattr__(self, name, value): else: super(Contour,self).__setattr__(name,value) + @property + def image_k_position(self): + """ + Similar to `Contour.image_z_position`, but returns the index + instead of the z coordinate value. + + Note + ---- + This index may not be unique if the `slice_zvals` of the respective + scan are not unique. + """ + zs = self.annotation.scan.slice_zvals + k = np.abs(zs-self.image_z_position).argmin() + return k + def to_matrix(self, include_k=True): """ Return the contour-annotation coordinates as a matrix where each row contains an (i,j,k) *index* coordinate into the image volume. - + + Parameters + ---------- include_k: bool, default=True - Set `include_k=False` to omit the `k` axis coordinate. In this - case, each row of the returned matrix is an (i,j) *index* - coordinate into the `k`th image. The index `k` can be - recovered via: - z = contour.image_z_position - k = contour.annotation.contour_slice_indexes[z] + Set `include_k=False` to omit the `k` axis coordinate. """ # The reversal [::-1] is because the coordinates from the LIDC XML # are stored as (x,y), not (i,j). @@ -86,10 +106,8 @@ def to_matrix(self, include_k=True): if not include_k: return ij else: - k = np.ones(ij.shape[0]) + k = np.ones(ij.shape[0])*self.image_k_position zs = self.annotation.contour_slice_zvals - kk = np.abs(zs-self.image_z_position).argmin() - k *= self.annotation.contour_slice_indices[kk] return np.c_[ij, k].astype(np.int) Annotation.contours = relationship('Contour', diff --git a/pylidc/Scan.py b/pylidc/Scan.py index e5dd57d..f0085b9 100644 --- a/pylidc/Scan.py +++ b/pylidc/Scan.py @@ -68,13 +68,17 @@ class Scan(Base): ========== study_instance_uid: string + DICOM attribute (0020,000D). + series_instance_uid: string + DICOM attribute (0020,000E). patient_id: string - Identifier of the from `LIDC-IDRI-####` + Identifier of the form "LIDC-IDRI-dddd" where dddd is a string of + integers. slice_thickness: float - DICOM attribute, `(0018,0050)`. Note that this may not be + DICOM attribute (0018,0050). Note that this may not be equal to the `slice_spacing` attribute (see below). slice_zvals: ndarray @@ -83,17 +87,20 @@ class Scan(Base): as a NumPy array sorted in increasing order. slice_spacing: float - This computes the median of the difference + This computed property is the median of the difference between the slice coordinates, i.e., `scan.slice_zvals`. - NOTE: that this attribute is typically (but not always!) the + + Note + ---- + This attribute is typically (but not always!) the same as the `slice_thickness` attribute. Furthermore, the `slice_spacing` does NOT necessarily imply that all the slices are spaced with spacing (although they often are). pixel_spacing: float - Dicom attribute, `(0028,0030)`. This is normally two + Dicom attribute (0028,0030). This is normally two values. All scans in the LIDC have equal resolutions - in the transverse plane, so only one value is taken here. + in the transverse plane, so only one value is used here. contrast_used: bool If the DICOM file for the scan had any Contrast tag, @@ -104,24 +111,27 @@ class Scan(Base): part of the initial 399 release. sorted_dicom_file_names: string - This attribute is no longer used, but left for historical - reasons. This is a string containing a comma-separated list - like `[number].dcm`. It is a list of the dicom file - names for this scan in order of increasing z-coordinate - of dicom attribute, `(0020,0032)`. In rare cases where - a scan includes multiple files with the same z-coordinate, - the one with the lesser `InstanceNumber` is used. - - Example: - >>> import pylidc as pl - >>> qu = pl.query(pl.Scan).filter(pl.Scan.slice_thickness <= 1) - >>> print(qu.count()) - >>> # => 97 - >>> scan = qu.first() - >>> print(scan.patient_id, scan.pixel_spacing, scan.slice_thickness) - >>> # => LIDC-IDRI-0066, 0.63671875, 0.6 - >>> print(len(scan.annotations)) - >>> # => 11 + This attribute is no longer used, and can be ignored. + + Example + ------- + A short example of `Scan` class usage:: + + import pylidc as pl + + scans = pl.query(pl.Scan).filter(pl.Scan.slice_thickness <= 1) + print(scans.count()) + # => 97 + + scan = scans.first() + print(scan.patient_id, + scan.pixel_spacing, + scan.slice_thickness, + scan.slice_spacing) + # => LIDC-IDRI-0066, 0.63671875, 0.6, 0.5 + + print(len(scan.annotations)) + # => 11 """ __tablename__ = 'scans' id = sq.Column('id', sq.Integer, primary_key=True) @@ -148,36 +158,31 @@ def __setattr__(self, name, value): def get_path_to_dicom_files(self): """ Get the path to where the DICOM files are stored for this scan, - relative to the root path set in the pylidc configuration file (e.g., + relative to the root path set in the pylidc configuration file (i.e., `~/.pylidc` in MAC and Linux). - *This path is tricky and would benefit from coordination with TCIA.* + 1. In older downloads, the data DICOM data would download as:: - 1.) In older downloads, the data DICOM data would download as: + [...]/LIDC-IDRI/LIDC-IDRI-dddd/uid1/uid2/dicom_file.dcm - [...]/LIDC-IDRI/LIDC-IDRI-####/scan_instance_uid/series_instance_uid/*.dcm + where [...] is the base path set in the pylidc configuration + filee; uid1 is `Scan.study_instance_uid`; and, uid2 + is `Scan.series_instance_uid` . - For example, - - /data/storage/path/LIDC-IDRI/LIDC-IDRI-0078/1.3.6.1.4.1.14519.5.2.1.6279.6001.339170810277323131167631068432/1.3.6.1.4.1.14519.5.2.1.6279.6001.303494235102183795724852353824 + 2. However, in more recent downloads, the data is downloaded like:: - 2.) However, in more recent downloads, the data is downloaded like: + [...]/LIDC-IDRI/LIDC-IDRI-dddd/??? - [...]/LIDC-IDRI/LIDC-IDRI-####/?????? + where "???" is some unknown folder hierarchy convention used + by TCIA. - We first check 1.) and hope that it works. Otherwise, we check if the - `LIDC-IDRI-#### folder exists in the root path. If so, then we - recursively search the `LIDC-IDRI-####` directory until we find + We first check option 1. Otherwise, we check if the + "LIDC-IDRI-dddd" folder exists in the root path. If so, then we + recursively search the "LIDC-IDRI-dddd" directory until we find the correct subfolder that contains a DICOM file with the correct `study_instance_uid` and `series_instance_uid`. - Note that 2.) is much less efficient that 1.); however, 2.) is very - robust. - - Example: - >>> scan = pl.query(pl.Scan).first() - >>> print(scan.get_path_to_dicom_files()) - >>> # => /data/storage/path/LIDC-IDRI/LIDC-IDRI-0078/1.3.6.1.4.1.14519.5.2.1.6279.6001.339170810277323131167631068432/1.3.6.1.4.1.14519.5.2.1.6279.6001.303494235102183795724852353824 + Option 2 is less efficient than 1; however, option 2 is robust. """ if dicompath is None: raise EnvironmentError(dpath_msg) @@ -216,77 +221,157 @@ def get_path_to_dicom_files(self): if seid == self.series_instance_uid and \ stid == self.study_instance_uid: - path = dpath - found = True - break + path = dpath + found = True + break if not found: raise IOError("Couldn't find DICOM files for %s."%self) return path + def load_all_dicom_images(self, verbose=True): + """ + Load all the DICOM images assocated with this scan and return as list. + + Parameters + ---------- + verbose: bool + Turn the loading method on/off. + + Example + ------- + An example:: + + import pylidc as pl + import matplotlib.pyplot as plt + + scan = pl.query(pl.Scan).first() + + images = scan.load_all_dicom_images() + zs = [float(img.ImagePositionPatient[2]) for img in images] + print(zs[1] - zs[0], img.SliceThickness, scan.slice_thickness) + + plt.imshow( images[0].pixel_array, cmap=plt.cm.gray ) + plt.show() + + """ + if verbose: print("Loading dicom files ... This may take a moment.") + + path = self.get_path_to_dicom_files() + fnames = [fname for fname in os.listdir(path) + if fname.endswith('.dcm')] + images = [] + for fname in fnames: + image = dicom.dcmread(os.path.join(path,fname)) + + seid = str(image.SeriesInstanceUID).strip() + stid = str(image.StudyInstanceUID).strip() + + if seid == self.series_instance_uid and\ + stid == self.study_instance_uid: + images.append(image) + + # ############################################## + # Clean multiple z scans. + # + # Some scans contain multiple slices with the same `z` coordinate + # from the `ImagePositionPatient` tag. + # The arbitrary choice to take the slice with lesser + # `InstanceNumber` tag is made. + # This takes some work to accomplish... + zs = [float(img.ImagePositionPatient[-1]) for img in images] + inums = [float(img.InstanceNumber) for img in images] + inds = list(range(len(zs))) + while np.unique(zs).shape[0] != len(inds): + for i in inds: + for j in inds: + if i!=j and zs[i] == zs[j]: + k = i if inums[i] > inums[j] else j + inds.pop(inds.index(k)) + + # Prune the duplicates found in the loops above. + zs = [zs[i] for i in range(len(zs)) if i in inds] + images = [images[i] for i in range(len(images)) if i in inds] + + # Sort everything by (now unique) ImagePositionPatient z coordinate. + sort_inds = np.argsort(zs) + images = [images[s] for s in sort_inds] + # End multiple z clean. + # ############################################## + + return images + + def cluster_annotations(self, metric='min', tol=None, factor=0.9, - return_distance_matrix=False, verbose=True): + min_tol=1e-1, return_distance_matrix=False, + verbose=True): """ Estimate which annotations refer to the same physical - nodule. This method clusters all nodule Annotations for a - Scan by computing a distance measure between the annotations. - + nodule in the CT scan. This method clusters all nodule Annotations for + a Scan by computing a distance measure between the annotations. + + Parameters + ------ metric: string or callable, default 'min' - If string, see - `from pylidc.annotation_distance_metrics import metrics` - `metrics.keys()` + If string, see:: + + from pylidc.annotation_distance_metrics import + print(metrics metrics.keys()) + for available metrics. If callable, the provided function, should take two Annotation objects and return a float, i.e., - metric(ann1, ann2). + `isinstance( metric(ann1, ann2), float )`. - tol: float, default None + tol: float, default=None A distance in millimeters. Annotations are grouped when the minimum distance between their boundary contour points - is less than `tol`. The default, None, sets - `tol = scan.pixel_spacing`. More detail on this is found below. + is less than `tol`. If `tol = None` (the default), then + `tol = scan.pixel_spacing` is used. factor: float, default=0.9 - If `tol` resulted in any group of nodules with more than + If `tol` resulted in any group of annotations with more than 4 Annotations, then `tol` is multiplied by `factor` and the grouping is performed again. + min_tol: float, default=0.1 + If `tol` is reduced below `min_tol` (see the `factor` parameter), + then the routine exits because we conclude that the annotation + groups cannot be automatically reduced to have groups + with each group having `Annotations<=4` (as expected + with LIDC data). + return_distance_matrix: bool, default False Optionally return the distance matrix that was used to produce the clusters. - verbose: bool, default True - If `tol` is reduced below 1e-1, then we conclude that - the nodule groups cannot be automatically reduced to have - groups with number of Annotations <= 4, and a warning message - is printed. If verbose=False, this message is not printed. - - returns: clusters, a list of lists. - `clusters[j]` is a list of Annotation objects deemed - to refer to the same physical nodule in the Scan. - - `len(clusters)` attempts to estimate (via the specified `tol`) - the number of unique physical nodules present in this Scan as - determined by this overlap method and the tolerance used. + verbose: bool, default=True + If True, a warning message is printed when `tol < min_tol` occurs. - More on the `tol` parameter and distance measures: + Return + ------ + clusters: list of lists. + `clusters[i]` is a list of :class:`pylidc.Annotation` objects + that refer to the same physical nodule in the Scan. `len(clusters)` + estimates the number of unique physical nodules in the Scan. + Note + ---- The "distance" matrix, `d[i,j]`, between all Annotations for the Scan is first computed using the provided `metric` parameter. + Annotations are said to be adjacent when `d[i,j]<=tol`. + Annotation groups are determined by finding the connected components + of the graph associated with this adjacency matrix. - Annotations are "grouped" or estimated to refer to the same physical - nodule when `d <= tol`. The groupings are formed by determining - the adjacency matrix for the Annotations. Annotations are said to be - adjacent when `d[i,j] <= tol`. Groups are determined by finding - the connected components of the graph associated with - this adjacency matrix. - - Example:: + Example + ------- + An example:: import pylidc as pl scan = pl.query(pl.Scan).first() nodules = scan.cluster_annotations() + print("This can has %d nodules." % len(nodules)) # => This can has 4 nodules. @@ -296,6 +381,7 @@ def cluster_annotations(self, metric='min', tol=None, factor=0.9, # => Nodule 2 has 4 annotations. # => Nodule 3 has 1 annotations. # => Nodule 4 has 4 annotations. + """ assert 0 < factor < 1, "`factor` must be in the interval (0,1)." @@ -335,7 +421,7 @@ def cluster_annotations(self, metric='min', tol=None, factor=0.9, # no nodules with more than 4 annotations. while any([c > 4 for c in counts]): tol *= factor - if tol < 1e-1: + if tol < min_tol: msg = "Failed to reduce all groups to <= 4 Annotations.\n" msg+= "Some nodules may be close and must be grouped manually." if verbose: print(msg) @@ -360,73 +446,19 @@ def cluster_annotations(self, metric='min', tol=None, factor=0.9, else: return clusters - def load_all_dicom_images(self, verbose=True): - """ - Load all the DICOM images assocated with this scan and return as list. - - Example:: - - import pylidc as pl - import matplotlib.pyplot as plt - - scan = pl.query(pl.Scan).first() - - images = scan.load_all_dicom_images() - zs = [float(img.ImagePositionPatient[2]) for img in images] - print(zs[1] - zs[0], img.SliceThickness, scan.slice_thickness) - - plt.imshow( images[0].pixel_array, cmap=plt.cm.gray ) - plt.show() - """ - if verbose: print("Loading dicom files ... This may take a moment.") - - path = self.get_path_to_dicom_files() - fnames = [fname for fname in os.listdir(path) - if fname.endswith('.dcm')] - images = [] - for fname in fnames: - image = dicom.dcmread(os.path.join(path,fname)) - images.append(image) - - # ############################################## - # Clean multiple z scans. - # - # Some scans contain multiple slices with the same `z` coordinate - # from the `ImagePositionPatient` tag. - # The arbitrary choice to take the slice with lesser - # `InstanceNumber` tag is made. - # This takes some work to accomplish... - zs = [float(img.ImagePositionPatient[-1]) for img in images] - inums = [float(img.InstanceNumber) for img in images] - inds = list(range(len(zs))) - while np.unique(zs).shape[0] != len(inds): - for i in inds: - for j in inds: - if i!=j and zs[i] == zs[j]: - k = i if inums[i] > inums[j] else j - inds.pop(inds.index(k)) - - # Prune the duplicates found in the loops above. - zs = [zs[i] for i in range(len(zs)) if i in inds] - images = [images[i] for i in range(len(images)) if i in inds] - - # Sort everything by (now unique) ImagePositionPatient z coordinate. - sort_inds = np.argsort(zs) - images = [images[s] for s in sort_inds] - # End multiple z clean. - # ############################################## - - return images - def visualize(self, annotation_groups=None): """ Visualize the scan. - annotation_groups: list of lists of Annotation objects + Parameters + ---------- + annotation_groups: list of lists of Annotation objects, default=None This argument should be supplied by the returned object from the `cluster_annotations` method. - Example:: + Example + ------- + An example:: import pylidc as pl @@ -434,6 +466,7 @@ def visualize(self, annotation_groups=None): nodules = scan.cluster_annotations() scan.visualize(annotation_groups=nodules) + """ images = self.load_all_dicom_images() @@ -552,7 +585,9 @@ def slice_spacing(self): This computes the median of the difference between the slice coordinates, i.e., `scan.slice_zvals`. - NOTE: that this attribute is typically (but not always!) the + Note + ---- + This attribute is typically (but not always!) the same as the `slice_thickness` attribute. Furthermore, the `slice_spacing` does NOT necessarily imply that all the slices are spaced with spacing (although they often are). diff --git a/pylidc/__init__.py b/pylidc/__init__.py index 064f3f2..8327792 100644 --- a/pylidc/__init__.py +++ b/pylidc/__init__.py @@ -1,7 +1,4 @@ """ -pylidc --------------------------------------------------------- - Author: Matt Hancock, not.matt.hancock@gmail.com This python module implements an (ORM) object relational mapping to an sqlite database containing the annotation information from @@ -23,7 +20,7 @@ """ from __future__ import print_function as _pf -__version__ = '0.1.9' +__version__ = '0.2.0' # Hidden stuff. import os as _os @@ -51,21 +48,21 @@ def query(*args): import pylidc as pl - qu = pl.query(pl.Scan).filter(pl.Scan.slice_thickness <= 1.) - print qu.count() + scans = pl.query(pl.Scan).filter(pl.Scan.slice_thickness <= 1.) + print scans.count() # => 97 - scan = qu.first() + scan = scans.first() print(len(scan.annotations)) # => 11 - qu = pl.query(pl.Annotation).filter(pl.Annotation.malignancy > 3, - pl.Annotation.spiculation < 3) - print(qu.count()) + anns = pl.query(pl.Annotation).filter(pl.Annotation.malignancy > 3, + pl.Annotation.spiculation < 3) + print(anns.count()) # => 1083 - annotation = qu.first() - print(annotation.volume) + ann = anns.first() + print(ann.volume) # => 5230.33874999 """ return _session.query(*args) diff --git a/pylidc/utils.py b/pylidc/utils.py index b6a6aa1..34ba5fc 100644 --- a/pylidc/utils.py +++ b/pylidc/utils.py @@ -2,11 +2,12 @@ from .Annotation import Annotation def consensus(anns, clevel=0.5, pad=None, ret_masks=True, verbose=True): - """ - Return the boolean-valued consensus volume amongst the + """Return the boolean-valued consensus volume amongst the provided annotations (`anns`) at a particular consensus level (`clevel`). + Parameters + ---------- anns: list of `pylidc.Annotation` objects This list should be probably be one of the lists returned by the `pylidc.Scan.cluster_annotations` @@ -30,7 +31,9 @@ def consensus(anns, clevel=0.5, pad=None, ret_masks=True, verbose=True): verbose: bool, default=True Turns the DICOM image loading message on/off. - returns: consensus_mask, consensus_bbox[, masks] + Returns + ------- + consensus_mask, consensus_bbox[, masks]: (ndarray, tuple[, list]) `consensus_mask` is the boolean-valued volume of the annotation masks at `clevel` consensus. `consensus_bbox` is a 3-tuple of slices that can be used to index into the image volume at the diff --git a/setup.cfg b/setup.cfg index 2a9acf1..ed8a958 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,2 +1,5 @@ [bdist_wheel] universal = 1 + +[metadata] +license_file = LICENSE