CellStitch-Cuda: CUDA-accelerated CellStitch 3D labeling.

About this repo

An overhaul of the CellStitch algorithm, developed by Yining Liu and Yinuo Jin (original repository), publication can be found here.

Some major adjustments:

Replaced NumPy with CuPy for GPU-accelerated calculations.
Replaced nested for-loops with vectorized calculations for dramatic speedups (~100x).
Included novel segmentation method InstanSeg, which enables multichannel inputs (repo and publication).
An all-in-one method that takes an ZCYX-formatted .tif file, performs the correct transposes, and writes stitched labels.
Included a histogram-based bleach correction to adjust for signal degradation over the Z-axis (originally developed for ImageJ in (Miura 2020) and released for Python by marx-alex in napari-bleach-correct).
Completely rewrote the interpolation pipeline to obtain equal results with less RAM usage and at a much higher speed.

Some comparisons

Stitching

The calculations were run on the same machine (GPU: NVIDIA Quadro RTX 6000 24 GB; CPU: Intel Xeon Gold 6252 (48/96 cores); RAM: 1024 GB), the core count of which gave it a clear parallel-processing advantage. This particularly affects the fill_holes_and_remove_small_masks function, which has been rewritten to utilize parallel processing.

In image A, GPU VRAM load was ~200 MB at its peak.

In image B, GPU VRAM load was ~2442 MB at its peak

Interpolation

The revised interpolation method leverages a more efficient alternative to SciPy's binary_fill_holes, speeding up the process tremendously (>100x)

Image: 10x1024x1024 px containing 4117 stitched masks

Metric	Original	Revised
Time (s)	3356.18	26.31
RAM (GB)	~60	~16

Installation

Notes

This setup has so far only been verified on Windows-based, CUDA-accelerated machines. Testing has only been performed on CUDA 12.x. There are no reasons why 11.x should not work (check instructions), but your mileage may vary.

Conda setup

conda create -n cellstitch-cuda python=3.11
conda activate cellstitch-cuda

Install using PyPi

pip install cellstitch-cuda
pip uninstall torch
conda install pytorch pytorch-cuda=12.4 -c conda-forge -c pytorch -c nvidia

You may replace the version number for pytorch-cuda with whatever is applicable for you.

Additional steps for CUDA 11.x

pip uninstall cupy-cuda12x
pip install cupy-cuda11x

Instructions

Example code

For more detail, see examples/.

From an image

This assumes a multichannel grayscale image in the order ZCYX. Single-channel images are currently not supported, but will be in the future.

from cellstitch_cuda.pipeline import cellstitch_cuda

img = "path/to/image.tif"  # ZCYX
# or feed img as a numpy ndarray

volumetric_masks = cellstitch_cuda(img)

From pre-existing orthogonal labels

These are label images over the Z-, X, and Y-axis. They are assumed to be in the order ZYX. If you set output_masks=True in the cellstitch_cuda()-function, these masks will be written to disk (either in the input folder, or in the folder set in output_path).

from cellstitch_cuda.pipeline import full_stitch
import tifffile

# Define xy_masks, yz_masks, xz_masks
yx_masks = tifffile.imread("path/to/yx_masks.tif")  # ZYX
yz_masks = tifffile.imread("path/to/yz_masks.tif")  # ZYX
xz_masks = tifffile.imread("path/to/xz_masks.tif")  # ZYX

volumetric_masks = full_stitch(yx_masks, yz_masks, xz_masks)

Arguments

cellstitch_cuda.pipeline.cellstitch_cuda()

cellstitch_cuda() takes the following arguments:

img: Either a path pointing to an existing image, or a numpy.ndarray. Must be 4D (ZCYX).
output_masks: True to write all masks to the output path, or False to only return the final stitched mask. Default False
output_path: Set to None to write to the input file location (if provided). Ignored of output_masks is False. N.B.: If output_masks is True, while no path has been provided (e.g., by loading a numpy.ndarray directly), the output masks will be written to the folder where the script is run from. Default None
seg_mode: Instanseg segmentation mode: "nuclei" to only return nuclear masks, "cells" to return all the cell masks (including those without nuclei), or "nuclei_cells", which returns only cells with detected nuclei. Default "nuclei_cells"
pixel_size: XY pixel size in microns per pixel. When set to None, will be read from img metadata if possible. Default None
z_step: Z pixel size (z step) in microns per step. When set to None, will be read from img metadata if possible. Default None
bleach_correct: Whether histogram-based signal degradation correction should be applied to img. Default True
filtering: Whether the optimized fill_holes_and_remove_small_masks function should be executed. Default True
interpolation: If set to True, the function returns a tuple of the array of stitched masks and an array with interpolated volumetric masks. CellStitch provides an interpolation method to turn anisotropic masks into pseudo-isotropic masks. The algorithm, adapted from the original codebase, has been completely rewritten for efficient parallel processing. Outputs a separate mask in the output folder if output_masks = True. Default False
n_jobs: Set the number of threads to be used in parallel processing tasks. Use 1 for debugging. Generally, best left at the default value. Default -1
verbose: Verbosity. Default False

cellstitch_cuda.pipeline.full_stitch()

full_stitch() takes the following arguments:

xy_masks_prior: numpy.ndarray with XY masks, order ZYX
yz_masks: numpy.ndarray with YZ masks, order ZYX
xz_masks: numpy.ndarray with XZ masks, order ZYX
nuclei: numpy.ndarray with XY masks of nuclei, order ZYX. If provided, it will run the function filter_nuclei_cells() to filter volumetric masks by the presence of a 2D nucleus mask. Default None
filter: Use CellPose-based fill_holes_and_remove_small_masks() function. Default True
n_jobs: Number of threads used. Set n_jobs to 1 for debugging parallel processing tasks. Default -1
verbose: Verbosity. Default False

cellstitch_cuda.interpolate.full_interpolate()

full_interpolate() takes the following arguments:

masks: numpy.ndarray with stitched XY masks
anisotropy: The ratio (or mismatch) between the Z and XY sampling rate, calculated as anisotropy = z_step/pixel_size. Default 2
dist: The distance metric used to calculate the Optimal Transport between two masks. Default "sqeuclidean"
n_jobs: Number of threads used. Set n_jobs to 1 for debugging parallel processing tasks. Default -1
verbose: Verbosity. Default False

References

Goldsborough, T., O’Callaghan, A., Inglis, F., Leplat, L., Filbey, A., Bilen, H., & Bankhead, P. (2024) A novel channel invariant architecture for the segmentation of cells and nuclei in multiplexed images using InstanSeg. bioRxiv, 2024.09.04.611150. doi: 10.1101/2024.09.04.611150

Liu, Y., Jin, Y., Azizi, E., & Blumberg, E. (2023) Cellstitch: 3D cellular anisotropic image segmentation via optimal transport. BMC Bioinformatics, 24(480). doi: 10.1186/s12859-023-05608-2

Miura, K. (2020) Bleach correction ImageJ plugin for compensating the photobleaching of time-lapse sequences. F1000Res, 9:1494. doi: 10.12688/f1000research.27171.1

Stringer, C., Wang, T., Michaelos, M., & Pachitariu, M. (2021) Cellpose: a generalist algorithm for cellular segmentation. Nature Methods, 18(1), 100-106. doi: 10.1038/s41592-020-01018-x

Name		Name	Last commit message	Last commit date
Latest commit History 319 Commits
.idea		.idea
examples		examples
figures		figures
src/cellstitch_cuda		src/cellstitch_cuda
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CellStitch-Cuda: CUDA-accelerated CellStitch 3D labeling.

About this repo

Some comparisons

Stitching

Interpolation

Installation

Notes

Conda setup

Install using PyPi

Additional steps for CUDA 11.x

Instructions

Example code

From an image

From pre-existing orthogonal labels

Arguments

cellstitch_cuda.pipeline.cellstitch_cuda()

cellstitch_cuda.pipeline.full_stitch()

cellstitch_cuda.interpolate.full_interpolate()

References

About

Languages

License

Tomvl117/cellstitch-cuda

Folders and files

Latest commit

History

Repository files navigation

CellStitch-Cuda: CUDA-accelerated CellStitch 3D labeling.

About this repo

Some comparisons

Stitching

Interpolation

Installation

Notes

Conda setup

Install using PyPi

Additional steps for CUDA 11.x

Instructions

Example code

From an image

From pre-existing orthogonal labels

Arguments

cellstitch_cuda.pipeline.cellstitch_cuda()

cellstitch_cuda.pipeline.full_stitch()

cellstitch_cuda.interpolate.full_interpolate()

References

About

Resources

License

Stars

Watchers

Forks

Languages