Skip to content

Latest commit

 

History

History
89 lines (63 loc) · 3.71 KB

README.md

File metadata and controls

89 lines (63 loc) · 3.71 KB

Pycytominer: Data processing functions for profiling perturbations

Build Status Coverage Status Code style: black RTD

Pycytominer is a suite of common functions used to process high dimensional readouts from high-throughput cell experiments. The tool is most often used for processing data through the following pipeline:

pipeline

Image data flow from the microscope to segmentation and feature extraction tools (e.g. CellProfiler or DeepProfiler). From here, additional single cell processing tools curate the single cell readouts into a form manageable for pycytominer input. For CellProfiler, we use cytominer-database or cytominer-transport. For DeepProfiler, we include single cell processing tools in pycytominer.cyto_utils.

From the single cell output, we perform five steps using a simple API (described below), before passing along our data to cytominer-eval for quality and perturbation strength evaluation.

API

The API is consistent for the five major processing functions:

  1. Aggregate
  2. Annotate
  3. Normalize
  4. Feature select
  5. Consensus

Each processing function has unique arguments, see our documentation for more details.

Installation

Pycytominer is still in beta, and can only be installed from GitHub:

pip install git+git://github.com/cytomining/pycytominer

Since the project is actively being developed, with new features added regularly, we recommend installation using a hash:

# Example:
pip install git+git://github.com/cytomining/pycytominer@2aa8638d7e505ab510f1d5282098dd59bb2cb470

Usage

Using pycytominer is simple and fun.

# Real world example
import pandas as pd
import pycytominer

commit = "da8ae6a3bc103346095d61b4ee02f08fc85a5d98"
url = f"https://media.githubusercontent.com/media/broadinstitute/lincs-cell-painting/{commit}/profiles/2016_04_01_a549_48hr_batch1/SQ00014812/SQ00014812_augmented.csv.gz"

df = pd.read_csv(url)

normalized_df = pycytominer.normalize(
    profiles=df,
    method="standardize",
    samples="Metadata_broad_sample == 'DMSO'"
)

Customized usage

Pycytominer was written with a goal of processing any high-throughput profiling data. However, the initial use case was developed for processing image-based profiling experiments specifically. And, more specifically than that, image-based profiling readouts from CellProfiler measurements from Cell Painting data.

Therefore, we have included some custom tools in pycytominer/cyto_utils.

Citation

Please support computational biology by citing software. If you have used pycytominer in your project, please cite us as:

@software{pycytominer,
  author = {Way, G.P., Chandrasekaran, S.N., Bornholdt, M., Fleming, S.J., Tsang, H., Adeboye, A., Cimini, B., Weisbart, E., Ryder, P., Stirling, D., Jamali, N., Carpenter, A.E., Singh, S.},
  title = {Pycytominer: Data processing functions for profiling perturbations},
  url = {https://github.com/cytomining/pycytominer},
  version = {0.1},
  date = {2021},
}