cellmarkeraccordion

R package for automatically annotating and interpreting single-cell populations.

Installation

To install the cellmarkeraccordion package directly from GitHub the devtools package is required. If not already installed on your system, run

install.packages("devtools")

Otherwise, load devtools and install the cellmarkeraccordion by

library(devtools)
install_github("TebaldiLab/cellmarkeraccordion", dependencies = TRUE)

Loading

To load the cellmarkeraccordion run

library(cellmarkeraccordion)
library(Seurat)
library(data.table)

Input data

All the functions of the cellmarkeraccordion accept as input either a Seurat object or a raw or normalized count matrix. As an example we used a dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 10X Genomics. Load the raw counts and create a Seurat object

load(system.file("extdata", "counts.rda", package = "cellmarkeraccordion")) #counts data
# Create Seurat Object
data <- CreateSeuratObject(counts = counts, min.cells = 3, min.features = 200)

Process and cluster the data

data <- NormalizeData(data)
data <- FindVariableFeatures(data, selection.method = "vst", nfeatures = 2000)
data <- ScaleData(data)
data <- RunPCA(data, features = VariableFeatures(object = data))
data <- FindNeighbors(data, dims = 1:10)
data <- FindClusters(data, resolution = 0.8)
data <- RunUMAP(data, dims = 1:10)

Annotate and interprete single-cell populations with the built-in Cell Marker Accordion database

cellmarkeraccordion allows to automatically identifies cell populations in multiple tissue in single-cell dataset by running function accordion. It requires in input only a Seurat object or a raw or normalized count matrix with genes on rows and cells on columns. The cell types annotation is performed by exploiting the built-in Cell Marker Accordion database of marker genes. In addition, this function provides an easy interpretation of the results by reporting the for each group of cells the top marker genes which mostly impacted the annotation, together with the top cell types and their relationship based on the cell ontology tree (thanks to the include_detailed_annotation_info and plot parameters).

Run list_tissues() function to explore which tissues are available in the Cell Marker Accordion.

available_tissue<-list_tissues(species = "Human")
available_tissue[1:20]

To perform cell types identification by cluster and obtain detailed annotation information simply run:

# Input: Seurat object
# Output: Seurat object with annotation results 
data <- accordion(data, assay ="RNA", species ="Human", tissue="blood", annotation_resolution = "cluster", max_n_marker = 30, include_detailed_annotation_info = TRUE, plot = TRUE)

DimPlot(data, group.by = "accordion_per_cluster")

Or you can use raw counts matrix and specify cluster's id for each cell.

# Input: raw counts and clusters id  
raw_counts <- GetAssayData(data, assay="RNA", slot='counts')
clusters<- data.table(cell = rownames([email protected]), cluster = [email protected]$seurat_clusters)
# Output: list with annotation results 
output <- accordion(counts, assay ="RNA", species ="Human", tissue="blood", cluster_info = clusters, annotation_resolution= "cluster", max_n_marker = 30, include_detailed_annotation_info = TRUE, plot = TRUE)

Improve the biological interpretation of the results

The Cell Marker Accordion has been developed to improve the biological interpretation of the results, by returning a dot plot listing the top N cell types for each cluster. The dot size is proportional to the impact score, and the winning annotation is highlighted.

data@misc[["accordion"]][["cluster_resolution"]][["detailed_annotation_info"]][["top_celltypes_plot"]][["global"]]

Further insight is provided by looking at the percentage of cells assigned to the top scoring cell types, and their similarity based on the Cell Ontology hierarchy

data@misc[["accordion"]][["cluster_resolution"]][["detailed_annotation_info"]][["top_celltypes_plot"]][["2_B cell"]]

Finally the top N marker genes contributing to the annotation of each specific cell type can be explored

data@misc[["accordion"]][["cluster_resolution"]][["detailed_annotation_info"]][["top_markers_per_celltype_cluster_plot"]][["global"]]

Cell type or pathways identification with custom genes sets

cellmarkeraccordion performs automatic identification of cell populations based on a custom input set of marker genes by running function accordion_custom . It requires in input only a Seurat object or a raw or normalized count matrix with genes on rows and cells on columns and a table of marker genes associated to cell types or to pathways. The marker table should contains at least two columns, the category_column, which specifies cell types or categories, and the marker_column, which specifies the corresponding markers on each row. Columns indicating the marker type (either positive or negative), and the marker weight can be optionally included. We used a published human retinal dataset (Lu et al., Dev Cell. 2020) and we included a table of well-known markers associated to retinal cell types.

load(system.file("extdata", "retina_markers.rda", package = "cellmarkeraccordion"))
head(retina_markers)

cell_type	marker
Cones	ARR3
Retinal ganglion cells	ATOH7
Retinal ganglion cells	POU4F1
Rods	C11orf96
Bipolar cells	CA10
Bipolar cells	CADPS

load(system.file("extdata", "retinal_data.rda", package = "cellmarkeraccordion"))

To perform the annotation with the custom marker genes run:

retinal_data <-accordion_custom(retinal_data, annotation_resolution = "cluster", marker_table  = retina_markers, category_column = "cell_type", marker_column = "marker", min_n_marker = 2, plot=T, annotation_name = "cell_type_retina")

DimPlot(retinal_data, group.by = "cell_type_retina_per_cluster", reduction = "umap.integrated", label=T) + NoLegend()

You can also exploit the accordion_custom function to explore the expression of group of genes associated to a specific pathway. As an example:

load(system.file("extdata", "marker_table_pathway.rda", package = "cellmarkeraccordion"))

pathway	genes
apoptosis	AKT3
apoptosis	IRAK3
apoptosis	CHP1
apoptosis	CHUK
apoptosis	CSF2RB
apoptosis	DFFA
apoptosis	DFFB
apoptosis	ENDOG
apoptosis	AKT1
apoptosis	AKT2

And simply run:

retinal_data<-accordion_custom(retinal_data, marker_table_pathway, category_column= "pathway", marker_column ="genes", annotation_resolution = "cell",annotation_name = "apoptosis_signature")

FeaturePlot(retinal_data, features = "apoptosis_signature_per_cell_score",  max.cutoff = "q90")

Automatically identify and interpreting cell cycle state of single-cell populations

cellmarkeraccordion provides the accordion_cellcycle function to automatically assing cell cycle state to cell populations. This function exploits the built-in collection of marker genes associated to each cell cycle phase (G0, G1, G2M, S). It takes in input either a Seurat object or a raw or normalized count matrix. To perform cell cycle identification run:

data<-accordion_cellcycle(data)

Annotate and interprete aberrant single-cell populations with the built-in Cell Marker Accordion disease database

cellmarkeraccordion includes the accordion_disease function which allows the identification of aberrant populations exploiting the built-in Accordion gene marker disease database. This function requires in input either a Seurat object or a raw or normalized count matrix. It is possible to specific both disease and critical cells to identify, thanks to disease and cell_types parameters. We analyzed a published scRNA-seq dataset of CD34+ bone marrow cells from 5 healthy controls and 14 acute myeloid leukemia patients.

load(system.file("extdata", "bone_marrow_data.rda", package = "cellmarkeraccordion"))

To identify for example "Leukemic Hematopoietic Stem Cell" in "acute myeloid leukemia" patients from "bone marrow" samples run:

bone_marrow_data = accordion_disease(bone_marrow_data, assay = "RNA", species="Human",tissue="bone marrow", disease= "acute myeloid leukemia", NCIT_celltypes = "Leukemic Hematopoietic Stem Cell",annotation_resolution = "cell", max_n_marker = 30, log2FC_threshold = 1, plot=F, annotation_name = "LHSC")

FeaturePlot(bone_marrow_data, features = "LHSC_per_cell_score", min.cutoff = "q15",max.cutoff = "q95", split.by="condition", cols = c("gray","red"), order = T)

To identify instead "Neoplastic Monocyte" run:

bone_marrow_data = accordion_disease(bone_marrow_data, assay = "RNA", species="Human",tissue="bone marrow", disease= "acute myeloid leukemia", NCIT_celltypes = "Neoplastic Monocyte",annotation_resolution = "cell",max_n_marker = 30, log2FC_threshold = 1, plot=F, annotation_name = "Neoplastic_monocyte")

FeaturePlot(bone_marrow_data, features = "Neoplastic_monocyte_per_cell_score", min.cutoff = "q10",max.cutoff = "q90", split.by="condition", cols = c("gray","blue"), order = T)

Name		Name	Last commit message	Last commit date
Latest commit History 378 Commits
.github/workflows		.github/workflows
R		R
data		data
inst/extdata		inst/extdata
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
cellmarkeraccordion.Rproj		cellmarkeraccordion.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cellmarkeraccordion

R package for automatically annotating and interpreting single-cell populations.

Installation

Loading

Input data

Annotate and interprete single-cell populations with the built-in Cell Marker Accordion database

Improve the biological interpretation of the results

Cell type or pathways identification with custom genes sets

Automatically identify and interpreting cell cycle state of single-cell populations

Annotate and interprete aberrant single-cell populations with the built-in Cell Marker Accordion disease database

About

Releases 1

Packages

Contributors 2

Languages

License

TebaldiLab/cellmarkeraccordion

Folders and files

Latest commit

History

Repository files navigation

cellmarkeraccordion

R package for automatically annotating and interpreting single-cell populations.

Installation

Loading

Input data

Annotate and interprete single-cell populations with the built-in Cell Marker Accordion database

Improve the biological interpretation of the results

Cell type or pathways identification with custom genes sets

Automatically identify and interpreting cell cycle state of single-cell populations

Annotate and interprete aberrant single-cell populations with the built-in Cell Marker Accordion disease database

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages