To install the cellmarkeraccordion package directly from GitHub the devtools package is required. If not already installed on your system, run
install.packages("devtools")
Otherwise, load devtools and install the cellmarkeraccordion by
library(devtools)
install_github("TebaldiLab/cellmarkeraccordion", dependencies = TRUE)
To load the cellmarkeraccordion run
library(cellmarkeraccordion)
library(Seurat)
library(data.table)
All the functions of the cellmarkeraccordion accept as input either a Seurat object or a raw or normalized count matrix. As an example we used a dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 10X Genomics. Load the raw counts and create a Seurat object
load(system.file("extdata", "counts.rda", package = "cellmarkeraccordion")) #counts data
# Create Seurat Object
data <- CreateSeuratObject(counts = counts, min.cells = 3, min.features = 200)
Process and cluster the data
data <- NormalizeData(data)
data <- FindVariableFeatures(data, selection.method = "vst", nfeatures = 2000)
data <- ScaleData(data)
data <- RunPCA(data, features = VariableFeatures(object = data))
data <- FindNeighbors(data, dims = 1:10)
data <- FindClusters(data, resolution = 0.8)
data <- RunUMAP(data, dims = 1:10)
cellmarkeraccordion allows to automatically identifies cell populations in multiple tissue in single-cell dataset by running function accordion
.
It requires in input only a Seurat object or a raw or normalized count matrix with genes on rows and cells on columns. The cell types annotation is performed by exploiting the built-in Cell Marker Accordion database of marker genes. In addition, this function provides an easy interpretation of the results by reporting the for each group of cells the top marker genes which mostly impacted the annotation, together with the top cell types and their relationship based on the cell ontology tree (thanks to the include_detailed_annotation_info and plot parameters).
Run list_tissues() function to explore which tissues are available in the Cell Marker Accordion.
available_tissue<-list_tissues(species = "Human")
available_tissue[1:20]
To perform cell types identification by cluster and obtain detailed annotation information simply run:
# Input: Seurat object
# Output: Seurat object with annotation results
data <- accordion(data, assay ="RNA", species ="Human", tissue="blood", annotation_resolution = "cluster", max_n_marker = 30, include_detailed_annotation_info = TRUE, plot = TRUE)
DimPlot(data, group.by = "accordion_per_cluster")
Or you can use raw counts matrix and specify cluster's id for each cell.
# Input: raw counts and clusters id
raw_counts <- GetAssayData(data, assay="RNA", slot='counts')
clusters<- data.table(cell = rownames([email protected]), cluster = [email protected]$seurat_clusters)
# Output: list with annotation results
output <- accordion(counts, assay ="RNA", species ="Human", tissue="blood", cluster_info = clusters, annotation_resolution= "cluster", max_n_marker = 30, include_detailed_annotation_info = TRUE, plot = TRUE)
The Cell Marker Accordion has been developed to improve the biological interpretation of the results, by returning a dot plot listing the top N cell types for each cluster. The dot size is proportional to the impact score, and the winning annotation is highlighted.
data@misc[["accordion"]][["cluster_resolution"]][["detailed_annotation_info"]][["top_celltypes_plot"]][["global"]]
Further insight is provided by looking at the percentage of cells assigned to the top scoring cell types, and their similarity based on the Cell Ontology hierarchy
data@misc[["accordion"]][["cluster_resolution"]][["detailed_annotation_info"]][["top_celltypes_plot"]][["2_B cell"]]
Finally the top N marker genes contributing to the annotation of each specific cell type can be explored
data@misc[["accordion"]][["cluster_resolution"]][["detailed_annotation_info"]][["top_markers_per_celltype_cluster_plot"]][["global"]]
cellmarkeraccordion performs automatic identification of cell populations based on a custom input set of marker genes by running function accordion_custom
. It requires in input only a Seurat object or a raw or normalized count matrix with genes on rows and cells on columns and a table of marker genes associated to cell types or to pathways. The marker table should contains at least two columns, the category_column, which specifies cell types or categories, and the marker_column, which specifies the corresponding markers on each row. Columns indicating the marker type (either positive or negative), and the marker weight can be optionally included. We used a published human retinal dataset (Lu et al., Dev Cell. 2020) and we included a table of well-known markers associated to retinal cell types.
load(system.file("extdata", "retina_markers.rda", package = "cellmarkeraccordion"))
head(retina_markers)
cell_type | marker |
---|---|
Cones | ARR3 |
Retinal ganglion cells | ATOH7 |
Retinal ganglion cells | POU4F1 |
Rods | C11orf96 |
Bipolar cells | CA10 |
Bipolar cells | CADPS |
load(system.file("extdata", "retinal_data.rda", package = "cellmarkeraccordion"))
To perform the annotation with the custom marker genes run:
retinal_data <-accordion_custom(retinal_data, annotation_resolution = "cluster", marker_table = retina_markers, category_column = "cell_type", marker_column = "marker", min_n_marker = 2, plot=T, annotation_name = "cell_type_retina")
DimPlot(retinal_data, group.by = "cell_type_retina_per_cluster", reduction = "umap.integrated", label=T) + NoLegend()
You can also exploit the accordion_custom
function to explore the expression of group of genes associated to a specific pathway. As an example:
load(system.file("extdata", "marker_table_pathway.rda", package = "cellmarkeraccordion"))
pathway | genes |
---|---|
apoptosis | AKT3 |
apoptosis | IRAK3 |
apoptosis | CHP1 |
apoptosis | CHUK |
apoptosis | CSF2RB |
apoptosis | DFFA |
apoptosis | DFFB |
apoptosis | ENDOG |
apoptosis | AKT1 |
apoptosis | AKT2 |
And simply run:
retinal_data<-accordion_custom(retinal_data, marker_table_pathway, category_column= "pathway", marker_column ="genes", annotation_resolution = "cell",annotation_name = "apoptosis_signature")
FeaturePlot(retinal_data, features = "apoptosis_signature_per_cell_score", max.cutoff = "q90")
cellmarkeraccordion provides the accordion_cellcycle
function to automatically assing cell cycle state to cell populations. This function exploits the built-in collection of
marker genes associated to each cell cycle phase (G0, G1, G2M, S). It takes in input either a Seurat object or a raw or normalized count matrix.
To perform cell cycle identification run:
data<-accordion_cellcycle(data)
Annotate and interprete aberrant single-cell populations with the built-in Cell Marker Accordion disease database
cellmarkeraccordion includes the accordion_disease
function which allows the identification of aberrant populations exploiting the built-in Accordion gene marker disease database.
This function requires in input either a Seurat object or a raw or normalized count matrix. It is possible to specific both disease and critical cells to identify, thanks to disease and cell_types parameters. We analyzed a published scRNA-seq dataset of CD34+ bone marrow cells from 5 healthy controls and 14 acute myeloid leukemia patients.
load(system.file("extdata", "bone_marrow_data.rda", package = "cellmarkeraccordion"))
To identify for example "Leukemic Hematopoietic Stem Cell" in "acute myeloid leukemia" patients from "bone marrow" samples run:
bone_marrow_data = accordion_disease(bone_marrow_data, assay = "RNA", species="Human",tissue="bone marrow", disease= "acute myeloid leukemia", NCIT_celltypes = "Leukemic Hematopoietic Stem Cell",annotation_resolution = "cell", max_n_marker = 30, log2FC_threshold = 1, plot=F, annotation_name = "LHSC")
FeaturePlot(bone_marrow_data, features = "LHSC_per_cell_score", min.cutoff = "q15",max.cutoff = "q95", split.by="condition", cols = c("gray","red"), order = T)
To identify instead "Neoplastic Monocyte" run:
bone_marrow_data = accordion_disease(bone_marrow_data, assay = "RNA", species="Human",tissue="bone marrow", disease= "acute myeloid leukemia", NCIT_celltypes = "Neoplastic Monocyte",annotation_resolution = "cell",max_n_marker = 30, log2FC_threshold = 1, plot=F, annotation_name = "Neoplastic_monocyte")
FeaturePlot(bone_marrow_data, features = "Neoplastic_monocyte_per_cell_score", min.cutoff = "q10",max.cutoff = "q90", split.by="condition", cols = c("gray","blue"), order = T)