Skip to content

Clustering on Network of Samples

Notifications You must be signed in to change notification settings

khandaud15/conos

 
 

Repository files navigation

conos

Clustering on Network of Samples

  • What is Conos? It's a package to wire together large collections of single-cell RNA-seq datasets. It focuses on uniform mapping of homologous cell types across heterogeneous sample collections. For instance, a collection of dozens of peripheral blood samples from cancer patients, combined with dozens of controls. And perhaps also including samples of a related tissue, such as lymph nodes.

  • How does it work? overview Conos applies one of many error-prone methods to align each pair of samples in a collection, establishing weighted inter-sample cell-to-cell links, creating a global joint graph. Cells of the same type will tend to map to each other across many such pair-wise comparisons, forming cliques, that can be recognized as clusters (graph communities).

  • What does it produce? In essense, Conos will take a large, potentially heterogeneous panel of samples and will produce clustering grouping similar cell subpopulations together in a way that will be robust to inter-sample variation:
    example

  • What are the advantages over existing alignment methods? Conos is robust to heterogeneity of samples within collection, as well as noise. The ability to resolve finer subpopulation structure improves as the size of the panel increases.

  • What do I need to run it? Conos is an R package. Currently, it supports pre-processing (filtering, normalization, etc.) of the individual datasets using pagoda2 or Seurat.

Installation

Native installations have been tested in Linux. Normal installation should take <10min.

Native installation

Please make sure devtools package is installed (use install.packages("devtools") to install it if needed). Then install pagoda2 (or Seurat), then install conos:

devtools::install_github("hms-dbmi/conos")

If you have problems with sccore package, run devtools::install_github("hms-dbmi/sccore") before installing Conos.

System dependencies

The dependencies are inherited from pagoda2:

Ubuntu Dependencies

Install system dependencies, example here provided for Ubuntu

sudo apt-get update
sudo apt-get -y install libcurl4-openssl-dev libssl-dev
Red-Hat-based distributions Dependencies
yum install openssl-devel libcurl-devel
OS X

It is possible to install pagoda2 and Conos on OS X, however some users have reported issues with OpenMP configuration. For instructions see pagoda2 readme.

Installing Conos as Docker Container

If your system configuration is making it difficult to install Conos natively, an alternative way to get Conos running is through a docker container.

Note: on OS X, Docker Machine has Memory and CPU limit. To control it, please check instructions either for CLI or for Docker Desktop.

Ready-to-run docker image

The docker distribution has the latest version and also includes the Pagoda2 package. To start a docker container, first install docker on your platform and then start the pagoda container with the following command in the shell:

docker run -p 8787:8787 -e PASSWORD=pass docker.io/vpetukhov/conos:latest

The first time you run the command it will download several images so make sure that you have fast internet access setup. You can then point your browser to http://localhost:8787/ to get an Rstudio environment with pagoda2 and conos installed (log in using credentials rstudio / pass). Explore the docker --mount option to allow access of the docker image to your local files.

Note: if you already downloaded the docker image and want to update it, please run

docker pull vpetukhov/conos:latest

Building docker image on the fly

If you want to build image by your own, download the Dockerfile (available in this repo under /dockers) and run to following command to build it:

docker build -t conos .

This will create a "conos" docker image on your system (be patient, as the build takes ~30-50min or so). You can then run it using the following command:

docker run -d -p 8787:8787 -e PASSWORD=pass --name conos -it conos

Usage example

To see the class documentation, run ?Conos.

Alignment of datasets

Please see Conos tutorial for detailed usage. The overall runtime of the tutorial should be ~5 minutes.

Additional examples: forcing better alignment, integrating RNA-seq and ATAC-seq.

Given a list of individual processed samples (pl), Conos processing can be as simple as this:

# construct conos object, where pl is a list of pagoda2 objects 
con <- Conos$new(pl)

# build graph
con$buildGraph()

# find communities
con$findCommunities()

# plot joint graph
con$plotGraph()

# plot panel with joint clustering results
con$plotPanel()

Integration with ScanPy

For integration with ScanPy you need to save Conos files on disk from R session, than upload these files from Python. See the following tutorials:

Running RNA velocity on a conos object

First of all, in order to obtain an RNA velocity plot from a conos object you have to use the dropEst pipeline to align and annotate your single-cell RNA-seq measurments. You can see this tutorial and this shell script to see how it can be done. In this example we specifically assume that when running dropEst you have used the -V option to get estimates of unspliced/spliced counts from the dropEst directly. Secondly, you need the velocyto.R package for the actual velocity estimation and visualisation.

After running dropEst you should have 2 files for each of the samples:

  • sample.rds (matrix of counts)
  • sample.matrices.rds (3 matrices of exons, introns and spanning reads)

The .matrices.rds files are the velocity files. Load them into R in a list (same order as you give to conos). Load, preprocess and integrate with conos the count matrices (.rds) as you normally would. Before running the velocity you have to had at least created an embedding and run the leiden clustering. Finally, you can esitmate the velocity:

### Assuming con is your conos object and cms.list is the list of your velocity files ###

library(velocyto.R)

# Preprocess the velocity files to match the conos obejct
vi <- velocityInfoConos(cms.list = cms.list, con = con, 
                        n.odgenes = 2e3, verbose = TRUE)

# Estimate RNA velocity
vel.info <- vi %$%
  gene.relative.velocity.estimates(emat, nmat, cell.dist = cell.dist, 
                                   deltaT = 1, kCells = 25, fit.quantile = 0.05, n.cores = 4)

# Visualise the velocity on your conos embedding 
# Takes a very long time! 
# Assign to a variable to speed up subsequent recalculations
cc.velo <- show.velocity.on.embedding.cor(vi$emb, vel.info, n = 200, scale = 'sqrt', 
                                          cell.colors = ac(vi$cell.colors, alpha = 0.5), 
                                          cex = 0.8, grid.n = 50, cell.border.alpha = 0,
                                          arrow.scale = 3, arrow.lwd = 0.6, n.cores = 4, 
                                          xlab = "UMAP1", ylab = "UMAP2")

# Use cc=cc.velo$cc when running again (skips the most time consuming delta projections step)
show.velocity.on.embedding.cor(vi$emb, vel.info, cc = cc.velo$cc, n = 200, scale = 'sqrt', 
                               cell.colors = ac(vi$cell.colors, alpha = 0.5), 
                               cex = 0.8, arrow.scale = 15, show.grid.flow = TRUE, 
                               min.grid.cell.mass = 0.5, grid.n = 40, arrow.lwd = 2,
                               do.par = F, cell.border.alpha = 0.1, n.cores = 4,
                               xlab = "UMAP1", ylab = "UMAP2")

Reference

If you find this pipeline useful for your research, please consider citing the paper:

Barkas N., Petukhov V., Nikolaeva D., Lozinsky Y., Demharter S., Khodosevich K. & Kharchenko P.V. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods, (2019). doi:10.1038/s41592-019-0466-z

About

Clustering on Network of Samples

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 51.6%
  • C 30.3%
  • R 17.9%
  • Other 0.2%