RNA Velocity analysis is a trajectory analysis based on spliced/unspliced RNA ratio.
It is quite popular https://www.nature.com/articles/s41586-018-0414-6,
however, the original pipeline is not well supported:
https://github.com/velocyto-team/velocyto.R/issues
There is a new one from kallisto team: https://bustools.github.io/BUS_notebooks_R/velocity.html
- module load gcc/6.2.0
- installed R-devel: https://www.r-bloggers.com/r-devel-in-parallel-to-regular-r-installation/
because one of the packages wanted R4.0 - configure R with
./configure --enable-R-shlib
for rstudio - remove conda from PATH to avoid using its libcurl
- module load boost/1.62.0
- module load hdf5/1.10.1
- installing velocyto.R: velocyto-team/velocyto.R#86
bash:
sudo dnf update R
sudo dnf install boost boost-devel hdf5 hdf5-devel
git clone https://github.com/velocyto-team/velocyto.R
rstudio/R:
BiocManager::install("pcaMethods")
setwd("/where/you/cloned/velocyto.R")
devtools::install_local("velocyto.R")
Rscriptdev
01_get_velocity_files.R- output:
cDNA_introns.fa
cDNA_tx_to_capture.txt
introns_tx_to_capture.txt
tr2g.tsv
This step takes ~1-2h and 100G or RAM:
sbatch
02_kallisto_index.sh
- inDrops3 support: BUStools/bustools#4
- merge reads from multiple flowcells first
- https://pypi.org/project/barcode-splitter/
barcode_splitter --bcfile samples.tsv Undetermined_S0_L001_R1.fastq Undetermined_S0_L001_R2.fastq Undetermined_S0_L001_R3.fastq Undetermined_S0_L001_R4.fastq --idxread 3 --suffix .fq
kallisto bus counting procedure works on per sample basis, so we need to split samples to separate fastq files, and merge samples across lanes.
- kallisto_count
- output:
spliced.barcodes.txt
spliced.genes.txt
spliced.mtx
unspliced.barcodes.txt
unspliced.genes.txt
unspliced.mtx
- create_seurat_sample.Rmd
- also removes empty droplets