Skip to content

Preprocessing

Kane edited this page Jul 22, 2022 · 4 revisions

Preprocessing involves turning the raw bam or FASTQ files acquired from a sequencing platform into the cell x gene matrices used to start single-cell analysis. Specifically, the raw transcripts are aligned to a reference transcriptome by a sequence alignment software.

For 10X scRNASeq, the standard 10x cellranger pipeline is sufficient for most analyses. The 10x website includes comprehensive FAQs explaining how install and run cellranger. A summary is provided by the Babraham institute. An introduction to understanding the 10x bam format itself is found here

An alternatives for sequence alignment is the kallistobus pipeline, published here. The biggest advantage over cellranger is speed and ability to run on a desktop computer. The authors also suggest a slight improvement over 10x sequencing in regards to sequence mapping.

The choice of reference transcriptome to align transcripts to is also relevant. One recent (May 2022) article stated to provide a scRNA-seq optimised transcriptomic reference. The issues claimed to be addressed were 1) reads mapping immediately 3’ to known gene boundaries due to poor 3’ UTR annotation; (2) intronic reads stemming from unannotated exons or pre-mRNA; (3) discarded reads due to gene overlaps.

Kane Foster (22-07-2022)

Clone this wiki locally