Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2020update #169

Open
wants to merge 962 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
962 commits
Select commit Hold shift + click to select a range
a58c5a1
try rstudio folder
wikiselev Oct 26, 2017
0b43c0b
update docker and its information
wikiselev Oct 26, 2017
cef9d2e
fix sildist bug in SC3
wikiselev Oct 27, 2017
1a6a92e
fix cluster installation
wikiselev Oct 27, 2017
ea063bb
add projection; fix M3Drop bug
wikiselev Oct 27, 2017
a2f178a
update index; switch off half of the chapters
wikiselev Oct 27, 2017
5066b38
move snn-cliq and MAGIC to utils
wikiselev Oct 27, 2017
5a14f2d
fix last chapters
wikiselev Oct 27, 2017
e8d404c
activate first chapters
wikiselev Oct 27, 2017
57a1e85
update README
wikiselev Oct 27, 2017
8b1ecd4
fix imputation, update chapter numbers
wikiselev Oct 27, 2017
b56a1c5
add scfind
wikiselev Oct 27, 2017
7865364
update the course website
wikiselev Oct 27, 2017
13323af
update the course website
wikiselev Oct 27, 2017
757b0d5
polishing up to chapter 11 inclusive
wikiselev Oct 27, 2017
9b52c89
Update 21-imputation.Rmd
Oct 27, 2017
541994d
Merge branch 'master' of https://github.com/hemberg-lab/scRNA.seq.course
wikiselev Oct 27, 2017
7d6d6c6
Merge pull request #128 from hemberg-lab/mhemberg-patch-1
Oct 27, 2017
acc010a
Merge branch 'master' of https://github.com/hemberg-lab/scRNA.seq.course
wikiselev Oct 27, 2017
6d8065c
update imputation
wikiselev Oct 27, 2017
c002009
update imputation
wikiselev Oct 27, 2017
b87c443
update seurat
wikiselev Oct 27, 2017
eecc91e
add limma
wikiselev Oct 28, 2017
8e591e7
fix chapter 8
wikiselev Oct 28, 2017
d4bba17
update the course website
wikiselev Oct 28, 2017
9131dec
update the course website
wikiselev Oct 28, 2017
e15cd89
update expression chapter; some minor updates
wikiselev Oct 28, 2017
5762bc4
Merge branch 'master' of https://github.com/hemberg-lab/scRNA.seq.course
wikiselev Oct 28, 2017
70e1608
add ARI; minor updates
wikiselev Oct 28, 2017
3a6fcdc
update the course website
wikiselev Oct 29, 2017
02b60a8
activate pdf building again
wikiselev Oct 29, 2017
d77e503
Merge branch 'master' of https://github.com/hemberg-lab/scRNA.seq.course
wikiselev Oct 29, 2017
94e13e8
update the course website
wikiselev Oct 29, 2017
a957fca
update the course website
wikiselev Oct 29, 2017
307b19f
update bib file with betterbib tool
wikiselev Oct 29, 2017
dbdb2de
Merge branch 'master' of https://github.com/hemberg-lab/scRNA.seq.course
wikiselev Oct 29, 2017
64794b8
update the course website
wikiselev Oct 29, 2017
02b882b
Update 27-ideal-scrnaseq-pipeline.Rmd
tallulandrews Oct 30, 2017
6bb64c5
Merge branch 'tallulandrews-patch-1'
wikiselev Oct 30, 2017
15ea384
minor text updates
wikiselev Oct 30, 2017
2d2fab1
Fixed typos
Oct 30, 2017
b97c8f5
Merge branch 'master' of https://github.com/hemberg-lab/scRNA.seq.course
wikiselev Oct 30, 2017
eea0ad5
Fixed typos
Oct 30, 2017
f95ba27
Fixed text
Oct 30, 2017
d080cfc
Fixed typos
Oct 30, 2017
a52f967
Minor fixes
Oct 30, 2017
b280ebd
Fixed typo
Oct 30, 2017
f3c215f
Minor changes
Oct 30, 2017
ea0682b
Merge branch 'master' into mhemberg-patch-1
wikiselev Oct 30, 2017
8163e55
Merge branch 'mhemberg-patch-1'
wikiselev Oct 30, 2017
505d97c
Merge branch 'master' into mhemberg-patch-2
wikiselev Oct 30, 2017
8db61c2
Merge branch 'mhemberg-patch-2'
wikiselev Oct 30, 2017
58f48b7
Merge branch 'master' into mhemberg-patch-3
wikiselev Oct 30, 2017
17f7b22
Merge branch 'mhemberg-patch-3'
wikiselev Oct 30, 2017
23eddcc
Merge branch 'master' into mhemberg-patch-4
wikiselev Oct 30, 2017
e9498ae
Merge branch 'mhemberg-patch-4'
wikiselev Oct 30, 2017
8695f55
Merge branch 'master' into mhemberg-patch-5
wikiselev Oct 30, 2017
dc64dd0
Merge branch 'mhemberg-patch-5'
wikiselev Oct 30, 2017
56b54b4
final updates
wikiselev Oct 30, 2017
9f48740
Merge branch 'mhemberg-patch-6'
wikiselev Oct 30, 2017
f77d10e
update the course website
wikiselev Oct 30, 2017
32830fa
switch off pdf building
wikiselev Oct 30, 2017
4850f26
update the course website
wikiselev Oct 31, 2017
5b75630
update the course website
wikiselev Nov 1, 2017
43ebe8c
Add exercise solutions
tallulandrews Nov 1, 2017
2171955
update index page
wikiselev Nov 2, 2017
25fc479
move to Bioconductor release branch
wikiselev Nov 2, 2017
b61ff94
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
wikiselev Nov 2, 2017
cd6891e
add RUVSeq dependency
wikiselev Nov 6, 2017
27ff0af
trying to fix RMySQL
wikiselev Nov 6, 2017
6f4278c
Fix dropout chapter
wikiselev Nov 7, 2017
e9c4f6f
update the course website
wikiselev Nov 7, 2017
6e71f2b
update the course website
wikiselev Nov 7, 2017
adb168e
update the course website
wikiselev Nov 8, 2017
0139493
update the course website
wikiselev Nov 9, 2017
224e41e
update the course website
wikiselev Nov 10, 2017
51bd31b
update the course website
wikiselev Nov 11, 2017
8aa5667
update the course website
wikiselev Nov 12, 2017
0971962
update the course website
wikiselev Nov 13, 2017
386ce57
update the course website
wikiselev Nov 14, 2017
c571626
update the course website
wikiselev Nov 15, 2017
3f4864c
update the course website
wikiselev Nov 16, 2017
bc8c8ed
update the course website
wikiselev Nov 18, 2017
83ef4b0
update the course website
wikiselev Nov 19, 2017
7df2e57
update the course website
wikiselev Nov 20, 2017
60e629a
update the course website
wikiselev Nov 22, 2017
57ec956
update the course website
wikiselev Nov 23, 2017
b66117e
fix pseudotime chapter
wikiselev Nov 24, 2017
dbb71a7
Fix mnn_correct issues; update Youtube link on the main page
wikiselev Dec 11, 2017
e75c4f6
fix mnnCorrect problem for reads
wikiselev Dec 13, 2017
fc79472
switch off normalisatio by gene length (it's wrong and biomaRt call f…
wikiselev Dec 18, 2017
d192598
forgot to add eval=FALSE
Dec 19, 2017
7b2e01c
Add Lab5 - download Tabula Muris
tallulandrews Jan 11, 2018
6f2862d
First half of introduction to R lab
tallulandrews Jan 12, 2018
a69e8e4
First half of introduction to R lab
tallulandrews Jan 12, 2018
a2cf8de
Delete L3-intro-to-R.Rmd
tallulandrews Jan 12, 2018
869f64b
Add colour explanation
tallulandrews Jan 12, 2018
6818d6d
Update L3-intro-to-R.Rmd
tallulandrews Jan 12, 2018
50573fc
feature_symbol is requried in the newest version of SC3
Jan 14, 2018
189a256
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
Jan 14, 2018
559f661
First half of processing raw data lab
tallulandrews Jan 15, 2018
6125940
add mnnCorrect diagram
tallulandrews Jan 16, 2018
c21b2ca
fix imputation and scmap
Jan 16, 2018
6efd273
Update 24-projection.Rmd
tallulandrews Jan 16, 2018
fe8289a
CCA (Seurat) plots
tallulandrews Jan 16, 2018
433a5ee
New Cross-dataset diagrams
tallulandrews Jan 16, 2018
dfb9985
Update 24-projection.Rmd
tallulandrews Jan 16, 2018
b9c181b
update the course website
wikiselev Jan 17, 2018
8eb19ee
update the course website
wikiselev Jan 17, 2018
eb652eb
JW first draft of reads QC lab 1
Jan 18, 2018
354d9e6
update the course website
wikiselev Jan 18, 2018
2ef8920
JW added wget commands to download Kolod et al. data
Jan 18, 2018
40fdb8b
JW first draft of alignment lab including images used in Rmd
Jan 18, 2018
a2290b7
update the course website
wikiselev Jan 19, 2018
5ae1aff
update the course website
wikiselev Jan 20, 2018
dd905af
update the course website
wikiselev Jan 21, 2018
05de376
update Dockerfile, add processing software
Jan 21, 2018
3ee0e25
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
Jan 21, 2018
6f1eab3
add curl
Jan 21, 2018
cedc4e0
fix R packages installation
Jan 21, 2018
2e277d2
add fastqc, kallisto, bedtools2 and cutadapt
Jan 22, 2018
f5c7db8
update the course website
wikiselev Jan 22, 2018
8948940
fix bedtools; remove unnessesary software
Jan 23, 2018
98607d6
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
Jan 23, 2018
bd6f9ba
fix bedtools again
Jan 23, 2018
3937a34
update the course website
wikiselev Jan 23, 2018
ed8e28d
fix kallisto; update README
Jan 24, 2018
7ed3d37
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
Jan 24, 2018
5c58349
update the course website
wikiselev Jan 24, 2018
9c06e14
add tidyverse and ggfortify
Jan 25, 2018
1fa4199
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
Jan 25, 2018
2a6f1c4
Upload 2000 transcript transcriptome -JW
Jan 25, 2018
be9d67a
update the course website
wikiselev Jan 25, 2018
aca6dc0
All files checked against docker image except L4 - JW
Jan 26, 2018
cc74252
update the course website
wikiselev Jan 26, 2018
cc4ffb7
update the course website
wikiselev Jan 27, 2018
ef214e0
Add demultiplexing lab
tallulandrews Jan 28, 2018
6655525
Update L3-intro-to-R.Rmd
tallulandrews Jan 28, 2018
2fa7a9c
demultiplexing example data
tallulandrews Jan 29, 2018
0f7ecfa
ID true cell barcodes example data
tallulandrews Jan 29, 2018
1c51597
merge new chapters from devel branch
Jan 30, 2018
e7eae87
update the course website
wikiselev Jan 30, 2018
ba952f7
remove empty continuation lines
Jan 31, 2018
fd999b2
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
Jan 31, 2018
0980850
update the course website
wikiselev Jan 31, 2018
600db99
order chapters
Jan 31, 2018
2231be2
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
Jan 31, 2018
094f940
Update 05-L1-process-raw.Rmd
tallulandrews Feb 1, 2018
a73df46
Update 30-de-real.Rmd
tallulandrews Feb 1, 2018
86d1c11
install trim_galore; minor edits
Feb 1, 2018
4d37ad2
Update 05-L1-process-raw.Rmd
tallulandrews Feb 1, 2018
b591c37
update trim_galore link
Feb 1, 2018
a9cbb63
Add demultiplexing perl scripts
tallulandrews Feb 1, 2018
28f04d6
Merge branch 'devel' of github.com:hemberg-lab/scRNA.seq.course into …
Feb 1, 2018
d157852
Merge branch 'devel' of github.com:hemberg-lab/scRNA.seq.course into …
Feb 1, 2018
97aa461
Update 05-L1-process-raw.Rmd
tallulandrews Feb 1, 2018
f78ca04
Update 06-L1-process-raw-align.Rmd
tallulandrews Feb 1, 2018
c339052
Update 09-L3-intro-to-R.Rmd
tallulandrews Feb 1, 2018
d76f481
Update 13-L5-Intro-TabulaMuris.Rmd
tallulandrews Feb 1, 2018
b69800c
split dockerfile in two
Feb 1, 2018
3c7c977
Merge branch 'devel' of github.com:hemberg-lab/scRNA.seq.course into …
Feb 1, 2018
bb10cea
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
Feb 1, 2018
b2ed408
update Dockerfile
Feb 1, 2018
d4ceec9
remove less
Feb 2, 2018
1b092a4
Add metaneighbour
tallulandrews Feb 2, 2018
69fffd1
Update 31-projection.Rmd
tallulandrews Feb 2, 2018
c3a6330
update the course website
wikiselev Feb 2, 2018
8d0bd5b
Update 05-L1-process-raw.Rmd
tallulandrews Feb 2, 2018
9df0c1e
Update 13-L5-Intro-TabulaMuris.Rmd
tallulandrews Feb 2, 2018
8fd3d68
Update 14-exprs-qc.Rmd
tallulandrews Feb 2, 2018
256d329
JW fixed references and tested code on AWS
Feb 2, 2018
e0f8c88
JW fixed references and tested code on AWS
Feb 2, 2018
cc722fa
Merge branch 'devel' of https://github.com/hemberg-lab/scRNA.seq.course
Feb 2, 2018
857be2d
Merge branch 'devel'
Feb 2, 2018
44c070d
Merge branch 'devel'
Feb 2, 2018
55dd29f
update the course website
wikiselev Feb 2, 2018
8a5482d
update the headers
Feb 2, 2018
28e9d3b
update main page, clean the docs folder
Feb 2, 2018
23f334e
update README; add empty-file to docs
Feb 2, 2018
fec0b01
remove year
Feb 2, 2018
ba5af60
update the course website
wikiselev Feb 2, 2018
6cedbcc
fix some outputs
Feb 2, 2018
176da10
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
Feb 2, 2018
2092837
update the course website
wikiselev Feb 2, 2018
ccee30f
update the course website
wikiselev Feb 3, 2018
dbb2b2f
hide projection chapter
Feb 3, 2018
96cfae2
Merge branch 'master' of github.com:hemberg-lab/scRNA.seq.course
Feb 3, 2018
9d04274
update the course website
wikiselev Feb 3, 2018
90ca80c
update the course website
wikiselev Feb 4, 2018
bb0405c
Updating pseudotime chapter.
davismcc Feb 5, 2018
b6f231f
update the course website
wikiselev Feb 5, 2018
6a33c86
Merge branch 'devel' of git://github.com/davismcc/scRNA.seq.course in…
Feb 5, 2018
ecdd535
add .Renviron to increase a max number of dlls to 250
Feb 5, 2018
67700bf
Merge branch 'davismcc-devel' into devel
Feb 5, 2018
dff9b75
Merge branch 'devel'
Feb 5, 2018
bb223e3
update the course website
wikiselev Feb 6, 2018
fc297bb
try to build an updated pdf
Feb 7, 2018
0732b2a
update the course website
wikiselev Feb 7, 2018
1686056
Add files via upload
tallulandrews Feb 8, 2018
43a2249
Add files via upload
tallulandrews Feb 8, 2018
e455cc2
Add files via upload
tallulandrews Feb 8, 2018
83c41a8
add Jenkinsfile
Feb 8, 2018
4f0f036
move deploy.sh to Jenkinsfile
Feb 8, 2018
e502bea
Add files via upload
tallulandrews Feb 9, 2018
f943448
fix latex errors
Feb 9, 2018
5a43a18
Merge branch 'master' of https://github.com/hemberg-lab/scRNA.seq.course
Feb 9, 2018
0a54c68
add triggers to the pipeline
Feb 9, 2018
862631d
Update 20-exprs-norm.Rmd
tallulandrews Feb 11, 2018
b090c24
add pushing step
Feb 12, 2018
ee85712
Merge branch 'master' of https://github.com/hemberg-lab/scRNA.seq.course
Feb 12, 2018
278a19a
Added Jenkinsfile
Feb 12, 2018
ab6e57b
Added Jenkinsfile
Feb 13, 2018
0333756
add base Dockerfile
Feb 13, 2018
2b3e2d0
delete docker-base
Feb 14, 2018
77319b7
fix the workspace error
Feb 27, 2018
a3500fe
update the course website
Feb 27, 2018
20d791c
update the course website
Mar 6, 2018
e400eab
update the course website
Mar 13, 2018
030a0df
remove pdf compilation from build
Mar 13, 2018
0ad0cdf
update the course website
Mar 13, 2018
b06062b
try to change docker cp command
Mar 17, 2018
d42f56b
update the course website
Mar 17, 2018
b2a1998
update the course website
Mar 20, 2018
8a3b775
rm unnecessary files
Mar 20, 2018
1ee9ac6
update the course website
Mar 20, 2018
854cce1
update the course website
Mar 27, 2018
934127b
update the course website
Apr 3, 2018
4537e9c
update the course website
Apr 10, 2018
4b4bc1c
update the course website
Apr 17, 2018
1b67fa6
update the course website
Apr 24, 2018
5922b46
add a statement about Share folder
Apr 25, 2018
198081c
update the course website
Apr 25, 2018
a6eb8a6
update the course website
Apr 26, 2018
6b9a453
add a note about Share folder
Apr 26, 2018
b3862fb
update the course website
Apr 26, 2018
570c954
add note
Apr 26, 2018
ddc9968
edit text
Apr 26, 2018
4cd1013
update the course website
Apr 26, 2018
93a2c30
update the course website
May 1, 2018
1260438
update the course website
May 8, 2018
d81fa44
update the course website
May 15, 2018
fcd70c8
update the course website
May 22, 2018
7a36651
update the course website
May 29, 2018
b64c48e
Proper citation for destiny
flying-sheep Jun 3, 2018
7e82e5d
Turn off the progress bar for scaling procedure
ChuliangXiao Oct 31, 2018
d33bd9d
fixed a few typos; updated code for scater plots
stephaniehicks Nov 20, 2018
fd17135
one more typo
stephaniehicks Nov 20, 2018
52a2359
Merge pull request #149 from flying-sheep/patch-1
Nov 21, 2018
ccadb22
Merge pull request #150 from ChuliangXiao/patch-1
Nov 21, 2018
aa3222f
Merge pull request #152 from stephaniehicks/master
Nov 21, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .Renviron
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
R_MAX_NUM_DLLS = 250
2 changes: 2 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# do not copy git directory
.git
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,15 @@
.Rhistory
.RData
.*.Rnb.cached
.DS_*
*/.DS_*
*.Rproj
.Rbuildignore
pars*.rds
deng.csv
scimpute_count.txt
MAGIC_count.csv
totalCounts_by_cell.rds
clust.rds
tung/reads.rds
tung/umi.rds
22 changes: 0 additions & 22 deletions 01-intro.Rmd

This file was deleted.

69 changes: 69 additions & 0 deletions 02-intro.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
output: html_document
---

# Introduction to single-cell RNA-seq

```{r, echo=FALSE}
library(knitr)
opts_chunk$set(fig.align = "center", echo=FALSE)
```

## Bulk RNA-seq

* A major breakthrough (replaced microarrays) in the late 00's and has been widely used since
* Measures the __average expression level__ for each gene across a large population of input cells
* Useful for comparative transcriptomics, e.g. samples of the same tissue from different species
* Useful for quantifying expression signatures from ensembles, e.g. in disease studies
* __Insufficient__ for studying heterogeneous systems, e.g. early development studies, complex tissues (brain)
* Does __not__ provide insights into the stochastic nature of gene expression

## scRNA-seq

* A __new__ technology, first publication by [@Tang2009-bu]
* Did not gain widespread popularity until [~2014](https://www.ohio.edu/bioinformatics/upload/Single-Cell-RNA-seq-Method-of-the-Year-2013.pdf) when new protocols and lower sequencing costs made it more accessible
* Measures the __distribution of expression levels__ for each gene across a population of cells
* Allows to study new biological questions in which __cell-specific changes in transcriptome are important__, e.g. cell type identification, heterogeneity of cell responses, stochasticity of gene expression, inference of gene regulatory networks across the cells.
* Datasets range __from $10^2$ to $10^6$ cells__ and increase in size every year
* Currently there are several different protocols in use, e.g. SMART-seq2 [@Picelli2013-sb], CELL-seq [@Hashimshony2012-kd] and Drop-seq [@Macosko2015-ix]
* There are also commercial platforms available, including the [Fluidigm C1](https://www.fluidigm.com/products/c1-system), [Wafergen ICELL8](https://www.wafergen.com/products/icell8-single-cell-system) and the [10X Genomics Chromium](https://www.10xgenomics.com/single-cell/)
* Several computational analysis methods from bulk RNA-seq __can__ be used
* __In most cases__ computational analysis requires adaptation of the existing methods or development of new ones

## Workflow

```{r intro-rna-seq-workflow, out.width = '90%', fig.cap="Single cell sequencing (taken from Wikipedia)"}
knitr::include_graphics("figures/RNA-Seq_workflow-5.pdf.jpg")
```

Overall, experimental scRNA-seq protocols are similar to the methods used for bulk RNA-seq. We will be discussing some of the most common approaches in the next chapter.

## Computational Analysis

This course is concerned with the computational analysis of the data
obtained from scRNA-seq experiments. The first steps (yellow) are general for any highthroughput sequencing data. Later steps (orange) require a mix of existing RNASeq analysis methods and novel methods to address the technical difference of scRNASeq. Finally the biological interpretation (blue) __should__ be analyzed with methods specifically developed for scRNASeq.

```{r intro-flowchart, out.width = '65%', fig.cap="Flowchart of the scRNA-seq analysis"}
knitr::include_graphics("figures/flowchart.png")
```

There are several reviews of the scRNA-seq analysis available including [@Stegle2015-uv].

Today, there are also several different platforms available for carrying out one or more steps in the flowchart above. These include:

* [Falco](https://github.com/VCCRI/Falco/) a single-cell RNA-seq processing framework on the cloud.
* [SCONE](https://github.com/YosefLab/scone) (Single-Cell Overview of Normalized Expression), a package for single-cell RNA-seq data quality control and normalization.
* [Seurat](http://satijalab.org/seurat/) is an R package designed for QC, analysis, and exploration of single cell RNA-seq data.
* [ASAP](https://asap.epfl.ch/) (Automated Single-cell Analysis Pipeline) is an interactive web-based platform for single-cell analysis.
* [Bioconductor](https://master.bioconductor.org/packages/release/workflows/html/simpleSingleCell.html) is a open-source, open-development software project for the analysis of high-throughput genomics data, including packages for the analysis of single-cell data.


## Challenges

The main difference between bulk and single cell RNA-seq is that each sequencing library represents a single cell, instead of a population of cells. Therefore, significant attention has to be paid to comparison of the results from different cells (sequencing libraries). The main sources of discrepancy between the libraries are:

* __Amplification__ (up to 1 million fold)
* __Gene 'dropouts'__ in which a gene is observed at a moderate expression level in one cell but is not detected in another cell [@Kharchenko2014-ts].

In both cases the discrepancies are introduced due to low starting amounts of transcripts since the RNA comes from one cell only. Improving the transcript capture efficiency and reducing the amplification bias are currently active areas of research. However, as we shall see in this course, it is possible to alleviate some of these issues through proper normalization and corrections.

7 changes: 0 additions & 7 deletions 02-literature.Rmd

This file was deleted.

72 changes: 72 additions & 0 deletions 03-exp-methods.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
output: html_document
---

```{r, echo=FALSE}
library(knitr)
opts_chunk$set(fig.align = "center", echo=FALSE, out.width = '70%')
```

## Experimental methods

```{r, fig.cap="Moore's law in single cell transcriptomics (image taken from [Svensson et al](https://arxiv.org/abs/1704.01379))", out.width = '100%'}
knitr::include_graphics("figures/moores-law.png")
```

Development of new methods and protocols for scRNA-seq is currently a very active area of research, and several protocols have been published over the last few years. A non-comprehensive list includes:

* CEL-seq [@Hashimshony2012-kd]
* CEL-seq2 [@Hashimshony2016-lx]
* Drop-seq [@Macosko2015-ix]
* InDrop-seq [@Klein2015-kz]
* MARS-seq [@Jaitin2014-ko]
* SCRB-seq [@Soumillon2014-eu]
* Seq-well [@Gierahn2017-es]
* Smart-seq [@Picelli2014-ic]
* Smart-seq2 [@Picelli2014-ic]
* [SMARTer](http://www.clontech.com/US/Products/cDNA_Synthesis_and_Library_Construction/Next_Gen_Sequencing_Kits/Total_RNA-Seq/Universal_RNA_Seq_Random_Primed)
* STRT-seq [@Islam2014-cn]

The methods can be categorized in different ways, but the two most important aspects are __quantification__ and __capture__.

For quantification, there are two types, __full-length__ and __tag-based__. The former tries to achieve a uniform read coverage of each transcript. By contrast, tag-based protocols only capture either the 5'- or 3'-end of each RNA. The choice of quantification method has important implications for what types of analyses the data can be used for. In theory, full-length protocols should provide an even coverage of transcripts, but as we shall see, there are often biases in the coverage. The main advantage of tag-based protocol is that they can be combined with unique molecular identifiers (UMIs) which can help improve the quantification (see chapter \@ref(umichapter)). On the other hand, being restricted to one end of the transcript may reduce the mappability and it also makes it harder to distinguish different isoforms [@Archer2016-zq].

The strategy used for capture determines throughput, how the cells can be selected as well as what kind of additional information besides the sequencing that can be obtained. The three most widely used options are __microwell-__, __microfluidic-__ and __droplet-__ based.

```{r, fig.cap="Image of microwell plates (image taken from Wikipedia)"}
knitr::include_graphics("figures/300px-Microplates.jpg")
```

For well-based platforms, cells are isolated using for example pipette or laser capture and placed in microfluidic wells. One advantage of well-based methods is that they can be combined with fluorescent activated cell sorting (FACS), making it possible to select cells based on surface markers. This strategy is thus very useful for situations when one wants to isolate a specific subset of cells for sequencing. Another advantage is that one can take pictures of the cells. The image provides an additional modality and a particularly useful application is to identify wells containg damaged cells or doublets. The main drawback of these methods is that they are often low-throughput and the amount of work required per cell may be considerable.

```{r, fig.cap="Image of a 96-well Fluidigm C1 chip (image taken from Fluidigm)"}
knitr::include_graphics("figures/fluidigmC1.jpg")
```

Microfluidic platforms, such as [Fluidigm's C1](https://www.fluidigm.com/products/c1-system#workflow), provide a more integrated system for capturing cells and for carrying out the reactions necessary for the library preparations. Thus, they provide a higher throughput than microwell based platforms. Typically, only around 10% of cells are captured in a microfluidic platform and thus they are not appropriate if one is dealing with rare cell-types or very small amounts of input. Moreover, the chip is relatively expensive, but since reactions can be carried out in a smaller volume money can be saved on reagents.

```{r, out.width = '60%', fig.cap="Schematic overview of the drop-seq method (Image taken from Macosko et al)"}
knitr::include_graphics("figures/drop-seq.png")
```

The idea behind droplet based methods is to encapsulate each individual cell inside a nanoliter droplet together with a bead. The bead is loaded with the enzymes required to construct the library. In particular, each bead contains a unique barcode which is attached to all of the reads originating from that cell. Thus, all of the droplets can be pooled, sequenced together and the reads can subsequently be assigned to the cell of origin based on the barcodes. Droplet platforms typically have the highest throughput since the library preparation costs are on the order of $.05$ USD/cell. Instead, sequencing costs often become the limiting factor and a typical experiment the coverage is low with only a few thousand different transcripts detected [@Ziegenhain2017-cu].

## What platform to use for my experiment?

The most suitable platform depends on the biological question at hand. For example, if one is interested in characterizing the composition of a tissue, then a droplet-based method which will allow a very large number of cells to be captured is likely to be the most appropriate. On the other hand, if one is interesting in characterizing a rare cell-population for which there is a known surface marker, then it is probably best to enrich using FACS and then sequence a smaller number of cells.

Clearly, full-length transcript quantification will be more appropriate if one is interested in studying different isoforms since tagged protocols are much more limited. By contrast, UMIs can only be used with tagged protocols and they can facilitate gene-level quantification.

Two recent studies from the Enard group [@Ziegenhain2017-cu] and the Teichmann group [@Svensson2017-op] have compared several different protocols. In their study, Ziegenhain et al compared five different protocols on the same sample of mouse embryonic stem cells (mESCs). By controlling for the number of cells as well as the sequencing depth, the authors were able to directly compare the sensitivity, noise-levels and costs of the different protocols. One example of their conclusions is illustrated in the figure below which shows the number of genes detected (for a given detection threshold) for the different methods. As you can see, there is almost a two-fold difference between drop-seq and Smart-seq2, suggesting that the choice of protocol can have a major impact on the study

```{r, out.width = '60%', fig.cap="Enard group study"}
knitr::include_graphics("figures/ziegenhainEnardFig1.png")
```

Svensson et al take a different approach by using synthetic transcripts (spike-ins, more about these later) with known concentrations to measure the accuracy and sensitivity of different protocols. Comparing a wide range of studies, they also reported substantial differences between the protocols.

```{r, out.width = '100%', fig.cap="Teichmann group study"}
knitr::include_graphics("figures/svenssonTeichmannFig2.png")
```

As protocols are developed and computational methods for quantifying the technical noise are improved, it is likely that future studies will help us gain further insights regarding the strengths of the different methods. These comparative studies are helpful not only for helping researchers decide which protocol to use, but also for developing new methods as the benchmarking makes it possible to determine what strategies are the most useful ones.
7 changes: 0 additions & 7 deletions 03-method.Rmd

This file was deleted.

92 changes: 92 additions & 0 deletions 04-L1-process-raw-QC.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
output: html_document
code_folding: hide
---

```{r include=FALSE}
library('bookdown')
```

# Processing Raw scRNA-seq Data

## FastQC

Once you've obtained your single-cell RNA-seq data, the first thing you need to do with it is check the quality of the reads you have sequenced. For this task, today we will be using a tool called FastQC. FastQC is a quality control tool for sequencing data, which can be used for both bulk and single-cell RNA-seq data. FastQC takes sequencing data as input and returns a report on read quality. Copy and paste this link into your browser to visit the FastQC website:

https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

This website contains links to download and install FastQC and documentation on the reports produced. Fortunately we have already installed FastQC for you today, so instead we will take a look at the documentation. Scroll down the webpage to 'Example Reports' and click 'Good Illumina Data'. This gives an example of what an ideal report should look like for high quality Illumina reads data.

Now let's make a FastQC report ourselves.

Today we will be performing our analysis using a single cell from an mESC dataset produced by [@Kolodziejczyk2015-xy]. The cells were sequenced using the SMART-seq2 library preparation protocol and the reads are paired end. The files are located in `Share`.

__Note__ The current text of the course is written for an AWS server for people who attend our course in person. You will have to download the files (both `ERR522959_1.fastq` and `ERR522959_2.fastq`) and create `Share` directory yourself to run the commands. You can find the files here: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2600/samples/

Now let's look at the files:
```{bash, eval=FALSE}
less Share/ERR522959_1.fastq
less Share/ERR522959_2.fastq
```

Task 1: Try to work out what command you should use to produce the FastQC report. Hint: Try executing

```{bash, eval=FALSE, collapse=TRUE}
fastqc -h
```

This command will tell you what options are available to pass to FastQC. Feel free to ask for help if you get stuck! If you are successful, you should generate a .zip and a .html file for both the forwards and the reverse reads files. Once you have been successful, feel free to have a go at the next section.


### Solution and Downloading the Report

If you haven't done so already, generate the FastQC report using the commands below:

```{bash, eval=FALSE, echo = TRUE}
mkdir fastqc_results
fastqc -o fastqc_results Share/ERR522959_1.fastq Share/ERR522959_2.fastq
```

Once the command has finished executing, you should have a total of four files - one zip file for each of the paired end reads, and one html file for each of the paired end reads. The report is in the html file. To view it, we will need to get it off AWS and onto your computer using either filezilla or scp. Ask an instructor if you are having difficulties.

Once the file is on you computer, click on it. Your FastQC report should open. Have a look through the file. Remember to look at both the forwards and the reverse end read reports! How good quality are the reads? Is there anything we should be concerned about? How might we address those concerns?

Feel free to chat to one of the instructors about your ideas.

## Trimming Reads

Fortunately there is software available for read trimming. Today we will be using Trim Galore!. Trim Galore! is a wrapper for the reads trimming software cutadapt.

Read trimming software can be used to trim sequencing adapters and/or low quality reads from the ends of reads. Given we noticed there was some adaptor contamination in our FastQC report, it is a good idea to trim adaptors from our data.

Task 2: What type of adapters were used in our data? Hint: Look at the FastQC report 'Adapter Content' plot.

Now let's try to use Trim Galore! to remove those problematic adapters. It's a good idea to check read quality again after trimming, so after you have trimmed your reads you should use FastQC to produce another report.

Task 3: Work out the command you should use to trim the adapters from our data. Hint 1: You can use

```{bash, eval=FALSE}
trim_galore -h
```

To find out what options you can pass to Trim Galore.
Hint 2: Read through the output of the above command carefully. The adaptor used in this experiment is quite common. Do you need to know the actual sequence of the adaptor to remove it?

Task 3: Produce a FastQC report for your trimmed reads files. Is the adapter contamination gone?

Once you think you have successfully trimmed your reads and have confirmed this by checking the FastQC report, feel free to check your results using the next section.

### Solution

You can use the command(s) below to trim the Nextera sequencing adapters:

```{bash, eval=FALSE}
mkdir fastqc_trimmed_results
trim_galore --nextera -o fastqc_trimmed_results Share/ERR522959_1.fastq Share/ERR522959_2.fastq
```

Remember to generate new FastQC reports for your trimmed reads files! FastQC should now show that your reads pass the 'Adaptor Content' plot. Feel free to ask one of the instructors if you have any questions.

Congratulations! You have now generated reads quality reports and performed adaptor trimming. In the next lab, we will use STAR and Kallisto to align our trimmed and quality-checked reads to a reference transcriptome.


11 changes: 0 additions & 11 deletions 04-application.Rmd

This file was deleted.

Loading