RNAseq analysis methods Part 2
Open R (I use R studio, you can also use the command line)
Install edgeR.
source("http://bioconductor.org/biocLite.R")
biocLite("edgeR")
- Load the edgeR package. We will use edgeR to identify significantly differentially expressed genes between our two groups based on counts of gene expression for each gene.
library(edgeR)
Set working directory
setwd("/Users/jlkelley/Dropbox/ConGen2018")
Read in gene count matrix
counts = read.csv("gene_count_matrix.csv", header = T, row = 1)
Look at the beginning of the file (note, there are lots of zeros)
head(counts)
Make sure the number of genes and samples is expected
dim(counts)
Create map of samples to treatments
samples = colnames(counts)
group = c(rep("A", 10),rep("B",10))
sample.info = data.frame(cbind(samples,group))
head(sample.info)
Remove genes with zero counts
length(which(rowSums(counts) == 0))
gene.counts = counts[rowSums(counts) != 0,]
Determine the number of genes left after removing zeros
dim(gene.counts)
Create a DGEList of the data
y <- DGEList(counts=gene.counts,group=sample.info$group)
Calculate the normalization factors (from the edgeR manual: The calcNormFactors function normalizes for RNA composition by finding a set of scaling factors for the library sizes that minimize the log-fold changes between the samples for most genes. The default method for computing these scale factors uses a trimmed mean of M-values (TMM) between each pair of samples [30]. We call the product of the original library size and the scaling factor the effective lib)
y <- calcNormFactors(y)
Look at y
y
Visualize the data in y, for example using an MDS plot
plotMDS(y)
Design matrix
design <- model.matrix(~group)
design
Estimate dispersion
y <- estimateDisp(y,design)
perform quasi-likelihood F-tests:
fit <- glmQLFit(y,design)
qlf <- glmQLFTest(fit,coef=2)
topTags(qlf)
summarize the results
summary(de <- decideTestsDGE(qlf, p.value = 0.05))
plot the results
detags <- rownames(y)[as.logical(de)]
plotSmear(qlf, de.tags=detags)
abline(h = c(-1, 1), col = "blue")