Skip to content

Latest commit

 

History

History
111 lines (85 loc) · 2.23 KB

part2.md

File metadata and controls

111 lines (85 loc) · 2.23 KB

RNAseq analysis methods Part 2

Open R (I use R studio, you can also use the command line)

Install edgeR.

source("http://bioconductor.org/biocLite.R")
biocLite("edgeR")
  1. Load the edgeR package. We will use edgeR to identify significantly differentially expressed genes between our two groups based on counts of gene expression for each gene.
library(edgeR)

Set working directory

setwd("/Users/jlkelley/Dropbox/ConGen2018")

Read in gene count matrix

counts = read.csv("gene_count_matrix.csv", header = T, row = 1)

Look at the beginning of the file (note, there are lots of zeros)

head(counts)

Make sure the number of genes and samples is expected

dim(counts)

Create map of samples to treatments

samples = colnames(counts)
group = c(rep("A", 10),rep("B",10))
sample.info = data.frame(cbind(samples,group))
head(sample.info)

Remove genes with zero counts

length(which(rowSums(counts) == 0))
gene.counts = counts[rowSums(counts) != 0,]

Determine the number of genes left after removing zeros

dim(gene.counts)

Create a DGEList of the data

y <- DGEList(counts=gene.counts,group=sample.info$group)

Calculate the normalization factors (from the edgeR manual: The calcNormFactors function normalizes for RNA composition by finding a set of scaling factors for the library sizes that minimize the log-fold changes between the samples for most genes. The default method for computing these scale factors uses a trimmed mean of M-values (TMM) between each pair of samples [30]. We call the product of the original library size and the scaling factor the effective lib)

y <- calcNormFactors(y)

Look at y

y

Visualize the data in y, for example using an MDS plot

plotMDS(y)

Design matrix

design <- model.matrix(~group)
design

Estimate dispersion

y <- estimateDisp(y,design)

perform quasi-likelihood F-tests:

fit <- glmQLFit(y,design)
qlf <- glmQLFTest(fit,coef=2)
topTags(qlf)

summarize the results

summary(de <- decideTestsDGE(qlf, p.value = 0.05))

plot the results

detags <- rownames(y)[as.logical(de)]
plotSmear(qlf, de.tags=detags)
abline(h = c(-1, 1), col = "blue")