Time | Topic | Instructor |
---|---|---|
10:00 - 10:30 | Workshop Introduction | Meeta |
10:30 - 11:00 | RNA-seq pre-reading discussion | All |
11:00 - 11:45 | Intro to DGE / setting up DGE analysis | Noor |
11:45 - 12:00 | Overview of self-learning materials and homework submission | Meeta |
- Please study the contents and work through all the code within the following lessons:
-
Click here for a preview of this lesson
Starting with the count matrix, we want to explore some characteristics of the RNA-seq data and evaluate the appropriate model to use.
This lesson will cover:
- Describing characteristics of the RNA-seq count data
- Understanding different statistical methods to model the count data
- Explaining the benefits of biological replicates
-
Click here for a preview of this lesson
Count normalization is an import data pre-processing step before the differential expression analysis.
This lesson will cover:
- Describing "uninteresting factors" to consider during normalization
- Understanding different normalization methods and their corresponding use cases
- Generating a matrix of normalized counts using DESeq2's median of ratios method
-
Sample-level QC (PCA and hierarchical clustering)
Click here for a preview of this lesson
Next, we want to check the quality of count data, to make sure that the samples are good.
This lesson will cover:
- Understanding the importance of similarity analysis between samples
- Describing Principal Component Analysis (PCA) and interpreting PCA plots from RNA-seq data
- Performing hierarchical clustering and plotting correlation metrics
- Complete the exercises:
- Each lesson above contain exercises; please go through each of them.
- Copy over your solutions into the Google Form the day before the next class.
- If you get stuck due to an error while runnning code in the lesson, email us
Time | Topic | Instructor |
---|---|---|
10:00 - 11:00 | Self-learning lessons discussion | All |
11:00 - 11:30 | Design formulas | Noor |
11:30 - 12:00 | Hypothesis testing and multiple test correction | Meeta |
I. Please study the contents and work through all the code within the following lessons:
-
Description of steps for DESeq2
Click here for a preview of this lesson
The R code required to perform differential gene expression analysis is actually quite simple. Running the `DESeq()` function will carry out the various steps involved. It is important that you have some knowledge of what is happening under the hood, to be able to fully understand and interpret the results
In this lesson you will:
- Examine size factors and learn about sources that cause observed variation in values
- Explore the gene-wise dispersion estimates as they relate back the mean-variance relationship
- Critically evaluate a dispersion plot
-
Click here for a preview of this lesson
We have run the analysis, and now it's time to explore the results!
In this lesson you will:
- Learn how to extract results for specific group comparisons
- Explore the information presented in the results table (different statistics and their importance)
- Understand the different levels of filtering that are applied in DESeq2 by default (and why they are important)
-
Summarizing results and extracting significant gene lists
Click here for a preview of this lesson
Once you have your results, it is useful to summarize the information. Here, we get a snapshot of the number of differentially expressed genes that are identified from the different comparisons.
-
Click here for a preview of this lesson
A picture is worth a thousand words. In our case, a figure is worth a thousand (or 30 thousand) data points. When working with large scale data, it can be helpful to visualize results and get a big picture perspective of your findings.
In this lesson you will:
- Explore different plots for data visualization
- Create a volcano plot to evaluate the relationship between different statistics from the results table
- Create a heatmap for visualization of differentially expressed genes
II. Complete the exercises:
- Each lesson above contain exercises; please go through each of them.
- Copy over your solutions into the Google Form the day before the next class.
- If you get stuck due to an error while runnning code in the lesson, email us
Time | Topic | Instructor |
---|---|---|
10:00 - 11:15 | Self-learning lessons discussion | All |
11:15 - 12:00 | Likelihood Ratio Test results | Meeta |
-
Please study the contents and work through all the code within the following lessons:
- Time course analysis
Click here for a preview of this lesson
Sometimes we are interested in how a gene changes over time. The Likelihood Ratio Test (LRT) is paricularly well-suited for this task.
This lesson will cover:
- Designing a LRT for a time-course analysis in DESeq2
- Identifying patterns in our list of differentially expressed genes
- Gene annotation
Click here for a preview of this lesson
Next-generation analyses rely on annotations to provide a description for defining genes, transcripts and/or proteins. These annotations are often stored in publicly available databases.
This lesson will cover:
- Describing the various annotation databases
- Accessing annotations from one of these databases using R
- Functional analysis - over-representation analysis
Click here for a preview of this lesson
Oftentimes after completing an RNA-seq experiment, you will be left with a list of differentially expressed transcripts. You may be interested in knowing if these transcripts are enriched in certain biologically-relevant contexts.
This lesson will cover:
- Describing how functional enrichment tools yield statistically enriched functional categories or interactions
- Identifying enriched Gene Ontology terms using the R package, clusterProfiler
- Functional analysis - functional class scoring / GSEA
Click here for a preview of this lesson
While some functional analyses focus on large changes focused on a select few genes, functional class scoring (FCS) focuses on weaker but coordinated changes in sets of functionally related genes (i.e., pathways) that can also have significant effects.
This lesson will cover:
- Designing a GSEA analysis using GO and/or KEGG gene sets
- Evaluating the results of a GSEA analysis
- Discussing other tools and resources for identifying genes of novel pathways or networks
- Time course analysis
-
There is no assignment submission, but please use this Google form to ask us questions!
- If you get stuck due to an error while runnning code in the lesson, email us
Time | Topic | Instructor |
---|---|---|
10:00 - 11:00 | Questions about self-learning lessons | All |
11:00 - 11:15 | Summarizing workflow | Noor |
11:15 - 11:45 | Discussion, Q & A | All |
11:45 - 12:00 | Wrap Up | Meeta |
We have covered the inner workings of DESeq2 in a fair amount of detail such that when using this package you have a good understanding of what is going on under the hood. For more information on topics covered, we encourage you to take a look at the following resources:
- DESeq2 vignette
- GitHub book on RNA-seq gene level analysis
- Bioconductor support site (posts tagged with
deseq2
) - Enrichment analysis book