Skip to content

Latest commit

 

History

History

schedule

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Workshop Schedule

Pre-reading

  1. Workflow (raw data to counts)
  2. Experimental design considerations

Day 1

Time Topic Instructor
10:00 - 10:30 Workshop Introduction Meeta
10:30 - 11:00 RNA-seq pre-reading discussion All
11:00 - 11:45 Intro to DGE / setting up DGE analysis Noor
11:45 - 12:00 Overview of self-learning materials and homework submission Meeta

Before the next class:

  1. Please study the contents and work through all the code within the following lessons:
  • RNA-seq counts distribution

    Click here for a preview of this lesson
    Starting with the count matrix, we want to explore some characteristics of the RNA-seq data and evaluate the appropriate model to use.

    This lesson will cover:
    - Describing characteristics of the RNA-seq count data
    - Understanding different statistical methods to model the count data
    - Explaining the benefits of biological replicates

  • Count normalization

    Click here for a preview of this lesson
    Count normalization is an import data pre-processing step before the differential expression analysis.

    This lesson will cover:
    - Describing "uninteresting factors" to consider during normalization
    - Understanding different normalization methods and their corresponding use cases
    - Generating a matrix of normalized counts using DESeq2's median of ratios method

  • Sample-level QC (PCA and hierarchical clustering)

    Click here for a preview of this lesson
    Next, we want to check the quality of count data, to make sure that the samples are good.

    This lesson will cover:
    - Understanding the importance of similarity analysis between samples
    - Describing Principal Component Analysis (PCA) and interpreting PCA plots from RNA-seq data
    - Performing hierarchical clustering and plotting correlation metrics

  1. Complete the exercises:
    • Each lesson above contain exercises; please go through each of them.
    • Copy over your solutions into the Google Form the day before the next class.

Questions?

  • If you get stuck due to an error while runnning code in the lesson, email us

Day 2

Time Topic Instructor
10:00 - 11:00 Self-learning lessons discussion All
11:00 - 11:30 Design formulas Noor
11:30 - 12:00 Hypothesis testing and multiple test correction Meeta

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

  1. Description of steps for DESeq2

    Click here for a preview of this lesson
    The R code required to perform differential gene expression analysis is actually quite simple. Running the `DESeq()` function will carry out the various steps involved. It is important that you have some knowledge of what is happening under the hood, to be able to fully understand and interpret the results

    In this lesson you will:
    - Examine size factors and learn about sources that cause observed variation in values
    - Explore the gene-wise dispersion estimates as they relate back the mean-variance relationship
    - Critically evaluate a dispersion plot

  2. Wald test results

    Click here for a preview of this lesson
    We have run the analysis, and now it's time to explore the results!

    In this lesson you will:
    - Learn how to extract results for specific group comparisons
    - Explore the information presented in the results table (different statistics and their importance)
    - Understand the different levels of filtering that are applied in DESeq2 by default (and why they are important)

  3. Summarizing results and extracting significant gene lists

    Click here for a preview of this lesson
    Once you have your results, it is useful to summarize the information. Here, we get a snapshot of the number of differentially expressed genes that are identified from the different comparisons.

  4. Visualization

    Click here for a preview of this lesson
    A picture is worth a thousand words. In our case, a figure is worth a thousand (or 30 thousand) data points. When working with large scale data, it can be helpful to visualize results and get a big picture perspective of your findings.

    In this lesson you will:
    - Explore different plots for data visualization
    - Create a volcano plot to evaluate the relationship between different statistics from the results table
    - Create a heatmap for visualization of differentially expressed genes

II. Complete the exercises:

  • Each lesson above contain exercises; please go through each of them.
  • Copy over your solutions into the Google Form the day before the next class.

Questions?

  • If you get stuck due to an error while runnning code in the lesson, email us

Day 3

Time Topic Instructor
10:00 - 11:15 Self-learning lessons discussion All
11:15 - 12:00 Likelihood Ratio Test results Meeta

Before the next class:

  1. Please study the contents and work through all the code within the following lessons:

    • Time course analysis
      Click here for a preview of this lesson
      Sometimes we are interested in how a gene changes over time. The Likelihood Ratio Test (LRT) is paricularly well-suited for this task.

      This lesson will cover:
      - Designing a LRT for a time-course analysis in DESeq2
      - Identifying patterns in our list of differentially expressed genes

    • Gene annotation
      Click here for a preview of this lesson
      Next-generation analyses rely on annotations to provide a description for defining genes, transcripts and/or proteins. These annotations are often stored in publicly available databases.

      This lesson will cover:
      - Describing the various annotation databases
      - Accessing annotations from one of these databases using R

    • Functional analysis - over-representation analysis
      Click here for a preview of this lesson
      Oftentimes after completing an RNA-seq experiment, you will be left with a list of differentially expressed transcripts. You may be interested in knowing if these transcripts are enriched in certain biologically-relevant contexts.

      This lesson will cover:
      - Describing how functional enrichment tools yield statistically enriched functional categories or interactions
      - Identifying enriched Gene Ontology terms using the R package, clusterProfiler

    • Functional analysis - functional class scoring / GSEA
      Click here for a preview of this lesson
      While some functional analyses focus on large changes focused on a select few genes, functional class scoring (FCS) focuses on weaker but coordinated changes in sets of functionally related genes (i.e., pathways) that can also have significant effects.

      This lesson will cover:
      - Designing a GSEA analysis using GO and/or KEGG gene sets
      - Evaluating the results of a GSEA analysis
      - Discussing other tools and resources for identifying genes of novel pathways or networks

  2. There is no assignment submission, but please use this Google form to ask us questions!

Questions?

  • If you get stuck due to an error while runnning code in the lesson, email us

Day 4

Time Topic Instructor
10:00 - 11:00 Questions about self-learning lessons All
11:00 - 11:15 Summarizing workflow Noor
11:15 - 11:45 Discussion, Q & A All
11:45 - 12:00 Wrap Up Meeta

Answer keys

Resources

We have covered the inner workings of DESeq2 in a fair amount of detail such that when using this package you have a good understanding of what is going on under the hood. For more information on topics covered, we encourage you to take a look at the following resources:

Building on this workshop

Other helpful links