Skip to content

6 — Enrichment Analysis

narasi15 edited this page Feb 19, 2020 · 17 revisions

Time estimated: 3 d; taken 2d; date started: 2020-02-18; date completed: 2020-02-19


Process

  • Copy all the HUGO symbols to the query input box
  • Select the following parameters as shown in the image and click Run query.

  • For any ambiguous warnings, remove ambiguous identifiers by manually selecting each gene to its correct annotation, and re-run the query.
  • Under the Results tab, a plot is generated, similar to a scatter-plot. The x-axis contains terms that are grouped and colour-coded based on the selected data sources.
  • Under the Detailed Results tab, we can see the top results of each term of each data source.

What is the top term returned in each data source?
As shown in the image below, the top terms for each data source are:
GO:0006955 (immune response), REAC:R-HSA-168256 (Immune System) and WP:WP2328 (Allograft Rejection)



How many genes are in each of the above genesets returned?
T is our gene set size, the T values are displayed for each gene set below.

Term ID T Q TnQ U
GO:0006955 2286 427 283 17847
REAC:R-HSA-168256 2146 327 221 10565
WP:WP2328 88 259 32 6507

How many genes from our query are found in the above genesets?
TnQ is the number of genes that overlap between our gene list and gene-set. The values are listed in the table above, for each of the terms.

Change g:profiler settings so that you limit the size of the returned genesets. Make sure the returned genesets are between 5 and 200 genes in size. Did that change the results?
After modifying the term size, we get different results for the top terms in each data source. Shown in the image below:



Which of the 4 ovarian cancer expression subtypes do you think this list represents?
Gene expression subtypes: mesenchymal, immunoreactive, proliferative, and differentiated
The top results under each data source include terms like: "immune response", "Immune System" and "Allograt rejection" (which is an immunologic destruction of transplanted tissues or organs). So the gene list most likely represents the immunoreactive gene expression subtype.

Bonus!
Keeping the term size 5-200, none of the pathways contain the TFEC gene. When switching the term size back to the default (2-10000), we see that the terms associated with TFEC are all part of the Go biological process data source, they are GO:0051716, GO:0006950 and GO:0050896. There were no associations with the Reactome or WikiPathways data sources.


External Resources

  • (n.d.). Retrieved from https://biit.cs.ut.ee/gprofiler/page/docs
  • g:Profiler Tutorial. (n.d.). Retrieved from https://enrichmentmap.readthedocs.io/en/docs-2.2/Tutorial_GProfiler.html
  • (n.d.). Retrieved from https://biit.cs.ut.ee/gprofiler_archive/r1270_e75_eg22/web/help.cgi?help_id=4
  • Over-representation analysis (ORA) practical lab: g:Profiler. (n.d.). Retrieved from https://bioinformaticsdotca.github.io/Pathways_2019_Module2_Lab-GProfiler