6 — Enrichment Analysis

Time estimated: 3 d; taken 2d; date started: 2020-02-18; date completed: 2020-02-19

Process

Copy all the HUGO symbols to the query input box
Select the following parameters as shown in the image and click Run query.

For any ambiguous warnings, remove ambiguous identifiers by manually selecting each gene to its correct annotation, and re-run the query.
Under the Results tab, a plot is generated, similar to a scatter-plot. The x-axis contains terms that are grouped and colour-coded based on the selected data sources.
Under the Detailed Results tab, we can see the top results of each term of each data source.

What is the top term returned in each data source?
As shown in the image below, the top terms for each data source are:
GO:0006955 (immune response), REAC:R-HSA-168256 (Immune System) and WP:WP2328 (Allograft Rejection)

How many genes are in each of the above genesets returned?
T is our gene set size, the T values are displayed for each gene set below.

Term ID	T	Q	TnQ	U
GO:0006955	2286	427	283	17847
REAC:R-HSA-168256	2146	327	221	10565
WP:WP2328	88	259	32	6507

How many genes from our query are found in the above genesets?
TnQ is the number of genes that overlap between our gene list and gene-set. The values are listed in the table above, for each of the terms.

Change g:profiler settings so that you limit the size of the returned genesets. Make sure the returned genesets are between 5 and 200 genes in size. Did that change the results?
After modifying the term size, we get different results for the top terms in each data source. Shown in the image below:

Which of the 4 ovarian cancer expression subtypes do you think this list represents?
Gene expression subtypes: mesenchymal, immunoreactive, proliferative, and differentiated
The top results under each data source include terms like: "immune response", "Immune System" and "Allograt rejection" (which is an immunologic destruction of transplanted tissues or organs). So the gene list most likely represents the immunoreactive gene expression subtype.

Bonus!
Keeping the term size 5-200, none of the pathways contain the TFEC gene. When switching the term size back to the default (2-10000), we see that the terms associated with TFEC are all part of the Go biological process data source, they are GO:0051716, GO:0006950 and GO:0050896. There were no associations with the Reactome or WikiPathways data sources.

External Resources

(n.d.). Retrieved from https://biit.cs.ut.ee/gprofiler/page/docs
g:Profiler Tutorial. (n.d.). Retrieved from https://enrichmentmap.readthedocs.io/en/docs-2.2/Tutorial_GProfiler.html
(n.d.). Retrieved from https://biit.cs.ut.ee/gprofiler_archive/r1270_e75_eg22/web/help.cgi?help_id=4
Over-representation analysis (ORA) practical lab: g:Profiler. (n.d.). Retrieved from https://bioinformaticsdotca.github.io/Pathways_2019_Module2_Lab-GProfiler

This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6 — Enrichment Analysis

Process

External Resources

Clone this wiki locally