Skip to content

A Potpourri of Interesting Data Analysis Plots with Brief Explanations

Notifications You must be signed in to change notification settings

CodeInTheSkies/Plots-Potpourri

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

A Potpourri of Interesting Data Analysis Plots with Brief Explanations

Here I present a growing collection of carefully chosen plots that are interesting in their own right, which have been part of various analysis projects I have been handling recently.

For each plot set, I include brief context and explanations.

1. Expressions across cancers – one gene at a time

Log expressions of selected genes in cancer versus normal tissues

This plot shows log expressions (box and whisker plots) of selected genes in cancer versus normal tissues. Each dot represents either a healthy individual or a cancer patient depending on the group as shown. The idea behind creating this type of plot is to identify genes that are widely expressed well across multiple cancers but minimally expressed in normal tissues. Such genes can be further chosen as candidates for future drug targets to improve chemotherapy.

The two genes chosen demonstrate the contrasting scenarios where MAGEA4 turns out to be a good choice (and hence the green check mark), whereas MAGEA8 turns out to be a bad choice since it is expressed in so many normal tissues (hence the cross mark).

So, one final question that may be on your mind is, "why do we need to make sure the gene is not expressed in normal tissues"? This is a requirement because the drugs target these genes in the sense that they try to suppress or inhibit these genes from expressing and thereby starve the cancer progression. This is the mechanism of treatment. Therefore, this criteria makes sure that the damaging side effect of chemotherapy, where normal tissues (or cells) are also killed, is kept to the minimum.

Data source: The Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) project.

Full trend overview heatmap

The above is a full overview heatmap where the gene expressions are averaged by grouping according to the normal tissue or cancer type, and then plotted as a heatmap. The row and column labels are not legible, but the purpose of showing this plot is to see an overall trend. Each colored box in the heatmap represents the average expression of a given gene (row) in a given group (column). Before plotting, the average values are scaled (or z-scored) to obtain zero mean and unit variance.

Further, the rows are hierarchically clustered so that similar patches occur together. We can see some interesting patches, as marked, that can be considered to be good candidates for further study.

Below is a zoomed-in portion of a similar overview heatmap, where the gene names are visible, and the trend is clearer (notice the clear gene group that are quite red in the cancers and quite blue in the normals).

Zoomed trend_overview heatmap

2. Investigating AML data by mutations and disease state

The below two plot sets are box and whisker plots showing the expressions of two genes of interest, cGAS and STING, in acute myelogenous leukemia (AML) patients. Each dot is a patient belonging to a specific AML subtype or disease state, and the groups have been compared using Wilcoxon ranksum test for statistical significance. The p-values are shown between chosen groups of interest.

CGAS AML grouped by mutations

STING AML grouped by mutations

About

A Potpourri of Interesting Data Analysis Plots with Brief Explanations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published