Here I present a growing collection of carefully chosen plots that are interesting in their own right, which have been part of various analysis projects I have been handling recently.
For each plot set, I include brief context and explanations.
This plot shows log expressions (box and whisker plots) of selected genes in cancer versus normal tissues. Each dot represents either a healthy individual or a cancer patient depending on the group as shown. The idea behind creating this type of plot is to identify genes that are widely expressed well across multiple cancers but minimally expressed in normal tissues. Such genes can be further chosen as candidates for future drug targets to improve chemotherapy.
The two genes chosen demonstrate the contrasting scenarios where MAGEA4 turns out to be a good choice (and hence the green check mark), whereas MAGEA8 turns out to be a bad choice since it is expressed in so many normal tissues (hence the cross mark).
So, one final question that may be on your mind is, "why do we need to make sure the gene is not expressed in normal tissues"? This is a requirement because the drugs target these genes in the sense that they try to suppress or inhibit these genes from expressing and thereby starve the cancer progression. This is the mechanism of treatment. Therefore, this criteria makes sure that the damaging side effect of chemotherapy, where normal tissues (or cells) are also killed, is kept to the minimum.
Data source: The Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) project.
The above is a full overview heatmap where the gene expressions are averaged by grouping according to the normal tissue or cancer type, and then plotted as a heatmap. The row and column labels are not legible, but the purpose of showing this plot is to see an overall trend. Each colored box in the heatmap represents the average expression of a given gene (row) in a given group (column). Before plotting, the average values are scaled (or z-scored) to obtain zero mean and unit variance.
Further, the rows are hierarchically clustered so that similar patches occur together. We can see some interesting patches, as marked, that can be considered to be good candidates for further study.
Below is a zoomed-in portion of a similar overview heatmap, where the gene names are visible, and the trend is clearer (notice the clear gene group that are quite red in the cancers and quite blue in the normals).
The below two plot sets are box and whisker plots showing the expressions of two genes of interest, cGAS and STING, in acute myelogenous leukemia (AML) patients. Each dot is a patient belonging to a specific AML subtype or disease state, and the groups have been compared using Wilcoxon ranksum test for statistical significance. The p-values are shown between chosen groups of interest.