-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #208 from aying2/main
aying2 EC Assignments
- Loading branch information
Showing
5 changed files
with
275 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
--- | ||
layout: post | ||
title: "gganimate: Visualizing IGKC in tSNE Space with Non-linear Dimensionality Reduction on Varying Numbers of PCs" | ||
author: Andrew Ying | ||
jhed: aying2 | ||
categories: [ HW EC1 ] | ||
image: homework/hwEC1/hwEC1_aying2.gif | ||
featured: false | ||
--- | ||
|
||
## What data types are you visualizing? | ||
For the plots on the left side of the animation, I am visualizing the quantitative data of the X1 and X2 tSNE embedding values, and qualitative data of the IGKC expression for each spot. | ||
|
||
For the plots on the right side of the animation, I am visualizing quantitative data of the standard deviation of the principal component, and ordinal data of the number assigned to the principal component. | ||
|
||
## What data encodings are you using to visualize these data types? | ||
|
||
For the plots on the left side of the animation, I am using the geometric primitive of points to represent each spot on the spatial gene expression slide. To encode the X1 embedding value, I am using the visual channel of position along the x axis. To encode the X2 value, I am using the visual channel of position along the y axis. To encode the quantitative IGKC expression, I am using the visual channel of saturation going from an unsaturated light grey to a saturated red. | ||
|
||
For the plots on the right side of the animation, I am using the geometric primitive of points to represent each principal component. To encode the quantitative standard deviation of the principal component, I am using the visual channel of position along the x axis. To encode the ordinal number assigned to the principal component, I am using the visual channel of position along the y axis. | ||
|
||
The visual channels were chosen because according to the data type chart, | ||
position has the best resolving time, so it was used for the X1 and X2 embedding values, PC standard deviation, and PC ordinal number. Saturation was chosen to encode IGKC expression because it has a moderate resolving time for quantitative data and would not result in overlap between points in the tSNE plot, like area would. | ||
|
||
## What type of data visualization is this? What about the data are you trying to make salient through this data visualization? What Gestalt principles have you applied towards achieving this goal if any? | ||
|
||
The plots on the left side of the animation are scatterplots. The plots on the right side of the animation are line plots. | ||
|
||
My explanatory data visualization seeks to make more salient the effect of the number of PCs used on the output of nonlinear dimensionality reduction in tSNE space. It also makes salient the IGKC expression for each spot in tSNE space, with the saturation channel used for IGKC expression helping viewers track various groups of spots across frames of the animation. The visualization also makes salient the standard deviation for each principal component, such that the viewer can gauge how the changes in the tSNE plot from using more PCs relates to the standard deviation of those PCs. | ||
|
||
The Gestalt principles of proximity and similarity are present because the tSNE spatial plot and standard deviation using the same number of PCs are adjacent to each other throughout the animation. The Gestalt principle of continuity is used because the frames of the animation are in increasing order for the number of PCs used for tSNE and the standard deviation plot. | ||
|
||
## Please share the code you used to reproduce this data visualization. | ||
```{r} | ||
data <- | ||
read.csv("genomic-data-visualization-2024/data/eevee.csv.gz", | ||
row.names = 1) | ||
data[1:10, 1:10] | ||
pos <- data[, 2:3] | ||
gexp <- data[, 4:ncol(data)] | ||
# from lesson 5 | ||
topgene <- names(sort(apply(gexp, 2, var), decreasing = TRUE)[1:1000]) | ||
gexpfilter <- gexp[, topgene] | ||
# code taken from Dr. Fan's code-lesson-5.R | ||
# gexpnorm <- log10(gexpfilter/rowSums(gexpfilter) * mean(rowSums(gexpfilter))+1) | ||
gexpnorm <- log10(gexp/rowSums(gexp) * mean(rowSums(gexp))+1) | ||
? prcomp | ||
pcs <- prcomp(gexpnorm) | ||
plot(pcs$sdev, type = 'o') | ||
library(ggplot2) | ||
ggplot(data.frame(pos, gexpnorm)) + | ||
scale_colour_gradient(low = 'lightgrey', high = 'darkred') + | ||
geom_point(aes(x= aligned_x, y=aligned_y, color = IGKC)) + | ||
theme_minimal() | ||
? Rtsne | ||
library(Rtsne) | ||
nsteps = ceiling(log2(length(pcs$sdev))) | ||
sdev_plts <- list() | ||
tsne_plts <- list() | ||
df_anim <- data.frame() | ||
df_sdev <- data.frame() | ||
for (i in 1:nsteps) { | ||
npcs = 2^i | ||
s <- "" | ||
if (npcs > length(pcs$sdev)) { | ||
npcs = length(pcs$sdev) | ||
s <- "all " | ||
} | ||
print(paste(i, npcs)) | ||
sdev_df = data.frame(sdev = pcs$sdev[1:npcs]) | ||
sdev_df$PC = as.numeric(rownames(sdev_df)) | ||
sdev_plts[[i]] <- ggplot(sdev_df, aes(x = PC, y = sdev, grouping = 1)) + | ||
geom_point() + geom_line() + | ||
labs( | ||
title = sprintf( | ||
'sdev vs. PC (%d PCs total)', | ||
npcs | ||
) | ||
) + | ||
theme_bw() | ||
set.seed(42) | ||
emb <- Rtsne(pcs$x[,1:npcs])$Y | ||
df <- data.frame(emb, gexpnorm) | ||
df_anim <- rbind(df_anim, cbind(df, npcs = npcs)) | ||
df_sdev <- rbind(df_sdev, cbind(sdev_df, npcs = npcs)) | ||
tsne_plts[[i]] <- | ||
ggplot(df) + geom_point(aes(x = X1, y = X2, color = IGKC)) + | ||
scale_colour_gradient(low = 'lightgrey', high = 'darkred') + | ||
labs( | ||
title = sprintf( | ||
'IGKC vs. X2 vs. X1 (tSNE on %s%d PCs)', | ||
s, | ||
npcs | ||
) | ||
) + | ||
theme_bw() | ||
} | ||
sdev_plts[[1]] | ||
tsne_plts[[10]] | ||
library(gganimate) | ||
main_plt <- ggplot(df_anim) + geom_point(aes(x = X1, y = X2, color = IGKC)) + | ||
scale_colour_gradient(low = 'lightgrey', high = 'darkred') + | ||
labs( | ||
title = 'IGKC vs. X2 vs. X1 (tSNE on {closest_state} PCs)' | ||
) + | ||
theme_bw() | ||
main_anim <- main_plt + | ||
transition_states(npcs, | ||
state_length = 2, | ||
transition_length = 1) + | ||
ease_aes('sine-in-out') | ||
sdev_plt <- ggplot(df_sdev, aes(x = PC, y = sdev, grouping = 1)) + | ||
geom_point() + geom_line() + | ||
labs( | ||
title = | ||
'sdev vs. PC ({closest_state} PCs total)' | ||
) + | ||
theme_bw() | ||
sdev_anim <- sdev_plt + | ||
transition_states(npcs, | ||
state_length = 2, | ||
transition_length = 1) + | ||
ease_aes('sine-in-out') | ||
main_gif <- animate(main_anim, renderer = magick_renderer()) | ||
sdev_gif <- animate(sdev_anim, renderer = magick_renderer()) | ||
i=1 | ||
new_gif <- image_append(c(main_gif[i], sdev_gif[i])) | ||
for(i in 2:100){ | ||
combined <- image_append(c(main_gif[i], sdev_gif[i])) | ||
new_gif <- c(new_gif, combined) | ||
} | ||
new_gif | ||
image_write(new_gif, "aying2.gif") | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
--- | ||
layout: post | ||
title: "Analyzing Connectivity of Spleen CODEX dataset using CRAWDAD on K-means Clusters" | ||
author: Andrew Ying | ||
jhed: aying2 | ||
categories: [ HW EC3 ] | ||
image: homework/hwEC3/hwEC3_aying2.png | ||
featured: false | ||
--- | ||
|
||
## What data types are you visualizing? | ||
|
||
For plot A, I am visualizing the spatial data of the x and y positions for each cell, and categorical data of the cluster the cell belongs to. I am using the geometric primitive of points to represent each cell. To encode the spatial x position, I am using the visual channel of position along the x axis. To encode the spatial y position, I am using the visual channel of position along the y axis. To encode the categorical cluster the cell belongs to, I am using the visual channel of hue. | ||
|
||
For plot B, I am visualizing the quantitative data of the X1 and X2 tSNE embedding values, and categorical data of the cluster the cell belongs to. I am using the geometric primitive of points to represent each cell. To encode the X1 embedding value, I am using the visual channel of position along the x axis. To encode the X2 value, I am using the visual channel of position along the y axis. To encode the categorical cluster the cell belongs to, I am using the visual channel of hue. | ||
|
||
For plot C, I am visualizing the categorical data of the neighbor cluster using the y axis position. I am visualizing the categorical data of the reference cluster using the x axis position. I am visualizing the quantitative data of the z-score using the visual channel of color hue. I am visualizing the quantitative data of the scale using the visual channel of area. | ||
|
||
For plots D and E, I am visualizing the quantitative data of the z-score using the visual channel of y axis position. I am visualizing the quantitative data of the scale using the visual channel of x axis position. | ||
|
||
The Gestalt principle of similarity and proximity because D and E are both line plots and are adjacent, and plots A and B which have the same color scheme are adjacent. | ||
|
||
|
||
## Please share the code you used to reproduce this data visualization. | ||
```{r} | ||
data <- | ||
read.csv("genomic-data-visualization-2024/data/codex_spleen_subset.csv.gz", | ||
row.names = 1) | ||
data[1:10, 1:10] | ||
pos <- data[, 1:2] | ||
area <- data[, 3] | ||
pexp <- data[4:ncol(data)] | ||
pexpnorm <- log10(pexp/area * mean(area)+1) | ||
library(ggplot2) | ||
ggplot(data.frame(pos, area)) + geom_point(aes(x=x, y=y, col=area))+ | ||
scale_color_gradient(high = "darkred", low = "gray") | ||
library(Rtsne) | ||
set.seed(42) | ||
emb <- Rtsne(pexpnorm, perplexity=15)$Y | ||
set.seed(42) | ||
tw <- sapply(1:15, function(i) { | ||
print(i) | ||
kmeans(pexpnorm, centers=i, iter.max = 50)$tot.withinss | ||
}) | ||
plot(tw, type='o') | ||
set.seed(42) | ||
com <- as.factor(kmeans(pexpnorm, centers=7)$cluster) | ||
p1 <- ggplot(data.frame(pos, pexpnorm, com)) + | ||
geom_point(aes(x = x, y = y, col = com), size = 1) | ||
p2 <- ggplot(data.frame(emb, pexpnorm, com)) + | ||
geom_point(aes(x = X1, y = X2, col = com), size = 1) | ||
## https://github.com/JEFworks-Lab/CRAWDAD/blob/main/docs/3_spleen.md | ||
library(crawdad) | ||
crawdad_df <- data.frame(x = pos[,1], y = pos[,2], com) | ||
ncores <- 8 | ||
set.seed(42) | ||
## convert to sp::SpatialPointsDataFrame | ||
seq <- crawdad:::toSF(pos = crawdad_df[,c("x", "y")], | ||
celltypes = crawdad_df$com) | ||
set.seed(42) | ||
## generate background | ||
shuffle.list <- crawdad::makeShuffledCells(seq, | ||
scales = seq(100, 1000, by=100), | ||
perms = 3, | ||
ncores = ncores, | ||
seed = 1, | ||
verbose = TRUE) | ||
set.seed(42) | ||
## find trends, passing background as parameter | ||
results <- crawdad::findTrends(seq, | ||
dist = 50, | ||
shuffle.list = shuffle.list, | ||
ncores = ncores, | ||
verbose = TRUE, | ||
returnMeans = FALSE) # for error bars | ||
set.seed(42) | ||
## convert results to data.frame | ||
dat <- crawdad::meltResultsList(results, withPerms = T) | ||
## multiple-test correction | ||
ntests <- length(unique(dat$reference)) * length(unique(dat$reference)) | ||
psig <- 0.05/ntests # bonferroni correction | ||
zsig <- round(qnorm(psig/2, lower.tail = F), 2) | ||
p3 <- vizColocDotplot(dat, reorder = FALSE, zsig.thresh = zsig, zscore.limit = zsig*2, dot.sizes = c(6, 20)) + | ||
theme(legend.position='right', | ||
axis.text.x = element_text(angle = 45, h = 0)) | ||
p3 | ||
library(tidyverse) | ||
dat_filter <- dat %>% | ||
filter(reference == '2') %>% | ||
filter(neighbor == '6') | ||
p4 <- vizTrends(dat_filter, lines = T, withPerms = T, sig.thresh = zsig) | ||
dat_filter <- dat %>% | ||
filter(reference == '1') %>% | ||
filter(neighbor == '2') | ||
p5 <- vizTrends(dat_filter, lines = T, withPerms = T, sig.thresh = zsig) | ||
library(patchwork) | ||
p1 + p2 + p3 + p4 + p5 + plot_annotation(tag_levels = 'A') + plot_layout(nrow = 2, ncol = 3) | ||
``` |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.