-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #149 from acheng41/main
Andrea Cheng (acheng41) HW5 Submission
- Loading branch information
Showing
2 changed files
with
139 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
--- | ||
layout: post | ||
title: "Identifying KRT8 Expression in Spot-Based Spatial Transcriptomic Data Set" | ||
author: Andrea Cheng | ||
jhed: acheng41 | ||
categories: [ HW5 ] | ||
image: homework/hw5/hw5_acheng41.png | ||
featured: false | ||
--- | ||
|
||
|
||
In the pikachu Single Cell Data Set, I identified a cluster of cells that had high expression of the KRT8 gene. I hypothesized that these cells were epithelial cells. | ||
|
||
In my orginial investigation, I found that the KRT8 gene was almost exclusively upregulated in the cell cluster I identified. Thus, I used expression of this gene to identify a similar cluster of spots in the same general spatial location and pattern with similar upregulation of the KRT8 gene. Panels A and B show the spatial location of the spots of interest as well as the gene expression. Panels C and D show the lower dimension tSNE embedding. And finally, Panels E and F show the lower dimension PCA graph. | ||
|
||
I made slight modifications to my code to identify this cluster. In addition to adjusting which columns I extracted data from, I also used a different k for k-means clustering. I previously performed k-means clustering with k=5. Now I performed k-means clustering with k=4. This is because I found the optimal K based on total withinness to be 4. While the cluster of interest was usually isolated with many different values of k, I found that higher k values seemed to oversegment the other cells. I think there is a lower optimal number of transcriptionally distinct cell-clusters in the spot-based Eevee dataset compared to the single-cell resolution Pikachu dataset because many cells may be in a single spot. Thus, it may be more difficult to identify/distinguish between distinct cell types. | ||
|
||
|
||
|
||
```{r} | ||
#HW5 | ||
#Andrea Cheng (acheng41) | ||
data = read.csv("~/Desktop/GDV class/data/eevee.csv.gz", row.names =1) | ||
#import libraries | ||
library(Rtsne) | ||
library(ggplot2) | ||
library(patchwork) | ||
#design of visualization | ||
my_color = scale_color_viridis_c(option= "C") | ||
my_theme = theme( | ||
plot.title = element_text(hjust = 0.5, face="bold", size=10), | ||
text = element_text(size = 10) | ||
) | ||
data[1:5,1:10] | ||
pos = data[,2:3] | ||
head(pos) | ||
gexp = data[,4:ncol(data)] | ||
#normalize | ||
gexpnorm = log10(gexp/rowSums(gexp) * mean(rowSums(gexp)) + 1) | ||
#identify cell cluster | ||
df_plot = data.frame(pos, gene = gexpnorm[,'KRT8']) | ||
#plot on tissue | ||
gene_tissue = ggplot(df_plot) + | ||
geom_point(aes(x=aligned_x, y = aligned_y, | ||
col = gene)) + my_color + | ||
labs(col = "Expression of KRT8", | ||
title = "KRT8 - Spacial Visualization")+ my_theme | ||
#PCA | ||
pcs = prcomp(gexpnorm) | ||
plot(pcs$sdev[1:20]) | ||
#tsne | ||
emb = Rtsne(pcs$x[,1:20])$Y #faster w/ pcs which represent genes than tSNE on genes | ||
df_tsne = data.frame(emb) | ||
#check tsne plot | ||
ggplot(df_tsne) + geom_point(aes(x = X1, y = X2)) + theme_classic() | ||
#find optimal clustering coefficient k on tSNE | ||
#look at different tot.within-ness for different k | ||
results <- sapply(2:15, function(i){ | ||
com <- kmeans(emb, centers = i) | ||
return(com$tot.withinss) | ||
}) | ||
plot(results, type = 'l') | ||
#kmeans with k=4 | ||
com = kmeans(emb, centers = 4) | ||
#visualize clusters | ||
df_clusters = data.frame(pos, tsne = emb, pcs = pcs$x, kmeans = as.factor(com$cluster)) | ||
#Plot showing cluster of interest in tSNE | ||
kmeans_tsne = ggplot(df_clusters) + | ||
geom_point(aes(x=tsne.1,y = tsne.2, | ||
col = ifelse(kmeans == 3, "Cluster of Interest", "All Other Cells"))) + | ||
labs( | ||
col = "Clustering on tSNE", | ||
title = "Cluster of Interest - tSNE Visualization" | ||
) + my_theme | ||
#Plot showing cluster of interest in tissue | ||
kmeans_tissue = ggplot(df_clusters) + | ||
geom_point(aes(x=aligned_x,y = aligned_y, | ||
col = ifelse(kmeans == 3, "Cluster of Interest", "All Other Cells")))+ | ||
labs( | ||
col = "Clustering on tSNE", | ||
title = "Cluster of Interest - Spacial Visualization" | ||
)+ my_theme | ||
#Plot showing cluster of interest in PCA | ||
kmeans_pca = ggplot(df_clusters) + | ||
geom_point(aes(x=pcs.PC1,y = pcs.PC2, | ||
col = ifelse(kmeans == 3, "Cluster of Interest", "All Other Cells"))) + | ||
labs( | ||
col = "Clustering on tSNE", | ||
title = "Cluster of Interest - PCA Visualization" | ||
) + xlab("PC1") + ylab ("PC2") + my_theme | ||
#data frame for gene expression of KRT8 | ||
df_gene = data.frame(pos, tsne = emb, pcs = pcs$x, gene = gexpnorm$KRT8) | ||
#Plot showing gene expression in tSNE | ||
gene_tsne = ggplot(df_gene) + geom_point(aes(x = tsne.1, y=tsne.2, col = gene)) + | ||
my_color + labs( | ||
col = "Expression of KRT8", | ||
title = "KRT8 - tSNE Visualization" | ||
)+ my_theme | ||
#Plot showing gene expression in PCA | ||
gene_pca = ggplot(df_gene) + geom_point(aes(x=pcs.PC1,y = pcs.PC2, col = gene)) + | ||
my_color + xlab("PC1") + ylab ("PC2") + | ||
labs( | ||
col = "Expression of KRT8", | ||
title = "KRT8 - PCA Visualization" | ||
)+ my_theme | ||
#display plots | ||
kmeans_tissue + gene_tissue + kmeans_tsne + gene_tsne + kmeans_pca + gene_pca + plot_annotation(tag_levels = 'A')+ | ||
plot_layout(widths = c(5, 5, 5), | ||
guides = "collect", | ||
design = " | ||
12 | ||
34 | ||
56 | ||
") | ||
``` |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.