generated from DS4200-S23-Class/project
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
180 lines (172 loc) · 9.17 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
<!DOCTYPE html>
<html>
<head>
<meta charset=utf-8/>
<link rel="stylesheet" type="text/css" href="style.css">
<title>Gene Expression</title>
</head>
<body>
<h1>Gene Expression Visualization Tool</h1>
<!-- tabs -->
<div class="tab">
<button class="tablinks" onclick="openTab(event, 'tab1')">About</button>
<button class="tablinks" onclick="openTab(event, 'tab2')">Correlation Analysis</button>
<button class="tablinks" onclick="openTab(event, 'tab3')">Overexpression Analysis</button>
</div>
<!-- About tab -->
<div id="tab1" class="tabcontent">
<h2>Motivation</h2>
<p>
One use case for a visualization tool related to cancer data is to visualize gene expression data across normal and cancerous tissue.
The end-user would be a scientist aiming to identify genes that are upregulated in cancerous tissue for potential drug targeting.
A scientist may use this tool for multiple purposes. They may already have a gene in mind and want to observe the distribution of
that gene's expression levels across multiple normal tissues and cancer indications. This could be visualized with multiple boxplots
representing gene expression distributions for the gene of interest, with each boxplot representing a specific tissue or cancer indication.
This would allow them to analyze whether or not the gene may serve as a good target. For example, if the gene of interest is overexpressed in
a specific cancerous tissue and less expressed in normal tissue, it may serve as a good target for a therapeutic drug that would then be able
to target only tissue that overexpresses the gene of interest.
The scientist may also not have a specific gene in mind and could use the visualization tool for more exploratory purposes.
They may filter which genes they would like to see based on criteria they have and then select a cancer indication they are interested in.
The resulting visualization would show them across all genes that matched filter criteria, which genes are most overexpressed in that cancer type
compared to normal tissue. This could be represented with a line plot that plots the ratio of average cancer expression to normal expression on one
axis against the ratio of prevalence of cancer overexpression to normal expression. Thus, from the visualization, the scientist would be able to isolate
only genes with the highest average cancer overexpression as well as highest prevalence of cancer overexpression. This would give them a solid
foundation for further research into best gene targets to utilize for therapeutic drug synthesis against their cancer type.
</p>
<h2>Background</h2>
<p>
<h4>Data</h4>
This data, the TOIL dataset, is a project conducted by UCSC that recomputes expression levels across samples from both cancer and
normal tissues. The original studies (TCGA and TARGET for tumor tissue and GTEx for normal tissue) computed their expression values
through different processing pipelines, so this recomputing project was necessary to be able to compare values across studies.
There are multiple datasets within TOIL, but the primary ones we will use are "RSEM norm_count", which contains the actual gene expression
values, and "TCGA GTEX main categories" and "TCGA TARGET GTEX selected phenotypes" which contain metadata regarding each sample.
The biggest bias consideration for the data is ensuring the expression results were harmonized correctly between the different studies.
If this was done incorrectly, it would be useless to compare results between the different studies. However, UCSC Toil went to great
lengths to create a pipeline that ensured all samples were recomputed in a single, reliable pipeline (at a cost of $1.30 per sample).
Other than this, there are no other huge considerations from the data as all samples were collected by consenting patients, and the data
cannot be traced back to the identity of the actual individuals that gave samples.
There was not much cleaning or processing that had to be done since the data is already well organized. The one big thing we did was aggregate
the different datasets (expression levels and metadata of samples) so that they are all accessible together. This was done in Python using Pandas
DataFrames and concatenating the DataFrames together.
<br>
<a href="https://xenabrowser.net/datapages/?hub=https://toil.xenahubs.net:443">Raw Data</a>
<h4>Demo Video</h4>
<video src="Video/Studio_Project.mp4" width="500px" height="500px" controls>
<track src="Video/captions.vtt" label="English" kind="subtitles" srclang="en-us">
</video>
<h4>Report</h4>
<a href="https://drive.google.com/file/d/11sduwdwiet28WOtPbnwQ__xfUL0QKY07/view?usp=sharing">Final Report</a>
</p>
<h2>Acknowledgements</h2>
<ul>
<li>
<a href="https://xenabrowser.net/datapages/?hub=https://toil.xenahubs.net:443">Data Source</a>
</li>
<li>
<a href="https://ieeexplore.ieee.org/document/9325502">Related Publication - Analysis of Expression Data</a>
</li>
<li>
<a href="https://ieeexplore.ieee.org/document/7881694">Related Publication - Interactive Visual Analysis of mRNA</a>
</li>
<li>
<a href="https://medium.com/codex/an-interactive-scatter-plot-e5a01064b17">Related Publication - Interactive Visual Analysis of mRNA</a>
</li>
<li>
<a href="https://www.w3schools.com/howto/howto_js_tabs.asp">Tabs in HTML</a>
</li>
</ul>
</div>
<!-- Correlation tab -->
<div id="tab2" class="tabcontent">
<form id="correlationForm">
<label for="gene1">Gene 1: </label>
<select name="gene1" id="gene1">
<option value="TNF">TNF</option>
<option value="MS4A1">MS4A1</option>
<option value="EGFR">EGFR</option>
<option value="VEGFA">VEGFA</option>
<option value="ERBB2" selected>ERBB2</option>
<option value="CTLA4">CTLA4</option>
<option value="PDCD1">PDCD1</option>
<option value="IL6">IL6</option>
<option value="TNFSF11">TNFSF11</option>
<option value="PCSK9">PCSK9</option>
</select>
<br>
<label for="gene2">Gene 2:</label>
<select name="gene2" id="gene2" >
<option value="TNF">TNF</option>
<option value="MS4A1">MS4A1</option>
<option value="EGFR" selected>EGFR</option>
<option value="VEGFA">VEGFA</option>
<option value="ERBB2">ERBB2</option>
<option value="CTLA4">CTLA4</option>
<option value="PDCD1">PDCD1</option>
<option value="IL6">IL6</option>
<option value="TNFSF11">TNFSF11</option>
<option value="PCSK9">PCSK9</option>
</select>
<br>
<input type="submit">
</form>
<div id="scatter-wrapper">
<svg id="scatterplot"></svg>
<svg id="scatter-legend"></svg>
<div id="selected-point"></div>
<div id="highlighted-tissue"></div>
</div>
<p>
This visualization allows the user to select two genes
to see how correlated they are. When graphed, the visualization
will show both of the expression data on a scatterplot. In
the scatterplot, red points represent TCGA data, yellow is TARGET data,
and green is GTEX data, showing the different data sources.
When the user clicks on a point, it will show the sample ID it comes
from as well as exact expression counts for both genes.
</p>
</div>
<!-- Overexpression tab -->
<div id="tab3" class="tabcontent">
<form id="overexpressionForm">
<label for="gene">Gene: </label>
<select name="gene" id="gene">
<option value="TNF">TNF</option>
<option value="MS4A1">MS4A1</option>
<option value="EGFR">EGFR</option>
<option value="VEGFA">VEGFA</option>
<option value="ERBB2">ERBB2</option>
<option value="CTLA4">CTLA4</option>
<option value="PDCD1">PDCD1</option>
<option value="IL6">IL6</option>
<option value="TNFSF11" selected>TNFSF11</option>
<option value="PCSK9">PCSK9</option>
</select>
<br>
<label for="disease">Disease: </label>
<select name="disease" id="disease">
</select>
<br>
<input type="submit">
</form>
<div id="bar-wrapper">
<svg id="barplot"></svg>
<svg id="bar-legend"></svg>
<div id="tooltip"></div>
</div>
<p>
This visualization allows the user to select the gene they
would like to look at, as well as the disease they want to
look at expression for. The visualization will then show gene
expression counts for all normal tissue types (colored green)
and compare it to the expression of the tumor tissue, highlighted in red.
Hovering over each tissue types shows the exact maximum expression of
that tissue. Clicking on any tissue will also highlight the correlating
points in the scatterplot with the same tissue type in black.
</p>
</div>
<!-- Link to javascript file-->
<script src="js/d3.v6.1.1/d3.min.js"></script>
<script src = "script.js" type="text/javascript"></script>
</body>
</html>