-
Notifications
You must be signed in to change notification settings - Fork 2
/
session2a.Rmd
179 lines (121 loc) · 9.42 KB
/
session2a.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
title: "RNA-Seq Data Network Analysis - Session 2-A"
author: "Akshay Bhat"
date: '`r format(Sys.time(), "Last modified: %d %b %Y")`'
output:
html_notebook:
toc: yes
toc_float: yes
css: stylesheets/styles.css
---
<img src="images/logo-sm.png" style="position:absolute;top:40px;right:10px;" width="200" />
# Part A:
# RNA-Seq Data Network Analysis
Cytoscape is an open source software platform for integrating, visualizing, and analysing measurement data in the context of networks.<br></br>
This protocol describes a network analysis workflow in Cytoscape for differentially expressed genes from an RNA-Seq experiment. Overall workflow:<br></br>
• Finding a set of differentially expressed genes. <br></br>
• Retrieving relevant networks from public databases. <br></br>
• Integration and visualization of experimental data. <br></br>
• Network functional enrichment analysis. <br></br>
• Exporting network visualizations. <br></br>
## Setup
Install the ([stringApp](http://apps.cytoscape.org/apps/stringapp)) from the Cytoscape App Store, or via **Apps → App Store → Show App Store.** <br></br>
## OR <br></br>
Just visit the **Cytoscape App store** and install/download it from there.
<br></br><br></br>
![](images/StringApp.png)
<br></br><br></br>
## Experimental Data
For this exercise, we will use a dataset comparing transcriptomic differences between bladder cancer and normal tissue. The study has been published by Radvanyi F et al., and we will get a summarized dataset with fold change and p-value from the **EBI Gene Expression Atlas**. Array-express ID is **E-MTAB-1940**.
* link to the publication and data : ([Here..!](https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-1940?query=E-MTAB-1940))
<br></br>
<br></br>
* Download the data: Transcriptomic analysis of bladder cancer reveals convergent molecular pathology. ([Here..!](https://raw.githubusercontent.com/a1aks/Cytoscape_Course/main/Data_Files/BCLA-all.tsv)). First Select all the contents (by holding control + A or Command + A (Mac-users)) and Save the file by right-clicking the mouse button and using save-as option. <br></br>
* To open the tsv data file in Excel, first launch Excel and open a blank workbook. Next, go to **Data → Get External Data → Import Text File....** <br></br>
* In the import wizard, select **Delimited** and in the next step select Tab.
<br></br>
* In the third step, you can select the **Data Format** for every column. The file has 4 columns of data: **Gene ID, Gene Name, fold change and p-value**. **Make sure to change the format for the second column, Gene Name, to Text.** You will have to scroll to the right to see the second column.
<br></br>
* Click **Finish** to complete the import.
<br></br>
## Editing experimental data
We are going to define a set of up-regulated genes from the full dataset by filtering for fold change and p-value.
<br></br>
For this reason we will need to edit the raw downloaded file to obtain expression information for the features specific to bladder cancer.
Download the following file ([Here..!](https://raw.githubusercontent.com/a1aks/Cytoscape_Course/main/Data_Files/E-MTAB-1940-query-results.tsv)) and open in Microsoft Excel.
<br></br>
• Select the row containing data value headers (row 4) and select **Data → Filter.**<br></br>
• In the drop-down for the fold change column, set a filter for fold change greater than 2. This should result in **263** genes.<br></br>
• Next, one would normally filter out non-significant changes by filtering on the p-value as well, for example setting p-value less than 0.05. But in this case, all genes with a fold change greater than 2 already meet that cutoff.<br></br>
• With the filter active, select and copy all entries in the **Gene Name** column.<br></br>
<br></br><br></br>
![](images/Exp_data.png)
<br></br><br></br>
## Retrieve Networks from STRING
To identify a relevant network, we will use the **STRING** database to find a network relevant to the list of up-regulated genes.
<br></br><br></br>
• Launch Cytoscape. In the **Network Search** bar at the top of the **Network Panel**, select **STRING protein query** from the drop-down, and paste in the list of 263 up-regulated genes.<br></br>
• Open the options panel and confirm you are searching **Homo sapiens** with a **Confidence cutoff of 0.40** and **0 Maximum additional interactors.**<br></br>
• Click the **search icon** to search. If any of the search terms are ambiguous, a **Resolve Ambiguous Terms** dialog will appear. Click **Import** to continue with the import using the default choices. The resulting network will load automatically, and should have around **173** nodes. <br></br>
<br></br>
![](images/String-Import.png)
<br></br><br></br>
## STRING Network Up-Regulated Genes
The resulting network contains up-regulated genes recognized by STRING, and interactions between them with a confidence score of 0.4 or greater.
<br></br><br></br>
![](images/String-Image1.png)
<br></br><br></br>
The networks consist of one large connected component, several smaller networks, and some unconnected nodes. We will use only the largest connected component for the rest of the tutorial.<br></br>
• To select the largest connected component, select **Select → Nodes → Largest subnetwork.**<br></br>
• Select **File → New Network → From Selected Nodes, All Edges.**<br></br>
<br></br><br></br>
![](images/String-Image2.png)
<br></br>
## Data Integration
Next we will import the RNA-Seq data and use them to create a visualization.<br></br>
• Load the downloaded **E-MTAB-1940-query-results.tsv** file under File menu by selecting **Import → Table from File.....** Alternatively, drag and drop the data file directly onto the Node Table.<br></br>
• In **Advanced Options**..., in the **Ignore Lines Starting With field**, enter #, to exclude the additional lines at the beginning of the data file.<br></br>
• Select the **query term** column as the **Key column for Network** and select the **Gene Name** column as the key column by clicking on the header and selecting the key symbol.<br></br>
• Click **OK** to import. Two new columns of data will be added to the **Node Table.**<br></br>
<br></br><br></br>
![](images/Import-data-int.png)
![](images/Import-Columns.png)
<br></br><br></br>
## Visualization
Next, we will create a visualization of the imported data on the network. For more detailed information on data visualization, see the Visualizing Data tutorial.<br></br>
• In the **Style** tab of the **Control Panel**, switch the style from **STRING** style to **default** in the drop-down at the top.<br></br>
• Change the default node **Shape** to **ellipse** and **check Lock node width and height.**<br></br>
• Set the default node **Size** to **50.**<br></br>
• Set the default node **Fill Color** to **light gray**.<br></br>
• Set the default **Border Width** to 2, and make the default **Border Paint** dark gray.<br></br>
<br></br><br></br>
![](images/Visualization-Styles.png)
</br><br></br><br></br>
• For node **Fill Color**, create a continuous mapping for 'NMIBC' vs 'normal' .foldChange.<br></br>
• Double-click the color mapping to open the **Continuous Mapping Editor** and click the **Current Palette**. Select the ColorBrewer **yellow-orange-red shades gradient**.<br></br>
• Finally, for node **Label**, set a passthrough mapping for display name.<br></br>
• Save your new visualization under **Copy Style...** in the **Options** menu of the **Style** interface, and name it de genes up.<br></br>
<br></br><br></br>
![](images/Style-options.png)
<br></br><br></br>
Apply the **Prefuse Force Directed** layout by clicking the **Apply Preferred Layout** button in the toolbar. The network will now look something like this:
<br></br><br></br>
![](images/Prefuse-force-layout.png)
<br></br><br></br>
<div class="exercise">
# Exercise
## a. STRING Enrichment
The STRING app has built-in enrichment analysis functionality, which includes enrichment for Gene Ontology, InterPro, KEGG Pathways, and PFAM.<br></br>
* Using the STRING tab of the Results Panel, click the **Functional Enrichment button**. Keep the default settings. What do you see.<br></br>
![](images/FunctionalEnrichmentButton.png)
<br></br><br></br>
* When the enrichment analysis is complete, a new tab titled **STRING** **Enrichment** will open in the **Table Panel**.<br></br>
* The STRING app includes several options for filtering and displaying the enrichment results. The features are all available at the top of the **STRING Enrichment tab**. Filter the table to only show **GO Biological Process.**<br></br>
* At the top left of the STRING enrichment tab, click the filter icon `r icons::fontawesome("filter", style = "solid")` . Select **GO Biological Process** and check the **Remove redundant terms check-box**. Then click **OK.**<br></br>
* Next, add a split donut chart to the nodes representing the top terms by clicking on <br></br>
* Explore custom settings via in the top right of the STRING enrichment tab.<br></br>
### b. Repeat the whole experiment using "down-regulated" genes.
### c. Export your Networks
### d. **Save in any of the formats and be ready for publishing.**
</div>
### We will now go to the next session [session (2B)](session2b.nb.html) to understand how to perform Functional Enrichment.