session2a.Rmd

---
title: "RNA-Seq Data Network Analysis - Session 2-A"
author: "Akshay Bhat"
date: '`r format(Sys.time(), "Last modified: %d %b %Y")`'
output: 
  html_notebook: 
    toc: yes
    toc_float: yes
    css: stylesheets/styles.css
---
<img src="images/logo-sm.png" style="position:absolute;top:40px;right:10px;" width="200" />


# Part A:


# RNA-Seq Data Network Analysis
Cytoscape is an open source software platform for integrating, visualizing, and analysing measurement data in the context of networks.<br></br>
  
  This protocol describes a network analysis workflow in Cytoscape for differentially expressed genes from an RNA-Seq experiment. Overall workflow:<br></br>
  
  •	Finding a set of differentially expressed genes. <br></br>
  •	Retrieving relevant networks from public databases. <br></br>
  •	Integration and visualization of experimental data. <br></br>
  •	Network functional enrichment analysis. <br></br>
  •	Exporting network visualizations. <br></br>
  
  
## Setup
  
  Install the ([stringApp](http://apps.cytoscape.org/apps/stringapp)) from the Cytoscape App Store, or via **Apps → App Store → Show App Store.** <br></br>
  
## OR <br></br>
  
Just visit the **Cytoscape App store** and install/download it from there.
<br></br><br></br>
  ![](images/StringApp.png)
<br></br><br></br>
  
## Experimental Data
  For this exercise, we will use a dataset comparing transcriptomic differences between bladder cancer and normal tissue. The study has been published by Radvanyi F et al., and we will get a summarized dataset with fold change and p-value from the **EBI Gene Expression Atlas**. Array-express ID is **E-MTAB-1940**.  
  
* link to the publication and data : ([Here..!](https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-1940?query=E-MTAB-1940))
  
  
<br></br>
  <br></br>
  
* Download the data: Transcriptomic analysis of bladder cancer reveals convergent molecular pathology. ([Here..!](https://raw.githubusercontent.com/a1aks/Cytoscape_Course/main/Data_Files/BCLA-all.tsv)).  First Select all the contents (by holding control + A or Command + A (Mac-users)) and Save the file by right-clicking the mouse button and using save-as option.  <br></br>
* To open the tsv data file in Excel, first launch Excel and open a blank workbook. Next, go to **Data → Get External Data → Import Text File....** <br></br>
* In the import wizard, select **Delimited** and in the next step select Tab.
<br></br>
* In the third step, you can select the **Data Format** for every column. The file has 4 columns of data: **Gene ID, Gene Name, fold change and p-value**. **Make sure to change the format for the second column, Gene Name, to Text.** You will have to scroll to the right to see the second column.
<br></br>
* Click **Finish** to complete the import.

<br></br>
  
## Editing experimental data
  We are going to define a set of up-regulated genes from the full dataset by filtering for fold change and p-value.
<br></br>
For this reason we will need to edit the raw downloaded file to obtain expression information for the features specific to bladder cancer. 

Download the following file ([Here..!](https://raw.githubusercontent.com/a1aks/Cytoscape_Course/main/Data_Files/E-MTAB-1940-query-results.tsv)) and open in Microsoft Excel. 

<br></br>
  •	Select the row containing data value headers (row 4) and select **Data → Filter.**<br></br>
  •	In the drop-down for the fold change column, set a filter for fold change greater than 2. This should result in **263** genes.<br></br>
  •	Next, one would normally filter out non-significant changes by filtering on the p-value as well, for example setting p-value less than 0.05. But in this case, all genes with a fold change greater than 2 already meet that cutoff.<br></br>
  •	With the filter active, select and copy all entries in the **Gene Name** column.<br></br>
  <br></br><br></br>
  ![](images/Exp_data.png)

<br></br><br></br>
  
## Retrieve Networks from STRING
  
  To identify a relevant network, we will use the **STRING** database to find a network relevant to the list of up-regulated genes.
<br></br><br></br>
  •	Launch Cytoscape. In the **Network Search** bar at the top of the **Network Panel**, select **STRING protein query** from the drop-down, and paste in the list of 263 up-regulated genes.<br></br>
  •	Open the options panel  and confirm you are searching **Homo sapiens** with a **Confidence cutoff of 0.40** and **0 Maximum additional interactors.**<br></br>
  •	Click the **search icon** to search. If any of the search terms are ambiguous, a **Resolve Ambiguous Terms** dialog will appear. Click **Import** to continue with the import using the default choices. The resulting network will load automatically, and should have around **173** nodes. <br></br>
  
  <br></br>
  
  ![](images/String-Import.png)
<br></br><br></br>
  
## STRING Network Up-Regulated Genes
  
  The resulting network contains up-regulated genes recognized by STRING, and interactions between them with a confidence score of 0.4 or greater.
<br></br><br></br>
  
  ![](images/String-Image1.png)


<br></br><br></br>
  
  The networks consist of one large connected component, several smaller networks, and some unconnected nodes. We will use only the largest connected component for the rest of the tutorial.<br></br>
  
  •	To select the largest connected component, select **Select → Nodes → Largest subnetwork.**<br></br>
  •	Select **File → New Network → From Selected Nodes, All Edges.**<br></br>
  <br></br><br></br>
  ![](images/String-Image2.png)

<br></br>

## Data Integration
  
  Next we will import the RNA-Seq data and use them to create a visualization.<br></br>
  
  •	Load the downloaded **E-MTAB-1940-query-results.tsv** file under File menu by selecting **Import → Table from File.....** Alternatively, drag and drop the data file directly onto the Node Table.<br></br>
  •	In **Advanced Options**..., in the **Ignore Lines Starting With field**, enter #, to exclude the additional lines at the beginning of the data file.<br></br>
•	Select the **query term** column as the **Key column for Network** and select the **Gene Name** column as the key column by clicking on the header and selecting the key symbol.<br></br>
  •	Click **OK** to import. Two new columns of data will be added to the **Node Table.**<br></br>
  
  <br></br><br></br>
  ![](images/Import-data-int.png)
![](images/Import-Columns.png)
<br></br><br></br>
  
  
## Visualization
  Next, we will create a visualization of the imported data on the network. For more detailed information on data visualization, see the Visualizing Data tutorial.<br></br>
  
  •	In the **Style** tab of the **Control Panel**, switch the style from **STRING** style to **default** in the drop-down at the top.<br></br>
  •	Change the default node **Shape** to **ellipse** and **check Lock node width and height.**<br></br>
  •	Set the default node **Size** to **50.**<br></br>
  •	Set the default node **Fill Color** to **light gray**.<br></br>
  •	Set the default **Border Width** to 2, and make the default **Border Paint** dark gray.<br></br>
  <br></br><br></br>
  ![](images/Visualization-Styles.png)

</br><br></br><br></br>
  •	For node **Fill Color**, create a continuous mapping for 'NMIBC' vs 'normal' .foldChange.<br></br>
  •	Double-click the color mapping to open the **Continuous Mapping Editor** and click the **Current Palette**. Select the ColorBrewer **yellow-orange-red shades gradient**.<br></br>
  •	Finally, for node **Label**, set a passthrough mapping for display name.<br></br>
  •	Save your new visualization under **Copy Style...** in the **Options** menu of the **Style** interface, and name it de genes up.<br></br>
  <br></br><br></br>
  ![](images/Style-options.png)
<br></br><br></br>
  
  
  Apply the **Prefuse Force Directed** layout by clicking the **Apply Preferred Layout** button in the toolbar. The network will now look something like this:
  <br></br><br></br>
  ![](images/Prefuse-force-layout.png)
<br></br><br></br>

<div class="exercise">
# Exercise

## a. STRING Enrichment

The STRING app has built-in enrichment analysis functionality, which includes enrichment for Gene Ontology, InterPro, KEGG Pathways, and PFAM.<br></br>

*	Using the STRING tab of the Results Panel, click the **Functional Enrichment button**. Keep the default settings. What do you see.<br></br>
    ![](images/FunctionalEnrichmentButton.png)
  
  <br></br><br></br>

* When the enrichment analysis is complete, a new tab titled **STRING** **Enrichment** will open in the **Table Panel**.<br></br>

* The STRING app includes several options for filtering and displaying the enrichment results. The features are all available at the top of the **STRING Enrichment tab**. Filter the table to only show **GO Biological Process.**<br></br>
   
* At the top left of the STRING enrichment tab, click the filter icon `r icons::fontawesome("filter", style = "solid")` . Select **GO Biological Process** and check the **Remove redundant terms check-box**. Then click **OK.**<br></br>
* Next, add a split donut chart to the nodes representing the top terms by clicking on <br></br>
* Explore custom settings via   in the top right of the STRING enrichment tab.<br></br>   
    
### b. Repeat the whole experiment using "down-regulated" genes. 

### c. Export your Networks

### d. **Save in any of the formats and be ready for publishing.**
  
  </div>
  
  
### We will now go to the next session [session (2B)](session2b.nb.html) to understand how to perform Functional Enrichment.