CSE 185 RNAseq_Easy

RNAseq_Easy is a Python project with embedded R packages, developed for CSE185. It processes STAR and RSEM result files, checks for correlations, and generates a volcano plot to show differential expression analysis. The RNAseq_Easy project will output the results that users specify. Specifically, users can choose to output the Pearson correlation result, the CSV file generated after running the DESeq2 R package, and a volcano plot. These outputs are conditional upon successfully passing the packages/sanity_check.py script.

Installation instructions

Clone the repository and install the required packages using the following commands:

git clone https://github.com/CSE185-Final-Project/RNAseq_Easy/
cd RNAseq_Easy
pip install .

Alternatively, install directly from GitHub:

pip install git+https://github.com/CSE185-Final-Project/RNAseq_Easy.git

Export to your local path (needed if you're using class Juypter Notebook server or if you don't have root authorization)

export PATH="$HOME/.local/bin:$PATH"

If the install was successful, you can type RNAseq_Easy help to view the user manual.

Basic usage

The basic usage of RNAseq_Easy is:

RNAseq_Easy <group1_zip> <group2_zip> -o <output DIRECTORY> [options]

Example: To run RNAseq_Easy using example files from this repository:
RNAseq_Easy dataset/HFD_Rep.zip dataset/Chow_Rep.zip -o <output DIRECTORY> [options]

RNAseq_Easy Options

The paths to two zip files are only required inputs for RNAseq_Easy. Users may additionally specify the options below:

<group1_zip>, <group2_zip> Required. The path to two zipped files containing aligned and quantified gene reads processed by STAR and RSEM.
-o, --output <output_path> Required. Specifies the path where the output graph will be saved.
-p, --pearson Optional. If set, check the Pearson correlation for the data within the group and stop the program.
-d, --DESeq2 [-filter int] Optional. If set, pass all the files to DESeq2 to process and stop. The result will be saved in the output path if set. The result will be filtered by removing the genes with a count lower than filter, default -filter 0. Note: Automatically check Pearson correlation.
-v, --visual [-p_value int, -fod int, -filter int] Optional. If set, generate a volcano plot of the given data. The plot will have labels Up, Down, and None based on p-value and fold-of-change(fod) inputted, default -p_value 0.05 -fod 0. Note: Automatically check Pearson correlation and process the data through DESeq2.
-name, --name <file_path> Optional. A two-column file without a header. The first column contains the gene ID measured, and the second column contains the corresponding gene name.

Examples:
Print Manual for usage:
RNAseq_Easy
or
RNAseq_Easy help

Check Pearson Correlation within group:
RNAseq_Easy path/to/group1.zip path/to/group2.zip -o path/to/newdirectory -p

Process data through DESeq2 and filter out gene with count lower than 10:
RNAseq_Easy path/to/group1.zip path/to/group2.zip -o path/to/newdirectory -d -filter 10

Generate the visualization of volcano plot with p-value threshold = 0.05, fold-of-change threshold = 2 :
RNAseq_Easy path/to/group1.zip path/to/group2.zip -o path/to/output/graph.png -v -p_value 0.05 -fod 2

File format

-d option will generate a csv file containing the result dataframe processed by DESeq2, deliminator = ','
-v option will generate a volcano plot save as PNG formate.

Test guide (important)

Recommend to test on class Juypter Notebook Server
Download and install the package by link

pip install git+https://github.com/CSE185-Final-Project/RNAseq_Easy.git

and export by

export PATH="$HOME/.local/bin:$PATH"

make new directory for test mkdir test_RNA
set working directory cd test_RNA
download the test file from GitHub: https://github.com/CSE185-Final-Project/RNAseq_Easy/ -> dataset -> Chow_Rep.zip, HFD_Rep.zip, GRCm38.75.gene_names
upload to test_RNA on the server

run command

RNAseq_Easy HFD_Rep.zip Chow_Rep.zip -o ~/test_RNA -v -name GRCm38.75.gene_names

check the visualization plot result_vol_plot.png and deseq2 result result_deseq2.csv
free to modify options based on this manual

Contributors

This project was generated by Zhijun Qian, Sicheng Jing, and Jiarun Liu, with inspiratioin from Lab4 Assignment and many other projects.

We want to appreciate the project demo provided by Professor Gymrek:
https://github.com/gymreklab/cse185-demo-project/

And example final projects come from last year:
https://github.com/BennyXie/CSE185-GWAS-Implementation/
https://github.com/WillardFord/wf-align-CSE185/
https://github.com/kyrafetter/spyglass/

Please submit a pull request with any corrections or suggestions. Thank you!

Testing

We store test file in dataset/test_file/*. In order to test whether our code work or not, we will run our code on the file store inside. defaut:

RNAseq_Easy dataset/test_file/baby_HFD_Rep.zip dataset/test_file/baby_Chow_Rep.zip -o <output DIRECTORY> [options]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CSE 185 RNAseq_Easy

Installation instructions

Basic usage

RNAseq_Easy Options

File format

Test guide (important)

Contributors

Testing

Files

README.md

Latest commit

History

README.md

File metadata and controls

CSE 185 RNAseq_Easy

Installation instructions

Basic usage

RNAseq_Easy Options

File format

Test guide (important)

Contributors

Testing