Skip to content
Jessica Mattick edited this page Apr 30, 2020 · 26 revisions

Welcome to the RNA_Editing_Detection_Pipeline wiki!

Usage:

Download Reference Data

  1. Create a tab-delimited file containing the urls to all required reference data keeping the first column identical to the example.

Example reference_data.txt:

genome  ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz
genome_annotation       ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/GRCh37_mapping/gencode.v30lift37.annotation.gtf.gz
strand_detection        https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/hg19_RefSeq.bed.gz
rmsk    http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz
dbSNP   http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/snp151.txt.gz
rediportal_db   http://srv00.recas.ba.infn.it/webshare/rediportalDownload/table1_full.txt.gz
  1. Run get_ref_data_annotation.py to download all required data into specified directory and generate annotation files.

Parameters:

  • -i or --input: path to tab-delimited file containing data urls
  • -o or --output: path to output directory

Example:

nohup python3 get_ref_data_annotation.py -i reference_data.txt -o output_path &

Some reference data may need to be reformatted. This can be done following the instructions in box 7 of Lo Guidice et al. This will only need to be done one time per genome release. Formatted reference data is provided in the test dataset.

Index Genome for STAR

  1. Run index_genome_STAR.py to index the genome for STAR.

Parameters:

  • -f or --fasta: path to genome fasta file
  • -a or --gtf_annotation: path to genome gtf annotation
  • -o or --output: path to output directory

Example:

nohup python3 index_genome_STAR.py -f genome.fa -a annotation.gtf -o index_output/ &

Retrieve Fastq Files from SRA

  1. Create a txt file containing a list of SRA accession numbers.
  2. Run get_SRA_data.py to download data

Parameters:

  • -a or --acc_list: path to file containing list of SRA accession numbers
  • -o or --output: path to output directory

Example:

nohup python3 get_SRA_data.py -a acc.txt -o output_path &

Seq Reads Quality Check

  1. Run fastqc.py to quality check the sequencing reads

Parameters:

  • -se or --single_end: include at beginning of parameters if data is single end
  • -f or --fastq_dir: path to fastq directory
  • -o or --output: path to output directory

Example:

PE data

nohup python3 fastqc.py -f fastq_dir -o output_dir &

SE data

nohup python3 fastp.py -se -f fastq_dir -o output_dir &

Trim RNAseq Reads

  1. Run fastp.py to trim RNAseq Reads

Parameters:

  • -se or --single_end: include at beginning of parameters if data is single end
  • -f or --fastq_dir: path to fastq directory
  • -o or --output: path to output directory

Example:

PE data

nohup python3 fastp.py -f fastq_dir -o output_dir &

SE data

nohup python3 fastp.py -se -f fastq_dir -o output_dir &

Align RNAseq Reads

  1. Make sure genome has been indexed for STAR
  2. Run align_STAR.py to align paired-end data to the genome

Parameters:

  • -f or --fastq_dir: path to directory containing fastq files
  • -g or --genome_idx: path to STAR genome index
  • -o or --output: path to output directory

Example:

nohup python align_STAR.py -f fastq_dir -g genome_index -o output_dir &

Detection of the strand orientation of RNAseq reads

  1. Run infer_strand_direction.py

Parameters:

  • -d or --bam_dir: path to directory containing bams
  • -r or --ref_seq_bed: path to refseq bed file

Example:

nohup python3 infer_strand_direction.py -d bam_dir -r ref_seq_bed &

Download Fastq Files of WGS from SRA

  1. Create a text file containing a list of ERR accession numbers.
  2. Run get_WGS_data.py to download data

Parameters:

  • -a or --acc_list: path to file containing a list of ERR accession numbers
  • -o or --output: path to output directory

Example:

nohup python3 get_WGS_data.py -a acc.txt -o output_path &

Index Genome for BWA

  1. Run index_genome_bwa.py to index the genome for BWA.

Parameters:

  • -f or --fasta_dir: path to genome fasta file

Example:

nohup python3 index_genome_bwa.py -f fasta_dir &

Align DNAseq Reads

  • Run align_bwa.py to align paired-end data to the genome

Parameters:

  • -fq or --fastq_dir: path to directory containing fastq files
  • -fa or --fasta_dir: path to directory containing genome fasta file

Example:

nohup python3 align_bwa.py -fq fastq_dir -fa fasta_dir &

Select and map reads to a chromosome

  • Run select_map_chr.py to select and map reads to a specific chromosome

Parameters:

  • -g or --genome_dir: path to directory containing the genome .fai file
  • -f or --fastq_dir: path to directory containing the WGS fastq file and also sam file
  • -o or --output_dir: path to directory store the output files
  • -chr or --chrNum: select the chromosome number as 'chr[Int]' (e.g. -chr chr21)

Example:

nohup python3 select_map_chr.py -g genome_dir -f fastq_dir -o output_dir -chr chrNum &
Clone this wiki locally