Skip to content

Commit

Permalink
Update lab note for RNA-seq data processing
Browse files Browse the repository at this point in the history
  • Loading branch information
xie186 committed Oct 3, 2024
1 parent 669d918 commit f38a101
Showing 1 changed file with 87 additions and 0 deletions.
87 changes: 87 additions & 0 deletions docs/bulkRNAseq_lab.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -771,6 +771,93 @@
"In the command line above, you will again run `multiqc` in singularity container. This time, `-B $PWD` is used. `$PWD` is a dynamic environmental variable that stores the current working directory in which the input and output of `multiqc` will be store. "
]
},
{
"cell_type": "markdown",
"id": "8788ed49",
"metadata": {},
"source": [
"## Use RSeQC to generate QC plots"
]
},
{
"cell_type": "markdown",
"id": "08f641c8",
"metadata": {},
"source": [
"```\n",
"%%bash\n",
"cd /scratch/zt1/project/bioi611/user/$USER\n",
"sbatch ../../shared/scripts/bulkRNA_SE_s6_RSeQC_genebody_cov.sub\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "91fb378d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"#!/bin/bash\n",
"#SBATCH --partition=standard\n",
"#SBATCH -t 40:00:00\n",
"#SBATCH --nodes=1\n",
"#SBATCH --ntasks=1\n",
"#SBATCH --cpus-per-task=1\n",
"#SBATCH --job-name=bulkRNA_SE_s6_RSeQC_genebody_cov.sub\n",
"#SBATCH --mail-type=FAIL,BEGIN,END\n",
"#SBATCH --error=%x-%J-%u.err\n",
"#SBATCH --output=%x-%J-%u.out\n",
"\n",
"module load singularity\n",
"\n",
"## Binding path and singularity image file\n",
"SIF_BIND=\"/scratch/zt1/project/bioi611/\"\n",
"SIF_TRIMGALORE=\"/scratch/zt1/project/bioi611/shared/software/rseqc_v5.0.3.sif\"\n",
"SIF_BEDOPS=\"/scratch/zt1/project/bioi611/shared/software/bedops_v2.4.39.sif\"\n",
"## Paths to working directory and input fastq files\n",
"WORKDIR=\"/scratch/zt1/project/bioi611/user/$USER\"\n",
"\n",
"cd $WORKDIR\n",
"\n",
"\n",
"mkdir -p bulk_RNAseq_SE_RSeQC/\n",
"singularity exec -B $SIF_BIND $SIF_TRIMGALORE geneBody_coverage.py -r /scratch/zt1/project/bioi611/shared/reference/Caenorhabditis_elegans.WBcel235.111.bed -i bulkRNA_SE_STAR_align/N2_day1_rep1.Aligned.sortedByCoord.out.bam,bulkRNA_SE_STAR_align/N2_day7_rep1.Aligned.sortedByCoord.out.bam -o bulk_RNAseq_SE_RSeQC/geneBody_cov\n",
"\n",
"\n",
"# Test command line which can be completed in less than 2 minutes \n",
"# singularity exec -B $SIF_BIND $SIF_TRIMGALORE geneBody_coverage.py -r test_1000genes.bed -i bulkRNA_SE_STAR_align/N2_day1_rep1.Aligned.sortedByCoord.out.bam -o test_genebody_cov/test\n"
]
}
],
"source": [
"%%bash\n",
"cd /scratch/zt1/project/bioi611/user/$USER\n",
"cat ../../shared/scripts/bulkRNA_SE_s6_RSeQC_genebody_cov.sub"
]
},
{
"cell_type": "markdown",
"id": "d003be4c",
"metadata": {},
"source": [
"`Caenorhabditis_elegans.WBcel235.111.bed` is used as one of the input for `geneBody_coverage.py` in RSeQC. To understand the bed file format, please refer to the link below: \n",
"\n",
"https://genome.ucsc.edu/FAQ/FAQformat.html#format1\n",
"\n",
"The bed file can be genreated using GFF3 file. GFF3 format is a similar format as GTF. To generate bed file from GFF3 file, you can use the command line below:\n",
"\n",
"```\n",
"wget https://ftp.ensembl.org/pub/release-111/gff3/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.111.gff3.gz\n",
"export PATH=\"/scratch/zt1/project/bioi611/shared/software:$PATH\"\n",
"gff3ToGenePred Caenorhabditis_elegans.WBcel235.111.gff3 Caenorhabditis_elegans.WBcel235.111.phred\n",
"genePredToBed Caenorhabditis_elegans.WBcel235.111.phred Caenorhabditis_elegans.WBcel235.111.bed\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "5b4d74ef",
Expand Down

0 comments on commit f38a101

Please sign in to comment.