In this tutorial, we will walk you through the process of analyzing your own files for the SplicePCA tool.
As a first step, to get familiar with the SplicePCA tool, you can use the analyzed files deposited in data/
. These are control and Nova2-KD samples from the developing, embryonic neocortex.
Saito, Yuhki, et al. "Differential NOVA2-mediated splicing in excitatory and inhibitory neurons regulates cortical development and cerebellar function." Neuron 101.4 (2019): 707-720.
Once these files are downloaded, you can use them for the custom analysis in SplicePCA.
-
Browse to Cortexa's SplicePCA
-
Select relevant datasets for the PCA analysis – in this case Development and NPC/neuron could be a good choice
-
Upload the alternative splicing files by checking Include own datasets
-
Define the genes on which the analysis will be done a. To do the analysis on all available genes, select the option Use all Genes b. To define a subset of genes, enter the gene symbols in the Add Gene mask
-
Press Start PCA, the process can take a while.
*Result of SplicePCA using the Casette Exon (SE.MATS.JCEC.txt) analyzed with Development and NPC/neuron data and visualized after Download PCA with matplotlib.
In order to minimize technical effects, you should do the analysis as described in the manuscript.
Tools:
Reference Genome:
Analysis pipeline for alternative splicing
Caution
This is just a general outline of a pipeline and it has to be adapted to your parameters.
To minimize technical effects, you should perform the analysis as described in the manuscript. Follow these detailed steps to set up and run the analysis pipeline:
- Download BBDuk from SourceForge
- Extract the downloaded file:
tar -xvzf BBMap_39.01.tar.gz
- Add the BBDuk directory to your PATH:
export PATH=$PATH:/path/to/bbmap
- Download STAR
wget https://github.com/alexdobin/STAR/archive/2.7.11b.tar.gz
- Extract the ZIP file:
tar -xzf 2.7.11b.tar.gz
- Compile STAR:
cd STAR-2.7.11b/source make STAR
- Add the STAR directory to your PATH:
export PATH=$PATH:/path/to/STAR-2.7.11b/bin/Linux_x86_64
- Clone the rMATS-turbo repository:
git clone https://github.com/Xinglab/rmats-turbo.git
- Install dependencies (ensure you have Python 3.6+ and GCC installed):
cd rmats-turbo ./build_rmats
- Add the rMATS directory to your PATH:
export PATH=$PATH:/path/to/rmats-turbo
- Download Samtools from GitHub
- Extract Samtools
tar -xvjf samtools-1.18.tar.bz2
- Install Samtools
cd samtools-1.18 ./configure --prefix=/where/to/install make make install
- Install RSeQC using pip
pip install RSeQC
- Download Gencode mm39 from Gencode
- Download the genome FASTA file and the GTF annotation file
- Index the genome for STAR:
STAR --runMode genomeGenerate --genomeDir /path/to/star_index \ --genomeFastaFiles /path/to/GRCm39.primary_assembly.genome.fa \ --sjdbGTFfile /path/to/gencode.vM33.annotation.gtf \ --sjdbOverhang 100
Follow these steps for each sample:
- Quality control with FastQC:
fastqc -o /path/to/fastqc_output -t <threads> sample_R1.fastq.gz sample_R2.fastq.gz
- Adapter trimming with BBDuk:
bbduk.sh in1=sample_R1.fastq.gz in2=sample_R2.fastq.gz \
out1=sample_trimmed_R1.fastq.gz out2=sample_trimmed_R2.fastq.gz \
ref=/path/to/adapters.fa \
ktrim=r k=23 mink=11 hdist=1 tpe tbo \
qtrim=rl trimq=10 minlen=25
- Alignment with STAR:
STAR --genomeDir /path/to/star_index \
--readFilesIn sample_trimmed_R1.fastq.gz sample_trimmed_R2.fastq.gz \
--readFilesCommand zcat \
--outFileNamePrefix sample_ \
--outSAMtype BAM SortedByCoordinate \
--limitBAMsortRAM 10000000000 \
--runThreadN <threads>
samtools index -@ <threads> sample_Aligned.sortedByCoord.out.bam
- Infer strandedness with RSeQC:
infer_experiment.py -r /path/to/genome.bed -i sample_Aligned.sortedByCoord.out.bam > sample_strandedness.txt
- Count features with FeatureCounts:
featureCounts -s <strandedness> \
-p --countReadPairs \
-t exon \
-g gene_name \
-T <threads> \
-a /path/to/gencode.vM33.annotation.gtf \
-o sample_counts.tab \
sample_Aligned.sortedByCoord.out.bam
- Perform alternative splicing analysis with rMATS:
rmats.py --b1 sample1_Aligned.sortedByCoord.out.bam,sample2_Aligned.sortedByCoord.out.bam \
--gtf /path/to/gencode.vM33.annotation.gtf \
--od /path/to/rmats_output \
--tmp /path/to/rmats_tmp \
-t paired \
--libType <strandedness> \
--readLength <read_length> \
--nthread <threads>
Ensure you adjust the following parameters according to your experimental setup:
<threads>
: Number of threads to use for various processes<strandedness>
: Strandedness information (0 for unstranded, 1 for stranded, 2 for reversely stranded)<read_length>
: Read length of your sequencing data
Use the SE.MATS.JCEC.txt
file from the rMATS output for upload to SplicePCA.
Follow the steps in the "Use SplicePCA" section to upload and analyze your processed files.
Weißbach, S., Milkovits, J., Pastore, S. et al. Cortexa: a comprehensive resource for studying gene expression and alternative splicing in the murine brain. BMC Bioinformatics 25, 293 (2024). https://doi.org/10.1186/s12859-024-05919-y
and the data sets that you used, which can be found at Cortexa - About the data.