Skip to content

TheJacksonLaboratory/sbas

Repository files navigation

DOI

The impact of sex on alternative splicing

This repository documents the analysis performed for The impact of sex on alternative splicing; note that a manuscript with a modified version of the analysis has been submitted. To reproduce the analysis, users will need to go through several steps.

  1. Get access to the Genotype-Tissue Expression (GTEx) RNAseq data (an application to dbGAP for access to the dataset phs000424.v8.v2 is required)
  2. Align each RNAseq sample using hisat2 and create a matrix of counts for each of a variety of splicing types was generated by the rMATS. Specifically, rMATS was run as a nextflow script. The script may be modified to run on any platform, the results from this study was performed on the cloudOS/lifebit platform.
  3. Run the Jupyter notebooks from this repository to perform the individual analyses.

This repository documents the interactive analysis for the results of running the rmats-nf pipeline.

1. Get access

The RNA-seq samples analyzed in this project are restricted access (dbGAP phs000424.v8.v2). See the database of Genotypes and Phenotypes (dbGaP) for details.

2. Processing the RNA-seq samples

See the manuscript for methods details. In brief, we ran the nextflow script at https://github.com/lifebit-ai/rmats-nf to align the RNA-seq samples with hisat2 and to characterize splicing events with rMATS. Results from individual samples are summarized in 'matrix' files. To run the Jupyter scripts in the next section, you will need to place these files in a results bucket (if you are using the cloudos system) or in some other defined location.

3. Running the notebooks

Each of the results described in the manuscript was generated by one or more Jupyter notebooks in this repository. There are a number of R packages that need to be installed prior to running the notebooks. This process is described from the cloudos environment in this document. If running the notebooks in another environment, simply run the setup scripts.

3.1 Summarizing events

Most of the notebooks require that the raw rMATS files are first processed to generate summary files. This is done by the notebook countGenesAndEvents.ipynb. Additionally, two notebooks are used to perform DGE and DAS analysis. These three notebooks should be run first.

  1. differentialGeneExpressionAnalysis.ipynb. Perform differential gene analysis with voom.
  2. differentialSplicingJunctionAnalysis.ipynb. Regression analysis to characterize sex-biased alternative splicing events.
  3. countGenesAndEvents.ipynb. Set up the overall analysis. Write various files to the data subdirectory that will be used by other scripts.

The remaining notebooks can be run in any order. Most of the notebooks generate a Figure or a Table or a result that is described in the manuscript.

4. Reproducibility note: How can I reproduce the Jupyter Notebooks analysis?

To facilitate reproducing the results from the secondary analysis that generates all the plots and tables of the publication, we have created a helper bash script that can be run to perform the following:

  1. Prepare the environment by installing dependencies
  2. Retrieve the data that we have made available via Zenodo 10.5281/zenodo.5524975
  3. Programmatically executing all Jupyter Notebooks leveraging the papermill library.

You can find the file at ./reproduce.sh.

a) I have drawing conda available in my system and want to reproduce the analysis

Instructions for environments with conda available

The only prerequisite in this case is a machine with conda installed.

IMPORTANT NOTE: Before executing the bash script, make sure your terminal is initialises for using conda. You can do so by running the following command, depending on you default shell:

i) for zsh

## Initialise the terminal for use of conda
conda init zsh && exec -l zsh

ii) for bash

## Initialise the terminal for use of conda
conda init bash && exec -l bash

Copy the following commands in your terminal to reproduce the Jupyter Notebooks analysis:

git clone https://github.com/TheJacksonLaboratory/sbas.git
cd sbas
git checkout adds-rendered-notebooks
conda init zsh && exec -l zsh

After this has finished, run the bash script reproduce.sh:

time bash ./reproduce.sh

b) I have docker but not conda available in my system and want to reproduce the analysis

Instructions for environments with docker but not conda available

The only prerequisite in this case is a machine with docker installed.

You can use a docker image with conda, like this one for example continuumio/miniconda3. Copy the following commands in your terminal to reproduce the Jupyter Notebooks analysis:

## use the container, mount it so tha input and output data are available in PWD
docker run -v $PWD:$PWD -w $PWD -it continuumio/miniconda3

Continue running the commands below (inside the docker container):

## Initialise the terminal for use of conda
conda init zsh && exec -l zsh

Copy the following commands in your terminal to reproduce the Jupyter Notebooks analysis:

git clone https://github.com/TheJacksonLaboratory/sbas.git
cd sbas
git checkout adds-rendered-notebooks
conda init zsh && exec -l zsh

After this has finished, run the bash script reproduce.sh:

time bash ./reproduce.sh