Welcome to the repository for RNA Bind-N-Seq Analysis!
The RBNS pipeline is a set of bioinformatics tools to analyze data from high-throughput sequencing experiments of protein-bound RNAs. The current version includes read splitting, calculation of kmer frequencies and enrichments, QC metrics, production of motif sequence logos, and RNA secondary structure analysis. Incorporation of functions to compute presence of bipartite motifs & flanking nucleotide context preferences are forthcoming.
The RBNS pipeline is designed to run on a Linux computing cluster, optionally with jobs parallelized by submitting them to a PBS/Torque queue. In addition, it requires the following software to be pre-installed on your computing environment:
- Python (tested on version 2.7.11)
- The Miniconda or anaconda package manager (if you download this now, be sure to 'source ~/.bashrc' after so that 'conda' is on your $PATH).
- The Weblogo program (if sequence motif logos are to be produced.)
- The RNAfold program (if RNA secondary secondary structure analysis is performed).
NOTE: The forgi library (if RNA secondary secondary structure analysis is performed) is imported in teh conda env.
If you need help installing any of these tools, see the detailed documentation. When installing dependencies, make sure you agree with the corresponding licenses of various software tools.
The easiest way to get the RBNS pipeline software is to clone this repository. This will ensure you always have access to the latest version.
https://[email protected]/pfreese/rbns_pipeline.git
After cloning the repository, run the included installation script:
cd rbns_pipeline
./install.sh
This will use the Conda package manager to ensure that you have all of the key dependencies. This step will create a stable environment in which to run analysis jobs. This approach will keep your results reproducible and will not affect other software that you have installed on your system. While you can also use the RBNS pipeline without this step by installing all necessary packages manually, it is not recommended.
In this version, the RBNS_pipeline is able to analyze RBNS (Dominguez et al., 2017) data. You can find example input files in the test_data/ directory within the repository. These were derived from experiments that assayed the RBFOX3 protein.
Once the script has finished running, you can find the output from the pipeline in the results_dir given in the settings.RBFOX3.json file.
The inputs to the RBNS_pipeline are described in more detail here. Here is a quick summary:
- A settings .json file describing the experiment, different libraries assayed, and what counts & optional additional functionalities are to be performed by the pipeline.
- A FASTQ file containing the multiplexed sequencing reads from the different libraries to be split & analyzed.
You can find examples of all of these files in the test_data/ folder. It is probably easiest to just take a look at these files first. You can run the RBNS_pipeline on this example and reproduce the logo below.
The complete set of output files is described in here. Briefly, output should include:
- Split read files for each library, including files containing QC stats about the number of reads in each library & library complexity.
- Enrichment tables of kmers
- Pickled intermediate files of kmer counts & frequencies.
Additional output files depending on functionalities requested include:
- SKA (Streaming kmer Assignment) library fraction and nt. frequencies & kmer enrichments at different positions of the random region
- Sequence motif logos as shown below
- RNA secondary structure analyses of the top enriched kmers
An example of an output logo for RBFOX3, derived from computing the significantly enriched 5mers through an interative procedure at a Z-score > 3 cutoff, is:
RBNS_pipeline is developed by Peter Freese and released under a GPL v3 license.
For any questions or comments about the RBNS pipeline, contact Peter Freese (pfreese [at] mit {dot} edu).
- Lambert, et al. RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins Mol Cell. 2014 Jun 5;54(5):887-900. doi: 10.1016/j.molcel.2014.04.016
- Lambert, et al. RNA Bind-n-Seq: Measuring the Binding Affinity Landscape of RNA-Binding Proteins Methods Enzymol. 2015 558:465-93. doi: 10.1016/bs.mie.2015.02.007
- Dominguez et al. Sequence, Structure and Context Preferences of Human RNA Binding Proteins bioRxiv. 2017 Oct 12. doi: 10.1101/201996