Skip to content

Python version of RNAfbinv with extra feature such as accurate sequence per motif

License

Notifications You must be signed in to change notification settings

matandro/RNAsfbinv

Repository files navigation

RNAfbinv 2.0

RNAfbinv is a fragment based RNA design tool. It uses a simulated annealing process to optimize a 2D RNA structure.
The similarity is based on fragment based design. A tree alignment is done based on nodes (structural motifs).
Nodes are comparable if they are both bounded motifs (stems) or unbounded motifs (multi loop, interior loops, bulges ect...).
Each iteration the target motif tree will be aligned to the current candidate tree.
The best alignment with the addition of other valuable features will generate a design score.
Design score of 0 is exact fit but even higher scores can generate a good candidate.

RNAfbinv 2.0 can be easily installed as it is available on pypi (python 3 compatible). To install it simply run pip install rnafbinv.


If you use the tool please cite:
Drory Retwitzer, M., Reinharz, V., Churkin, A., Ponty, Y., Waldispühl, J., & Barash, D. (2019) incaRNAfbinv 2.0 - A webserver and software with motif control for fragment-based design of RNAs. Bioinformatics, accepted.

Attaching Vienna RNA

Vienna RNA package is required for RNAfbinv to work. This must be installed separately.
Current version was tested with Vienna 2.4 and above. RNAfbinv will identify Vienna package if it's bin directory is in PATH.
If you wish to link a specific installation of Vienna set the VIENNA_PATH environment variable to the correct bin directory.

You can set Vienna location in python

import os
os.environ['VIENNA_PATH'] = "VIENNA_BIN_DIR_PATH"

or directly via the vienna script

from rnafbinv import vienna
vienna.set_vienna_path("VIENNA_BIN_DIR_PATH")

Usage

The design process can be ran using the following code:

from rnafbinv import RNAfbinvCL
RNAfbinvCL.main(command_line_arguments)

To generate a tree for a specific sequence / structure:
Structure is a dot bracket notation structure and sequence is an IUPAC string with the same length

from rnafbinv import shapiro_tree_aligner
shapiro_tree_aligner.get_tree(sructure, sequence)

To compare two trees and score them: alignment_rules has a default value and is optional

from rnafbinv import shapiro_tree_aligner
shapiro_tree_aligner.align_trees(source_tree, tree_target, alignment_rules)

GUI / Command line

You can download the RNAfbinv wrapper from RNAfbinv2.0 git repository
The main folder includes python code to run the GUI / command line and a configuration file:

  • RNAfbinv.py - A GUI wrapper for RNAfbinv2.0
  • RNAfbinvCL.py - A command line wrapper for RNAfbinv2.0
  • Required varna_generator.py - Used to generate images based on VARNA
  • Required config.ini - Configuration file with paths to required software (information below).
  • Required img folder with NoImage.png - used in GUI as a placeholder

If you remove the VARNA jar or do not have java installed, images will not be generated but the design process will proceed normally.

To specify vienna package binary folder please update the 'VIENNA' parameter in config.ini (or set VIENNA_PATH environment variable)
To specify Java binary folder please update the 'JAVA' parameter in config.ini (or set JAVA_PATH environment variable)
To specify VARNA's jar file please update the 'VARNA' parameter in config.ini (or set VARNA_PATH environment variable)
Note that if the java or vienna package binaries are in your environment variables you may leave it empty.

Example to a valid config.ini file which has java installed and within the system's path:

[PATH]
VIENNA=~/ViennaRNA/bin/
#JAVA=
VARNA=~/VARNA/VARNAv3-93.jar

Command line arguments:

usage: RNAfbinvCL.py [-h] [-l LOG_OUTPUT] [--verbose | --debug]
                     [-p {MFE,centroid}] [-i ITERATIONS] [--seed SEED]
                     [-t LOOK_AHEAD] [--reduced_bi REDUCED_BI] [-e]
                     [--seq_motif] [-m MOTIF_LIST] [-s STARTING_SEQUENCE | -r]
                     [--length LENGTH] [-f INPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  -l LOG_OUTPUT, --log_output LOG_OUTPUT
                        Path to output log file. (default: None)
  --verbose             Increase output verbosity. (default: False)
  --debug               Debug level logging. (default: False)
  -p {MFE,centroid}, --structure_type {MFE,centroid}
                        uses RNAfold centroid or MFE folding. (default: MFE)
  -i ITERATIONS, --iterations ITERATIONS
                        Sets the number of simulated annealing iterations.
                        (default: 100)
  --seed SEED           Random seed used in the random number generator.
                        (default: None)
  -t LOOK_AHEAD, --look_ahead LOOK_AHEAD
                        Number of look head mutation attempts for each
                        iteration. (default: 4)
  --reduced_bi REDUCED_BI
                        Remove extra penalty for removal or addition of bulges
                        and interior loops under the given size. Alignment
                        penalties still occur. (default: 0)
  -e, --circular        Designs a circular RNA. (default: False)
  --seq_motif           Enables increased penalty for insertion or deletions
                        within marked regions (lower case characters in
                        sequence constraint). The feature was added to control
                        multi base sequence constraints (sequence motifs).
                        Only valid within a specific structural motif.
                        (default: False)
  -m MOTIF_LIST, --motif_list MOTIF_LIST
                        A comma separated list of motifs that are targeted for
                        preservation with size.Single motif format: <motif
                        No>[M|H|E|I|S|B]<motif No of bases>. Use
                        rnafbinv.ListMotifs.list_motifs(structure) to retrieve
                        a list of legal motifs for a given structure.
                        (default: [])
  -s STARTING_SEQUENCE, --starting_sequence STARTING_SEQUENCE
                        The initial sequence for the simulated annealing
                        process in IUPAC nucleotide codes. (default: None)
  -r, --random_start    Start simulated annealing with a random sequence.
                        (default: False)
  --length LENGTH       Maximum variation in result length compared to target
                        structure. (default: 0)
  -f INPUT_FILE         Path of ini file that includes mandatory information.
                        Some options can also be set via file. command line
                        options take precedence. (default: None)

Input file format (the '-f' parameter):

# mandatory
TARGET_STRUCTURE=<target structure>
TARGET_SEQUENCE=<target sequence>
# optional
TARGET_ENERGY=<target energy>
TARGET_MR=<target mutational robustness>
SEED=<random seed>
STARTING_SEQUENCE=<starting sequence>
ITERATION=<number of simulated annealing iterations>

Webserver

RNAfbinv2.0 can be found in a web server combined with incaRNAtion. The webserver generates starting seeds using incaRNAtion global sampling algorithm.
Te seed sequences are then sent to RNAfbinv2.0 for design. incaRNAfbinv web server

The Tree class

The tree alignment was written in an object oriented pattern (found in tree_aligner.py) . The Tree class generates the best alignment between two trees based on a dynamic programming algorithm based on the classic classic Jiang-Wang-Zhang solution. The TreeValue class is expended to solve the fragment-based comparison of two "shapiro trees" but it can solve multiple problems based on the user needs.
To use the code one must define a TreeValue class the specifies the value of a single node in the tree. To align the trees the user must implement an AlignmentObject class which is a container that holds four functions:

  • minmax_func - Function that receives two floating value and returns the best of the two (example: min, max)
  • delete_func - Function that receives a TreeValue and a boolean stating if the value is from the target of source tree and returns a score representing the value of the deletion and an optional AlignmentResult object that includes a description of the deletion
  • cmp_func - Function that receives two TreeValue objects and compares them, it returns a score and an optional AlignmentResult object that includes the description of the comparison
  • merge_func - Function the receives two TreeValue objects returns a new TreeValue representing the merge between the two.

A reference implementation can be found in the file: shapiro_tree_aligner.py

About

Python version of RNAfbinv with extra feature such as accurate sequence per motif

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages