Skip to content

Local/global sequence alignment (Smith Waterman algorithm), edit distance/similarity (Needleman Wunsch algorithm), RNA to amino acids translation

License

Notifications You must be signed in to change notification settings

micmarty/dna-sequence-analyzer

Repository files navigation

Sequence analyzer (bioinformatics)

About

This is a set of simple command-line python scripts with 101 algorihtms used in bioinformatics.

Features

  • Pariwise local alignment (Smith-Waterman algorithm)
  • Pairwise global alignment (Needleman-Wunsch algorithm)
  • Edit distance and similarity (Needleman-Wunsch algorithm)
  • RNA to amino acids translation

Available commands

Usage: analyze.py [OPTIONS] SEQUENCE_A SEQUENCE_B

Options:
  -S, --summary
  -s, --similarity
  -e, --edit-distance
  -a, --alignment [global|local]
  --load-csv                      Load scores.csv and edit_cost.csv
  --help                          Show this message and exit.
Usage: translate.py [OPTIONS] [SEQUENCE]

Options:
  -i, --input-file FILE  Path to text file containing long nucleotide sequences (1 sequence = 1 line)
  --help                 Show this message and exit.

Usage examples

python analyze.py AGCT AGGT --summary
python analyze.py AGCT AGGT --similarity
python analyze.py AGCT AGGT --edit-distance
python analyze.py AGCT AGGT --edit-distance --load-csv
python analyze.py AGCT AGGT --alignment local
python analyze.py AGCT AGGT --alignment global

python translate.py AUGACGGAGCUUCGGAGCUAG
python translate.py --input-file rna.txt

Output examples:

python analyze.py ACCC ACCT -e

[[0 1 2 3 4]
 [1 0 1 2 3]
 [2 1 0 1 2]
 [3 2 1 0 1]
 [4 3 2 1 1]]
[['' 'A' 'C' 'C' 'T']
 ['A' '↖' '←' '←' '←']
 ['C' '↑' '↖' '↖' '←']
 ['C' '↑' '↖' '↖' '←']
 ['C' '↑' '↖' '↖' '↖']]
[Edit distance] Cost=1
python translate.py --input-file rna.txt

MNACFSNLCYESKSIGG
MSDTLSQRLRASLGAIRIAFNLGRSAELD

Requirements

  • Python 3.7 (type annotations)
  • numpy (storing matrices)
  • pandas (loading CSV into DataFrame)
  • click (CLI interface)

We recommend using conda/virtualenv/pyenv environment (this step is optional)

conda create --name sequence-analyzer-env python=3.7 pip

Requirements installation

pip install -r requirements.txt

Customization

Default scoring values:

# SequenceAnalyzer.py
self.scoring_sys = ScoringSystem(match=1, mismatch=-1, gap=-1)
self.edit_cost_sys = ScoringSystem(match=0, mismatch=1, gap=1)

You can set up your own similarity and edit cost matrices by adding --load-csv flag

(these files are read by default)

scores.csv

image

edit_cost.csv

image

(Note: if any of your sequences contains invalid symbols, default values from ScoringSystem will be used instead)

Credits

License

Feel free play around with our code. If you see any bugs, please tell us about them in issues ❤️!

Apache License 2.0

About

Local/global sequence alignment (Smith Waterman algorithm), edit distance/similarity (Needleman Wunsch algorithm), RNA to amino acids translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages