This is a set of simple command-line python scripts with 101 algorihtms used in bioinformatics.
- Pariwise local alignment (Smith-Waterman algorithm)
- Pairwise global alignment (Needleman-Wunsch algorithm)
- Edit distance and similarity (Needleman-Wunsch algorithm)
- RNA to amino acids translation
Usage: analyze.py [OPTIONS] SEQUENCE_A SEQUENCE_B
Options:
-S, --summary
-s, --similarity
-e, --edit-distance
-a, --alignment [global|local]
--load-csv Load scores.csv and edit_cost.csv
--help Show this message and exit.
Usage: translate.py [OPTIONS] [SEQUENCE]
Options:
-i, --input-file FILE Path to text file containing long nucleotide sequences (1 sequence = 1 line)
--help Show this message and exit.
python analyze.py AGCT AGGT --summary
python analyze.py AGCT AGGT --similarity
python analyze.py AGCT AGGT --edit-distance
python analyze.py AGCT AGGT --edit-distance --load-csv
python analyze.py AGCT AGGT --alignment local
python analyze.py AGCT AGGT --alignment global
python translate.py AUGACGGAGCUUCGGAGCUAG
python translate.py --input-file rna.txt
Output examples:
python analyze.py ACCC ACCT -e
[[0 1 2 3 4]
[1 0 1 2 3]
[2 1 0 1 2]
[3 2 1 0 1]
[4 3 2 1 1]]
[['' 'A' 'C' 'C' 'T']
['A' '↖' '←' '←' '←']
['C' '↑' '↖' '↖' '←']
['C' '↑' '↖' '↖' '←']
['C' '↑' '↖' '↖' '↖']]
[Edit distance] Cost=1
python translate.py --input-file rna.txt
MNACFSNLCYESKSIGG
MSDTLSQRLRASLGAIRIAFNLGRSAELD
- Python 3.7 (type annotations)
- numpy (storing matrices)
- pandas (loading CSV into DataFrame)
- click (CLI interface)
We recommend using conda
/virtualenv
/pyenv
environment (this step is optional)
conda create --name sequence-analyzer-env python=3.7 pip
pip install -r requirements.txt
Default scoring values:
# SequenceAnalyzer.py
self.scoring_sys = ScoringSystem(match=1, mismatch=-1, gap=-1)
self.edit_cost_sys = ScoringSystem(match=0, mismatch=1, gap=1)
You can set up your own similarity and edit cost matrices by adding --load-csv
flag
(these files are read by default)
scores.csv
edit_cost.csv
(Note: if any of your sequences contains invalid symbols, default values from ScoringSystem
will be used instead)
- Michał Martyniak (@micmarty)
- Artur Śliwa (@asliwa)
Feel free play around with our code. If you see any bugs, please tell us about them in issues ❤️!