Skip to content

Latest commit

 

History

History
65 lines (46 loc) · 1.8 KB

README.md

File metadata and controls

65 lines (46 loc) · 1.8 KB

BioStockholm.jl

Build Status Aqua QA

Julia parser for the Stockholm file format (.sto) used for multiple sequence alignments of protein, RNA, or DNA sequences (Pfam, Rfam, etc databases). This package uses Automa.jl under the hood to generate a finite state machine parser.

Installation

Enter the package mode from the Julia REPL by pressing ], then install with:

add BioStockholm

Usage

using BioStockholm

msa = MSA{Char}(;
    seq = Dict("human"   => "ACACGCGAAA.GCGCAA.CAAACGUGCACGG",
               "chimp"   => "GAAUGUGAAAAACACCA.CUCUUGAGGACCU",
               "bigfoot" => "UUGAG.UUCG..CUCGUUUUCUCGAGUACAC"),
     GC = Dict("SS_cons" => "...<<<.....>>>....<<....>>.....")
)

# read from file
# example2.sto contains an example Stockholm file
msa_path = joinpath(dirname(pathof(BioStockholm)), "..",
                    "test", "example2.sto")
msa_str = read(msa_path, String)
print(msa_str)

# read from a file or parse from a String
msa = read(msa_path, MSA)
msa = parse(MSA, msa_str)

# write to a file
write("foobar.sto", msa)

# pretty-print
print(msa)
print(stdout, msa)

Limitations / TODO

  • when writing, long sequences or text is never split over multiple lines
  • integrate with BioJulia string types

Related packages

MIToS.jl is a package for analysing protein sequences that also supports parsing the Stockholm format (and many more things).