MIBiG taxonomy handling python bindings
This package is designed to help MIBiG-related python code to handle NCBI taxid lookups using NCBI taxdump data.
This package contains the python bindings for the MIBiG taxa-rs package to manage a local JSON-based cache of interesting taxa, allowing bulk database imports to speed up compared to parsing directly from the taxdump files.
To install taxa-py, run the following (assuming you are in a python virtualenv):
pip install mibig-taxa
To create a cache file, first grab the latest taxdump collection and extract it. You'll also need a directory containing the MIBiG BGC entry JSON files.
Then run the following:
from mibig_taxa import TaxonCache
cache = TaxonCache()
cache.initialise(
taxdump="path/to/taxa/rankedlineage.dmp",
merged_id_dump="path/to/taxa/merged.dmp",
datadir="path/to/mibig-json/data"
)
# Save the cache to a file for later use
cache.save("my_cache.json")
If you want to use the cache in a different process, simply load the cache like this:
from mibig_taxa import TaxonCache
cache = TaxonCache("my_cache.json")
# Or, if you prefer the longer form
cache = TaxonCache()
cache.load("my_cache.json")
To get an ID mapping, use
from mibig_taxa import TaxonCache
cache = TaxonCache("my_cache.json")
id_to_map = 123456
name = get_name_by_id(id_to_map)
print(f"Taxon with ID {id_to_map} is called {name}")
If you want to transparently support deprecated IDs, also set the allow_deprecated
argument to True
:
from mibig_taxa import TaxonCache
cache = TaxonCache("my_cache.json")
deprecated_id_to_map = 123456
name = cache.get_name_by_id(deprecated_id_to_map, allow_deprecated=True)
print(f"Taxon with deprecated ID {deprecated_id_to_map} is called {name}")
To get the mapping to an antiSMASH --taxon
value, use:
from mibig_taxa import TaxonCache
cache = TaxonCache("my_cache.json")
tax_id = 123456
as_taxon = cache.get_antismash_taxon(tax_id)
print(f"For antiSMASH, use --taxon {as_taxon} with tax_id {tax_id}")
You can also grab individual entries directly:
from mibig_taxa import TaxonCache
cache = TaxonCache("my_cache.json")
tax_id = 123456
entry = cache.get(tax_id)
# "class" is a reserved keyword in python, can't use it directly
entry_class = getattr(entry, "class")
print(f"{entry.superkingdom} > {entry.kingdom} > {entry.phylum} > {entry_class} > {entry.order} > {entry.family} > {entry.name}")
Licensed under the Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0)
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be licensed as Apache-2.0, without any additional terms or conditions.