Release v2.3.0 · HobnobMancer/cazy_webscraper

What's Changed

Issue 111 + 112 uniprot by @HobnobMancer in #115

Full Changelog: v2.2.8...v2.3.0

New in version 2.3.0

Downloading protein data from UniProt is several magnitudes faster than before - and should have fewer issues with using older version of bioservices
- Uses bioservices mapping to map directly from NCBI protein version accession to UniProt
- cw_get_uniprot_data not longer calls to NCBI and thus no longer requires an email address as a positional argument
Updated database schema: Changed Genbanks 1--* Uniprots to Genbanks *--1 Uniprots. Uniprots.uniprot_id is now listed in the Genbanks table, instead of listing Genbanks.genbank_id in the Uniprots table
Retrieve taxonomic classifications from UniProt
- Use the --taxonomy/-t flag to retrieve the scientific name (genus and species) for proteins of interest
- Adds downloaded taxonomic information to the UniprotsTaxs table
Improved clarrification of deleting old records when using cw_get_uniprot_data
- Separate arguments to delete Genbanks-EC number and Genbanks-PDB accession relationships that are no longer listed in UniProt for those proteins in the local CAZyme database for proteins whom data is downloaded from UniProt
- New args:
  - --delete_old_ec_relationships = deletes Genbank(protein)-EC number relationships no longer in UniProt
  - --delete_old_ecs = deletes EC numbers in the local db not linked to any proteins
  - --delete_old_pdb_relationships = deletes Genbank(protein)-PDB relationships no longer in UniProt
  - --delete_old_pdbs = deletes PDB accessions in the local db not linked to any proteins
Retrieve the local db schema
- New command cw_get_db_schema added.
- Retrieves the SQLite schema of a local CAZyme database and prints it to the terminal
Added option to skip retrieving the latest taxonomic classifications NCBI taxonomies
- By default, when retreiving data from CAZy, cazy_webscraper retrieves the latest taxonomic classifications for proteins listed under multiple tax
- To increase scrapping time, and to reduce burden on the NCBI-Entrez server, if this data is not needed (e.g. GTDB taxs will be use) this step can be skipped by using the new --skip_ncbi_tax flag.
- When skipping retrieval of the latest taxa classifications from NCBI, cazy_webscraper will add the first taxa retrieved from CAZy for those proteins listed under mutliple taxa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.3.0

What's Changed

Contributors