v2.0.0-beta.1
adraismawur
released this
20 Dec 17:43
·
31 commits
to master
since this release
This version is still in pre-release and is undergoing testing.
The repository is also subject to change.
Major
BiG-SCAPE 2 Workflows.
The BiG-SCAPE logic has been separated into three workflows:
- BiG-SCAPE Cluster: Most common use case, performs distance calculations and clustering of BGCs.
- BiG-SCAPE Query: Uses the logic in BiG-SCAPE Cluster to return all BGCs that show similarity to a user defined Query BGC.
- BiG-SCAPE Benchmark: Allows benchmarking of the clustering/GCF calling of a given run/set of runs against a user provided curated set of GCF assignments.
Record and edge types
- Allows selection of any antiSMASH record type (region, candidate cluster, protocluster, protocore) as the working record to be used throughout the run.
- Each record will be showcased as a node in the sequence similarity network, and, when relevant, topo-links/edges (dashed lines) are shown between records that originate from the same region.
- Protoclusters/Protocores within interleaved and chemical hybrid candidate clusters are merged into one record.
BiG-SCAPE Classes & Weights
- Added classification based on antiSMASH Class and Category, in addition to BiG-SCAPE 1’s legacy classification. BiG-SCAPE 2’s default is to classify based on antiSMASH Category.
- BiG-SCAPE 1’s legacy classification into 8 groups remains compatible with antiSMASH versions up to v7.
- BiG-SCAPE 1’s legacy weight distribution of each distance component can be paired with the legacy classification mode, as well as new antiSMASH-based classification (for antiSMASH versions v6 and up).
Input & References
- A user-defined reference set of (antiSMASH processed) .gbks can be provided.
- Using the --mibig-version [version_number] flag will (if necessary) download and use antiSMASH processed MIBiG versions 3.1 and up.
- MIBiG .gbks are now available already processed by a custom antiSMASH version that ensures that MIBiG .gbks that would not trigger an antiSMASH rule are still processed, and BiG-SCAPE will download these directly from the https://dl.secondarymetabolites.org/mibig/ webpage.
- Any custom antiSMASH processed set of MIBiG .gbk files can also be provided.
- .gbks which were not processed by antiSMASH, but that do contain CDS and Sequence features can be provided by using the --force-gbk flag. Note: beta state, use with caution.
Output & Database usage
- Replaced all intermediary files with an SQLite database.
- An already populated SQLite database can be provided such that all relevant information, i.e. records and edges that are present in the input folder, can be reused.
- Canceled runs retain data on disc to continue from on re-run.
- Added an option to run entirely in-memory to reduce runtime.
- Added a full .network file, which contains all calculated distances without any cutoffs applied. Distances = 1 are not included.
- Removed independently generated arrower.py SVGs of each individual .gbk.
- Removed Clans, a second layer of clustering to attempt to group families into clans.
Full Changelog: v1.1.9...v2.0.0-beta.1