Skip to content

v2.0.0-beta.1

Compare
Choose a tag to compare
@adraismawur adraismawur released this 20 Dec 17:43
· 31 commits to master since this release

This version is still in pre-release and is undergoing testing.

The repository is also subject to change.

Major

BiG-SCAPE 2 Workflows.

The BiG-SCAPE logic has been separated into three workflows:

  • BiG-SCAPE Cluster: Most common use case, performs distance calculations and clustering of BGCs.
  • BiG-SCAPE Query: Uses the logic in BiG-SCAPE Cluster to return all BGCs that show similarity to a user defined Query BGC.
  • BiG-SCAPE Benchmark: Allows benchmarking of the clustering/GCF calling of a given run/set of runs against a user provided curated set of GCF assignments.

Record and edge types

  • Allows selection of any antiSMASH record type (region, candidate cluster, protocluster, protocore) as the working record to be used throughout the run.
  • Each record will be showcased as a node in the sequence similarity network, and, when relevant, topo-links/edges (dashed lines) are shown between records that originate from the same region.
  • Protoclusters/Protocores within interleaved and chemical hybrid candidate clusters are merged into one record.

BiG-SCAPE Classes & Weights

  • Added classification based on antiSMASH Class and Category, in addition to BiG-SCAPE 1’s legacy classification. BiG-SCAPE 2’s default is to classify based on antiSMASH Category.
  • BiG-SCAPE 1’s legacy classification into 8 groups remains compatible with antiSMASH versions up to v7.
  • BiG-SCAPE 1’s legacy weight distribution of each distance component can be paired with the legacy classification mode, as well as new antiSMASH-based classification (for antiSMASH versions v6 and up).

Input & References

  • A user-defined reference set of (antiSMASH processed) .gbks can be provided.
  • Using the --mibig-version [version_number] flag will (if necessary) download and use antiSMASH processed MIBiG versions 3.1 and up.
  • MIBiG .gbks are now available already processed by a custom antiSMASH version that ensures that MIBiG .gbks that would not trigger an antiSMASH rule are still processed, and BiG-SCAPE will download these directly from the https://dl.secondarymetabolites.org/mibig/ webpage.
  • Any custom antiSMASH processed set of MIBiG .gbk files can also be provided.
  • .gbks which were not processed by antiSMASH, but that do contain CDS and Sequence features can be provided by using the --force-gbk flag. Note: beta state, use with caution.

Output & Database usage

  • Replaced all intermediary files with an SQLite database.
  • An already populated SQLite database can be provided such that all relevant information, i.e. records and edges that are present in the input folder, can be reused.
  • Canceled runs retain data on disc to continue from on re-run.
  • Added an option to run entirely in-memory to reduce runtime.
  • Added a full .network file, which contains all calculated distances without any cutoffs applied. Distances = 1 are not included.
  • Removed independently generated arrower.py SVGs of each individual .gbk.
  • Removed Clans, a second layer of clustering to attempt to group families into clans.

Full Changelog: v1.1.9...v2.0.0-beta.1