-
Notifications
You must be signed in to change notification settings - Fork 27
Home
BiG-SCAPE 2 can be run in 3 workflows: Cluster, Query, and Benchmark.
bigscape cluster
performs clustering of BGCs into GCFs. This is the equivalent of running BiG-SCAPE 1’s
bigscape.py. With bigscape query
you can search for BGCs that show similarity to a user provided query
BGC/.gbk, and bigscape benchmark
compares the results of a BiG-SCAPE 2 Cluster mode run, BiG-SCAPE 1
run or BiG-SLiCE run against a user-provided set
of BGC <-> GCF assignments.
BiG-SCAPE Cluster reads BGC information stored in antiSMASH
processed GenBank files, and uses a phmm(profile hidden Markov Models) database (commonly, Pfam)
and hmmscan
from the HMMER suite to predict protein protein domains in each sequence,
thus summarizing each BGC as a linear set of protein domains. BGCs can be grouped in bins by their given
antiSMASH class/category,
or grouped into a single mixed bin (by using --mix). For every pair of BGCs in the bin, the pair is first
aligned (based on a user-defined alignment mode) which allows the region to compare
to be defined. Then, the pairwise distance between the pair of BGCs is calculated as the
weighted combination of the Jaccard, Adjacency Index (AI) and Domain Sequence Similarity (DSS) indices
(more detail here). A cutoff is then applied to these distance
values, and clustering with Affinity propagation is
applied to generate Gene Cluster Families (GCFs). BiG-SCAPE 2 also allows more than one cutoff to be provided
per run, in which case a set of GCFs will be generated for each cutoff.
Learn more about the BiG-SCAPE modes and options with python bigscape.py --help
, the tutorials or by reading through this wiki. We suggest going through this wiki in the following order: