-
Notifications
You must be signed in to change notification settings - Fork 27
parameters
Parameters
Type bigscape --help
to display a list of all available parameters. See also the following sections:
--config-file-path
Path to BiG-SCAPE config.yml
file, which stores values for a series of advanced use parameters.
Default: bundled big_scape/config.yml
.
-v, --verbose
Prints more detailed information of each step in the analysis, outputs all kinds of logs, including debugging log info, and writes to logfile. Toggle to activate.
--quiet
Don't print any log info to output, only write to logfile.
-l, --label
A run label to be added to the output results folder name, as well as dropdown menu in the visualization page. By default, BiG-SCAPE runs will have a name such as [label]_YYYY-MM-DD_HH-MM-SS
.
-c, --cores
Default: use all available cores
Set the max number of cores available. BiG-SCAPE will try to parallelize some steps in the analysis like domain prediction and distance calculation. Use this option to set the number of cores the script may use. If not specified, BiG-SCAPE will use all available cores.
-o , --output-dir
Output directory for all BiG-SCAPE results files. See more detail here.
--log-path
Default: output_dir/timestamp.log
.
Path to output log file.
--profiling
Run profiler and output profile report. Note: currently only available for Linux systems.
-i, --input-dir, --gbk_dir
Input directory containing .gbk
files to be used by BiG-SCAPE. See more detail here
--input-mode
Options: recursive, flat
.
Default: recursive
.
Tells BiG-SCAPE where to look for input GBK files:
recursive: search for .gbk
files recursively in the input directory.
flat: search for .gbk
files in the input directory only.
-m, --mibig-version
MIBiG release number (from 3.1 onwards). If not provided, MIBiG gene clusters will not be included in the analysis. If required, BiG-SCAPE will download the antiSMASH-processed MIBiG database to ./big_scape/MIBiG/mibig_antismash_<version>_gbk
.
For advanced users: any custom (antiSMASH-processed) MIBiG collection can be used as long as the expected folder is present, e.g. provide -m mymibig
with ./big_scape/MIBiG/mibig_antismash_mymibig_gbk
. For more detail, see here.
-r, --reference-dir
Path to directory containing user defined, non-MIBiG, antiSMASH processed reference gene clusters. For more detail, see here.
--include-gbk
Default: cluster, region
.
A comma separated list of strings. Only .gbk
files that have the string(s) in their filename will be used for the analysis. Use an asterisk to accept every file ( *
overrides --exclude_gbk_str
).
--exclude-gbk
Default: final
.
A comma separated list of strings. If any string in this list occurs in the .gbk
filename, this file will not be used for the analysis.
-p, --pfam-path
Path to Pfam database .hmm
file (e.g Pfam-A.hmm
). If the .hmm
file has already been pressed and the pressed files are included in the same folder as the Pfam .hmm
file, BiG-SCAPE will also use these pressed files. If this is not the case, BiG-SCAPE will run hmmpress
. Note: this requires the user to have write permissions to the given Pfam folder.
--domain-includelist-all-path
Path to .txt
file with phmm
domain accessions (commonly, Pfam accessions (e.g. PF00501)). Only regions containing all the listed accessions will be analyzed. In this file, each line contains a single phmm domain accession (with an optional comment, separated by a tab). Lines starting with #
are ignored. Domain accessions are case-sensitive. Cannot be provided in conjuction with --domain-includelist-any-path
.
--domain-includelist-any-path
Path to .txt
file with phmm domain accessions (commonly, Pfam accessions (e.g. PF00501)). Only BGCs containing any of the listed accessions will be analyzed. In this file, each line contains a single phmm domain accession (with an optional comment, separated by a tab). Lines starting with #
are ignored. Domain accessions are case-sensitive. Cannot be provided in conjuction with --domain-includelist-all-path
.
--legacy-weights
Use BiG-SCAPE 1 class-based weights in distance calculations. If not selected, the distance metric will be based on the mix
weights distribution. Warning: these weights have not been validated for record types other than region (see option --record_type
). See here for more detail.
--alignment-mode
Default: glocal
Options: global, glocal, local, auto
Alignment mode for each pair of gene clusters.
global
: the whole list of domains of each BGC record is compared;
local
: Seeds the subset of the domains used to calculate distance by trying to find the longest slice of common domain content (Longest Common Subcluster, LCS) between both records, then extends each side (see --extension_strategy
).
glocal
: Starts with performing local, but domain selection is then extended to the shortest upstream/downstream arms in a compared record pair.
auto
: use glocal when at least one of the BGCs in each pair has the contig_edge
annotation from antiSMASH v4+, otherwise use global mode on that pair.
For more detail see here.
--extend-strategy
Default: legacy
Options: legacy, greedy, simple match
Strategy to extend the BGC record pair comparable region. legacy
will use the original BiG-SCAPE extend strategy, while greedy
and simple match
are newly introduced in BiG-SCAPE 2. Legacy
and simple match
both examine the domain architecture of the record pair in order to find the most relevant extended borders. Greedy
is a very simple method that takes the coordinates of the outermost matching domains as the extended borders. For more detail see here.
--gcf-cutoffs
Default: 0.3
A comma separated list of floats. Generate networks using multiple distance cutoff values. Values should be in the range [0.0, 1.0]
. Example: --gcf_cutoffs 0.1,0.25,0.5,1.0
. For more detail see here
--profile-path
Default: output_dir/
.
Path to output profile file.
-db, --db-path
Default: output_dir/data_sqlite.db
.
Path to sqlite db output file.
--record-type
Options: region, cand_cluster, protocluster, proto_core
Default: region
Use a specific type of antiSMASH record for comparison. For every .gbk, BiG-SCAPE will try to extract the requested record type, if this is not present, BiG-SCAPE will try to extract the next higher level record type, i.e. if a proto_core
feature is not present, BiG-SCAPE will look for a protocluster
feature, and so on and so forth. The record type hierarchy is: region
> cand_cluster
> protocluster
> proto_core
. See more detail here.
--no-db-dump
Do not dump the sqlite database to disk until the very end of the run. This will speed up the run, but in case of a crashed run no info will be stored and you’ll have to re-start the run from scratch.
--db-only-output
Do not generate any output besides the data stored in the SQLite database. Suitable for advanced users that wish to only make use of the results stored in the SQLite database.
--no-trees
Do not generate any GCF newick trees. Suitable for users that do not utilize our output visualization, but only make use of the results stored in the output .tsv
files (which include the network files) and/or the SQLite database
.
--force-gbk
Recommended for advanced users only. Allows BiG-SCAPE to use non-antiSMASH processed .gbk
files. If GBK files are found without antiSMASH annotations (specifically, BiG-SCAPE checks for the absence of a antiSMASH version feature), BiG-SCAPE will still read and parse these files, and will create internal gbk
record objects, each of which will have a region
feature covering the full sequence length and a product feature other
. Warning: BiG-SCAPE still needs CDS features and a sequence feature to work with non-antiSMASH .gbks. Furthermore, --include-gbk
and --exclude-gbk
parameters might need to be adjusted if .gbk
file names also do not follow antiSMASH format. Disclaimer: this feature is still under development, use at own risk.
--mix
Calculate distances using a mix
bin, wherein no classification is applied. This will do an all-vs-all comparison of all input BGC records. This bin will use weights from the 'mix' weights distribution: {JC: 0.2, AI: 0.05, DSS: 0.75, Anchor boost: 2.0}
. For more detail, see here.
--classify
Options: none, class, category, legacy
Default: category
Define which method BiG-SCAPE should use to separate BGC records into analysis bins. --classify class
and --classify category
use antiSMASH/BGC classes (e.g. T2PKS) or categories (e.g. PKS) to run analyses on class/category-based bins, respectively.
--classify legacy
is based on BiG-SCAPE v1 predefined groups: PKS1, PKSOther, NRPS, NRPS-PKS-hybrid, RiPP, Saccharide, Terpene, Others, and will automatically use complementary --legacy-weights
. --classify legacy
is available for backwards compatibility with input .gbks
generated with antiSMASH versions up to version 7. For higher antiSMASH versions, use at your own risk, as BGC classes may have changed. All antiSMASH classes that this legacy mode does not recognise will be grouped in others
. To update the antiSMASH classes list yourself, see the config.yml file.
--classify class
and --classify category
can be used in combination with --legacy-weights
if input .gbks
have been generated by antiSMASH version 6 or higher. For older antiSMASH versions, either use --classify legacy
or do not select --legacy-weights
, which will perform the weighted distance calculations based on the generic 'mix' weights.
--hybrids-off
Toggle to add BGC records with hybrid predicted classes/categories to each subclass instead of a hybrid class/network (e.g. a terpene-nrps
BGC would be added to both the terpene
and NRPS
classes/networks instead of the terpene.nrps
network). Only works if any --classify
mode is selected.
--exclude-categories
A comma separated list of antiSMASH product categories. BGCs that have at least one of the product categories in this list will be excluded from the comparison (e.g. 'NRPS,PKS' will exclude all NRPS
or PKS
BGC records, even hybrids like NRPS-terpene
). Only available for .gbks
produced by antiSMASH version 6 or higher.
--include-categories
A comma separated list of antiSMASH product categories. Only BGCs that have at least one of the product categories in this list will be included in the comparison (e.g. NRPS
will include only NRPS
BGC records, including hybrids like NRPS-PKS
). Only available for .gbks
produced by antiSMASH version 6 or higher.
--exclude-classes
A comma separated list of antiSMASH product classes. BGC records that have at least one of the product classes in this list will be excluded from the comparison (e.g. 'T1PKS,T2PKS' will exclude all T1PKS
and T2PKS
BGC records, even hybrids like NRPS-T1PKS
).
--include-classes
A comma separated list of antiSMASH product classes. Only BGCs that have at least one of the product classes in this list will be included in the comparison (e.g. 'T1PKS' will include only T1PKS
BGC records, including hybrids like NRPS-T1PKS
).
--include-singletons
Include singletons in the network and all respective output. Reference singletons will not be included even if this is toggled.
-q --query-bgc-path
Path to query .gbk
file. BiG-SCAPE will compare all BGC records in the input and reference folders to the query record in a one-vs-all mode.
-n, --query-record-number
Query BGC record number. Used to select the specific record from the query BGC .gbk
, and is only relevant when running --record-type
cand_cluster
, protocluster
or proto_core
. Warning: if interleaved or chemical hybrid proto cluster/cores are merged (see config file), the relevant number is that of the first record of the merged cluster (the one with the lowest number). e.g. if records 1 and 2 get merged, the relevant number is 1.
--propagate
By default, BiG-SCAPE will only generate edges between the query and reference BGC records. With the propagate flag, BiG-SCAPE will go through multiple cycles of edge generation until no new reference BGCs are connected to the query connected component. For more detail see here
--classify
Options: none, class, category
Default: none
By default BiG-SCAPE will compare the query BGC record against any other supplied reference BGC records regardless of antiSMASH product class/category. Instead, select 'class' or 'category' to run analyses on one class-specific bin, in which case only reference BGC records with the same class/category as the query record will be compared. Can be used in combination with --legacy-weights
for .gbks
produced by antiSMASH version 6 or higher. For older antiSMASH versions or if --legacy-weights
is not selected, BiG-SCAPE will use the generic 'mix' weights: {JC: 0.2, AI: 0.05, DSS: 0.75, Anchor boost: 2.0}.
--GCF-assignment-file
Path to GCF assignments file. BiG-SCAPE will compare a run output to these assignments.
--BiG-dir
Path to BiG-SCAPE (v1/v2) or BiG-SLICE (v1/v2) output directory.