Skip to content

parameters

Catarina Loureiro edited this page Dec 13, 2024 · 2 revisions

Parameters

Type bigscape --help to display a list of all available parameters. See also the following sections:


Common to all modes

--config-file-path

Path to BiG-SCAPE config.yml file, which stores values for a series of advanced use parameters.

Default: bundled big_scape/config.yml.

-v, --verbose

Prints more detailed information of each step in the analysis, outputs all kinds of logs, including debugging log info, and writes to logfile. Toggle to activate.

--quiet

Don't print any log info to output, only write to logfile.

-l, --label

A run label to be added to the output results folder name, as well as dropdown menu in the visualization page. By default, BiG-SCAPE runs will have a name such as [label]_YYYY-MM-DD_HH-MM-SS.

-c, --cores

Default: use all available cores

Set the max number of cores available. BiG-SCAPE will try to parallelize some steps in the analysis like domain prediction and distance calculation. Use this option to set the number of cores the script may use. If not specified, BiG-SCAPE will use all available cores.

-o , --output-dir

Output directory for all BiG-SCAPE results files. See more detail here.

--log-path

Default: output_dir/timestamp.log.

Path to output log file.

Common to the Cluster & Query modes

--profiling

Run profiler and output profile report. Note: currently only available for Linux systems.

-i, --input-dir, --gbk_dir

Input directory containing .gbk files to be used by BiG-SCAPE. See more detail here

--input-mode

Options: recursive, flat.

Default: recursive.

Tells BiG-SCAPE where to look for input GBK files: recursive: search for .gbk files recursively in the input directory. flat: search for .gbk files in the input directory only.

-m, --mibig-version

MIBiG release number (from 3.1 onwards). If not provided, MIBiG gene clusters will not be included in the analysis. If required, BiG-SCAPE will download the antiSMASH-processed MIBiG database to ./big_scape/MIBiG/mibig_antismash_<version>_gbk. For advanced users: any custom (antiSMASH-processed) MIBiG collection can be used as long as the expected folder is present, e.g. provide -m mymibig with ./big_scape/MIBiG/mibig_antismash_mymibig_gbk. For more detail, see here.

-r, --reference-dir

Path to directory containing user defined, non-MIBiG, antiSMASH processed reference gene clusters. For more detail, see here.

--include-gbk

Default: cluster, region.

A comma separated list of strings. Only .gbk files that have the string(s) in their filename will be used for the analysis. Use an asterisk to accept every file ( * overrides --exclude_gbk_str).

--exclude-gbk

Default: final.

A comma separated list of strings. If any string in this list occurs in the .gbk filename, this file will not be used for the analysis.

-p, --pfam-path

Path to Pfam database .hmm file (e.g Pfam-A.hmm). If the .hmm file has already been pressed and the pressed files are included in the same folder as the Pfam .hmm file, BiG-SCAPE will also use these pressed files. If this is not the case, BiG-SCAPE will run hmmpress. Note: this requires the user to have write permissions to the given Pfam folder.

--domain-includelist-all-path

Path to .txt file with phmm domain accessions (commonly, Pfam accessions (e.g. PF00501)). Only regions containing all the listed accessions will be analyzed. In this file, each line contains a single phmm domain accession (with an optional comment, separated by a tab). Lines starting with # are ignored. Domain accessions are case-sensitive. Cannot be provided in conjuction with --domain-includelist-any-path.

--domain-includelist-any-path

Path to .txt file with phmm domain accessions (commonly, Pfam accessions (e.g. PF00501)). Only BGCs containing any of the listed accessions will be analyzed. In this file, each line contains a single phmm domain accession (with an optional comment, separated by a tab). Lines starting with # are ignored. Domain accessions are case-sensitive. Cannot be provided in conjuction with --domain-includelist-all-path.

--legacy-weights

Use BiG-SCAPE 1 class-based weights in distance calculations. If not selected, the distance metric will be based on the mix weights distribution. Warning: these weights have not been validated for record types other than region (see option --record_type). See here for more detail.

--alignment-mode

Default: glocal

Options: global, glocal, local, auto

Alignment mode for each pair of gene clusters. global: the whole list of domains of each BGC record is compared; local: Seeds the subset of the domains used to calculate distance by trying to find the longest slice of common domain content (Longest Common Subcluster, LCS) between both records, then extends each side (see --extension_strategy). glocal: Starts with performing local, but domain selection is then extended to the shortest upstream/downstream arms in a compared record pair. auto: use glocal when at least one of the BGCs in each pair has the contig_edge annotation from antiSMASH v4+, otherwise use global mode on that pair. For more detail see here.

--extend-strategy

Default: legacy

Options: legacy, greedy, simple match

Strategy to extend the BGC record pair comparable region. legacy will use the original BiG-SCAPE extend strategy, while greedy and simple match are newly introduced in BiG-SCAPE 2. Legacy and simple match both examine the domain architecture of the record pair in order to find the most relevant extended borders. Greedy is a very simple method that takes the coordinates of the outermost matching domains as the extended borders. For more detail see here.

--gcf-cutoffs

Default: 0.3

A comma separated list of floats. Generate networks using multiple distance cutoff values. Values should be in the range [0.0, 1.0]. Example: --gcf_cutoffs 0.1,0.25,0.5,1.0. For more detail see here

--profile-path

Default: output_dir/.

Path to output profile file.

-db, --db-path

Default: output_dir/data_sqlite.db.

Path to sqlite db output file.

--record-type

Options: region, cand_cluster, protocluster, proto_core

Default: region

Use a specific type of antiSMASH record for comparison. For every .gbk, BiG-SCAPE will try to extract the requested record type, if this is not present, BiG-SCAPE will try to extract the next higher level record type, i.e. if a proto_core feature is not present, BiG-SCAPE will look for a protocluster feature, and so on and so forth. The record type hierarchy is: region > cand_cluster > protocluster > proto_core. See more detail here.

--no-db-dump

Do not dump the sqlite database to disk until the very end of the run. This will speed up the run, but in case of a crashed run no info will be stored and you’ll have to re-start the run from scratch.

--db-only-output

Do not generate any output besides the data stored in the SQLite database. Suitable for advanced users that wish to only make use of the results stored in the SQLite database.

--no-trees

Do not generate any GCF newick trees. Suitable for users that do not utilize our output visualization, but only make use of the results stored in the output .tsv files (which include the network files) and/or the SQLite database.

--force-gbk

Recommended for advanced users only. Allows BiG-SCAPE to use non-antiSMASH processed .gbk files. If GBK files are found without antiSMASH annotations (specifically, BiG-SCAPE checks for the absence of a antiSMASH version feature), BiG-SCAPE will still read and parse these files, and will create internal gbk record objects, each of which will have a region feature covering the full sequence length and a product feature other. Warning: BiG-SCAPE still needs CDS features and a sequence feature to work with non-antiSMASH .gbks. Furthermore, --include-gbk and --exclude-gbk parameters might need to be adjusted if .gbk file names also do not follow antiSMASH format. Disclaimer: this feature is still under development, use at own risk.

Cluster mode specific parameters

--mix

Calculate distances using a mix bin, wherein no classification is applied. This will do an all-vs-all comparison of all input BGC records. This bin will use weights from the 'mix' weights distribution: {JC: 0.2, AI: 0.05, DSS: 0.75, Anchor boost: 2.0}. For more detail, see here.

--classify

Options: none, class, category, legacy

Default: category

Define which method BiG-SCAPE should use to separate BGC records into analysis bins. --classify class and --classify category use antiSMASH/BGC classes (e.g. T2PKS) or categories (e.g. PKS) to run analyses on class/category-based bins, respectively.

--classify legacy is based on BiG-SCAPE v1 predefined groups: PKS1, PKSOther, NRPS, NRPS-PKS-hybrid, RiPP, Saccharide, Terpene, Others, and will automatically use complementary --legacy-weights. --classify legacy is available for backwards compatibility with input .gbks generated with antiSMASH versions up to version 7. For higher antiSMASH versions, use at your own risk, as BGC classes may have changed. All antiSMASH classes that this legacy mode does not recognise will be grouped in others. To update the antiSMASH classes list yourself, see the config.yml file.

--classify class and --classify category can be used in combination with --legacy-weights if input .gbks have been generated by antiSMASH version 6 or higher. For older antiSMASH versions, either use --classify legacy or do not select --legacy-weights, which will perform the weighted distance calculations based on the generic 'mix' weights.

--hybrids-off

Toggle to add BGC records with hybrid predicted classes/categories to each subclass instead of a hybrid class/network (e.g. a terpene-nrps BGC would be added to both the terpene and NRPS classes/networks instead of the terpene.nrps network). Only works if any --classify mode is selected.

--exclude-categories

A comma separated list of antiSMASH product categories. BGCs that have at least one of the product categories in this list will be excluded from the comparison (e.g. 'NRPS,PKS' will exclude all NRPS or PKS BGC records, even hybrids like NRPS-terpene). Only available for .gbks produced by antiSMASH version 6 or higher.

--include-categories

A comma separated list of antiSMASH product categories. Only BGCs that have at least one of the product categories in this list will be included in the comparison (e.g. NRPS will include only NRPS BGC records, including hybrids like NRPS-PKS). Only available for .gbks produced by antiSMASH version 6 or higher.

--exclude-classes

A comma separated list of antiSMASH product classes. BGC records that have at least one of the product classes in this list will be excluded from the comparison (e.g. 'T1PKS,T2PKS' will exclude all T1PKS and T2PKS BGC records, even hybrids like NRPS-T1PKS).

--include-classes

A comma separated list of antiSMASH product classes. Only BGCs that have at least one of the product classes in this list will be included in the comparison (e.g. 'T1PKS' will include only T1PKS BGC records, including hybrids like NRPS-T1PKS).

--include-singletons

Include singletons in the network and all respective output. Reference singletons will not be included even if this is toggled.

Query mode specific parameters

-q --query-bgc-path

Path to query .gbk file. BiG-SCAPE will compare all BGC records in the input and reference folders to the query record in a one-vs-all mode.

-n, --query-record-number

Query BGC record number. Used to select the specific record from the query BGC .gbk, and is only relevant when running --record-type cand_cluster, protocluster or proto_core. Warning: if interleaved or chemical hybrid proto cluster/cores are merged (see config file), the relevant number is that of the first record of the merged cluster (the one with the lowest number). e.g. if records 1 and 2 get merged, the relevant number is 1.

--propagate

By default, BiG-SCAPE will only generate edges between the query and reference BGC records. With the propagate flag, BiG-SCAPE will go through multiple cycles of edge generation until no new reference BGCs are connected to the query connected component. For more detail see here

--classify

Options: none, class, category

Default: none

By default BiG-SCAPE will compare the query BGC record against any other supplied reference BGC records regardless of antiSMASH product class/category. Instead, select 'class' or 'category' to run analyses on one class-specific bin, in which case only reference BGC records with the same class/category as the query record will be compared. Can be used in combination with --legacy-weights for .gbks produced by antiSMASH version 6 or higher. For older antiSMASH versions or if --legacy-weights is not selected, BiG-SCAPE will use the generic 'mix' weights: {JC: 0.2, AI: 0.05, DSS: 0.75, Anchor boost: 2.0}.

Benchmark mode specific parameters

--GCF-assignment-file

Path to GCF assignments file. BiG-SCAPE will compare a run output to these assignments.

--BiG-dir

Path to BiG-SCAPE (v1/v2) or BiG-SLICE (v1/v2) output directory.