-
Notifications
You must be signed in to change notification settings - Fork 27
parameters
Type python bigscape.py -h
to display a list of all available parameters. See also the following sections:
Type -h
or --help
to display all current available options
--label <string>
By default, BiG-SCAPE runs will have a name such as YYYY-MM-DD_HH-MM-SS_[extra]
where the extra string is hybrids
(if activated) and the mode (i.e. glocal
). With the --label
option, it is possible to add an extra string to the run name. This will be reflected in the dropdown menu in the visualization page.
-i INPUTDIR, --inputdir INPUTDIR
Specify a path with the starting point to look for .gbk
files. If empty, the search will start where the BiG-SCAPE files are located. The search is recursive. See more information here
-o OUTPUTDIR, --outputdir OUTPUTDIR
Output directory, this will contain all output data files. See its structure and more details about each type of result here
--include_gbk_str INCLUDE_GBK_STR
If any string in this list occurs in the filename, this file will be included in the analysis. (default: cluster, region
)
--exclude_gbk_str EXCLUDE_GBK_STR
If any string in this list occurs in the filename, this file will not be used for the analysis. (default: final
)
--pfam_dir PFAM_DIR
Location of Pfam files. Default is same location of BiG-SCAPE. See how to prepare these files in the installation instructions
-c CORES, --cores CORES
BiG-SCAPE will try to paralellize some steps in the analysis like domain prediction and distance calculation. Use this option to set the number of cores the script may use. If not specified, BiG-SCAPE will use all available cores.
-v, --verbose
Prints more detailed information of each step in the analysis. Toggle to activate. Because of the amount of information, it might be a good idea to redirect output to to a file e.g.:
$> python bigscape.py <options> --verbose > run.log &
--include_singletons
Toggle to activate. This will include BGCs that don't have a distance lower than the cutoff distance specified.
-d DOMAIN_OVERLAP_CUTOFF, --domain_overlap_cutoff DOMAIN_OVERLAP_CUTOFF
Specify at which overlap percentage domains are considered to overlap. Domain with the best score is kept (default=0.1). See also [domain prediction](domain prediction).
-m MIN_BGC_SIZE, --min_bgc_size MIN_BGC_SIZE
Provide the minimum size of a BGC to be included in the analysis. Default is 0 base pairs. This includes the sum of all loci in a multi-record GenBank file.
--mix
By default, BiG-SCAPE separates the analysis according to the BGC product and will create network directories for each class (see [BiG-SCAPE classes](BiG-SCAPE classes)). Toggle to include an analysis mixing all classes. As BiG-SCAPE needs to calculate an all-vs-all distance network, this might use a lot of memory.
--no_classify
By default, BiG-SCAPE classifies the output files analysis based on the BGC product. Toggle to deactivate (note that if the --mix
parameter is not activated, BiG-SCAPE will not create any network file but all intermetiate files will be processed)
--banned_classes {PKSI, PKSother, NRPS, RiPPs, Saccharides, Terpene, PKS-NRP_Hybrids, Others}
BiG-SCAPE Classes that should NOT be included in the classification. E.g. "--banned_classes PKSI PKSOther
". Strings in lowercase are also allowed.
--cutoffs {0.0-1.0}
Generate networks using multiple raw distance cutoff values, example: --cutoffs 0.1, 0.25, 0.5, 1.0
. Default: c=0.3
. For every cutoff value, a different network file will be generated. Regarding the interactive visualization, only the highest cutoff will be shown. Automatic clustering of Gene Cluster Families will be done using each cutoff.
--clans-off
By default, BiG-SCAPE will perform a second layer of clustering to group GCFs into GCCs. Toggle to deactivate this.
--clans_cutoff {0.0-1.0} {0.0-1.0}
Cutoff Parameters for which clustering families into clans will be performed in raw distance. First value is the cutoff used for finding GCFs that will be used for Clan calling (default: 0.3). If this GCF cutoff value
is not included within --cutoffs
, it will be added automatically. Second value is the GCC cutoff value
for clustering families into clans (default: 0.7). Average distance between members of each pair of GCFs are used as inter-GCF distance. Every pair of GCFs linked with a distance of GCC cutoff value
or less will be taken into account. Example: --clan_cutoff 0.5 0.8
)
Learn more about [GCFs and GCCs](GCFs and GCCs).
--hybrids-off
By default, BGCs with hybrid predicted products from the PKS/NRPS Hybrids and Others classes will be included into each subclass (e.g. a terpene-nrps
BGC that will usually be classified in Others would be added to both the Terpene and NRPS classes). This means that the same cluster may appear in different classes. Toggle to deactivate
--mode {global,glocal,auto}
Alignment mode used when comparing each pair of gene clusters. global
: the whole list of domains of each BGC are compared; glocal
(default): Longest Common Subcluster mode. In it, the subset of the domains used to calculate distance is redefined by finding the longest slice of common domain content per gene in both BGCs, then expands each slice. auto
: use glocal mode when at least one of the BGCs in each pair has the contig_edge
annotation from antiSMASH v4+, otherwise use global mode on that pair. Learn more about the alignment modes here
--anchorfile ANCHORFILE
Point to a custom anchor file. Default is anchor_domains.txt
, included in with the repository. Learn more about the anchor file [here](anchor file).
--force_hmmscan
Force domain prediction using hmmscan
even if BiG-SCAPE finds processed domtable files (e.g. to use a new version of the Pfam database).
--skip_ma
Skip multiple alignment of domains' sequences. Use if alignments have been generated in a previous run. Domain sequence alignment will also be skipped if BiG-SCAPE reutilizes an output directory and no new BGCs are found within the input folder
--mibig, --mibig14, --mibig13
Use included BGCs from the MIBiG database. Currently, versions 2.1 (--mibig
), 1.4 (--mibig14
) and 1.3 (--mibig13
) of the database are included in the BiG-SCAPE project as a compressed file, which will be unzipped the first time these options are used.
Note that this sets are different from the bundle found in the downloads section of the MIBiG site; these GenBank files have been processed by antiSMASH to annotate the BGC type. Additionally, the latest version is an unofficial version of the current content of MIBiG 'repository' page (i.e. it contains a few more BGCs than the official 2.0 bundle)
--query_bgc QUERY_BGC
Instead of making an all-VS-all comparison of all the input BGCs, choose one BGC to compare with the rest of the set (one-VS-all). The query BGC does not have to be within inputdir. The distances that will be used for the GCF and GCC analysis are all that are equal or lower than the maximum cutoff value. This will only take into account the BiG-SCAPE class(es) that the Query BGC belongs to.
--domain_includelist
Only include BGCs which include (any) domains with the Pfam accessions found in the domain_includelist.txt
file. In this file, each line contains a single Pfam accession (with an optional comment, separated by a tab). Lines starting with "#" are ignored. See the file for an example using the P450 domain. Pfam accessions are case-sensitive.
--version
Show program's version number and exit