Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through screen
/ tmux
or similar tool. Alternatively, you can run nextflow within a cluster job submitted in your job scheduler.
It is recommended to limit the Nextflow Java virtual machines memory. We recommend adding the following line to your environment (typically in ~/.bashrc
or ~./bash_profile
):
NXF_OPTS='-Xms1g -Xmx4g'
The typical command for running the pipeline on a sample is as follows:
nextflow run main.nf -name 'example_run' --reads '/path_to_sample_reads/*.fastq' -profile docker
This will launch the pipeline with the docker
configuration profile. See below for more information about profiles.
Note that the pipeline will create the following files in your working directory:
work # Directory containing the nextflow working files
results # Finished results (configurable, see below)
.nextflow_log # Log file from Nextflow
# Other nextflow hidden files, eg. history of pipeline runs and old logs.
Use a name to identify all the samples/barcodes contained in the pipeline run. This name will be used by the web application to better group samples that were analysed with the same configuration parameters in the same run.
Use this parameter to choose a configuration profile. Profiles can give configuration and parameters presets for different compute environments or use cases.
We strongly recommend using profile configuration files for specifying the pipeline parameters and options for the classification steps, instead of writing the whole set of arguments in the command line. Profiles aim to simplify the pipeline execution command and provide an easy way to keep track of parametrization of past runs.
Apart from NanoRtax parameters, profile configuration files can be used to tune performance options such as the cpu threads an memory limits for each process. Check the Nextflow documentation for more information.
Several generic profiles are bundled with the pipeline and instruct it to use software packages using different methods (Docker, Conda). A "test" profile is also included with preloaded parameters for running a testing execution of NanoRTax
A good starting point for creating your own profile could be copying the content of the included default profile (./conf/default.config) and editing it with the new parameters of choice.
We encourage the use of Docker, however when this is not possible, Conda is also supported.
Note that multiple profiles can be loaded, for example: -profile test,docker
- the order of arguments is important!
They are loaded in sequence, so later profiles can overwrite earlier profiles.
If -profile
is not specified, the pipeline will run locally and expect all software to be installed and available on the PATH
. This is not recommended.
docker
- A generic configuration profile to be used with Docker
- Pulls software from dockerhub:
hecrp/nanortax
conda
test
- A profile with a complete configuration for automated testing
- Includes links to test data so needs no other parameters
default
- A profile with a complete configuration for automated testing using all classifiers and default parameters
- Edit this file to quickly set up your own configuration for the classification workflow
Use this to specify the location of your input FastQ file(s). For example:
--reads '/seq_path/fastq_pass/**/*.fastq'
--reads_rt '/seq_path/fastq_pass/**/*.fastq'
IMPORTANT: --reads_rt parameter is used for real-time workflows and provide automatic processing of newly generated read files in the specified directory. When using this mode, the pipeline must be terminated manually (using Ctrl+C in the command line) once all the read files have been generated and fully processed.
Please note the following requirements:
- The path must be enclosed in quotes
- The path may have
*
wildcard characters for selecting several directories/read files
If left unspecified, NanoRTax will load the testing dataset (check conf/test.config)
Use these flags in the command line or set true/false in the configuration profile for enabling or disabling an specific classifier for the analysis.
Important: When facing a complete run analysis or a real-time execution, make sure that you have enough computing resources when enabling the BLAST classifier in the analysis.
The following parameters define the databases used for classification. Make sure to download your preferred ones or check README.md for example 16S databases download command
Kraken database path.
Centrifuge database path.
BLAST database path.
BLAST taxdb path. This database is important for retrieving original taxa names from BLAST classification outputs.
Parameter description extracted directly from official documentation:
https://www.ncbi.nlm.nih.gov/books/NBK279684/
https://ccb.jhu.edu/software/centrifuge/manual.shtml
Minimum length of partial hits, which must be greater than 15.
Expect value (E) for saving hits.
Maximum number of HSPs (alignments) to keep for any single query-subject pair. The HSPs shown will be the best as judged by expect value. This number should be an integer that is one or greater. If this option is not set, BLAST shows all HSPs meeting the expect value criteria. Setting it to one will show only the best HSP for every query-subject pair.
Read length thresholds for the QC step. Default (1400-1700) selects near full-length 16S rRNA reads for more accurate taxonomic classification.
The output directory where the results will be saved.
Add this flag to the command to continue with a stopped run.