- Author(s): Alejandra Hernández Segura and Varisha Ganesh
- Organization: Rijksinstituut voor Volksgezondheid en Milieu (RIVM)
- Department: Infektieziekteonderzoek, Diagnostiek en Laboratorium Surveillance (IDS), Bacteriologie (BPD)
- Start date: 17 - 06 - 2021
- Commissioned by: Antoni Hendrickx and Varisha Ganesh
This is a short pipeline that takes (multi) fasta files as input containing one or more DNA-sequences. These sequences are then BLASTed against a local copy of the 'nt' database from BLAST. Before running BLAST, the pipeline will download/update the database if necessary so that the last version of it is used.
- Linux + conda A Linux-like environment with at least 'miniconda' installed.
- Python3.7.6 .
- Clone the repository:
git clone https://github.com/RIVM-bioinformatics/juno-blast.git
Alternatively, you can download it manually as a zip file (you will need to unzip it then).
- Enter the directory with the pipeline and install the master environment:
cd juno-blast
conda env create -f envs/master_env.yaml
-h, --help
Shows the help of the pipeline
-i, --input
Directory with the input (fasta) files. The fasta files should be all in this directory (no subdirectories) and have the extension '.fasta'.
-o, --output
Directory (if not existing it will be created) where the output of the pipeline will be collected. The default behavior is to create a folder called 'output' within the pipeline directory.-d, --db_dir
Directory (if not existing it will be created) where the databases used by this pipeline will be downloaded or where they are expected to be present. Default is '/mnt/db/juno/Jovian/NT_database' (RIVM path to the databases of the Juno pipelines). It is advisable to provide your own path if you are not working inside the RIVM Linux environment or if the pipeline were missing.-e, --evalue
Numeric value used as threshold for the e-value in BLAST. The e-value is the number of expected hits of similar quality (score) that could be found just by chance. Default is 1e-10.-mh, --max-hsps
Integer value used as threshold for the max_hsps parameter in BLAST. The max_hsps is the maximum number of HSPs (alignments) to keep for any single query-subject pair .Default is 10.-cl, --culling-limit
Integer value used as threshold for the culling_limit parameter in BLAST. The culling_limit deletes hits that are enveloped by at least this many higher-scoring hits. Default is 10.-c, --cores
Maximum number of cores to be used to run the pipeline. Defaults to 300 (it assumes you work in an HPC cluster).-l, --local
If this flag is present, the pipeline will be run locally (not attempting to send the jobs to a cluster). Keep in mind that if you use this flag, you also need to adjust the number of cores (for instance, to 2) to avoid crashes. The default is to assume that you are working on a cluster because the pipeline was developed in an environment where it is the case.-q, --queue
If you are running the pipeline in a cluster, you need to provide the name of the queue. It defaults to 'bio' (default queue at the RIVM).-n, --dryrun
,-u, --unlock
and--rerunincomplete
are all parameters passed to Snakemake. If you want the explanation of these parameters, please refer to the Snakemake documentation.
python juno_blast.py -i [dir/to/input_directory]
python juno_blast.py -i my_input_files -o my_results --db_dir my_db_dir --local --cores 2
- log: Log files with output and error files from each Snakemake rule/step that is performed.
- output One output file (.asn extension) per sample will be created containing the BLAST results.
- All default values have been chosen to work with the RIVM Linux environment, therefore, there might not be applicable to other environments (although they should work with the appropriate arguments/parameters).
- Any issue can be reported in the Issues section of this repository.
- Suggestions welcome at the Issues section of this repository.
This pipeline is licensed with an AGPL3 license. Detailed information can be found inside the 'LICENSE' file in this repository.
- Contact person: Alejandra Hernández Segura
- Email [email protected]