NanoCLUST toolkit

Programs that enhance NanoCLUST V1.0dev usage and output.

Scripts included

NC_runner.py
NC_summarizer.py
NC_gridsearch.py

To be included

summary_to_phyloseq.py (create OTU and TAX files for phyloseq based on CSV summary)
phyloseqize.py (create OTU and TAX files for phyloseq based directly on NanoCLUST output (no summary needed) (combine with summary_to_phyloseq.py?))
NC_cluster_concat.py (concatenate cluster consensus sequences from multiple NanoCLUST runs into .fasta file)
concat_fastqgz.py (recursively concatenate .fastq.gz files in a directory into separate .fastq files (needed to run NanoCLUST) (combine with NC_runner.py?))

NC_runner.py

The NC_runner.py script streamlines the execution of NanoCLUST on multiple files within a directory using just one command.

This script is useful when dealing with multiple fastq files, when the approach of using wildcards to select multiple input files does not work.

Usage

Download the NC_runner.py script to a convenient location, preferably your home directory to minimize path-related errors.
(Optional) In the script's argparse default section, provide the absolute paths for your database and tax-database to reduce command length and prevent path-related errors.
(Optional) Include the path to the main.nf file from NanoCLUST in the argparse default section.
(Optional) Modify the default output directory location.
(Optional) Adjust the default suffix in the argparse section to a suffix you commonly use.
Execute the script!

If no input directory is specified or the input directory doesn't exist, the script will exit.

If the specified output directory doesn't exist, the script will notify you and create it.

Output

NanoCLUST's outputs are organized in the specified output directory or your current working directory by default. For each NanoCLUST run, a separate folder is created with the corresponding sample name. These folders contain three output directories (classification data, FastQC results, and pipeline info) generated by NanoCLUST.

Example commands

Basic command, only input directory specified (default settings):

python NC_runner.py sequencedata

Input and output directory, file suffix, main.nf path, database paths specified

python NC_runner.py sequencedata -o NanoCLUST_out -s .fastq.gz -n project1/programs/NanoCLUST/main.nf -d project1/db/16S_ribosomal_RNA -t project1/db/taxdb

Getting help

python NC_runner.py -h

NC_summarizer.py

The NC_summarizer.py script facilitates the concatenation of taxonomic classification results from multiple NanoCLUST runs into a single CSV file. Users can specify the taxonomic level from which the results should be concatenated.

This script is useful when comparing taxonomic classification results across multiple NanoCLUST runs. Barplots containing relative abundances across samples can be easily created based on the generated CSV.

Usage

Download the NC_summarizer.py script to a convenient location, preferably your home directory to minimize path-related errors.
(Optional) Modify the default output directory location in the argparse section of the script.
(Optional) Modify the default taxonomic level in the argparse section of the script.
Execute the script!

Output

The program generates a single CSV file with three columns: "runname," "taxid," and "rel_abundance." The "runname" column contains the NanoCLUST run name for all taxonomic IDs detected by NanoCLUST, listed in the "taxid" column. The "rel_abundance" column contains the relative abundance of the noted taxid for that run.

Example commands

Basic command, only input directory specified (default settings):

python NC_summarizer.py NanoCLUST_out

Input directory, output file location and taxonomic level specified

python NC_summarizer.py NanoCLUST_out -o NanoCLUST_out/NCsummary.csv -l species

Getting help

python NC_summarizer.py -h

NC_gridsearch.py

The NC_gridsearch.py script performs NanoCLUST runs with a set of combinations of UMAP set size and minimum cluster size values.

This script is useful when determining the optimal UMAP set size and minimum cluster size parameters for your data.

For more information on UMAP set size and minimum cluster size, refer to the NanoCLUST GitHub page.

Usage

Download the NC_gridsearch.py script to a convenient location, preferably your home directory to minimize path-related errors.
(Optional) Modify the default set of parameters to be tested in the argparse section of the script.
(Optional) Modify the default output directory location in the argparse section of the script to reduce command length and prevent path-related errors.
(Optional) In the script's argparse default section, provide the absolute paths for your database and tax-databas.
(Optional) Include the path to the main.nf file from NanoCLUST in the argparse default section.
Execute the script!

Output

NanoCLUST's outputs of all runs are organized in the specified output directory or your current working directory by default. For each NanoCLUST run, a separate folder is created with the UMAP set size and minimum cluster size used. These folders contain three output directories (classification data, FastQC results, and pipeline info) generated by NanoCLUST.

Example commands

Basic command, only input reads specified (default settings):

python NC_gridsearch.py reads.fastq

Input reads, output directory, parameters, database, taxdatabase, main.nf path specified

python NC_gridsearch.py reads.fastq -o gridsearch -p 100 120 140 160 180 200 -d databases/NanoCLUST/db -t databases/NanoCLUST/taxdb -n NanoCLUST/main.nf

Getting help

python NC_gridsearch.py -h

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
NC_2_phyloseq.py		NC_2_phyloseq.py
NC_gridsearch.py		NC_gridsearch.py
NC_runner.py		NC_runner.py
NC_summarizer.py		NC_summarizer.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoCLUST toolkit

Scripts included

NC_runner.py

Usage

Output

Example commands

NC_summarizer.py

Usage

Output

Example commands

NC_gridsearch.py

Usage

Output

Example commands

About

Releases

Packages

Languages

BirgitRijvers/NanoCLUST_toolkit

Folders and files

Latest commit

History

Repository files navigation

NanoCLUST toolkit

Scripts included

NC_runner.py

Usage

Output

Example commands

NC_summarizer.py

Usage

Output

Example commands

NC_gridsearch.py

Usage

Output

Example commands

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages