assemblycomparator2

Assemblycomparator is a genomes-to-report pipeline. It is a bit like nullarbor, but it takes in genomes (assemblies) instead of reads.

It works by calling an alias that invokes the activation of a conda environment and subsequently calls a snakemake pipeline on the fasta-files in the current working directory of your terminal.

Assemblycomparator performs a palette of analyses on your genomes, and compares them. The main results from these analyses are summarized in a html-report that can be easily distributed.

Usage

Make a directory with the assembly-files you want to investigate with assemblycomparator2. Go into that directory in the terminal, and run the command assemblycomparator2_slurm or assemblycomparator2_local. assemblycomparator2 will then create a sub-directory containing a plethora of analyses.

Some useful commands

Execute a 'dry run'. That is, show the jobs which will run, without triggering the computation:

assemblycomparator2 -n
Simply, run assemblycomparator on the genomes in the current directory:

assemblycomparator2
If you're not sure your internet connection to the cluster will last for the full assemblycomparator2 run, put a & in the end.

assemblycomparator2 &

A bit more advanced controls

Execute all jobs up until (inclusive of) a specific job in the job graph:

assemblycomparator2 --until mlst
Select a specific MLST-scheme to use on all of the samples: (defaults to automatic)

assemblycomparator2 --config mlst_scheme=hpylori
Select a specific roary blastp-identity: (defaults to 95)

assemblycomparator2 --config roary_blastp_identity=90
Rerun a specific rule, (might be necessary if some parts of the report is missing):

assemblycomparator2 -R report

What analyses does it do?

For each assembly

any2fasta (wide input format support)
prokka (annotation)
kraken2 (species identification)
mlst (multi locus sequence typing)
abricate (virulence/resistance gene identification)
assembly-stats (generic assembly statistics)
clusterProfiler KEGG (pathway enrichment analysis)

For each group

roary (pan and core genome)
snp-dists (core genome pairwise snp-distances)
FastTree (phylogenetic tree of the core genome)
Mashtree (super fast distance measurement)
A nice report easy to share with your friends (demo)

Below is a snakemake exported directed graph of the rules involved:

Installation

Assemblycomparator2 needs Snakemake and the dependencies which can be needed for running on your specific setup. I.e. DRMAA for Slurm-mananged HPC's. You can either follow the official Snakemake instructions or use our guide below.

1. Preliminary Setup

We recommend that you use mamba instead of conda:
```
conda install -n base -c conda-forge mamba
```

Set the base directory for assemblycomparator2. You can change it to anything you'd like.

ASSCOM2_BASE=~/assemblycomparator2

mkdir -p $ASSCOM2_BASE
 
# And save it into your .bashrc
echo "export ASSCOM2_BASE=$ASSCOM2_BASE" >> ~/.bashrc

Clone the assemblycomparator2 GitHub-repository into that base

git clone https://github.com/cmkobel/assemblycomparator2.git $ASSCOM2_BASE

# Optionally use the git protocol:
# git clone [email protected]:cmkobel/assemblycomparator2.git $ASSCOM2_BASE

# Setup a asscom2 base environment which is used to call snakemake
cd $ASSCOM2_BASE && mamba env create -f environment.yaml

Set an alias that makes it easy to run assemblycomparator2 from anywhere in your filesystem
You have to decide whether you want to use Singularity (recommended if possible) or Conda for package management.

2. Install the alias

Select A or B depending on whether you want to install on a slurm-enabled HPC or a local system without slurm.

Option A) For HPCs with Slurm using Conda

# Main alias for running assemblycomparator2
echo "alias assemblycomparator2='conda run --live-stream --name assemblycomparator2 \
    snakemake --snakefile ${ASSCOM2_BASE}/snakefile \
        --profile ${ASSCOM2_BASE}/profile/slurm/ \
        --configfile ${ASSCOM2_BASE}/config.yaml'" >> ~/.bashrc

# Set the SNAKEMAKE_CONDA_PREFIX-variable, so the package installations can be reused between runs.
echo "export SNAKEMAKE_CONDA_PREFIX=${ASSCOM2_BASE}/conda_base" >> ~/.bashrc

Option B) For local setups using Conda

# Main alias for running assemblycomparator2
echo "alias assemblycomparator2='conda run --live-stream --name assemblycomparator2 \
    snakemake --snakefile ${ASSCOM2_BASE}/snakefile \
        --profile ${ASSCOM2_BASE}/profile/local/ \
        --configfile ${ASSCOM2_BASE}/config.yaml'" >> ~/.bashrc

# Set the SNAKEMAKE_CONDA_PREFIX-variable, so the package installations can be reused between runs.
echo "export SNAKEMAKE_CONDA_PREFIX=${ASSCOM2_BASE}/conda_base" >> ~/.bashrc

Setup of dasabases

Kraken2: If you already have a local copy of a kraken2 database, you can set the ASSCOM2_KRAKEN_DB system variable to its path.
GTDB-tk: Download the GTDB-tk database and set the GTDBTK_DATA_PATH variable to point to its directory.

Testing installation (optional)

assemblycomparator2 comes with a handful of E. faecium assemblies (illumina/skesa) which can be used to check that everything works as expected. In order to run this test, simply go into the location of these assemblies, and run the assemblycomparator2-command

cd ${ASSCOM2_BASE}/tests/E._faecium_plasmids
assemblycomparator2

If you encounter problems testing your installation, please refer to the issues tab of this repository.

Updating an existing installation (optional)

If you should -later down the line- wish to update the installation, run this command and you should be all set:

cd $ASSCOM2_BASE && git pull

# You might also want to update snakemake
conda env update --name assemblycomparator2 --file environment.yaml

# If you wish to update the job-environments, you can simply delete the contents of $SNAKEMAKE_CONDA_PREFIX
rm -r $SNAKEMAKE_CONDA_PREFIX/* 
# .. The environments will then be reinstalled from scratch next time you run assemblycomparator2

Note: If new databases have been added to kraken or mashscreen, you can rerun the above-mentioned set_up_*.sh-scripts.

Future functionality

In the future we might add some of the following pieces of software into assemblycomparator2.

Sample basis

Oriloc (Identify possible replication origins, and thereby help identify chromids)
RFplasmid (Identify plasmids using the pentamer-random-forest method)
Kaptive (surface polysaccharide loci for Klebsiella and Acinetobacter baumannii)
mash screen (recognition of plasmids-of-interest)

Batch basis

IQ-tree (phylogenetic tree of core genome with bootstrapping)
GC3-profiling ("fingerprinting" of the distribution of GC-content)
Identification of horizontally transferred genes?
panito (average nucleotide identity)
GenAPI (alternative to roary)

Development will continue.

Name		Name	Last commit message	Last commit date
Latest commit History 674 Commits
assets		assets
conda_definitions		conda_definitions
docker_imgs		docker_imgs
dockerfiles/tabseq		dockerfiles/tabseq
profile		profile
report_subpipeline		report_subpipeline
scripts		scripts
sync		sync
tests		tests
v1		v1
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
changelog.txt		changelog.txt
config.yaml		config.yaml
environment.yaml		environment.yaml
snakefile		snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

assemblycomparator2

Usage

Some useful commands

A bit more advanced controls

What analyses does it do?

For each assembly

For each group

Installation

1. Preliminary Setup

2. Install the alias

Option A) For HPCs with Slurm using Conda

Option B) For local setups using Conda

Setup of dasabases

Testing installation (optional)

Updating an existing installation (optional)

Future functionality

About

Releases

Packages

Languages

KMA-Aarhus/assemblycomparator2

Folders and files

Latest commit

History

Repository files navigation

assemblycomparator2

Usage

Some useful commands

A bit more advanced controls

What analyses does it do?

For each assembly

For each group

Installation

1. Preliminary Setup

2. Install the alias

Option A) For HPCs with Slurm using Conda

Option B) For local setups using Conda

Setup of dasabases

Testing installation (optional)

Updating an existing installation (optional)

Future functionality

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages