-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
324 improve the pipeline description #325
Merged
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
236dc12
WIP update README
mhkc b5edc6c
Added CONTRIBUTION info
mhkc 2345eee
WIP README
mhkc 997715f
fixed issue with getting the freebayes version
mhkc 321e690
Updated changelog
mhkc 1e506b0
Updated README
mhkc d0f12c6
Added documentation
mhkc 5f1eb0a
Added script for formatting version files to a csv
mhkc 005990d
Added readthedocs config
mhkc affdf4a
Merge branch 'master' into 324-improve-the-pipeline-description
mhkc 42702dd
added some info on how to report issues
mhkc bf843d3
Merge branch '324-improve-the-pipeline-description' of github.com:gen…
mhkc cc9339d
Merge branch '324-improve-the-pipeline-description' of github.com:gen…
mhkc 1998465
Updated path to config.py
mhkc 9635de0
removed some old installation instructions
mhkc 906a6fc
Updated some text
mhkc c95442c
Update docs/source/overview.md
mhkc 92f691c
Update docs/source/overview.md
mhkc 7e806b5
Fixed typo in filename
mhkc d85ee67
Updated installation instructions
mhkc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# .readthedocs.yaml | ||
# Read the Docs configuration file | ||
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details | ||
|
||
# Required | ||
version: 2 | ||
|
||
# Set the OS, Python version and other tools you might need | ||
build: | ||
os: ubuntu-22.04 | ||
tools: | ||
python: "3.12" | ||
# You can also specify other tool versions: | ||
# nodejs: "19" | ||
# rust: "1.64" | ||
# golang: "1.19" | ||
|
||
# Build documentation in the "docs/" directory with Sphinx | ||
sphinx: | ||
configuration: docs/source/conf.py | ||
|
||
# Optionally build your docs in additional formats such as PDF and ePub | ||
# formats: | ||
# - epub | ||
|
||
# Optional but recommended, declare the Python requirements required | ||
# to build your documentation | ||
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html | ||
python: | ||
install: | ||
- requirements: docs/requirements.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Contributing to Jasen | ||
|
||
This is a guide on how to contribute to Jasen in various ways. You can contribute to this project by reporting bugs, help implement new types of analysis, or by improving the pipeline documentation. | ||
|
||
## Documentation changes | ||
|
||
Propose changes to the documentation by opening a pull request. | ||
|
||
Edit the documentation by editing the source files in the [docs](docs/) directory. The [README](docs/source/README.md) describe how to build and preview the documentation on your local machine. | ||
|
||
## Bug reports | ||
|
||
Submitting a bug report is one of the simplest and one of the most useful way to help out the project. | ||
|
||
Report a bug by creating a new issue (use the *New issue* button) on the [issues page](https://github.com/genomic-medicine-sweden/jasen/issues). A good bug report should include a description of the error (with the error message) and steps on how to reproduce the error. | ||
|
||
## New feature, analysis, or tool | ||
|
||
We welcome suggestions on new types of analysis, bacterial species or other features to add to the pipeline. Before contributing these, please create a feature proposal on the [issues page](https://github.com/genomic-medicine-sweden/jasen/issues) so it can be discussed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,223 +9,32 @@ _Just Another System for Epityping NGS data_ | |
>[!WARNING] | ||
>**JASEN is in beta stage and the results are unverified. There is no guarantee that the pipeline can execute, output format consistency, or that it produces accurate results until there is an official 1.0 release.** | ||
|
||
Jasen produces results for epidemiological and surveillance purposes. | ||
Jasen has been developed for a small set of microbiota (primarily MRSA), but will likely work with any bacteria with a stable cgMLST scheme. | ||
Jasen produces results for antibiotic resistance and virulence prediction and epidemiological typing for surveillance purposes. The pipeline is developed in collaboration with several Swedish hospitals. The development was funded by [Genomic Medicine Sweden](https://genomicmedicine.se/). | ||
|
||
## Requirements | ||
|
||
* [Singularity](https://docs.sylabs.io/guides/3.0/user-guide/installation.html#install-on-windows-or-mac) | ||
* [JRE 8 - 21](https://www.java.com/en/download/manual.jsp) | ||
* Nextflow (`curl -s https://get.nextflow.io | bash`) | ||
|
||
### Recommended | ||
|
||
* Conda | ||
* Singularity Remote Login | ||
|
||
## Usage | ||
|
||
### Simple self-test | ||
|
||
``` | ||
nextflow run main.nf -profile staphylococcus_aureus -config configs/nextflow.base.config --csv assets/test_data/samplelist.csv | ||
``` | ||
|
||
#### Usage arguments | ||
|
||
| Argument type | Options | Required | | ||
| ------------- | -------------------------------------- | -------- | | ||
| -profile | **staphylococcus_aureus**, escherichia_coli, klebsiella_pneumoniae, mycobacterium_tuberculosis| True | | ||
| -config | **configs/nextflow.base.config**, configs/nextflow.dev.config, configs/nextflow.hopper.config, configs/nextflow.ngp.config| True | | ||
| -entry | bacterial_default | True | | ||
| --output | User specified directory | False | | ||
| -resume | Not applicable | False | | ||
|
||
|
||
### Input file format | ||
|
||
```csv | ||
id,platform,read1,read2 | ||
p1,illumina,assets/test_data/sequencing_data/saureus_10k/saureus_large_R1_001.fastq.gz,assets/test_data/sequencing_data/saureus_10k/saureus_large_R2_001.fastq.gz | ||
``` | ||
|
||
### Update databases | ||
|
||
#### Update MLST database | ||
|
||
``` | ||
bash /path/to/jasen/assets/mlst_db/update_mlst_db.sh | ||
``` | ||
The pipeline currently support a small set of microbiota and the support are in different stages of development. See the documentation of information on the supported analysis for each species and what the development status means. | ||
|
||
| Species | Development status | | ||
|------------------------------|--------------------| | ||
| *Staphylococcus arueus* | Draft | | ||
| *Escherichia coli* | Draft | | ||
| *Mycobacterium tuberculosis* | Draft | | ||
|
||
## Installation | ||
|
||
### Copy code locally | ||
|
||
``` | ||
git clone --recurse-submodules --single-branch --branch master https://github.com/genomic-medicine-sweden/jasen.git && cd jasen | ||
``` | ||
|
||
### Create singularity images. | ||
|
||
The containers will be attempted to be built and downloaded as part of | ||
the main Makefile (that is, when running `make install` in the main repo | ||
folder). | ||
|
||
``` | ||
cd container | ||
make | ||
``` | ||
|
||
|
||
### Download references and databases using singularity. | ||
|
||
First, make sure you stand in the `container` folder. Then run the `make` commands: | ||
|
||
``` | ||
cd .. | ||
make install | ||
make check | ||
``` | ||
|
||
Any errors produced during this step will hinder pipeline execution in | ||
unexpected ways. | ||
|
||
## Configuration | ||
|
||
### Nextflow configuration | ||
Source: `configs/nextflow.base.config` | ||
|
||
* Edit the `root` parameter in `configs/nextflow.base.config` | ||
* Edit the `krakenDb`, `workDir` and `outdir` parameters in `configs/nextflow.base.config` | ||
* Edit the `runOptions` in `configs/nextflow.base.config` in order to mount directories to your run | ||
|
||
When analysing Nanopore data: | ||
* Edit the `ext.args` for Flye: specify genome size for the organism of interest with flag `--genome-size` | ||
* Edit the `ext.seqmethod`for Flye depending on the input data | ||
* Edit the `ext.args` for Medaka: specify the model with flag `-m`. Currently it is set to `r941_min_sup_g507`, but one should always set it based on how the data was produced. More about choosing the right model can be found [here](https://github.com/nanoporetech/medaka#models). | ||
|
||
### Test data configuration | ||
Source: `assets/test_data/samplelist.csv` | ||
|
||
* Edit the read1 and read2 columns in `assets/test_data/samplelist.csv` | ||
|
||
### Temporary directories configuration | ||
Source: `~/.bashrc` | ||
|
||
* Add the export line to `~/.bashrc` | ||
* Change `SINGULARITY_TMPDIR` to `APPTAINER_TMPDIR` if you are using apptainer | ||
|
||
``` | ||
export SINGULARITY_TMPDIR="/tmp" #or equivalent filepath to tmp dir | ||
``` | ||
|
||
### Database configuration | ||
|
||
#### Kraken database configuration | ||
Choose between Kraken DB (64GB [Highly recommended]) or MiniKraken DB (8GB). | ||
Or customize [your own](https://benlangmead.github.io/aws-indexes/k2). | ||
|
||
##### Download standard Kraken database | ||
|
||
``` | ||
wget -O /path/to/kraken_db/krakenstd.tar.gz https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20230314.tar.gz | ||
tar -xf /path/to/kraken_db/krakenstd.tar.gz | ||
``` | ||
|
||
##### (Alternatively) Download miniKraken database | ||
|
||
``` | ||
wget -O /path/to/kraken_db/krakenmini.tar.gz https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20230314.tar.gz | ||
tar -xf /path/to/kraken_db/krakenmini.tar.gz | ||
``` | ||
|
||
#### Create TBProfiler database | ||
|
||
##### Install jasentool | ||
|
||
``` | ||
git clone [email protected]:ryanjameskennedy/jasentool.git && cd jasentool | ||
pip install . | ||
``` | ||
|
||
##### Create input csv that is used as tbdb input (composed of FoHM, WHO & tbdb variants) | ||
|
||
``` | ||
jasentool converge --output_dir /path/to/jasen/assets/tbdb | ||
``` | ||
|
||
##### Create tbdb (ensure tb-profiler is installed) | ||
|
||
``` | ||
cd /path/to/jasen/assets/tbdb | ||
tb-profiler create_db --prefix converged_who_fohm_tbdb | ||
tb-profiler load_library converged_who_fohm_tbdb | ||
``` | ||
|
||
##### Bgzip and index gms TBProfiler db | ||
|
||
``` | ||
bgzip -c converged_who_fohm_tbdb.bed > /path/to/jasen/assets/tbprofiler_dbs/bed/converged_who_fohm_tbdb.bed.gz | ||
tabix -p bed /path/to/jasen/assets/tbprofiler_dbs/bed/converged_who_fohm_tbdb.bed.gz | ||
``` | ||
|
||
|
||
## Component Breakdown | ||
|
||
### QC | ||
|
||
* [Kraken2](https://ccb.jhu.edu/software/kraken2/): Species detection. | ||
* [Bracken](https://ccb.jhu.edu/software/bracken/): Combined with Kraken2 for species detection. | ||
* [bwa mem](https://github.com/lh3/bwa): Maps reads to cgMLST loci (demarcated by bed file) in order to estimate genome coverage. Low levels of Intra-species contamination or erroneous mapping is removed using bwa and filtering away the heterozygous mapped bases. | ||
* [interquartile range](https://en.wikipedia.org/wiki/Interquartile_range): Calculates evenness of coverage. | ||
|
||
### Assembly | ||
|
||
* [SPAdes](http://cab.spbu.ru/software/spades/): De novo assembly for Ion Torrent. | ||
* [SKESA](https://www.ridom.de/seqsphere/ug/v60/SKESA_Assembler.html): De novo assembly for Illumina. | ||
* [QUAST](http://cab.spbu.ru/software/quast/): Extracts QC data (De novo assembly parameters) from the assembly. | ||
* [Flye](https://github.com/fenderglass/Flye/tree/flye): De novo assembly for Oxford Nanopore Technologies (ONT). | ||
* [Medaka](https://github.com/nanoporetech/medaka): Creates consensus sequences from ONT data. | ||
|
||
### Epidemiological typing | ||
|
||
* [chewBBACA](https://github.com/B-UMMI/chewBBACA/wiki): Calculates cgMLST of extracted alleles decided by schema. Number of missing loci is calculated and used as a QC parameter. | ||
* [cgmlst.net](https://www.cgmlst.org/ncs/schema/141106/): The cgMLST reference schema. | ||
* [mlst](https://github.com/tseemann/mlst): Caculates traditional 7-locus MLST. | ||
|
||
#### Supported profiles: | ||
|
||
* `staphylococcus_aureus` | ||
* `escherichia_coli` | ||
|
||
#### Future profiles that will be supported: | ||
|
||
* `klebsiella_pneumoniae` | ||
* `mycobacterium_tuberculosis` | ||
|
||
### Virulence and resistance markers | ||
See the documentation for installation instructions. | ||
|
||
* [resfinder](https://bitbucket.org/genomicepidemiology/resfinder/src/master/): Detects antimicrobial resistance genes as well as environmental and chemical resistance genes. | ||
* [pointfinder](https://bitbucket.org/genomicepidemiology/pointfinder/src/master/): Combines with resfinder to detect variants. | ||
* [virulencefinder](https://bitbucket.org/genomicepidemiology/virulencefinder/src/master/): Detects virulence genes. | ||
* [amrfinderplus](https://github.com/ncbi/amr/wiki/Running-AMRFinderPlus): Detects antimicrobial resistance genes as well as environmental, chemical resistance and virulence genes. | ||
* [resfinder_db](https://bitbucket.org/genomicepidemiology/resfinder_db/src/master/): Resfinder database. | ||
* [pointfinder_db](https://bitbucket.org/genomicepidemiology/pointfinder_db/src/master/): Pointfinder database. | ||
* [virulencefinder_db](https://bitbucket.org/genomicepidemiology/virulencefinder_db/src/master/): Virulencefinder database. | ||
### Tips | ||
|
||
### Relatedness | ||
* You can use [Bonsai](https://github.com/Clinical-Genomics-Lund/cgviz) to visualise jasen outputs. | ||
|
||
* [sourmash](https://github.com/sourmash-bio/sourmash): Determine relatedness between samples. | ||
## Documentation | ||
|
||
## Report and visualisation | ||
The documentation is abailable for the latest stable release. | ||
|
||
* [Bonsai](https://github.com/Clinical-Genomics-Lund/cgviz): Visualises jasen outputs. | ||
* [graptetree](https://github.com/achtman-lab/GrapeTree): Visualise phylogenetic relationship using cgmlst data. | ||
## Contributing | ||
|
||
## Frequent issues / Tips | ||
Contributions to the pipeline is more than welcome. Please use the [CONTRIBUTING](CONTRIBUTING.md) file for details. | ||
|
||
* Always run the latest versions of the bioinformatical software. | ||
* Verify you have execution permission for jasens `*.sif` images. | ||
* Old Singularity versions may sporadically produce the error `FATAL: could not open image jasen/container/*.sif: image format not recognized!` | ||
## License | ||
|
||
Jasen is released under the GPLv3 license. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
#!/usr/bin/env python | ||
"""Concatinate software versions.""" | ||
|
||
import click | ||
import yaml | ||
from yaml import Loader | ||
import pandas as pd | ||
from pathlib import Path | ||
|
||
|
||
def get_versions(version_obj: dict[str: dict]) -> dict[str, str]: | ||
workflow_name = list(version_obj.keys())[0].split(":")[-1] | ||
raw_softwares = list(version_obj.values())[0] | ||
# add workflow name to the list of all softwares | ||
softwares = {} | ||
for sw, version_info in raw_softwares.items(): | ||
version_info["workflow"] = workflow_name | ||
softwares[sw] = version_info | ||
# get container | ||
if "http" not in version_info["container"]: | ||
version_info["container"] = None | ||
return softwares | ||
|
||
|
||
@click.command() | ||
@click.option("-o", "--output", type=click.File("w"), help="Path to write output file to.") | ||
@click.argument("version_files", nargs=-1) | ||
def cli(output, version_files): | ||
"""Concatinate the versions of softwares.""" | ||
|
||
all_versions = {} | ||
for file in version_files: | ||
with open(file) as vfile: | ||
sw_version = get_versions(yaml.load(vfile, Loader=Loader)) | ||
# combine new sw versions with existing sw versions | ||
all_versions = {**all_versions, **sw_version} | ||
|
||
# convert version dict to csv tables | ||
df = (pd.DataFrame | ||
.from_dict(all_versions, orient="index") | ||
.drop("workflow", axis=1) | ||
.fillna("-") | ||
) | ||
df.index.name = "software" | ||
df.reset_index(inplace=True) | ||
df.sort_values("software", inplace=True) | ||
df.columns = [col.capitalize() for col in df.columns] | ||
# export to csv | ||
df.to_csv(output, index=False) | ||
click.secho(f"Wrote output file: {output.name}", fg="green") | ||
|
||
|
||
if __name__ == "__main__": | ||
cli() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = source | ||
BUILDDIR = build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you reckon this is true? Should we not say beta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact nomenclature doesnt matter to me. But I think draft is perhaps more expressive than beta. We also define the difference between the different statuses in the text as well.