Skip to content

Commit

Permalink
Merge branch 'dev'
Browse files Browse the repository at this point in the history
  • Loading branch information
arnikz committed Jan 18, 2023
2 parents 21c6063 + 6ef4937 commit 950e04c
Show file tree
Hide file tree
Showing 34 changed files with 963 additions and 17,098 deletions.
57 changes: 57 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: CI
on:
push:
pull_request:
schedule:
- cron: "0 0 1 * *" # run monthly
jobs:
build:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
schedulers: ["gridengine", "slurm"]
env:
REGISTRY: ghcr.io
SCH: ${{ matrix.schedulers }}
IMAGE: ${{ github.repository_owner }}/sv-gen-${{ matrix.schedulers }}
TAG: "dev"
steps:
- name: Checkout repo
uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: "3.9"
- name: Python info
run: |
which python
python --version
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r test-requirements.txt
- name: Show pip list
run: pip list
- name: Run unit tests
run: |
pytest --cov=helper_functions --cov-report=xml
mv coverage.xml ${{ github.workspace }}
working-directory: workflow
- name: Log into registry
uses: docker/login-action@v1
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Run workflow
run: |
docker run -d -p 10000:22 --name $SCH ${REGISTRY}/${IMAGE,,}:${TAG}
sleep 10
docker ps -a
docker exec -u xenon -t $SCH bash -c "cd sv-gen && ./run.sh $SCH"
- name: Upload coverage report to Codacy
uses: codacy/codacy-coverage-reporter-action@master
with:
project-token: ${{ secrets.CODACY_PROJECT_TOKEN }}
coverage-reports: coverage.xml
4 changes: 4 additions & 0 deletions .snakemake-workflow-catalog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
usage:
software-stack-deployment:
conda: true
report: false
27 changes: 0 additions & 27 deletions .travis.yml

This file was deleted.

62 changes: 32 additions & 30 deletions .zenodo.json
Original file line number Diff line number Diff line change
@@ -1,33 +1,35 @@
{
"creators": [
{
"affiliation": "Netherlands eScience Center",
"name": "Kuzniar, Arnold",
"orcid": "0000-0003-1711-7961"
},
{
"affiliation": "University Medical Center Utrecht",
"name": "Santuari, Luca",
"orcid": "0000-0001-8784-2507"
}
],
"keywords": [
"bioinformatics",
"structural variants",
"cancer genomics",
"whole genome sequencing",
"workflow",
"simulation",
"high-performance computing",
"HPC",
"WGS",
"FASTA",
"BAM",
"VCF",
"BED"
],
"license": {
"id": "Apache-2.0"
"creators": [
{
"affiliation": "Netherlands eScience Center",
"name": "Kuzniar, Arnold",
"orcid": "0000-0003-1711-7961"
},
"title": "sv-gen"
{
"affiliation": "University Medical Center Utrecht",
"name": "Santuari, Luca",
"orcid": "0000-0001-8784-2507"
}
],
"keywords": [
"bioinformatics",
"structural variants",
"cancer genomics",
"whole genome sequencing",
"workflow",
"simulation",
"high-performance computing",
"HPC",
"WGS",
"FASTA",
"BAM",
"VCF",
"BED"
],
"license": {
"id": "Apache-2.0"
},
"publication_date": "2023-01-18",
"title": "sv-gen",
"version": "1.1.0"
}
6 changes: 3 additions & 3 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ authors:
orcid: "https://orcid.org/0000-0001-8784-2507"

cff-version: "1.0.3"
date-released: "2020-03-24"
doi: 10.5281/zenodo.3725664
date-released: 2023-01-18
doi: 10.5281/zenodo.3725663
keywords:
- "bioinformatics"
- "structural variants"
Expand All @@ -35,4 +35,4 @@ license: Apache-2.0
message: "If you use this software, please cite it using these metadata"
repository-code: "https://github.com/GooglingTheCancerGenome/sv-gen"
title: sv-gen
version: "1.0.0"
version: "1.1.0"
35 changes: 16 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# sv-gen

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3725664.svg)](https://doi.org/10.5281/zenodo.3725664)
[![Build Status](https://travis-ci.org/GooglingTheCancerGenome/sv-gen.svg?branch=master)](https://travis-ci.org/GooglingTheCancerGenome/sv-gen)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/7d9a698a93fa44ec8ad79b96842d48ee)](https://www.codacy.com/gh/GooglingTheCancerGenome/sv-gen?utm_source=github.com&utm_medium=referral&utm_content=GooglingTheCancerGenome/sv-gen&utm_campaign=Badge_Grade)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3725663.svg)](https://doi.org/10.5281/zenodo.3725663)
[![CI](https://github.com/GooglingTheCancerGenome/sv-gen/actions/workflows/ci.yaml/badge.svg?branch=master)](https://github.com/GooglingTheCancerGenome/sv-gen/actions/workflows/ci.yaml)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/7d9a698a93fa44ec8ad79b96842d48ee)](https://www.codacy.com/gh/GooglingTheCancerGenome/sv-gen/dashboard?utm_source=github.com&utm_medium=referral&utm_content=GooglingTheCancerGenome/sv-gen&utm_campaign=Badge_Grade)
[![Codacy Badge](https://app.codacy.com/project/badge/Coverage/7d9a698a93fa44ec8ad79b96842d48ee)](https://www.codacy.com/gh/GooglingTheCancerGenome/sv-gen/dashboard?utm_source=github.com&utm_medium=referral&utm_content=GooglingTheCancerGenome/sv-gen&utm_campaign=Badge_Coverage)

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases. _sv-gen_ is a Snakemake-based workflow to generate artificial short-read alignments based on a reference genome with(out) SVs. The workflow is easy to use and deploy on any Linux-based machine. In particular, the workflow supports automated software deployment, easy configuration and addition of new analysis tools as well as enables to scale from a single computer to different HPC clusters with minimal effort.

Expand All @@ -22,7 +23,7 @@ The workflow ([DAG](/doc/sv-gen.svg)) includes the following tools:
- [BWA](https://github.com/lh3/bwa)
- [Samtools](https://github.com/samtools/samtools)

The software dependencies and versions can be found in the conda `environment.yaml` files ([1](/environment.yaml), [2](/snakemake/environment.yaml)).
The software dependencies and versions can be found in the conda `environment.yaml` files ([1](/environment.yaml), [2](/workflow/environment.yaml)).

**1. Clone this repo.**

Expand All @@ -40,40 +41,36 @@ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O mi
bash miniconda.sh
# update Conda
conda update -y conda
# create & activate new env with installed deps
conda env create -n wf -f environment.yaml
# install Mamba
conda install -n base -c conda-forge -y mamba
# create a new environment with dependencies & activate it
mamba env create -n wf -f environment.yaml
conda activate wf
cd snakemake
```

**3. Configure the workflow.**

- **config files**:
- [`analysis.yaml`](/snakemake/analysis.yaml) - analysis-specific settings
- [`environment.yaml`](/snakemake/environment.yaml) - software dependencies and versions
- [`analysis.yaml`](/config/analysis.yaml) - analysis-specific settings
- [`environment.yaml`](/workflow/environment.yaml) - software dependencies and versions

**4. Execute the workflow.**

```bash
cd workflow
# 'dry' run only checks I/O files
snakemake -np

# run the workflow locally
snakemake --use-conda
snakemake --use-conda --cores
```

_Submit jobs to Grid Engine-based cluster_

```bash
snakemake --use-conda --latency-wait 30 --jobs \
--cluster 'xenon scheduler gridengine --location local:// submit --name smk.{rule} --inherit-env --max-run-time 5 --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log' &>smk.log&
```

_Submit jobs to Slurm-based cluster_
_Submit jobs to Slurm/GridEngine-based cluster_

```bash
SCH=slurm # or gridengine
snakemake --use-conda --latency-wait 30 --jobs \
--cluster 'xenon scheduler slurm --location local:// submit --name smk.{rule} --inherit-env --max-run-time 5 --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log' &>smk.log&
--cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --max-run-time 5 --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&
```

_Query job accounting information_
Expand Down
1 change: 1 addition & 0 deletions config/README.md
51 changes: 51 additions & 0 deletions config/analysis.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
threads: -1 # Samtools & BWA (default: -1 = set dynamically based on available cores)
memory: -1 # Samtools (default: -1 = set dynamically based on free memory per core [MB])
tmpspace: 0 # Samtools (default: 0 [MB])

# I/O files
input:
fasta: data/test.fasta # filepath of ref. genome (haploid)
seqids: [12, 22] # zero or more SeqIDs (e.g. chromosomes)

output:
basedir: data/out # relative or absolute path
genotype: # diploid genomes
- hmz # homozygous
- hmz-sv # homozygous with SVs
- htz-sv # heterozygous with SVs

# registered I/O file extensions
filext:
fasta: .fasta
fasta_idx:
- .fasta.ann # BWA v0.6.x index files
- .fasta.amb #
- .fasta.bwt #
- .fasta.pac #
- .fasta.sa #
fastq: .fq
bam: .bam
bam_idx: .bam.bai
bed: .bed
vcf: .vcf

simulation:
# SURVIVOR parameters
config: survivor.cfg
svtype:
dup: [0, 100, 10000] # duplication: [count, min_len, max_len]
inv: [0, 600, 800] # inversion: ""
tra: [10, 1000, 3000] # translocation: ""
indel: [10, 20, 500] # insertion+deletion: ""
invdel: [0, 600, 800] # inversion+deletion: ""
invdup: [0, 600, 800] # inversion+duplication: ""
# ART parameters
seed: 1000
profile: HSXt
coverage: [10, 30] # [cov1, cov2, ...]
read:
length: [150] # [len1, len2, ...]
insert:
stdev: [10] # standard deviation of the fragment length (bp)
length: [500] # [len1, len2, ...]
Loading

0 comments on commit 950e04c

Please sign in to comment.