snakemake-class/exercise04 at main · NIH-HPC/snakemake-class

History

Name		Name	Last commit message	Last commit date
parent directory ..
myprofile		myprofile
README.md		README.md
Snakefile		Snakefile
Snakefile.finished		Snakefile.finished

README.md

Configuration of workflows

Goal: Extract sample information and the path to the hisat index from a configuration file in yaml. As in the other exercises, the starting file is Snakemake and the final product is Snakemake.finished.

Snakemake workflows can make use of configuration files in yaml or json format. The configuration file can be specified at the top level of the Snakefile. For example:

configfile: "config.yml"

rule all:
    input: expand("04count/{s}", s=samples)

or on the command line.

The config file is parsed and made available as a global dictionary named config. So, given the following config file

samples:
  ERR458502:
    gt: SNF2
    rep: 1
  ERR458509:
    gt: SNF2
    rep: 2
  ERR458516:
    gt: SNF2
    rep: 3
  ERR458495:
    gt: WT
    rep: 1
  ERR458880:
    gt: WT
    rep: 2
  ERR458887:
    gt: WT
    rep: 3
reference:
  ensembl_ver: 88
  genome_build: R64-1-1
  hisat_index: 00ref/hisat_index/R64-1-1
  genome_file: 00ref/R64-1-1.fa
  cdna_file: 00ref/R64-1-1.cdna_nc.fa

the config dict would look like this:

{'reference': {'cdna_file': '00ref/R64-1-1.cdna_nc.fa',
               'ensembl_ver': 88,
               'genome_build': 'R64-1-1',
               'genome_file': '00ref/R64-1-1.fa',
               'hisat_index': '00ref/hisat_index/R64-1-1'},
 'samples': {'ERR458495': {'gt': 'WT', 'rep': 1},
             'ERR458502': {'gt': 'SNF2', 'rep': 1},
             'ERR458509': {'gt': 'SNF2', 'rep': 2},
             'ERR458516': {'gt': 'SNF2', 'rep': 3},
             'ERR458880': {'gt': 'WT', 'rep': 2},
             'ERR458887': {'gt': 'WT', 'rep': 3}}}

The sample list can be extracted like so:

configfile: "config.yml"
samples = config["samples"].keys()

rule all:
    input: expand("04count/{s}", s=samples)

and the hisat index in the hisat rule like this:

rule hisat2:                                       
    input: fq = "00fastq/{sample}.fastq.gz",       
           idx = config["reference"]["hisat_index"]
    ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exercise04

exercise04

README.md

Configuration of workflows

Files

exercise04

Directory actions

More options

Directory actions

More options

Latest commit

History

exercise04

Folders and files

parent directory

README.md

Configuration of workflows