Skip to content

Latest commit

 

History

History

exercise04

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Configuration of workflows

Goal: Extract sample information and the path to the hisat index from a configuration file in yaml. As in the other exercises, the starting file is Snakemake and the final product is Snakemake.finished.

Snakemake workflows can make use of configuration files in yaml or json format. The configuration file can be specified at the top level of the Snakefile. For example:

configfile: "config.yml"

rule all:
    input: expand("04count/{s}", s=samples)

or on the command line.

The config file is parsed and made available as a global dictionary named config. So, given the following config file

samples:
  ERR458502:
    gt: SNF2
    rep: 1
  ERR458509:
    gt: SNF2
    rep: 2
  ERR458516:
    gt: SNF2
    rep: 3
  ERR458495:
    gt: WT
    rep: 1
  ERR458880:
    gt: WT
    rep: 2
  ERR458887:
    gt: WT
    rep: 3
reference:
  ensembl_ver: 88
  genome_build: R64-1-1
  hisat_index: 00ref/hisat_index/R64-1-1
  genome_file: 00ref/R64-1-1.fa
  cdna_file: 00ref/R64-1-1.cdna_nc.fa

the config dict would look like this:

{'reference': {'cdna_file': '00ref/R64-1-1.cdna_nc.fa',
               'ensembl_ver': 88,
               'genome_build': 'R64-1-1',
               'genome_file': '00ref/R64-1-1.fa',
               'hisat_index': '00ref/hisat_index/R64-1-1'},
 'samples': {'ERR458495': {'gt': 'WT', 'rep': 1},
             'ERR458502': {'gt': 'SNF2', 'rep': 1},
             'ERR458509': {'gt': 'SNF2', 'rep': 2},
             'ERR458516': {'gt': 'SNF2', 'rep': 3},
             'ERR458880': {'gt': 'WT', 'rep': 2},
             'ERR458887': {'gt': 'WT', 'rep': 3}}}

The sample list can be extracted like so:

configfile: "config.yml"
samples = config["samples"].keys()

rule all:
    input: expand("04count/{s}", s=samples)

and the hisat index in the hisat rule like this:

rule hisat2:                                       
    input: fq = "00fastq/{sample}.fastq.gz",       
           idx = config["reference"]["hisat_index"]
    ...