You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An input JSON file includes all genomic data files, parameters and metadata for running pipelines. Our pipeline will use default values if they are not defined in an input JSON file. We provide a set of template JSON files: minimum and full. We recommend to use a minimum template instead of full one. A full template includes all parameters of the pipeline with default values defined.
Please read through the following step-by-step instruction to compose a input JSON file.
Pipeline metadata
Parameter
Description
atac.title
Title for experiment which will be shown in a final HTML report
atac.description
Description for experiment which will be shown in a final HTML report
Pipeline parameters
Parameter
Default
Description
atac.pipeline_type
atac
atac for ATAC-seq or dnase for DNase-seq
atac.align_only
false
Peak calling and its downstream analyses will be disabled. Useful if you just want to map your FASTQs into filtered BAMs/TAG-ALIGNs and don't want to call peaks on them.
atac.true_rep_only
false
Disable pseudo replicate generation and all related analyses
Reference genome
All reference genome specific reference files/parameters can be defined in a single TSV file atac.genome_tsv. However, you can also individally define each file/parameter instead of a TSV file. If both a TSV file and individual parameters are defined, then individual parameters will override those defined in a TSV file. For example, if you define both atac.genome_tsv and atac.blacklist, then atac.blacklist will override that is defined in atac.genome_tsv. This is useful when you want to use your own for a specific parameter while keeping all the other parameters same as original.
Parameter
Type
Description
atac.genome_tsv
File
Choose one of the TSV files listed below or build your own
atac.ref_fa
File
Reference FASTA file
atac.bowtie2_idx_tar
File
Bowtie2 index TAR file (uncompressed) built from FASTA file
atac.chrsz
File
2-col chromosome sizes file built from FASTA file with faidx
atac.blacklist
File
3-col BED file. Peaks overlapping these regions will be filtered out
atac.gensz
String
MACS2's genome sizes (hs for human, mm for mouse or sum of 2nd col in chrsz)
Boolean to define endedness for ALL replicates. This will override per-replicate definition in atac.paired_ends
atac.paired_ends
Array of Boolean to define endedness for each replicate
Define atac.paired_end if all replicates in your dataset has the same endedness. You can also individually define endedness for each replicate. For example, rep1, rep2 are PE and rep3 is SE.
{"atac.paired_ends" : [true,true,false]}
Pipeline can start from any of the following data type (FASTQ, BAM, NODUP_BAM and TAG-ALIGN).
Parameter
Description
atac.fastqs_repX_R1
Array of R1 FASTQ files for replicate X. These files will be merged into one FASTQ file for rep X.
atac.fastqs_repX_R2
Array of R2 FASTQ files for replicate X. These files will be merged into one FASTQ file for rep X. Do not define for single ended dataset.
atac.bams
Array of BAM file for each replicate. (e.g. ["rep1.bam", "rep2.bam", ...])
atac.nodup_bams
Array of filtered/deduped BAM file for each replicate.
atac.tas
Array of TAG-ALIGN file for each replicate.
You can mix up different data types for individual replicate/control replicate. For example, pipeline can start from FASTQs for rep1 and rep3, BAMs for rep2, NODUP_BAMs for rep4 and TAG-ALIGNs for rep5. You can define similarly for control replicates.
If you choose to use auto-detection for adapters, then remove adapter arrays from input JSON. Otherwise define adapters for each FASTQ.
WARNING: Individually defined adapters arrays should have the same dimension as FASTQs.
Parameter
Description
atac.adapter
You can define an adapter sequence for ALL fastqs. If defined, this will override below adapter sequence definition for individual fastqs
atac.adapters_repX_R1
Array of adapter sequences for R1 FASTQs of replicate X
atac.adapters_repX_R2
Array of adapter sequences for R1 FASTQs of replicate X. Do not define it for singled-ended dataset
Optional adapter-trimming parameters
Parameter
Default
Description
atac.auto_detect_adapter
false
You can use auto-detection for adapters. List of adapters can be detected: AGATCGGAAGAGC (Illumina), CTGTCTCTTATA (Nextera) and TGGAATTCTCGG (smallRNA)
bowtie2 parameters for each read-endedness (paired-end). See bowtie2 --help for details.
atac.bowtie2_param_se
--local
bowtie2 parameters for each read-endedness (single-ended). See bowtie2 --help for details.
Optional filtering parameters
Parameter
Default
Description
atac.mapq_thresh
30
Threshold for mapped reads quality (samtools view -q)
atac.dup_marker
picard
Choose a dup marker between picard and sambamba. picard is recommended, use sambamba only when picard fails.
atac.no_dup_removal
false
Skip dup removal in a BAM filtering stage.
Optional subsampling parameters
Parameter
Default
Description
atac.subsample_reads
0
Subsample reads (0: no subsampling). Subsampled reads will be used for all downsteam analyses including peak-calling
atac.xcor_subsample_reads
15000000
Subsample reads for cross-corr. analysis only (0: no subsampling). Subsampled reads will be used for cross-corr. analysis only
Optional peak-calling parameters
Parameter
Default
Description
atac.cap_num_peak
500000
Cap number of peaks called from a peak-caller (MACS2)
atac.pval_thresh
0.01
P-value threshold for MACS2 (macs2 callpeak -p)
atac.enable_idr
true
Enable IDR (irreproducible discovery rate)
atac.idr_thresh
0.05
Threshold for IDR
Optional pipeline flags
Parameter
Default
Description
atac.enable_xcor
false
Enable cross-correlation analysis
atac.enable_count_signal_track
false
Enable count signal track generation
atac.keep_irregular_chr_in_bfilt_peak
false
Keep irregular chromosome names. Use this for custom genomes without canonical chromosome names (chr1, chrX, ...)
atac.disable_ataqc
false
Disable ATAqC (including all annotation-based analyses in it)
Other optional parameters
Parameter
Default
Description
atac.mito_chr_name
chrM
Name of mito chromosome. THIS IS NOT A REG-EX! you can define only one chromosome name for mito.
atac.regex_filter_reads
chrM
Regular expression to filter out reads with given chromosome name (1st column of BED/TAG-ALIGN). Any read with chr name that matches with this reg-ex pattern will be removed from outputs If your have changed the above parameter atac.mito_chr_name and still want to filter out mito reads then make sure that atac.mito_chr_name and atac.regex_filter_reads are the same
Resource parameters
WARNING: It is recommened not to change the following parameters unless you get resource-related errors for a certain task and you want to increase resources for such task. The following parameters are provided for users who want to run our pipeline with Caper's local on HPCs and 2).
Resources defined here are PER REPLICATE. Therefore, total number of cores will be approximately atac.bowtie2_cpu x NUMBER_OF_REPLICATES because bowtie2 is a bottlenecking task of the pipeline. Use this total number of cores if you manually qsuborsbatchyour job (using local mode of Caper).disks` is used for Google Cloud and DNAnexus only.