The segmentation script WGBS_Seq_Pipeline.sh is a wrapper for MethylSeekR tool MethylSeekR, which does genome wide DNA-methylation segmentation. The inputs of this script are *bed(gz) of methylation and snp calls generated by MCSvX bisulfite DEEP pipeline DOI: 10.17617/1.2W. It generates a segmentation files (.bed) that can be visualized in a genome browser in addition to a friendly HTML report. UMRs are CpG dense regions with low methylation level(<m; 'm' is 50% by default) while LMRs are CpG poor regions with low methylation level(<m) and number of CpGs n<30, where 'n' is calculated at a given FDR.
bash WGBS_Seq_Pipeline.sh -i input.bed -s SNP.vcf [-g genomeLength] [-n SampleName] [-o output folder] [-l number of cores] [-m methylation cut-off] [-f FDR cut-off] [-c minCover]
-
Mandatory:
- -i input BED file (chr,start,end,meth.Level,total_reads,strand), it could be gzipped
- -s SNP file(vcf), it could be gzipped
-
Optional:
- -g genome length (Shortcuts: hg19(Default), hg38 or mm10)
- -n Sample name
- -o output folder
- -l number of cores (Default: 2 cores)
- -m methylation level cut-off(Default 0.5 )
- -f FDR cut-off(Default 5%)
- -c minimum coverage for calling UMRs/LMRs (Default: 5)
Caution: you have to change/provide the following:
- temporary directory where the calculations are done TMPDIR
- give the path of Rscript and bedtools in WGBS_Seg_Pipeline
- give the genome length and the corresponding gaps file paths in WGBS_Seg_Pipeline.sh
ChromH3M (abbreviation for ChromHMM-meta segmentation) is an easy and straight-forward workflow to measure the similarity of PMDs/LMRs/UMRs produced by MethylSeekR across many samples. It takes segment files produced by MethylSeekR segmentation wrapper script as input and binarizes the binned genome accordingly across all samples. ChromHMM is applied to this binarized signal with different number of states defined by the user. The emission probabilities are then hierarchically clustered and annotations are added to the heatmap based on a samplesheet provided by the user.
bash ChromH3M.sh -i dir -g genome -n name -o output folder -a min -b max -s sample sheet
- Mandatory:
- -i bed files directory (give the full directory!)
- -g genome length (Shortcuts: hg19 or mm10)
- -n output name
- -o output folder name (full directory!)
- -s sample annotations sheet (full directory!)
- -a minimum number of ChromHMM states
- -b maximum number of ChromHMM states
Caution: you have to change/provide the following:
- temporary directory where the calculations are done TMPDIR
- give the path of ChromHMM.jar and Rscript in ChromH3M.sh
- give the genome length and the corresponding gaps file paths in meth_avg.sh
please cite the following paper if you used ChromH3M in your analysis:
Salhab A, Nordström K, Gasparoni G, Kattler K, Ebert P, Ramirez F, Arrigoni L, Müller F, Polansky JK, Cadenas C, et al. A comprehensive analysis of 195 DNA methylomes reveals shared and cell-specific features of partially methylated domains. Genome Biology. 2018; 19(1):150.
@article{salhab2018comprehensive,
title={A comprehensive analysis of 195 DNA methylomes reveals shared and cell-specific features of partially methylated domains},
author={Salhab, Abdulrahman and Nordstr{\"o}m, Karl and Gasparoni, Gilles and Kattler, Kathrin and Ebert, Peter and Ramirez, Fidel and Arrigoni, Laura and M{\"u}ller, Fabian and Polansky, Julia K and Cadenas, Cristina and others},
journal={Genome Biology},
volume={19},
number={1},
pages={150},
year={2018},
publisher={BioMed Central},
doi={10.1186/s13059-018-1510-5},
url={https://doi.org/10.1186/s13059-018-1510-5}
}