ChIPseq

Automation of ChIP-seq Workflow

About

This repository will use software in a conda virtual environment to process multiple ChIP-seq samples. Before proceeding make sure you have your conda environment installed, and the settings properly set in the 'settings.conf' file.

Getting Started

Cloning Repository

To get started, you must first clone the ChIPseq automation repository. Navigate to a directory you would like to clone the repository to and enter git clone https://github.com/zentnerlab/ChIPseq.git.

Preparing Conda Environment

This workflow takes advantage of the conda package manager and virtual environment. The conda package manager installs both the main software and all dependencies into a 'virtual environment' to ensure compatabilty. Additionally, the provided 'environments.yml' file can be used to install the same major software versions as used to develop the script, ensuring prolonged compatability and reproducibility.

Before creating the environment, you must first install miniconda.

Install miniconda, and make sure that conda is in your PATH.
Update conda to the latest version conda update -n base -c defaults conda.

You are now ready to create the virtual sofware environment, and download all software and dependencies. It is recommended to reproduce the environment used when creating the script, but instructions on installing the latest software are provided as an alternative.

Reproducing the Development Environment (Recommended)

To install the major software versions used when developing this script, navigate to the 'DOCS' directory, and use the provided 'environments.yml' file to create your conda environment.

conda env create -f environment.yml

For posterity, all software and versions used when developing the script are provided in the 'development_environment.yml' file located in the 'DOCS' directory for the repository. This file can not be used to install the environment on your computer, because many of the dependencies and software builds are system specific. However, this file may help you troubleshoot any dependency errors that may occur in your environment.

Installing The Latest Software Versions

Create the new environment and specify the software to include in it.

conda create -n chipseq-automation -y -c conda-forge -c bioconda \
fastqc bowtie2 samtools macs2 deeptools bedtools sra-tools r-tidyverse r-getopt \
bioconductor-chipseeker bioconductor-rtracklayer bioconductor-genomicranges \
bioconductor-genomicfeatures

Update the software to the latest compatible versions.

conda update -n chipseq-automation -y -c conda-forge -c bioconda --all

If you wish to use any of the software in the environment outside of the workflow you can type conda activate chipseq-automation. You can deactivate the environment by closing your terminal or entering conda deactivate.

Creating Sample Sheet

In order to keep track of samples, this workflow requires the creation of a sample sheet. An example sheet 'samples.tsv' is provided in the 'DOCS' repository directory. It is important to follow exact formatting of this sheet, as the information within it is used in various stages of the workflow.

Column	Description
sample_ID	Short sample identifier (e.g. A001).
condition	Experimental condition (e.g. EWSR1_KD).
replicate	Sample replicate number (e.g. 1).
R1	Name of R1 fastq file of experimental condition.
R2	Name of R2 fastq file of experimental condition (leave blank if single end).
control_ID	Name of input/control sample (e.g. KD_input_1) (leave blank if there is no input/control).
R1_control	Name of R1 fastq of input/control (leave blank if there is no input/control).
R2_control	Name of R2 fastq of input/control (leave blank if input/control is single end).

After creating the sample sheet, set the path and file name in the 'settings.conf' file.

Specifying Run Settings

The last step is to set a few settings in the 'settings.conf' file in the main repository directory. An example settings file is provided in the 'DOCS' directory of the repository.

Setting	Description
BASEDIR	The directory for the ChIPseq repository (e.g. /analysis/ChIPseq).
OUTDIR	The output directory for the workflow results (e.g. /analysis/results).
CORES	The number of CPU cores/threads (e.g. 2).
SAMPLE_SHEET	The directory and name of the sample sheet (e.g. /analysis/samples.tsv).
DOWNLOAD	Whether the files need to be downloaded from SRA (e.g.'TRUE').
SEQDIR	The directory containing the fastq files (e.g. /analysis/sequences).
GENOME_FASTA	The directory and name of the genome assembly fasta (e.g. /analysis/genome/genome.fasta).
GENOME_GTF	The directory and name of the genome annotation GTF/GFF (e.g. /analysis/genome/genes.gtf).
GENOME_SIZE	Effective genome size (e.g. hg38 is ~3000000000).
UPSTREAM	Bases upstream of TSS for defining promoter (e.g. 1000).
DOWNSTREAM	Bases downstream of TSS for defining promoter (e.g. 1000).

Running the Workflow

After getting the conda environment ready, the sample sheet prepared, and the settings specified, you are now ready to run the workflow. Navigate to the main directory and enter 'bash main.sh'.

Notes for IU Folks

If you wish to submit the workflow to a compute node, you can do so by submitting it through the TORQUE resource manager qsub -l nodes=1:ppn=8,vmem=64gb,walltime=12:00:00 main.sh. 'ppn' specifies the threads, and 'vmem' is the virtual memory.

Built With

This workflow would not be possible without the great software listed below.

Anaconda - Software package manager and virtual environment.
FastQC - Read quality control.
Bowtie 2 - Short read aligner.
Samtools - SAM/BAM manipulation.
MACS - Peak caller.
bedtools - Bed file manipulation.
deepTools - Read normalization and heatmap generation.
ChIPseeker - Peak annotation.
Tidyverse - Data manipulation and visualization in R.
rtracklayer - Easy genomic file manipulation.
GenomicRanges - Robust data structure for genomics data.
GenomicFeatures - Working with genomic annotation files.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
DOCS		DOCS
bin		bin
LICENSE		LICENSE
README.md		README.md
main.sh		main.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChIPseq

About

Getting Started

Cloning Repository

Preparing Conda Environment

Reproducing the Development Environment (Recommended)

Installing The Latest Software Versions

Creating Sample Sheet

Specifying Run Settings

Running the Workflow

Notes for IU Folks

Built With

About

Releases

Packages

Languages

License

gzentner/ChIPseq

Folders and files

Latest commit

History

Repository files navigation

ChIPseq

About

Getting Started

Cloning Repository

Preparing Conda Environment

Reproducing the Development Environment (Recommended)

Installing The Latest Software Versions

Creating Sample Sheet

Specifying Run Settings

Running the Workflow

Notes for IU Folks

Built With

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages