A pipeline to perform automated mapping, haplotype-assignment, and Hi-C data generation, as in Dynamic reorganization of the genome shapes the recombination landscape in meiotic prophase.
The pipeline ingests paired-end .fastq files from a Hi-C experiment and runs a complete Hi-C analysis pipeline, specifically for the haplotype analysis.
This pipeline is optimized for PBS/TORQUE systems. To get started, you'll need to set up a few things:
- Clone the repository to your working directory.
- Add your index files to the
/index
directory - Add your SNP files to the
/snp
directory- The required format is one file per chromosome named
chr#.snps.txt.gz
with each file containing SNPs in the tab-seperated formatPOS REF ALT
- The required format is one file per chromosome named
- Add your sequence files to the
/fastq
directory- By default, the pipeline expects bz2-compressed .fastq files
- The compression format can be configured in
config.json
- Configure the
config.json
file with the paths to the required software packages, .fastq prefixes, assemblies, and chromosome references for the pipeline - Configure the jobscript.pbs with your respective email and username used in your job submission queue. Also load any modules that may be needed here.
This pipeline leverages several bioinformatics tools, namely:
- Samtools
- BWA (MEM)
- WASP
- Juicer
- Cooler
- Pairtools
To run the pipeline, simply execute run.sh
. I recommend backgrounding the process, or submitting it to a job scheduler.
./run.sh &
What packages are necessary to run the pipeline?
The main packages are listed above, but a full conda environment can be found in environment.yml
.
The pipeline failed or completed before generating the final output files, what happened?
Consult the log files in /mapping/log
and /pbslog
for more information.
I can't restart the pipeline after it failed!
Snakemake locks the working directory while running. You may have to unlock the working directory when restarting the pipeline with snakemake --unlock
.
- Python
- The pipeline must be executed in a Python 3 environment, but Python 2.7 is still required for a few legacy scripts.
- Snakemake - A pipeline tool for Python
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
- Lucas Patel - Initial and ongoingn work
- Yunjiang Qiu - Initial pipeline structure and Hi-C WASP adaptation
- David Gorkin - Hi-C WASP adaptation
- Kevin Corbett - Experimental design and scientific direction
This pipeline was created to facilitate the experimental design of the corresponding publicatio, Dynamic reorganization of the genome shapes the recombination landscape in meiotic prophase. If this pipeline is used in your own analyis, please cite the above approrpriately.
-
Thanks to Grant McVicker for early guidance.
-
Thanks to Anton Goloborodko for pairing score guidance and code.
-
The
homolog_pairing.py
script is adapted from_____