-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #21 from databio/dev
Release 0.4
- Loading branch information
Showing
17 changed files
with
1,463 additions
and
514 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,10 @@ | ||
# General | ||
*.pyc | ||
.~lock* | ||
.~lock* | ||
|
||
# JetBrains | ||
.idea/ | ||
|
||
# Tests | ||
.cache/ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,26 +16,38 @@ These features are explained in more detail later in this README. | |
|
||
## Installing | ||
|
||
**Prerequisite python packages**. This pipeline uses [pypiper](https://github.com/epigen/pypiper) to run a single sample, [looper](https://github.com/epigen/looper) to handle multi-sample projects (for either local or cluster computation), and [pararead](https://github.com/databio/pararead) for parallel processing sequence reads. You can do a user-specific install of these like this: | ||
### Prequisites | ||
|
||
**Python packages**. This pipeline uses [pypiper](https://github.com/epigen/pypiper) to run a single sample, [looper](https://github.com/epigen/looper) to handle multi-sample projects (for either local or cluster computation), and [pararead](https://github.com/databio/pararead) for parallel processing sequence reads. You can do a user-specific install of these like this: | ||
|
||
``` | ||
pip install --user https://github.com/epigen/pypiper/zipball/master | ||
pip install --user https://github.com/epigen/looper/zipball/master | ||
pip install --user https://github.com/databio/pararead/zipball/master | ||
``` | ||
**R packages**. This pipeline uses R to generate QC metric plots. These are **optional** and if you don't install these R packages (or R in general), the pipeline will still work, but you will not get the QC plot outputs. | ||
|
||
Version 0.3 of this pipeline requires looper version 0.6 or greater. You can upgrade looper with: `pip install --user --upgrade https://github.com/epigen/looper/zipball/master`. | ||
The following packages are used by the qc scripts: | ||
- ggplot2 | ||
- gplots (v3.0.1) | ||
- reshape2 (v1.4.2) | ||
|
||
You can install these packages like this: | ||
``` | ||
R # start R | ||
install.packages(c("ggplot2", "gplots", "reshape2")) | ||
``` | ||
|
||
**Required executables**. You will need some common bioinformatics tools installed. The list is specified in the pipeline configuration file ([pipelines/ATACseq.yaml](pipelines/ATACseq.yaml)) tools section. | ||
|
||
**Genome resources**. This pipeline requires genome assemblies produced by [refgenie](https://github.com/databio/refgenie). You may [download pre-indexed references](http://cloud.databio.org/refgenomes) or you may index your own (see [refgenie](https://github.com/databio/refgenie) instructions). Any prealignments you want to do use will also require refgenie assemblies. Some common examples are provided by [ref_decoy](https://github.com/databio/ref_decoy). | ||
|
||
### Configuring the pipeline | ||
|
||
**Clone the pipeline**. Clone this repository using one of these methods: | ||
- using SSH: `git clone [email protected]:databio/ATACseq.git` | ||
- using HTTPS: `git clone https://github.com/databio/ATACseq.git` | ||
|
||
## Configuring | ||
|
||
There are two configuration options: You can either set up environment variables to fit the default configuration, or change the configuration file to fit your environment. For the Chang lab, you may use the pre-made config file and project template described on the [Chang lab configuration](examples/chang_project) page. For others, choose one: | ||
|
||
**Option 1: Default configuration** (recommended; [pipelines/ATACseq.yaml](pipelines/ATACseq.yaml)). | ||
|
@@ -56,14 +68,15 @@ There are two configuration options: You can either set up environment variables | |
|
||
**Option 2: Custom configuration**. Instead, you can also put absolute paths to each tool or resource in the configuration file to fit your local setup. Just change the pipeline configuration file ([pipelines/ATACseq.yaml](pipelines/ATACseq.yaml)) appropriately. | ||
|
||
## Usage | ||
|
||
## Running the pipeline | ||
|
||
You have options for running the pipeline. This is a looper-compatible pipeline, so you never need to interface with the pipeline directly, but you can if you want. | ||
You have two options for running the pipeline. | ||
|
||
### Option 1: Running the pipeline script directly | ||
|
||
Just run `python pipelines/ATACseq.py -h` to see usage. You just need to pass a few command-line parameters to specify sample_name, reference genome, input files, etc. See example command in [cmd.sh](cmd.sh). | ||
To see the command-line options for usage, see [usage.txt](usage.txt), which you can get on the command line by running `pipelines/ATACseq.py --help`. You just need to pass a few command-line parameters to specify sample_name, reference genome, input files, etc. See example command in [cmd.sh](cmd.sh) using test data. | ||
|
||
To run on multiple samples, you can just write a loop to process each sample independently with the pipeline, or you can use *option 2*... | ||
|
||
### Option 2: Running the pipeline through looper | ||
|
||
|
@@ -129,6 +142,24 @@ grep "level 1" ${GENOME}.gtf | grep "gene" | awk '{if($7=="+"){print $1"\t"$4"\ | |
``` | ||
|
||
### Optional summary plots | ||
|
||
1. Run `looper summarize` to generate a summary table in tab-separated values (TSV) format | ||
|
||
``` | ||
looper summarize examples/test_project/test_config.yaml | ||
``` | ||
|
||
2. Run `ATAC_Looper_Summary_plot.R` to produce summary plots. | ||
|
||
You must pass the full path to your TSV file that resulted from the call to looper summarize. | ||
``` | ||
Rscript ATAC_Looper_Summary_plot.R </path/to/looper/summarize/summary.TSV> | ||
``` | ||
|
||
This results in the output of multiple PDF plots in the directory containing the TSV input file. | ||
|
||
|
||
## Using a cluster | ||
|
||
Once you've specified your project to work with this pipeline, you will also inherit all the power of looper for your project. You can submit these jobs to a cluster with a simple change to your configuration file. Follow instructions in [configuring looper to use a cluster](http://looper.readthedocs.io/en/latest/cluster-computing.html). | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,4 +29,4 @@ ATACseq.py: | |
file_size: "6" | ||
cores: "8" | ||
mem: "32000" | ||
time: "3-00:00:00" | ||
time: "3-00:00:00" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
ATAC: ATACseq.py | ||
ATAC-SEQ: ATACseq.py | ||
ATAC-SEQ: ATACseq.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.