Skip to content

Commit

Permalink
Merge pull request #21 from databio/dev
Browse files Browse the repository at this point in the history
Release 0.4
  • Loading branch information
nsheff authored Jul 21, 2017
2 parents 7a41ca3 + a426e17 commit 43cdb39
Show file tree
Hide file tree
Showing 17 changed files with 1,463 additions and 514 deletions.
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,10 @@
# General
*.pyc
.~lock*
.~lock*

# JetBrains
.idea/

# Tests
.cache/

19 changes: 17 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,26 @@
# Change log
All notable changes to this project will be documented in this file.

## [0.4.0] -- 2017-07-21

### Added
- Added [fseq](https://github.com/aboyle/F-seq) as a peak caller option
- Peak caller is specified by a command line argument (defaults to macs2)
- Count of called peaks is now reported as a pipeline result.
- Add R and ggplot2 as requirements

### Changed
- Changed TSS plotting to use R instead of python
- TSS plot failures no longer fail the pipeline.
- Changed `Read_type` to `read_type` to prevent duplicate columns
- Read trimmer is now specified in option + argument style rather than as a flag.

## [0.3.0] -- 2017-06-22

### Added
- Added exact cuts calculation
- Adds command-line version display
- Added command-line version display
- Added skewer as a trimmer option
- Uses looper 'implied columns' (from looper v0.6) to derive multiple variables from organism value

### Changed
Expand All @@ -32,4 +47,4 @@ All notable changes to this project will be documented in this file.

## [0.1.0]
### Added
- First release of ATAC-seq pypiper pipeline
- First release of ATAC-seq pypiper pipeline
47 changes: 39 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,38 @@ These features are explained in more detail later in this README.

## Installing

**Prerequisite python packages**. This pipeline uses [pypiper](https://github.com/epigen/pypiper) to run a single sample, [looper](https://github.com/epigen/looper) to handle multi-sample projects (for either local or cluster computation), and [pararead](https://github.com/databio/pararead) for parallel processing sequence reads. You can do a user-specific install of these like this:
### Prequisites

**Python packages**. This pipeline uses [pypiper](https://github.com/epigen/pypiper) to run a single sample, [looper](https://github.com/epigen/looper) to handle multi-sample projects (for either local or cluster computation), and [pararead](https://github.com/databio/pararead) for parallel processing sequence reads. You can do a user-specific install of these like this:

```
pip install --user https://github.com/epigen/pypiper/zipball/master
pip install --user https://github.com/epigen/looper/zipball/master
pip install --user https://github.com/databio/pararead/zipball/master
```
**R packages**. This pipeline uses R to generate QC metric plots. These are **optional** and if you don't install these R packages (or R in general), the pipeline will still work, but you will not get the QC plot outputs.

Version 0.3 of this pipeline requires looper version 0.6 or greater. You can upgrade looper with: `pip install --user --upgrade https://github.com/epigen/looper/zipball/master`.
The following packages are used by the qc scripts:
- ggplot2
- gplots (v3.0.1)
- reshape2 (v1.4.2)

You can install these packages like this:
```
R # start R
install.packages(c("ggplot2", "gplots", "reshape2"))
```

**Required executables**. You will need some common bioinformatics tools installed. The list is specified in the pipeline configuration file ([pipelines/ATACseq.yaml](pipelines/ATACseq.yaml)) tools section.

**Genome resources**. This pipeline requires genome assemblies produced by [refgenie](https://github.com/databio/refgenie). You may [download pre-indexed references](http://cloud.databio.org/refgenomes) or you may index your own (see [refgenie](https://github.com/databio/refgenie) instructions). Any prealignments you want to do use will also require refgenie assemblies. Some common examples are provided by [ref_decoy](https://github.com/databio/ref_decoy).

### Configuring the pipeline

**Clone the pipeline**. Clone this repository using one of these methods:
- using SSH: `git clone [email protected]:databio/ATACseq.git`
- using HTTPS: `git clone https://github.com/databio/ATACseq.git`

## Configuring

There are two configuration options: You can either set up environment variables to fit the default configuration, or change the configuration file to fit your environment. For the Chang lab, you may use the pre-made config file and project template described on the [Chang lab configuration](examples/chang_project) page. For others, choose one:

**Option 1: Default configuration** (recommended; [pipelines/ATACseq.yaml](pipelines/ATACseq.yaml)).
Expand All @@ -56,14 +68,15 @@ There are two configuration options: You can either set up environment variables

**Option 2: Custom configuration**. Instead, you can also put absolute paths to each tool or resource in the configuration file to fit your local setup. Just change the pipeline configuration file ([pipelines/ATACseq.yaml](pipelines/ATACseq.yaml)) appropriately.

## Usage

## Running the pipeline

You have options for running the pipeline. This is a looper-compatible pipeline, so you never need to interface with the pipeline directly, but you can if you want.
You have two options for running the pipeline.

### Option 1: Running the pipeline script directly

Just run `python pipelines/ATACseq.py -h` to see usage. You just need to pass a few command-line parameters to specify sample_name, reference genome, input files, etc. See example command in [cmd.sh](cmd.sh).
To see the command-line options for usage, see [usage.txt](usage.txt), which you can get on the command line by running `pipelines/ATACseq.py --help`. You just need to pass a few command-line parameters to specify sample_name, reference genome, input files, etc. See example command in [cmd.sh](cmd.sh) using test data.

To run on multiple samples, you can just write a loop to process each sample independently with the pipeline, or you can use *option 2*...

### Option 2: Running the pipeline through looper

Expand Down Expand Up @@ -129,6 +142,24 @@ grep "level 1" ${GENOME}.gtf | grep "gene" | awk '{if($7=="+"){print $1"\t"$4"\
```

### Optional summary plots

1. Run `looper summarize` to generate a summary table in tab-separated values (TSV) format

```
looper summarize examples/test_project/test_config.yaml
```

2. Run `ATAC_Looper_Summary_plot.R` to produce summary plots.

You must pass the full path to your TSV file that resulted from the call to looper summarize.
```
Rscript ATAC_Looper_Summary_plot.R </path/to/looper/summarize/summary.TSV>
```

This results in the output of multiple PDF plots in the directory containing the TSV input file.


## Using a cluster

Once you've specified your project to work with this pipeline, you will also inherit all the power of looper for your project. You can submit these jobs to a cluster with a simple change to your configuration file. Follow instructions in [configuring looper to use a cluster](http://looper.readthedocs.io/en/latest/cluster-computing.html).
Expand Down
2 changes: 1 addition & 1 deletion config/pipeline_interface.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ ATACseq.py:
file_size: "6"
cores: "8"
mem: "32000"
time: "3-00:00:00"
time: "3-00:00:00"
2 changes: 1 addition & 1 deletion config/protocol_mappings.yaml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
ATAC: ATACseq.py
ATAC-SEQ: ATACseq.py
ATAC-SEQ: ATACseq.py
2 changes: 1 addition & 1 deletion pipeline_interface.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ pipelines:
file_size: "0.001"
cores: "1"
mem: "4000"
time: "00:20:00"
time: "00:40:00"
pico:
file_size: "0.05"
cores: "1"
Expand Down
Loading

0 comments on commit 43cdb39

Please sign in to comment.