Skip to content

Commit

Permalink
Merge pull request #33 from databio/dev
Browse files Browse the repository at this point in the history
release v0.6
  • Loading branch information
nsheff authored Dec 14, 2017
2 parents 8bec5f0 + 57ae0b6 commit f678667
Show file tree
Hide file tree
Showing 11 changed files with 439 additions and 281 deletions.
29 changes: 24 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,41 @@
# Change log
All notable changes to this project will be documented in this file.

## [0.7.0] -- Unreleased



## [0.6.0] -- 2017-12-13

### Added
- Single-end reads are now allowed
- Pipeline can now accommodate `.bam` input files
- Added a single-base bed file output in addition to the smoothed version

### Changed
- Fixed a bug with peak counting for fseq
- Fixed a bug with reporting estimated library sizes
- Fixed issues with TSS enrichment calculation that could lead to stalled jobs or excess CPU use
- Reduced verbosity of error messages for some tools
- Reduced amount of resources requested by default
- Introduced requirement on pypiper v0.7

## [0.5.0] -- 2017-09-13

### Added
- Adds rudimentary figure reporting
- Added rudimentary figure reporting

### Changed
- Changed default trimmer from trimmomatic to skewer
- Make output from several tasks less verbose to make logs cleaner
- Fixes an issue that left behind temporary samtools files if the job was killed
- Made output from several tasks less verbose to make logs cleaner
- Fixed an issue that left behind temporary samtools files if the job was killed

## [0.4.0] -- 2017-07-21

### Added
- Added [fseq](https://github.com/aboyle/F-seq) as a peak caller option
- Peak caller is specified by a command line argument (defaults to macs2)
- Count of called peaks is now reported as a pipeline result.
- Peak caller is now specified by a command line argument (defaults to macs2)
- Count of called peaks is now reported as a pipeline result
- Add R and ggplot2 as requirements

### Changed
Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ You have two options for running the pipeline.

### Run option 1: Running the pipeline script directly

To see the command-line options for usage, see [usage.txt](usage.txt), which you can get on the command line by running `pipelines/ATACseq.py --help`. You just need to pass a few command-line parameters to specify sample_name, reference genome, input files, etc. See example command in [cmd.sh](cmd.sh) using test data.
To see the command-line options for usage, see [usage.txt](usage.txt), which you can get on the command line by running `pipelines/ATACseq.py --help`. You just need to pass a few command-line parameters to specify sample_name, reference genome, input files, etc. See [example commands](example_cmd.txt) that use test data.

To run on multiple samples, you can just write a loop to process each sample independently with the pipeline, or you can use *option 2*...

Expand Down Expand Up @@ -139,6 +139,11 @@ URL="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz"
wget -O ${GENOME}_TSS_full.txt.gz ${URL}
zcat ${GENOME}_TSS_full.txt.gz | awk '{if($4=="+"){print $3"\t"$5"\t"$5"\t"$4"\t"$13}else{print $3"\t"$6"\t"$6"\t"$4"\t"$13}}' | LC_COLLATE=C sort -k1,1 -k2,2n -u > ${GENOME}_TSS.tsv
echo ${GENOME}_TSS.tsv
Mouse:
GENOME="mm10"
URL="http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/refGene.txt.gz"
```

Another option from Gencode GTF:
Expand Down
4 changes: 2 additions & 2 deletions examples/gold_atac/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ python get_geo.py -i ~/code/ATACseq/examples/gold_atac/metadata/gold_atac_gse.cs

I used resulting file [metadata/annocomb_gold_atac_gse.csv](metadata/annocomb_gold_atac_gse.csv) to create the looper metadata sheet, [metadata/gold_atac_annotation.csv](metadata/gold_atac_annotation.csv).

I create project config file and sampled test data. The SRA fastq files should be stored in a folder `${SRAFQ}`, and then this will run with looper with no additional changes.
I create project config file and sampled test data. The SRA fastq files should be stored in a folder pointed to by environment variable `SRAFQ`, and then this will run with `looper` with no additional changes.

## Run pipeline

```
looper run ${CODE}ATACseq/examples/gold_atac/metadata/project_config.yaml -d
```
```
2 changes: 1 addition & 1 deletion examples/test_project/test_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ implied_columns:
mouse:
genome: mm10
macs_genome_size: "mm"
prealignments: null
prealignments: "mouse_chrM2x"
14 changes: 7 additions & 7 deletions pipeline_interface.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
protocol_mapping:
ATAC: ATACseq.py
ATAC-SEQ: ATACseq
ATAC-SEQ: ATACseq.py

pipelines:
ATACseq.py:
name: ATACseq
path: pipelines/ATACseq.py
looper_args: True
required_input_files: [read1, read2]
required_input_files: [read1]
all_input_files: [read1, read2]
ngs_input_files: [read1, read2]
arguments:
"--sample-name": sample_name
"--genome": genome
"--input": read1
"--input2": read2
"--single-or-paired": read_type
optional_arguments:
"--input2": read2
"--frip-ref-peaks": FRIP_ref
"--prealignments": prealignments
"--genome-size": macs_genome_size
Expand All @@ -40,14 +40,14 @@ pipelines:
file_size: "0.5"
cores: "4"
mem: "16000"
time: "02-00:00:00"
time: "00-04:00:00"
micro:
file_size: "1"
cores: "8"
mem: "32000"
time: "07-00:00:00"
time: "02-00:00:00"
milli:
file_size: "4"
file_size: "10"
cores: "16"
mem: "64000"
time: "07-00:00:00"
time: "03-00:00:00"
Loading

0 comments on commit f678667

Please sign in to comment.