Merge pull request #33 from databio/dev

release v0.6
databio · Dec 14, 2017 · f678667 · f678667
2 parents 8bec5f0 + 57ae0b6
commit f678667
Show file tree

Hide file tree

Showing 11 changed files with 439 additions and 281 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,22 +1,41 @@
 # Change log
 All notable changes to this project will be documented in this file.
 
+## [0.7.0] -- Unreleased
+
+
+
+## [0.6.0] -- 2017-12-13
+
+### Added
+- Single-end reads are now allowed
+- Pipeline can now accommodate `.bam` input files
+- Added a single-base bed file output in addition to the smoothed version
+
+### Changed
+- Fixed a bug with peak counting for fseq
+- Fixed a bug with reporting estimated library sizes
+- Fixed issues with TSS enrichment calculation that could lead to stalled jobs or excess CPU use
+- Reduced verbosity of error messages for some tools
+- Reduced amount of resources requested by default
+- Introduced requirement on pypiper v0.7
+
 ## [0.5.0] -- 2017-09-13
 
 ### Added
-- Adds rudimentary figure reporting
+- Added rudimentary figure reporting
 
 ### Changed
 - Changed default trimmer from trimmomatic to skewer
-- Make output from several tasks less verbose to make logs cleaner
-- Fixes an issue that left behind temporary samtools files if the job was killed
+- Made output from several tasks less verbose to make logs cleaner
+- Fixed an issue that left behind temporary samtools files if the job was killed
 
 ## [0.4.0] -- 2017-07-21
 
 ### Added
 - Added [fseq](https://github.com/aboyle/F-seq) as a peak caller option
-- Peak caller is specified by a command line argument (defaults to macs2)
-- Count of called peaks is now reported as a pipeline result.
+- Peak caller is now specified by a command line argument (defaults to macs2)
+- Count of called peaks is now reported as a pipeline result
 - Add R and ggplot2 as requirements
 
 ### Changed

diff --git a/README.md b/README.md
@@ -76,7 +76,7 @@ You have two options for running the pipeline.
 
 ### Run option 1: Running the pipeline script directly
 
-To see the command-line options for usage, see [usage.txt](usage.txt), which you can get on the command line by running `pipelines/ATACseq.py --help`. You just need to pass a few command-line parameters to specify sample_name, reference genome, input files, etc. See example command in [cmd.sh](cmd.sh) using test data.
+To see the command-line options for usage, see [usage.txt](usage.txt), which you can get on the command line by running `pipelines/ATACseq.py --help`. You just need to pass a few command-line parameters to specify sample_name, reference genome, input files, etc. See [example commands](example_cmd.txt) that use test data.
 
 To run on multiple samples, you can just write a loop to process each sample independently with the pipeline, or you can use *option 2*...
 
@@ -139,6 +139,11 @@ URL="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz"
 wget -O ${GENOME}_TSS_full.txt.gz ${URL}
 zcat ${GENOME}_TSS_full.txt.gz | awk  '{if($4=="+"){print $3"\t"$5"\t"$5"\t"$4"\t"$13}else{print $3"\t"$6"\t"$6"\t"$4"\t"$13}}'  | LC_COLLATE=C sort -k1,1 -k2,2n -u > ${GENOME}_TSS.tsv
 echo ${GENOME}_TSS.tsv
+
+Mouse:
+GENOME="mm10"
+URL="http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/refGene.txt.gz"
+
 ```
 
 Another option from Gencode GTF:

diff --git a/examples/gold_atac/README.md b/examples/gold_atac/README.md
@@ -13,10 +13,10 @@ python get_geo.py -i ~/code/ATACseq/examples/gold_atac/metadata/gold_atac_gse.cs
 
 I used resulting file [metadata/annocomb_gold_atac_gse.csv](metadata/annocomb_gold_atac_gse.csv) to create the looper metadata sheet, [metadata/gold_atac_annotation.csv](metadata/gold_atac_annotation.csv).
 
-I create project config file and sampled test data. The SRA fastq files should be stored in a folder `${SRAFQ}`, and then this will run with looper with no additional changes.
+I create project config file and sampled test data. The SRA fastq files should be stored in a folder pointed to by environment variable `SRAFQ`, and then this will run with `looper` with no additional changes.
 
 ## Run pipeline
 
 ```
 looper run ${CODE}ATACseq/examples/gold_atac/metadata/project_config.yaml -d
-```
+```
diff --git a/examples/test_project/test_config.yaml b/examples/test_project/test_config.yaml
@@ -30,4 +30,4 @@ implied_columns:
     mouse:
       genome: mm10
       macs_genome_size: "mm"
-      prealignments: null
+      prealignments: "mouse_chrM2x"
diff --git a/pipeline_interface.yaml b/pipeline_interface.yaml
@@ -1,22 +1,22 @@
 protocol_mapping:
   ATAC: ATACseq.py
-  ATAC-SEQ: ATACseq
+  ATAC-SEQ: ATACseq.py
 
 pipelines:
   ATACseq.py:
     name: ATACseq
     path: pipelines/ATACseq.py
     looper_args: True
-    required_input_files: [read1, read2]
+    required_input_files: [read1]
     all_input_files: [read1, read2]
     ngs_input_files: [read1, read2]
     arguments:
       "--sample-name": sample_name
       "--genome": genome
       "--input": read1
-      "--input2": read2
       "--single-or-paired": read_type
     optional_arguments:
+      "--input2": read2
       "--frip-ref-peaks": FRIP_ref
       "--prealignments": prealignments
       "--genome-size": macs_genome_size
@@ -40,14 +40,14 @@ pipelines:
         file_size: "0.5"
         cores: "4"
         mem: "16000"
-        time: "02-00:00:00"
+        time: "00-04:00:00"
       micro:
         file_size: "1"
         cores: "8"
         mem: "32000"
-        time: "07-00:00:00"
+        time: "02-00:00:00"
       milli:
-        file_size: "4"
+        file_size: "10"
         cores: "16"
         mem: "64000"
-        time: "07-00:00:00"
+        time: "03-00:00:00"