Skip to content

Commit

Permalink
Merge pull request #75 from databio/dev
Browse files Browse the repository at this point in the history
Change default deduplication tool and improve messaging
  • Loading branch information
jpsmith5 authored Jan 3, 2019
2 parents 33ac444 + d9d0ac7 commit 13fa79c
Show file tree
Hide file tree
Showing 188 changed files with 44,002 additions and 266 deletions.
13 changes: 10 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,16 @@
*.pyc
.~lock*

# JetBrains
.idea/

# Tests
.cache/

# Jekyll files
_site
.DS_store
.jekyll
.bundle
.sass-cache
_site/
/_site/
.sass-cache/
.jekyll-metadata
55 changes: 55 additions & 0 deletions BiocProject/PEPATAC_BiocProject.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: "PEPATAC BiocProject"
author: "Michal Stolarczyk"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---


```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

# Introduction

Before you start see the [Getting started with `BiocProject` vignette](http://code.databio.org/pepr/articles/gettingStarted.html) for the basic `BiocProejct` information and installation instructions and [`PEPATAC` website](http://code.databio.org/PEPATAC/) for information regarding this ATAC-seq pipeline.

# Read the results of `PEPATAC`

The function shown below reads in the [`BED` files](https://genome.ucsc.edu/FAQ/FAQformat.html) from the `output_dir` specified in the [PEP](https://pepkit.github.io/docs/simple_example/) (precisely: YAML config file).

```{r include=FALSE, eval=TRUE}
processFunction = "readPepatacPeakBeds.R"
source(processFunction)
```
```{r echo=FALSE, comment=""}
readPepatacPeakBeds
```

Get the project config
```{r echo=T,message=FALSE}
library(BiocProject)
ProjectConfig = "gold_hg19.yaml"
```
## Create the `BiocProject` object

```{r}
bp = BiocProject(file=ProjectConfig)
```

## Get the read data

```{r}
data = getData(bp)
```
It is packed into a nested list, so to access the specific elements run, e.g.:
```{r}
data[[1]]$gold1
```
6 changes: 6 additions & 0 deletions BiocProject/gold_atac_annotation.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
sample_name,sample_description,treatment_description,organism,protocol,data_source,SRR,SRX,Sample_geo_accession,Sample_series_id,read_type,Sample_instrument_model,read1,read2
gold1,ATAC-seq from dendritic cell (ENCLB065VMV),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210416,SRX2523872,GSM2471255,GSE94182,PAIRED,Illumina HiSeq 2000,SRA_1,SRA_2
gold2,ATAC-seq from dendritic cell (ENCLB811FLK),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210450,SRX2523906,GSM2471300,GSE94222,PAIRED,Illumina HiSeq 2000,SRA_1,SRA_2
gold3,ATAC-seq from dendritic cell (ENCLB887PKE),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210398,SRX2523862,GSM2471249,GSE94177,PAIRED,Illumina NextSeq 500,SRA_1,SRA_2
gold4,ATAC-seq from dendritic cell (ENCLB586KIS),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210428,SRX2523884,GSM2471269,GSE94196,PAIRED,Illumina HiSeq 2000,SRA_1,SRA_2
gold5,ATAC-seq from dendritic cell (ENCLB384NOX),Homo sapiens dendritic in vitro differentiated cells treated with 0 ng/mL Lipopolysaccharide for 0 hours,human,ATAC-seq,SRA,SRR5210390,SRX2523854,GSM2471245,GSE94173,PAIRED,Illumina HiSeq 2000,SRA_1,SRA_2
23 changes: 23 additions & 0 deletions BiocProject/gold_hg19.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Run gold standard samples through ATACseq pipeline.
name: gold_hg19

metadata:
sample_annotation: "$PROCESSED/gold/pepatac/hg19/gold_atac_annotation.csv"
output_dir: "$PROCESSED/gold/pepatac/hg19/10_08_18_wo"
pipeline_interfaces: "$CODE/pepatac/pipeline_interface.yaml"

derived_columns: [read1, read2]

data_sources:
SRA_1: "${SRAFQ}{SRR}_1.fastq.gz"
SRA_2: "${SRAFQ}{SRR}_2.fastq.gz"

implied_columns:
organism:
human:
genome: hg19
macs_genome_size: hs

bioconductor:
read_fun_name: readPepatacPeakBeds
read_fun_path: readPepatacPeakBeds.R
48 changes: 48 additions & 0 deletions BiocProject/readPepatacPeakBeds.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
readPepatacPeakBeds = function(project) {
# define default column names in GenomicRanges::GRanges objects
DEFAULT_GRANGES_COLS = c('chr', 'start', 'end')
# inferring the suffix, which is "peak_calling_" + genome_assembly,
# see: pepatac.py
if (length(unique(samples(p)$genome)) != 1)
stop(paste0("Need one genome assembly, got ",
length(unique(samples(p)$genome)),
".\nCouldn't infer the path to the files."))
genome_assembly = unique(samples(project)$genome)
suffix = paste0("peak_calling_", genome_assembly)
# inferring prefix, which is "restults_pipeline",
# if not profided in PEP config, see: python peppy package
prefix = ifelse(is.null(config(project)$metadata$results_subdir),
"results_pipeline", config(project)$metadata$results_subdir)
# get output directory from PEP
outputDir = config(project)$metadata$output_dir
# get sample names from PEP
samples_names = samples(project)$sample_name
# read the data for each sample
result = lapply(samples_names, function(sample) {
# use the provided arguments to construct the path
dir = file.path(outputDir, prefix, sample, suffix)
# find BED files in the path
bedFiles = list.files(path=dir, pattern="*.bed")
# get absolute paths to the BED files
bedFilesAbs = file.path(dir,bedFiles)
gr = list()
# for eache BED file for each sample
message("reading ",length(bedFiles)," files for sample: ", sample)
for (i in seq_along(bedFilesAbs)) {
# read BED file
df = read.table(bedFilesAbs[i])
# since the number of columns varies, name the first 3 as default and
# the rest metadataX
colNames = append(
DEFAULT_GRANGES_COLS,
paste0("metadata", seq(1,NCOL(df)-length(DEFAULT_GRANGES_COLS))))
colnames(df) = colNames
# convert the data.frame to GenomicRanges::GRanges object
gr[[i]] = GenomicRanges::GRanges(df)
}
names(gr) = bedFiles
return(gr)
})
names(result) = samples_names
return(result)
}
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,21 @@
# Change log
All notable changes to this project will be documented in this file.

## [0.8.4] -- 2019-01-03

### Changed
- Switched to samblaster for default deduplication
- Improved readability of reported results
- Improved help messages
- Fix mitochdonrial counting and remainding removal
- Use gunzip instead of zcat for MacOS compatibility

### Added
- Report total mapped and unmapped reads
- Add website docs
- Zip unmapped files
- Add `--lite` option to minimize size of output directory

## [0.8.3] -- 2018-10-04

### Changed
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@ Pull requests welcome. Active development should occur in a development or featu
* Nathan Sheffield, [email protected]
* Jason Smith, [email protected]
* Ryan Corces, [email protected]
* Vince Reuter, vince.reuter@gmail.com
* Vince Reuter, vreuter@protonmail.com
* Others... (add your name)
32 changes: 0 additions & 32 deletions config/pipeline_interface.yaml

This file was deleted.

2 changes: 0 additions & 2 deletions config/protocol_mappings.yaml

This file was deleted.

21 changes: 14 additions & 7 deletions containers/pepatac.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Pull base image
FROM phusion/baseimage:0.10.1
FROM phusion/baseimage:0.10.2

# Who maintains this image
LABEL maintainer Jason Smith "[email protected]"

# Version info
LABEL version 0.8.5
LABEL version 0.9.1

# Use baseimage-docker's init system.
CMD ["/sbin/my_init"]
Expand Down Expand Up @@ -48,6 +48,7 @@ RUN pip install virtualenv && \
RUN DEBIAN_FRONTEND=noninteractive apt-get --assume-yes install r-base r-base-dev && \
echo "r <- getOption('repos'); r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile && \
Rscript -e "install.packages('argparser')" && \
Rscript -e "install.packages('data.table')" && \
Rscript -e "install.packages('devtools')" && \
Rscript -e "devtools::install_github('pepkit/pepr')" && \
Rscript -e "install.packages('data.table')" && \
Expand All @@ -62,7 +63,6 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get --assume-yes install r-base r-base-de
Rscript -e "install.packages('scales')" && \
Rscript -e "install.packages('stringr')"


# Install bedtools
RUN DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes \
ant \
Expand Down Expand Up @@ -104,10 +104,12 @@ RUN wget https://downloads.sourceforge.net/project/bowtie-bio/bowtie2/2.3.4.1/bo
make install && \
ln -s /home/src/bowtie2-2.3.4.1/bowtie2 /usr/bin/

# Install picard
WORKDIR /home/tools/bin
RUN wget https://github.com/broadinstitute/picard/releases/download/2.18.0/picard.jar && \
chmod +x picard.jar
# Install samblaster
WORKDIR /home/tools/
RUN git clone git://github.com/GregoryFaust/samblaster.git && \
cd /home/tools/samblaster && \
make && \
ln -s /home/tools/samblaster/samblaster /usr/bin/

# Install UCSC tools
WORKDIR /home/tools/
Expand Down Expand Up @@ -135,6 +137,11 @@ RUN git clone git://github.com/relipmoc/skewer.git && \
make install

# OPTIONAL REQUIREMENTS
# Install picard
WORKDIR /home/tools/bin
RUN wget https://github.com/broadinstitute/picard/releases/download/2.18.0/picard.jar && \
chmod +x picard.jar

# Install F-seq
WORKDIR /home/src/
RUN wget https://github.com/aboyle/F-seq/archive/master.zip && \
Expand Down
21 changes: 21 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# [PEPATAC documentation](http://code.databio.org/PEPATAC)

This repository is viewable at [code.databio.org/PEPATAC](http://code.databio.org/PEPATAC). It holds HTML documentation for the PEPATAC pipeline.

## Building PEPATAC documentation with jekyll:

`jekyll build pepatac`

## Do it with `docker` or `singularity`!

1. Grab the container

`docker pull nsheff/jim`
*or*
`singularity build jim docker://nsheff/jim`

2. Build the website

`docker run jim jekyll build pepatac`
*or*
`singularity exec jim jekyll build pepatac`
5 changes: 5 additions & 0 deletions docs/_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name: PEPATAC
title: PEPATAC
url: "http://code.databio.org/PEPATAC"
baseurl: ""
include: ['pages', "howto", "assets"]
33 changes: 33 additions & 0 deletions docs/_includes/footer.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<hr>
<footer>
<div class="container">
<ul id="contact">
<li><a href="{{ "/contact/" | prepend: site.baseurl }}"><span class="far fa-envelope"></span> Contact Us</a></li>
<li><a href="http://databio.org">Learn more about the Databio team!</a></li>
</ul>
</div>
</footer>
<!-- JavaScript -->
<!-- jQuery first, then Popper.js, then Bootstrap JS -->
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
<script src="{{ "/assets/js/bootstrap.min.js" | prepend: site.baseurl }}" ></script>
<script src="{{ "/assets/js/bootstrap-toc.js" | prepend: site.baseurl }}" ></script>
<script src="{{ "/assets/js/clipboard.js" | prepend: site.baseurl }}" ></script>
<script src="{{ "/assets/js/prism.js" | prepend: site.baseurl }}" ></script>
<script>
$(function () {
$('.tree li:has(ul)').addClass('parent_li').find(' > span').attr('title', 'Collapse this branch');
$('.tree li.parent_li > span').on('click', function (e) {
var children = $(this).parent('li.parent_li').find(' > ul > li');
if (children.is(":visible")) {
children.hide('fast');
$(this).attr('title', 'Expand this branch').find(' > i').addClass('icon-plus-sign').removeClass('icon-minus-sign');
} else {
children.show('fast');
$(this).attr('title', 'Collapse this branch').find(' > i').addClass('icon-minus-sign').removeClass('icon-plus-sign');
}
e.stopPropagation();
});
});
</script>
10 changes: 10 additions & 0 deletions docs/_includes/header.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<!-- Bootstrap stylesheet -->
<link rel="stylesheet" type="text/css" href="{{ "/assets/css/bootstrap.min.css" | prepend: site.baseurl }}" >
<!-- Bootstrap ToC -->
<link rel="stylesheet" type="text/css" href="{{ "/assets/css/bootstrap-toc.css" | prepend: site.baseurl }}" >
<!-- Tree -->
<link rel="stylesheet" type="text/css" href="{{ "/assets/css/tree.css" | prepend: site.baseurl }}" >
<!-- Prism syntax highlighting -->
<link rel="stylesheet" type="text/css" href="{{ "/assets/css/prism.css" | prepend: site.baseurl }}" >
<!-- FontAwesome -->
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous">
40 changes: 40 additions & 0 deletions docs/_includes/navbar.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<nav class="navbar sticky-top navbar-expand-lg navbar-dark bg-dark" style="z-index: 100000">
<a class="navbar-left" href="#top"><img src="{{ "/assets/images/logo_pepatac_white.png" | prepend: site.baseurl }}" class="d-inline-block align-middle img-responsive" alt="PEPATAC" style="max-height:20px; margin-top:-10px; margin-bottom:-10px"></a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarPrimary" aria-controls="navbarPrimary" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarPrimary">
<ul class="navbar-nav mr-auto">
<li class="nav-item active">
<a class="nav-link" href="{{ "/" | prepend: site.baseurl }}">Home<span class="sr-only">(current)</span></a>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="get-started-Dropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"><span class="fas fa-play-circle"></span> Getting started</a>
<div class="dropdown-menu" aria-labelledby="get-started-Dropdown">
<a class="dropdown-item" href="{{ "/intro/" | prepend: site.baseurl }}">Introduction</a>
<a class="dropdown-item" href="{{ "/features/" | prepend: site.baseurl }}">Features and benefits</a>
<a class="dropdown-item" href="{{ "/install/" | prepend: site.baseurl }}">Install and run test example</a>
<a class="dropdown-item" href="{{ "/tutorial/" | prepend: site.baseurl }}">Extended tutorial</a>
<a class="dropdown-item" href="{{ "/glossary/" | prepend: site.baseurl }}">Glossary</a>
</div>
</li>
<li class="nav-item">
<a class="nav-link" href="{{ "/howto/" | prepend: site.baseurl }}"><span class="fas fa-chalkboard-teacher"></span> How-to guides</a>
</li>
<li class="nav-item">
<a class="nav-link" href="{{ "/assets/files/examples/gold/summary.html" | prepend: site.baseurl }}" rel="noopener noreferrer" target="_blank"><span class="fas fa-desktop"></span> Example output</a>
</li>
<li class="nav-item">
<a class="nav-link" href="https://github.com/databio/pepatac"><span class="fab fa-github fa-lg"></span> GitHub</a>
</li>
</ul>
<ul class="navbar-nav navbar-right">
<li class="nav-item">
<a class="nav-link" href="http://databio.org/">Databio.org</a>
</li>
<li class="nav-item">
<a class="nav-link" href="http://databio.org/software/">Software & Data</a>
</li>
</ul>
</div>
</nav>
Loading

0 comments on commit 13fa79c

Please sign in to comment.