Skip to content

Commit

Permalink
Pd 2804 rc (broadinstitute#1441)
Browse files Browse the repository at this point in the history
* update optimus methods

* add atac  methods

* add snm3c methods

* update snm3c methods

* atac snm3c and docsite

* rewrite snm3c

* Update website/docs/Pipelines/ATAC/atac.methods.md

Co-authored-by: Elizabeth Kiernan <[email protected]>

* Update website/docs/Pipelines/Optimus_Pipeline/optimus.methods.md

Co-authored-by: Elizabeth Kiernan <[email protected]>

* Update website/docs/Pipelines/Optimus_Pipeline/optimus.methods.md

Co-authored-by: Elizabeth Kiernan <[email protected]>

* Update website/docs/Pipelines/Optimus_Pipeline/optimus.methods.md

Co-authored-by: Elizabeth Kiernan <[email protected]>

* Update website/docs/Pipelines/Optimus_Pipeline/optimus.methods.md

Co-authored-by: Elizabeth Kiernan <[email protected]>

---------

Co-authored-by: Elizabeth Kiernan <[email protected]>
  • Loading branch information
rsc3 and ekiernan authored Dec 5, 2024
1 parent c563774 commit 145b6bf
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 22 deletions.
7 changes: 7 additions & 0 deletions website/docs/Pipelines/ATAC/atac.methods.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# ATAC v2.3.1 Methods

# Methods

Data preprocessing and analysis for 10x chromatin accessibility was performed using the ATAC workflow v2.3.1 (RRID:SCR_025042). Briefly, FASTQ files were processed with a custom tool fastqprocess which corrects cell barcodes against a reference whitelist and splits reads by barcode to enable processing parallelization. Adaptor sequences were then removed from reads using Cutadapt v4.4. Reads were then aligned to the reference genome using BWA-MEM2 v2.2.1 with default parameters, which outputs corrected barcodes to a BAM in the CB:Z tag. The resulting BAM was then processed with SnapATAC2 v2.7.0 to produce a fragment file, index, and h5ad containing fragments as well as per-barcode quality metrics.

An overview of the pipeline is available in [WARP Documentation](https://broadinstitute.github.io/warp/docs/Pipelines/ATAC/README) and examples of genomic references, whitelists, and other inputs are available in the [WARP repository](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/multiome/test_inputs).
19 changes: 8 additions & 11 deletions website/docs/Pipelines/Optimus_Pipeline/optimus.methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,28 @@
sidebar_position: 2
---

# Optimus v5.3.0 Methods
# Optimus v7.8.1 Methods

Below we provide an example methods section for a publication, separated into single-cell or single-nucleus use cases. For the complete pipeline documentation, see the [Optimus Overview](./README.md).

# Methods

## Single-cell (sc_rna mode)
Data preprocessing and count matrix construction were performed using the Optimus v5.3.0 pipeline (RRID:SCR_018908). Briefly, FASTQ files were partitioned by barcodes using sctools v0.3.13. The files were then trimmed, aligned, UMI-corrected against the 10x Genomics barcodes whitelist, and converted to a raw count matrix using STAR v2.7.9a. CB correction was performed using the `--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts` parameter which allowed for multiple matches in the whitelist with 1 mismatched base, used posterior probability calculation to choose one of the matches, added pseudocounts of 1 to all whitelist barcodes, and allowed multi-matching of CBs with N-bases to the whitelist.
Data preprocessing and count matrix construction were performed using the Optimus v7.8.1 pipeline (RRID:SCR_018908). Briefly, FASTQ files were partitioned by barcodes using fastqprocess. The files were then trimmed, aligned, UMI-corrected against the 10x Genomics barcodes whitelist, and converted to a raw count matrix using STARsolo v2.7.11a. CB correction was performed using the `--soloCBmatchWLtype 1MM_multi` parameter.

Reads were trimmed using the solo parameter `--clipAdapterType CellRanger4` and `--outFilterScoreMin 30` which matches read trimming performed by CellRanger4. Reads were then aligned to GENCODE mouse (M21) or human (V27) references in unstranded mode. Genes were annotated using the STAR "Gene" COUNTING_MODE and UMIs were corrected with the `--soloUMIdedup 1MM_Directional_UMItoolsdirectional` parameter, which uses a directional correction method. The resulting BAM was then used for cell and gene metric correction using the sctools v0.3.13 TagSortBam tool. The STAR TSV outputs for gene counts, features, and barcodes were converted to numpy arrays for downstream empty droplet detection using DropletUtils v1.2.1 emptyDrops with the parameters
```--fdr-cutoff 0.01 --emptydrops-niters 10000 --min-molecules 100 --emptydrops-lower 100```.
Reads were trimmed using the solo parameter `--clipAdapterType CellRanger4` and `--outFilterScoreMin 30` which matches read trimming performed by CellRanger4. Reads were then aligned to GENCODE mouse (M32) or human (V43) references in stranded mode. Genes were annotated using the STARsolo "Gene" COUNTING_MODE and UMIs were corrected with the `--soloUMIdedup 1MM_CR` parameter, which uses Cell Ranger's correction method. The resulting BAM was then used for cell and gene metric correction using the warp-tools TagSort tool. The STAR TSV outputs for gene counts, features, and barcodes were converted to numpy arrays for downstream empty droplet detection using DropletUtils v1.2.1 emptyDrops with the parameters `--fdr-cutoff 0.01 --emptydrops-niters 10000 --min-molecules 100 --emptydrops-lower 100.`


All cell and gene metrics (alignment, mitochondrial, and other QC metrics), count matrices, and emptyDrops results were aggregated into a final Loom-formatted cell-by-gene matrix. The final outputs included the unfiltered Loom and unfiltered (but tagged) BAM file.
All cell and gene metrics (alignment, mitochondrial, and other QC metrics), count matrices, and emptyDrops results were aggregated into a final h5ad-formatted cell-by-gene matrix. The final outputs included the unfiltered h5ad and unfiltered (but tagged) BAM file.

An example of the pipeline and outputs is available on the Terra HCA Optimus Pipeline Featured Workspace (https://app.terra.bio/#workspaces/featured-workspaces-hca/HCA_Optimus_Pipeline), and an additional pipeline overview is available in [WARP documentation](https://broadinstitute.github.io/warp/docs/Pipelines/Optimus_Pipeline/README). Examples of genomic references, whitelists, and other inputs are available in the WARP repository (see the [example inputs](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/optimus/example_inputs)).
An example of the pipeline and outputs is available on the [Terra HCA Optimus Pipeline Featured Workspace](https://app.terra.bio/#workspaces/featured-workspaces-hca/HCA_Optimus_Pipeline), and an additional pipeline overview is available in [WARP documentation.](https://broadinstitute.github.io/warp/docs/Pipelines/Optimus_Pipeline/README) Examples of genomic references, whitelists, and other inputs are available in the WARP repository [(see the example inputs).](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/optimus/example_inputs).

## Single-nucleus (sn_rna mode)

Data preprocessing and count matrix construction were performed using the Optimus v5.3.0 pipeline (RRID:SCR_018908). Briefly, FASTQ files were partitioned by barcodes using sctools v0.3.13. The files were then trimmed, aligned, UMI-corrected against the 10x Genomics barcodes whitelist, and converted to a raw count matrix using STAR v2.7.9a. CB correction was performed using the `--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts` parameter which allowed for multiple matches in the whitelist with 1 mismatched base, used posterior probability calculation to choose one of the matches, added pseudocounts of 1 to all whitelist barcodes, and allowed multi-matching of CBs with N-bases to the whitelist.
Data preprocessing and count matrix construction were performed using the Optimus v7.8.1 pipeline (RRID:SCR_018908). Briefly, FASTQ files were partitioned by barcodes using fastqprocess. The files were then trimmed, aligned, UMI-corrected against the 10x Genomics barcodes whitelist, and converted to a raw count matrix using STARsolo v2.7.11a. CB correction was performed using the `--soloCBmatchWLtype 1MM_multi` parameter.

Reads were trimmed using the solo parameter `--clipAdapterType CellRanger4` and `--outFilterScoreMin 30` which matches read trimming performed by CellRanger4. Reads were then aligned to GENCODE mouse (M21) or human (V27) references in unstranded mode. Genes were annotated using the STAR "GeneFull" COUNTING_MODE and UMIs were corrected with the `--soloUMIdedup 1MM_Directional_UMItoolsdirectional` parameter, which uses a directional correction method. The resulting BAM was then used for cell and gene metric correction using the sctools v0.3.13 TagSortBam tool. The STAR TSV outputs for gene counts, features, and barcodes were converted to numpy arrays for downstream Loom conversion.
Reads were trimmed using the solo parameter `--clipAdapterType CellRanger4` and `--outFilterScoreMin 30` which matches read trimming performed by CellRanger4. Reads were then aligned to GENCODE mouse (M32) or human (V43) references in stranded mode. Genes were annotated using the STAR "GeneFull_Ex50pAS" COUNTING_MODE and UMIs were corrected with the `--soloUMIdedup 1MM_CR`, which uses a Cell Ranger's correction method. The resulting BAM was then used for cell and gene metric correction using the warp-tools TagSort tool. The STAR TSV outputs for gene counts, features, and barcodes were converted to numpy arrays for downstream h5ad conversion. All cell and gene metrics (alignment, mitochondrial, and other QC metrics) and count matrices were aggregated into a final h5ad-formatted cell-by-gene matrix. The final outputs included the unfiltered h5ad and unfiltered (but tagged) BAM file.

All cell and gene metrics (alignment, mitochondrial, and other QC metrics) and count matrices were aggregated into a final Loom-formatted cell-by-gene matrix. The final outputs included the unfiltered Loom and unfiltered (but tagged) BAM file.

An example of the pipeline and outputs is available on the [Terra HCA Optimus Pipeline Featured Workspace](https://app.terra.bio/#workspaces/featured-workspaces-hca/HCA_Optimus_Pipeline), and an additional pipeline overview is available in [WARP documentation](https://broadinstitute.github.io/warp/docs/Pipelines/Optimus_Pipeline/README). Examples of genomic references, whitelists, and other inputs are available in the WARP repository (see the [example inputs](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/optimus/example_inputs).
An example of the pipeline and outputs is available on the [Terra HCA Optimus Pipeline Featured Workspace](https://app.terra.bio/#workspaces/featured-workspaces-hca/HCA_Optimus_Pipeline), and an additional pipeline overview is available in [WARP documentation](https://broadinstitute.github.io/warp/docs/Pipelines/Optimus_Pipeline/README). Examples of genomic references, whitelists, and other inputs are available in the WARP repository (see the [example inputs.](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/optimus/example_inputs)).


7 changes: 7 additions & 0 deletions website/docs/Pipelines/snM3C/snm3c.methods.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# snM3C v4.0.1 Methods

# Methods

Methylome and chromatin contact sequencing data was preprocessed for downstream analysis using the snm3C v4.0.1 pipeline (RRID:SCR_025041). Briefly, [Cutadapt software](https://cutadapt.readthedocs.io/en/stable/) was used to demultiplex paired-end sequencing reads from a single 384-well plate to cell-level FASTQ files based on a list of random primer indices, and then further used to sort, filter, and trim reads. Paired-end reads were then aligned to the human hg38 v43 reference genome using HISAT-3N. Custom python scripts from the [CEMBA GitHub repository](https://github.com/DingWB/cemba_data) were then called to separate unmapped reads, unique reads, and multi-mapped reads. The unmapped reads were saved to a FASTQ file and used for single-end alignment with HISAT-3N. Overlapping reads were removed and all resulting aligned reads merged into a single BAM. All mapped reads were deduplicated using samtools and Picard. The resulting BAM was used as input to a custom CEMBA python script for chromatin contact calling based on a 2,500 base pair threshold and as input to the [ALLCools software](https://lhqing.github.io/ALLCools/intro.html) for methylation site calling. Key summary statistics for read trimming, mapping, deduplication and chromatin contacts were then calculated and exported to a summary metrics file.

Further details regarding tools, parameters, and references used in the pipeline are available in the [YAP documentation](https://hq-1.gitbook.io/mc).
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ sidebar_position: 1

This documentation site is built on top of the `React`-based framework [Docusaurus](https://docusaurus.io), so most of the recommended configuration can be found in the frameworks documentation.

## Versions and dependabot

Due to the nature of the frontend framework, most of the plugins this docsite is using can be deprecated or updated over the time. Github has the nice `dependabot` feature which periodically scans this code base and finds the potential vulnerbilities. There are usually 2 types of updates:

- Major version nump
Expand All @@ -15,17 +17,14 @@ The former one requires more human engagement and you should follow the official

The latter one in most cases will be identified by `dependabot` and only requires a human review + approval, before commentting `@dependabot merge` on the PR. In rare cases, the PR opened by `dependabot` will have merge conflicts against the `develop` branch. Developers will need to `cherry-pick` the commit made by `dependabot`, resolve the conflicts and open a separate PR for it.

When you want to test your changes to the docs, you can deploy the site locally.
If doing this for the first time, install the dev version of the site using the root of the repo:
```
yarn --cwd=website install
````
Then deploy the dev version using:
```
yarn --cwd=website start
```
## Deploying local site for testing and editing

to spin up a local server to preview your changes.
When you want to test your changes to the docs, you can deploy the site locally.
* first you need yarn, which is installed with npm: [mac install](https://classic.yarnpkg.com/lang/en/docs/install/#mac-stable)
* then you need to build the local website in the root of the docs directory `warp/website` on your desired branch
* `yarn --cwd=website install` # install the site
* `yarn --cwd=website start` # spin up a local server to preview your changes
* the local website should popup automatically in your default browser
* the website should update automatically as you edit the markdown files

For more details on deployment, check this [page](https://docusaurus.io/docs/deployment).

0 comments on commit 145b6bf

Please sign in to comment.