Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Km fix broken links in docs #1090

Merged
merged 8 commits into from
Oct 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions website/docs/Best_practices/GC_cost_optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ sidebar_position: 1

# WDL cost optimization: Tips and tricks when working with Google Cloud in Terra

Reducing the cost of your WDL workflow is always a priority. Below, the Broad Institute’s Pipelines team provides some tips and tricks for optimizing workflow costs for runs using Google Cloud virtual machines (VM) from the bioinformatics platform, [Terra](app.terra.bio).
Reducing the cost of your WDL workflow is always a priority. Below, the Broad Institute’s Pipelines team provides some tips and tricks for optimizing workflow costs for runs using Google Cloud virtual machines (VM) from the bioinformatics platform, [Terra](https://app.terra.bio/).

Overall, the majority of optimization comes down to understanding the size of your VM and how long you use it. Keep this in mind as you read the tips below and remember: one size does not necessarily fit all when it comes to optimizing your workflow.

Expand Down Expand Up @@ -57,7 +57,7 @@ Just like starting a VM has some overhead cost, localizing files also has some d

Each time you move cloud files, you pay egress for the network transitions, so it’s important to find the balance in the number of files you decide to move. You have to weigh whether it costs more to move a large file vs. moving several smaller files. For example, you might find that it’s more cost-efficient to move a zipped \~100 GB file than to move 100, 1 GB files. After running a test workflow, check your workflow logs to see what the timing is for localizing and moving files vs. running your tool. This might require trial and error when developing your workflow.

## Tip 4: Run files in parallel when possible
## Tip 4: Run files in parallel when possible
If you need to run multiple files through a tool, you’ll have to decide whether to scatter those files across multiple VMs (running the tool in parallel), or run the files sequentially through the tool in one VM.

Similar to the problem of running multiple tools per VM, running multiple files per VM also comes with the risk of rerunning your tool if a file should fail. If you’re running files that are large or prone to transient failures, it’s best to scatter them across VMs in parallel.
Expand Down
2 changes: 1 addition & 1 deletion website/docs/Best_practices/task_execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ sidebar_position: 6
---

# Task execution - tips for using the WDL task command section
Every WDL task has a [command section](https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#command-section) where you can call software tools and specify parameters to help transform your data into meaningful output. This section is like a terminal for whatever environment you’re using to execute your WDL script. That environment can be a virtual computer set up by a Docker container or it can be your local computer. If you’re using Cromwell to execute your WDL (as happens in the cloud-based platform [Terra](app.terra.bio)), the command section is run after Cromwell has resolved the task inputs but before it assesses outputs.
Every WDL task has a [command section](https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#command-section) where you can call software tools and specify parameters to help transform your data into meaningful output. This section is like a terminal for whatever environment you’re using to execute your WDL script. That environment can be a virtual computer set up by a Docker container or it can be your local computer. If you’re using Cromwell to execute your WDL (as happens in the cloud-based platform [Terra](https://app.terra.bio/)), the command section is run after Cromwell has resolved the task inputs but before it assesses outputs.

The actual commands you use in the task command section depend on what tools and operating systems are available in the execution environment. [WARP](https://github.com/broadinstitute/warp/tree/master) pipelines, for example, often set up virtual machines using Docker containers with Alpine Linux-based operating systems. This means that the command section should contain commands that work in Alpine. If additional software is installed on top of Alpine, that software's commands will also work. WARP workflows often require custom python scripts and that’s why python is installed on top of WARP’s Alpine-based (or other OS) Dockers. Python is one example, but you can install any language on a docker and then use language-specific commands or scripts from the WDL task command section. In addition to these commands, you can also point to paths for software (such as the path to a jar in the Docker) as well as input/output files that are in the Docker container.

Expand Down
2 changes: 1 addition & 1 deletion website/docs/Pipelines/Multiome_Pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ The Multiome workflow calls two subworkflows, which are described briefly in the

## Versioning and testing

All Multiome pipeline releases are documented in the [Multiome changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/multiome/Multiome.changelog.md) and tested using [plumbing and scientific test data](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/multiome/test_inputs/test_data_overview.md). To learn more about WARP pipeline testing, see [Testing Pipelines](https://broadinstitute.github.io/warp/docs/About_WARP/TestingPipelines).
All Multiome pipeline releases are documented in the [Multiome changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/multiome/Multiome.changelog.md) and tested using [plumbing and scientific test data](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/multiome/test_inputs). To learn more about WARP pipeline testing, see [Testing Pipelines](https://broadinstitute.github.io/warp/docs/About_WARP/TestingPipelines).

## Citing the Multiome Pipeline
Please identify the pipeline in your methods section using the Multiome Pipeline's [SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_024217/resolver?q=SCR_024217&l=SCR_024217&i=rrid:scr_024217).
Expand Down
12 changes: 6 additions & 6 deletions website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,10 @@ The [scATAC workflow](https://github.com/broadinstitute/warp/blob/master/pipelin

| Task | Task Description | Tool Docker Image | Parameter Descriptions or Code |
|--- | --- | --- | --- |
| AlignPairedEnd | Align the modified FASTQ files to the genome | [snaptools:0.0.1](https://github.com/broadinstitute/warp/blob/master/dockers/skylab/snaptools/Dockerfile) | [SnapTools documentation](https://github.com/r3fang/SnapTools) |
| SnapPre | Initial generation of snap file | [snaptools:0.0.1](https://github.com/broadinstitute/warp/blob/master/dockers/skylab/snaptools/Dockerfile) | [SnapTools documentation](https://github.com/r3fang/SnapTools) |
| SnapCellByBin | Binning of data by genomic bins | [snaptools:0.0.1](https://github.com/broadinstitute/warp/blob/master/dockers/skylab/snaptools/Dockerfile) | [SnapTools documentation](https://github.com/r3fang/SnapTools) |
| MakeCompliantBAM | Generation of a GA4GH compliant BAM | [snaptools:0.0.1](https://github.com/broadinstitute/warp/blob/master/dockers/skylab/snaptools/Dockerfile) | [Code](https://github.com/broadinstitute/warp/blob/develop/dockers/skylab/pytools/tools/makeCompliantBAM.py) |
| AlignPairedEnd | Align the modified FASTQ files to the genome | [snaptools-bwa:1.0.0-1.4.8-0.7.17-1690310027](https://github.com/broadinstitute/warp-tools/blob/develop/3rd-party-tools/snaptools-bwa/Dockerfile) | [SnapTools documentation](https://github.com/r3fang/SnapTools) |
| SnapPre | Initial generation of snap file | [snaptools-bwa:1.0.0-1.4.8-0.7.17-1690310027](https://github.com/broadinstitute/warp-tools/blob/develop/3rd-party-tools/snaptools-bwa/Dockerfile) | [SnapTools documentation](https://github.com/r3fang/SnapTools) |
| SnapCellByBin | Binning of data by genomic bins | [snaptools-bwa:1.0.0-1.4.8-0.7.17-1690310027](https://github.com/broadinstitute/warp-tools/blob/develop/3rd-party-tools/snaptools-bwa/Dockerfile) | [SnapTools documentation](https://github.com/r3fang/SnapTools) |
| MakeCompliantBAM | Generation of a GA4GH compliant BAM | [warp-tools:1.0.1-1690997141](https://github.com/broadinstitute/warp-tools/blob/develop/tools/Dockerfile) | [Code](https://github.com/broadinstitute/warp-tools/blob/develop/tools/scripts/makeCompliantBAM.py) |
| BreakoutSnap | Extraction of tables from snap file into text format (for testing and user availability) | [snap-breakout:0.0.1](https://github.com/broadinstitute/warp/tree/master/dockers/skylab/snap-breakout) | [Code](https://github.com/broadinstitute/warp/tree/master/dockers/skylab/snap-breakout/breakoutSnap.py) |

### Task Summary
Expand Down Expand Up @@ -109,7 +109,7 @@ The SnapCellByBin task uses the Snap file to create cell-by-bin count matrices i

#### MakeCompliantBAM

The MakeCompliantBAM task uses a [custom python script (here)](https://github.com/broadinstitute/warp/blob/develop/dockers/skylab/pytools/tools/makeCompliantBAM.py) to make a GA4GH compliant BAM by moving the cellular barcodes in the read names to the CB tag.
The MakeCompliantBAM task uses a [custom python script](https://github.com/broadinstitute/warp-tools/blob/develop/tools/scripts/makeCompliantBAM.py) to make a GA4GH compliant BAM by moving the cellular barcodes in the read names to the CB tag.

#### BreakoutSnap

Expand Down Expand Up @@ -160,7 +160,7 @@ All scATAC workflow releases are documented in the [scATAC changelog](https://gi
Please identify the pipeline in your methods section using the scATAC Pipeline's [SciCrunch resource identifier](https://scicrunch.org/scicrunch/Resources/record/nlx_144509-1/SCR_018919/resolver?q=SCR_018919&l=SCR_018919).
* Ex: *scATAC Pipeline (RRID:SCR_018919)*

## Consortia Support
## Consortia Support
This pipeline is supported and used by the [BRAIN Initiative Cell Census Network](https://biccn.org/) (BICCN).

If your organization also uses this pipeline, we would love to list you! Please reach out to us by contacting [the WARP team](mailto:[email protected]).
Expand Down
8 changes: 4 additions & 4 deletions website/docs/Pipelines/SlideSeq_Pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ Poly(A) tails are trimmed from reads using the STARsolo parameter `--clip3pAdapt

**Alignment**

STAR maps barcoded reads to the genome primary assembly reference (see the [Quickstart table](https://broadinstitute.github.io/warp/docs/Pipelines/Slide-seq_Pipeline/README#quickstart-table) above for version information). The example references for the Slide-seq workflow were generated using the [BuildIndices pipeline](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/build_indices/BuildIndices.wdl).
STAR maps barcoded reads to the genome primary assembly reference (see the [Quickstart table](#quickstart-table) above for version information). The example references for the Slide-seq workflow were generated using the [BuildIndices pipeline](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/build_indices/BuildIndices.wdl).

**Gene annotation and counting**

Expand All @@ -155,7 +155,7 @@ The task’s output includes a coordinate-sorted BAM file containing the bead ba

#### 4. Calculating metrics

The [CalculateGeneMetrics](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Metrics.wdl), [CalculateUMIsMetrics](https://github.com/broadinstitute/warp/blob/develop/master/skylab/Metrics.wdl), and [CalculateCellMetrics](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Metrics.wdl) tasks use [warp-tools](https://github.com/broadinstitute/warp-tools) to calculate summary metrics that help assess the per-bead and per-UMI quality of the data output each time this pipeline is run.
The [CalculateGeneMetrics](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Metrics.wdl), [CalculateUMIsMetrics](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Metrics.wdl), and [CalculateCellMetrics](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Metrics.wdl) tasks use [warp-tools](https://github.com/broadinstitute/warp-tools) to calculate summary metrics that help assess the per-bead and per-UMI quality of the data output each time this pipeline is run.

These metrics output from both tasks are included in the output Loom matrix. A detailed list of these metrics is found in the [Slide-seq Count Matrix Overview](./count-matrix-overview.md).

Expand Down Expand Up @@ -208,9 +208,9 @@ The following table lists the output files produced from the pipeline. For sampl
| fastq_reads_per_umi | `<input_id>.numReads_perCell_XM.txt` | Metric file containing the number of reads per UMI that were calculated prior to alignment. | TXT |
| loom_output_file | `<input_id>.loom` | Loom file containing count data and metadata. | Loom |

The Loom matrix is the default output. See the [create_loom_slide_seq.py](https://github.com/broadinstitute/warp-tools/blob/develop/scripts/create_loom_optimus.py) script for the detailed code. This matrix contains the unnormalized (unfiltered) count matrices, as well as the gene and bead barcode metrics detailed in the [Slide-seq Count Matrix Overview](./count-matrix-overview.md).
The Loom matrix is the default output. See the [create_loom_slide_seq.py](https://github.com/broadinstitute/warp-tools/blob/develop/tools/scripts/create_loom_optimus.py) script for the detailed code. This matrix contains the unnormalized (unfiltered) count matrices, as well as the gene and bead barcode metrics detailed in the [Slide-seq Count Matrix Overview](./count-matrix-overview.md).

The output Loom matrix can be converted to an H5AD file for downstream processing using a [custom script](https://github.com/broadinstitute/warp-tools/blob/develop/scripts/loom_to_h5ad.py) available in the [warp-tools GitHub repository](https://github.com/broadinstitute/warp-tools).
The output Loom matrix can be converted to an H5AD file for downstream processing using a [custom script](https://github.com/broadinstitute/warp-tools/blob/develop/tools/scripts/loom_to_h5ad.py) available in the [warp-tools GitHub repository](https://github.com/broadinstitute/warp-tools).

## Validation against on-prem pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ slug: /Pipelines/Smart-seq2_Multi_Sample_Pipeline/README

The Smart-seq2 Multi-Sample (Multi-SS2) Pipeline is a wrapper around the [Smart-seq2 Single Sample](../Smart-seq2_Single_Sample_Pipeline/README) pipeline. It is developed by the Data Coordination Platform of the Human Cell Atlas to process single-cell RNAseq (scRNAseq) data generated by Smart-seq2 assays. The workflow processes multiple cells by importing and running the [Smart-seq2 Single Sample workflow](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_sample/SmartSeq2SingleSample.wdl) for each cell (sample) and then merging the resulting Loom matrix output into a single Loom matrix containing raw counts and TPMs.

Full details about the Smart-seq2 Pipeline can be read in the [Smart-seq2 Single Sample Overview](../Smart-seq2_Single_Sample_Pipeline/README) in GitHub.
Full details about the Smart-seq2 Pipeline can be read in the [Smart-seq2 Single Sample Overview](https://broadinstitute.github.io/warp/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/README) in GitHub.

The Multi-SS2 workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. The Terra [Smart-seq2 public workspace](https://app.terra.bio/#workspaces/featured-workspaces-hca/HCA%20Smart-seq2%20Multi%20Sample%20Pipeline) contains the Smart-seq2 workflow, workflow configurations, required reference data and other inputs, and example testing data.

Expand Down Expand Up @@ -101,7 +101,7 @@ Release information for the Multi-SS2 Pipeline can be found in the [changelog](h
Please identify the pipeline in your methods section using the Smart-seq2 Multi-Sample Pipeline's [SciCrunch resource identifier](https://scicrunch.org/scicrunch/Resources/record/nlx_144509-1/SCR_018920/resolver?q=Smart-seq2&l=Smart-seq2).
* Ex: *Smart-seq2 Multi-Sample Pipeline (RRID:SCR_018920)*

## Consortia Support
## Consortia Support
This pipeline is supported and used by the [Human Cell Atlas](https://www.humancellatlas.org/) (HCA) project.

If your organization also uses this pipeline, we would love to list you! Please reach out to us by contacting [the WARP team](mailto:[email protected]).
Expand Down
2 changes: 1 addition & 1 deletion website/docs/Pipelines/snM3C/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The snM3C pipeline can be deployed using [Cromwell](https://cromwell.readthedocs

### Inputs

The snM3C workflow requires a JSON configuration file specifying the input files and parameters for the analysis. An example configuration file can be found in the [snM3C directory](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/snM3C/snM3C_inputs.json).
The snM3C workflow requires a JSON configuration file specifying the input files and parameters for the analysis. Example configuration files can be found in the [snM3C `test_inputs` directory](https://github.com/broadinstitute/warp/tree/develop/pipelines/skylab/snM3C/test_inputs) in the WARP repository.

The main input files and parameters include:

Expand Down