diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 913600b..c8117ab 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -19,7 +19,7 @@ If you'd like to write some code for nf-core/scnanoseq, the standard workflow is 1. Check that there isn't already an issue about your idea in the [nf-core/scnanoseq issues](https://github.com/nf-core/scnanoseq/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/scnanoseq repository](https://github.com/nf-core/scnanoseq) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) -4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +4. Use `nf-core pipelines schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -40,7 +40,7 @@ There are typically two types of tests that run: ### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. -To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. +To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core pipelines lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. @@ -75,7 +75,7 @@ If you wish to contribute a new step, please use the following coding standards: 2. Write the process block (see below). 3. Define the output channel if needed (see below). 4. Add any new parameters to `nextflow.config` with a default (see below). -5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool). +5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool). 6. Add sanity checks and validation for all relevant parameters. 7. Perform local tests to validate that the new code works as expected. 8. If applicable, add a new test command in `.github/workflow/ci.yml`. @@ -86,11 +86,11 @@ If you wish to contribute a new step, please use the following coding standards: Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. -Once there, use `nf-core schema build` to add to `nextflow_schema.json`. +Once there, use `nf-core pipelines schema build` to add to `nextflow_schema.json`. ### Default processes resource requirements -Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block. @@ -103,7 +103,7 @@ Please use the following naming schemes, to make it easy to understand what is g ### Nextflow version bumping -If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core pipelines bump-version --nextflow . [min-nf-version]` ### Images and figures diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index a0314bc..d79d108 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -17,7 +17,7 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/scna - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/scnanoseq/tree/master/.github/CONTRIBUTING.md) - [ ] If necessary, also make a PR on the nf-core/scnanoseq _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. -- [ ] Make sure your code lints (`nf-core lint`). +- [ ] Make sure your code lints (`nf-core pipelines lint`). - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index 490d087..f73f6e0 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,18 +1,35 @@ name: nf-core AWS full size tests -# This workflow is triggered on published releases. +# This workflow is triggered on PRs opened against the master branch. # It can be additionally triggered manually with GitHub actions workflow dispatch button. # It runs the -profile 'test_full' on AWS batch on: - release: - types: [published] + pull_request: + branches: + - master workflow_dispatch: + pull_request_review: + types: [submitted] + jobs: run-platform: name: Run AWS full tests - if: github.repository == 'nf-core/scnanoseq' + # run only if the PR is approved by at least 2 reviewers and against the master branch or manually triggered + if: github.repository == 'nf-core/scnanoseq' && github.event.review.state == 'approved' && github.event.pull_request.base.ref == 'master' || github.event_name == 'workflow_dispatch' runs-on: ubuntu-latest steps: + - uses: octokit/request-action@v2.x + id: check_approvals + with: + route: GET /repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/reviews + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - id: test_variables + if: github.event_name != 'workflow_dispatch' + run: | + JSON_RESPONSE='${{ steps.check_approvals.outputs.data }}' + CURRENT_APPROVALS_COUNT=$(echo $JSON_RESPONSE | jq -c '[.[] | select(.state | contains("APPROVED")) ] | length') + test $CURRENT_APPROVALS_COUNT -ge 2 || exit 1 # At least 2 approvals are required - name: Launch workflow via Seqera Platform uses: seqeralabs/action-tower-launch@v2 with: diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 349e712..dda9dd8 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -7,9 +7,12 @@ on: pull_request: release: types: [published] + workflow_dispatch: env: NXF_ANSI_LOG: false + NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity + NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity concurrency: group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}" @@ -17,27 +20,66 @@ concurrency: jobs: test: - name: Run pipeline with test data + name: "Run pipeline with test data (${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }})" # Only run on push if this is the nf-core dev branch (merged PRs) if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/scnanoseq') }}" runs-on: ubuntu-latest strategy: matrix: NXF_VER: - - "23.04.0" + - "24.04.2" - "latest-everything" + profile: + - "conda" + - "docker" + - "singularity" + test_name: + - "test" + isMaster: + - ${{ github.base_ref == 'master' }} + # Exclude conda and singularity on dev + exclude: + - isMaster: false + profile: "conda" + - isMaster: false + profile: "singularity" steps: - name: Check out pipeline code uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - - name: Install Nextflow + - name: Set up Nextflow uses: nf-core/setup-nextflow@v2 with: version: "${{ matrix.NXF_VER }}" - - name: Disk space cleanup + - name: Set up Apptainer + if: matrix.profile == 'singularity' + uses: eWaterCycle/setup-apptainer@main + + - name: Set up Singularity + if: matrix.profile == 'singularity' + run: | + mkdir -p $NXF_SINGULARITY_CACHEDIR + mkdir -p $NXF_SINGULARITY_LIBRARYDIR + + - name: Set up Miniconda + if: matrix.profile == 'conda' + uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3 + with: + miniconda-version: "latest" + auto-update-conda: true + conda-solver: libmamba + channels: conda-forge,bioconda + + - name: Set up Conda + if: matrix.profile == 'conda' + run: | + echo $(realpath $CONDA)/condabin >> $GITHUB_PATH + echo $(realpath python) >> $GITHUB_PATH + + - name: Clean up Disk space uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 - - name: Run pipeline with test data + - name: "Run pipeline with test data ${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }}" run: | - nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results + nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.test_name }},${{ matrix.profile }} --outdir ./results diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml index 2d20d64..713dc3e 100644 --- a/.github/workflows/download_pipeline.yml +++ b/.github/workflows/download_pipeline.yml @@ -1,4 +1,4 @@ -name: Test successful pipeline download with 'nf-core download' +name: Test successful pipeline download with 'nf-core pipelines download' # Run the workflow when: # - dispatched manually @@ -8,7 +8,7 @@ on: workflow_dispatch: inputs: testbranch: - description: "The specific branch you wish to utilize for the test execution of nf-core download." + description: "The specific branch you wish to utilize for the test execution of nf-core pipelines download." required: true default: "dev" pull_request: @@ -39,9 +39,11 @@ jobs: with: python-version: "3.12" architecture: "x64" - - uses: eWaterCycle/setup-singularity@931d4e31109e875b13309ae1d07c70ca8fbc8537 # v7 + + - name: Setup Apptainer + uses: eWaterCycle/setup-apptainer@4bb22c52d4f63406c49e94c804632975787312b3 # v2.0.0 with: - singularity-version: 3.8.3 + apptainer-version: 1.3.4 - name: Install dependencies run: | @@ -54,33 +56,64 @@ jobs: echo "REPOTITLE_LOWERCASE=$(basename ${GITHUB_REPOSITORY,,})" >> ${GITHUB_ENV} echo "REPO_BRANCH=${{ github.event.inputs.testbranch || 'dev' }}" >> ${GITHUB_ENV} + - name: Make a cache directory for the container images + run: | + mkdir -p ./singularity_container_images + - name: Download the pipeline env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images run: | - nf-core download ${{ env.REPO_LOWERCASE }} \ + nf-core pipelines download ${{ env.REPO_LOWERCASE }} \ --revision ${{ env.REPO_BRANCH }} \ --outdir ./${{ env.REPOTITLE_LOWERCASE }} \ --compress "none" \ --container-system 'singularity' \ - --container-library "quay.io" -l "docker.io" -l "ghcr.io" \ + --container-library "quay.io" -l "docker.io" -l "community.wave.seqera.io" \ --container-cache-utilisation 'amend' \ - --download-configuration + --download-configuration 'yes' - name: Inspect download run: tree ./${{ env.REPOTITLE_LOWERCASE }} + - name: Count the downloaded number of container images + id: count_initial + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Initial container image count: $image_count" + echo "IMAGE_COUNT_INITIAL=$image_count" >> ${GITHUB_ENV} + - name: Run the downloaded pipeline (stub) id: stub_run_pipeline continue-on-error: true env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results - name: Run the downloaded pipeline (stub run not supported) id: run_pipeline if: ${{ job.steps.stub_run_pipeline.status == failure() }} env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -profile test,singularity --outdir ./results + + - name: Count the downloaded number of container images + id: count_afterwards + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Post-pipeline run container image count: $image_count" + echo "IMAGE_COUNT_AFTER=$image_count" >> ${GITHUB_ENV} + + - name: Compare container image counts + run: | + if [ "${{ env.IMAGE_COUNT_INITIAL }}" -ne "${{ env.IMAGE_COUNT_AFTER }}" ]; then + initial_count=${{ env.IMAGE_COUNT_INITIAL }} + final_count=${{ env.IMAGE_COUNT_AFTER }} + difference=$((final_count - initial_count)) + echo "$difference additional container images were \n downloaded at runtime . The pipeline has no support for offline runs!" + tree ./singularity_container_images + exit 1 + else + echo "The pipeline can be downloaded successfully!" + fi diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 1a08eb1..16a79c9 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,6 +1,6 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. -# It runs the `nf-core lint` and markdown lint tests to ensure +# It runs the `nf-core pipelines lint` and markdown lint tests to ensure # that the code meets the nf-core guidelines. on: push: @@ -42,17 +42,32 @@ jobs: python-version: "3.12" architecture: "x64" + - name: read .nf-core.yml + uses: pietrobolcato/action-read-yaml@1.1.0 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + - name: Install dependencies run: | python -m pip install --upgrade pip - pip install nf-core + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Run nf-core pipelines lint + if: ${{ github.base_ref != 'master' }} + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - - name: Run nf-core lint + - name: Run nf-core pipelines lint --release + if: ${{ github.base_ref == 'master' }} env: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - name: Save PR number if: ${{ always() }} diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml index 40acc23..42e519b 100644 --- a/.github/workflows/linting_comment.yml +++ b/.github/workflows/linting_comment.yml @@ -11,7 +11,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Download lint results - uses: dawidd6/action-download-artifact@09f2f74827fd3a8607589e5ad7f9398816f540fe # v3 + uses: dawidd6/action-download-artifact@bf251b5aa9c2f7eeb574a96ee720e24f801b7c11 # v6 with: workflow: linting.yml workflow_conclusion: completed diff --git a/.github/workflows/release-announcements.yml b/.github/workflows/release-announcements.yml index 03ecfcf..c6ba35d 100644 --- a/.github/workflows/release-announcements.yml +++ b/.github/workflows/release-announcements.yml @@ -12,7 +12,7 @@ jobs: - name: get topics and convert to hashtags id: get_topics run: | - echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" >> $GITHUB_OUTPUT + echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" | sed 's/-//g' >> $GITHUB_OUTPUT - uses: rzr/fediverse-action@master with: diff --git a/.github/workflows/template_version_comment.yml b/.github/workflows/template_version_comment.yml new file mode 100644 index 0000000..e8aafe4 --- /dev/null +++ b/.github/workflows/template_version_comment.yml @@ -0,0 +1,46 @@ +name: nf-core template version comment +# This workflow is triggered on PRs to check if the pipeline template version matches the latest nf-core version. +# It posts a comment to the PR, even if it comes from a fork. + +on: pull_request_target + +jobs: + template_version: + runs-on: ubuntu-latest + steps: + - name: Check out pipeline code + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + with: + ref: ${{ github.event.pull_request.head.sha }} + + - name: Read template version from .nf-core.yml + uses: nichmor/minimal-read-yaml@v0.0.2 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + + - name: Install nf-core + run: | + python -m pip install --upgrade pip + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Check nf-core outdated + id: nf_core_outdated + run: echo "OUTPUT=$(pip list --outdated | grep nf-core)" >> ${GITHUB_ENV} + + - name: Post nf-core template version comment + uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2 + if: | + contains(env.OUTPUT, 'nf-core') + with: + repo-token: ${{ secrets.NF_CORE_BOT_AUTH_TOKEN }} + allow-repeats: false + message: | + > [!WARNING] + > Newer version of the nf-core template is available. + > + > Your pipeline is using an old version of the nf-core template: ${{ steps.read_yml.outputs['nf_core_version'] }}. + > Please update your pipeline to the latest version. + > + > For more documentation on how to update your pipeline, please see the [nf-core documentation](https://github.com/nf-core/tools?tab=readme-ov-file#sync-a-pipeline-with-the-template) and [Synchronisation documentation](https://nf-co.re/docs/contributing/sync). + # diff --git a/.gitignore b/.gitignore index f3113c8..b74c0fd 100644 --- a/.gitignore +++ b/.gitignore @@ -11,3 +11,4 @@ params.yml samplesheet.csv *.swp input* +null/ diff --git a/.gitpod.yml b/.gitpod.yml index 105a182..4611863 100644 --- a/.gitpod.yml +++ b/.gitpod.yml @@ -4,17 +4,14 @@ tasks: command: | pre-commit install --install-hooks nextflow self-update - - name: unset JAVA_TOOL_OPTIONS - command: | - unset JAVA_TOOL_OPTIONS vscode: extensions: # based on nf-core.nf-core-extensionpack - - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code + #- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar - mechatroner.rainbow-csv # Highlight columns in csv files in different colors - # - nextflow.nextflow # Nextflow syntax highlighting + - nextflow.nextflow # Nextflow syntax highlighting - oderwat.indent-rainbow # Highlight indentation level - streetsidesoftware.code-spell-checker # Spelling checker for source code - charliermarsh.ruff # Code linter Ruff diff --git a/.nf-core.yml b/.nf-core.yml index 1b1135d..6db149f 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -1,13 +1,26 @@ -repository_type: pipeline -nf_core_version: "2.14.1" - +bump_version: null lint: - template_strings: False # "Jinja string found in" bin/create_regex.py and bin/seurat_qc.R files_unchanged: - .github/workflows/linting.yml - lib/NfcoreTemplate.groovy - docs/images/nf-core-scnanoseq_logo_dark.png - docs/images/nf-core-scnanoseq_logo_light.png + - .gitignore pipeline_todos: - README.md - main.nf + template_strings: false +nf_core_version: 3.0.2 +org_path: null +repository_type: pipeline +template: + author: Austyn Trull, Lara Ianov + description: Single-cell/nuclei pipeline for data derived from Oxford Nanopore + force: false + is_nfcore: true + name: scnanoseq + org: nf-core + outdir: . + skip_features: null + version: 1.0.0 +update: null diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 4dc0f1d..9e9f0e1 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -7,7 +7,7 @@ repos: - prettier@3.2.5 - repo: https://github.com/editorconfig-checker/editorconfig-checker.python - rev: "2.7.3" + rev: "3.0.3" hooks: - id: editorconfig-checker alias: ec diff --git a/CHANGELOG.md b/CHANGELOG.md index b336102..dbc2f15 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,14 +3,26 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## v1.0.0 [2024-10-07] +## v1.1.0 [TBD] -Initial release of nf-core/scnanoseq, created with the [nf-core](https://nf-co.re/) template. +### Enhancements + +- Inputs for IsoQuant are split on chromosome to allow for faster processing +- The read counts QC metric is now able to leverage NanoPlot counts if FastQC is skipped +- Added `oarfish` as an option for quantification + +### Fixes -### `Added` +- The 'Post Trim Read QC' and 'Post Extract Read QC' nodes on the metro diagram have been placed in correct locations +- The BLAZE process in the example config has been corrected to use cpus instead of `--threads` -### `Fixed` +### Software dependencies -### `Dependencies` +| Dependency | Old version | New version | +| ---------- | ----------- | ----------- | +| `IsoQuant` | 3.5.0 | 3.6.1 | +| `MultiQC` | 1.25 | 1.25.1 | -### `Deprecated` +## v1.0.0 [2024-10-07] + +Initial release of nf-core/scnanoseq, created with the [nf-core](https://nf-co.re/) template. diff --git a/CITATIONS.md b/CITATIONS.md index 3e9f38a..ecd3867 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -16,7 +16,7 @@ - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) - > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. +> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. - [IsoQuant](https://pubmed.ncbi.nlm.nih.gov/36593406/) @@ -28,7 +28,7 @@ - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) - > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. +> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. - [NanoComp](https://pubmed.ncbi.nlm.nih.gov/29547981/) @@ -42,6 +42,10 @@ > De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018 Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794. +- [oarfish](https://github.com/COMBINE-lab/oarfish) + + > Jousheghani ZZ, Patro R. Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification. bioRxiv [Preprint]. 2024 Mar 1:2024.02.28.582591. doi: 10.1101/2024.02.28.582591. PMID: 38464200; PMCID: PMC10925290. + - [pigz](https://zlib.net/pigz/) - [SAMtools](https://pubmed.ncbi.nlm.nih.gov/19505943/) diff --git a/README.md b/README.md index d884531..a1f3ea2 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ [![GitHub Actions Linting Status](https://github.com/nf-core/scnanoseq/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/scnanoseq/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/scnanoseq/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.13899279-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.13899279) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) -[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/) [![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) @@ -34,18 +34,18 @@ On release, automated continuous integration tests run the pipeline on a full-si 1. Optional: Split FASTQ for faster processing ([`split`](https://linux.die.net/man/1/split)) 3. Trim and filter reads ([`Nanofilt`](https://github.com/wdecoster/nanofilt)) 4. Post trim QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), [`NanoPlot`](https://github.com/wdecoster/NanoPlot), [`NanoComp`](https://github.com/wdecoster/nanocomp) and [`ToulligQC`](https://github.com/GenomiqueENS/toulligQC)) -5. Barcode detection using a custom whitelist or 10X whitelist. [`BLAZE`](https://github.com/shimlab/BLAZE) +5. Barcode detection using a custom whitelist or 10X whitelist. ([`BLAZE`](https://github.com/shimlab/BLAZE)) 6. Extract barcodes. Consists of the following steps: 1. Parse FASTQ files into R1 reads containing barcode and UMI and R2 reads containing sequencing without barcode and UMI (custom script `./bin/pre_extract_barcodes.py`) 2. Re-zip FASTQs ([`pigz`](https://github.com/madler/pigz)) 7. Barcode correction (custom script `./bin/correct_barcodes.py`) 8. Post-extraction QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), [`NanoPlot`](https://github.com/wdecoster/NanoPlot), [`NanoComp`](https://github.com/wdecoster/nanocomp) and [`ToulligQC`](https://github.com/GenomiqueENS/toulligQC)) -9. Alignment ([`minimap2`](https://github.com/lh3/minimap2)) +9. Alignment to the genome, transcriptome, or both ([`minimap2`](https://github.com/lh3/minimap2)) 10. Post-alignment filtering of mapped reads and gathering mapping QC ([`SAMtools`](http://www.htslib.org/doc/samtools.html)) 11. Post-alignment QC in unfiltered BAM files ([`NanoComp`](https://github.com/wdecoster/nanocomp), [`RSeQC`](https://rseqc.sourceforge.net/)) 12. Barcode (BC) tagging with read quality, BC quality, UMI quality (custom script `./bin/tag_barcodes.py`) -13. UMI-based deduplication [`UMI-tools`](https://github.com/CGATOxford/UMI-tools) -14. Gene and transcript level matrices generation [`IsoQuant`](https://github.com/ablab/IsoQuant) +13. UMI-based deduplication ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools)) +14. Gene and transcript level matrices generation with [`IsoQuant`](https://github.com/ablab/IsoQuant) and/or transcript level matrices with [`oarfish`](https://github.com/COMBINE-lab/oarfish) 15. Preliminary matrix QC ([`Seurat`](https://github.com/satijalab/seurat)) 16. Compile QC for raw reads, trimmed reads, pre and post-extracted reads, mapping metrics and preliminary single-cell/nuclei QC ([`MultiQC`](http://multiqc.info/)) @@ -77,14 +77,15 @@ nextflow run nf-core/scnanoseq \ ``` > [!WARNING] -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; -> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). +> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/scnanoseq/usage) and the [parameter documentation](https://nf-co.re/scnanoseq/parameters). ## Pipeline output -This pipeline produces feature-barcode matrices at both the gene and transcript level and can be configured to retain introns within the counts themselves. These feature-barcode matrices are able to be ingested directly by most packages used for downstream analyses such as `Seurat`. Additionally, the pipeline produces a number of quality control metrics to ensure that the samples processed meet expected metrics for single-cell/nuclei data. +This pipeline produces feature-barcode matrices as the main output. These feature-barcode matrices are able to be ingested directly by most packages used for downstream analyses such as `Seurat`. Additionally, the pipeline produces a number of quality control metrics to ensure that the samples processed meet expected metrics for single-cell/nuclei data. + +The pipeline provides two tools to produce the aforementioned feature-barcode matrices, `IsoQuant` and `oarfish`, and the user is given the ability to choose whether to run both or just one. `IsoQuant` will require a genome fasta to be used as input to the pipeline, and will produce both gene and transcript level matrices. `oarfish` will require a transcriptome fasta to be used as input to the pipeline and will produce only transcript level matrices. To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/scnanoseq/results) tab on the nf-core website pipeline page. For more details about the full set of output files and reports, please refer to the @@ -115,17 +116,11 @@ process } } -//NOTE: reminder that params set in modules.config need to be copied over to a custom config process { withName: '.*:BLAZE' { - ext.args = { - [ - "--threads 30", - params.barcode_format == "10X_3v3" ? "--kit-version 3v3" : params.barcode_format == "10X_5v2" ? "--kit-version 5v2" : "" - ].join(' ').trim() - } + cpus = 30 } } diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml index 36e2841..dd13543 100644 --- a/assets/multiqc_config.yml +++ b/assets/multiqc_config.yml @@ -1,6 +1,7 @@ report_comment: > - This report has been generated by the nf-core/scnanoseq analysis pipeline. For information about how to interpret these results, please see the documentation. - + This report has been generated by the nf-core/scnanoseq + analysis pipeline. For information about how to interpret these results, please see the + documentation. report_section_order: "nf-core-scnanoseq-methods-description": order: -1000 @@ -61,8 +62,9 @@ custom_content: seurat_section: parent_id: seurat_section order: - - transcript_seurat_stats_module - - gene_seurat_stats_module + - isoquant_transcript_seurat_stats_module + - isoquant_gene_seurat_stats_module + - oarfish_transcript_seurat_stats_module custom_data: read_counts_module: @@ -73,30 +75,42 @@ custom_data: file_format: "csv" plot_type: "table" - gene_seurat_stats_module: + isoquant_gene_seurat_stats_module: + parent_id: seurat_section + parent_name: "Seurat Section" + parent_description: "Preliminary expression analysis summary completed with + Seurat using IsoQuant generated matrices. Note that these numbers are generated + without any filtering done on the dataset" + section_name: "IsoQuant Gene Seurat Stats" + file_format: "tsv" + plot_type: "table" + + isoquant_transcript_seurat_stats_module: parent_id: seurat_section parent_name: "Seurat Section" parent_description: "Preliminary expression analysis summary completed with - Seurat. Note that these numbers are generated + Seurat using IsoQuant generated matrices. Note that these numbers are generated without any filtering done on the dataset" - section_name: "Gene Seurat Stats" + section_name: "IsoQuant Transcript Seurat Stats" file_format: "tsv" plot_type: "table" - transcript_seurat_stats_module: + oarfish_transcript_seurat_stats_module: parent_id: seurat_section parent_name: "Seurat Section" parent_description: "Preliminary expression analysis summary completed with - Seurat. Note that these numbers are generated + Seurat using OARFISH generated matrices. Note that these numbers are generated without any filtering done on the dataset" - section_name: "Transcript Seurat Stats" + section_name: "OARFISH Transcript Seurat Stats" file_format: "tsv" plot_type: "table" sp: - gene_seurat_stats_module: - fn: "gene.*.tsv" - transcript_seurat_stats_module: - fn: "transcript.*.tsv" + isoquant_gene_seurat_stats_module: + fn: "isoquant_gene.tsv" + isoquant_transcript_seurat_stats_module: + fn: "isoquant_transcript.tsv" + oarfish_transcript_seurat_stats_module: + fn: "oarfish_transcript.tsv" read_counts_module: fn: "read_counts.csv" diff --git a/assets/schema_input.json b/assets/schema_input.json index 55e7095..3340046 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/scnanoseq/master/assets/schema_input.json", "title": "nf-core/scnanoseq pipeline - params.input schema", "description": "Schema for the file provided with params.input", diff --git a/assets/scnanoseq_tube_map.png b/assets/scnanoseq_tube_map.png index 8b80566..73b14c2 100644 Binary files a/assets/scnanoseq_tube_map.png and b/assets/scnanoseq_tube_map.png differ diff --git a/assets/scnanoseq_tube_map.svg b/assets/scnanoseq_tube_map.svg index 7d71130..5d28bca 100644 --- a/assets/scnanoseq_tube_map.svg +++ b/assets/scnanoseq_tube_map.svg @@ -2,9 +2,9 @@ HTMLHTMLfastqTSVTSVRead QCRead QCFastQC &Nanoplot &NanoPlot &Nanocomp &NanoComp &ToulligQCRead Filtering &Read Filtering &TrimmingNanofiltNanoFiltPost Trim Post Extract Read QCFastQC &Nanoplot &NanoPlot &Nanocomp &NanoComp &ToulligQCBarcode Barcode DetectionBLAZEPost Extract Post Trim Read QCFastQC &Nanoplot &NanoPlot &Nanocomp &NanoComp &ToulligQCBarcodeBarcodeExtractionExtractionAlignment Alignment StatsSamtools &SAMtools &RSeQC &NanocompNanoCompAlignment StatsSAMtools &RSeQC &NanoCompCell Counts QC MergeSeuratGene CellCount QCSeuratFinal ReportFinal ReportMultiQCFinal ReportMultiQCHTMLQC ReportTranscript QuantificationQuantification: oarfishGene QuantificationQuantification: IsoQuantFile inputFile inputFile OutputFile OutputBarcode Barcode TaggingTranscriptomeAlignmentMinimap2TranscriptomeAlignmentMinimap2GenomeAlignmentAlignmentMinimap2Minimap2Barcode GenomeCorrectionAlignmentMinimap2UMI UMI DeduplicationumitoolsUMI-toolsTranscript CellCount QCSeuratCell Feature QuantificationIsoquantIsoQuantBarcode CorrectionHTMLQC ReportLegendLegendHTMLQC ReportCell Feature QuantificationIsoQuantCell Feature QuantificationoarfishTXTMappingStatsBAMTXTMappingStatsBAMAlignmentFileBAMAlignmentFileGene-LevelBarcode-FeatureMatrixTranscript-LevelBarcode-FeatureMatrixTSVTSVGene-LevelBarcode-FeatureMatrixTranscript-LevelBarcode-FeatureMatrixTSVTranscript-LevelBarcode-FeatureMatrix + x="7.9847655" + y="181.98756" />Barcode CorrectionUMI DeduplicationUMI-toolsTranscript CellCount QCSeuratGene CellCount QCSeuratCell Counts QC MergeSeurat diff --git a/bin/generate_read_counts.sh b/bin/generate_read_counts.sh index 4c42399..3b6af98 100755 --- a/bin/generate_read_counts.sh +++ b/bin/generate_read_counts.sh @@ -9,6 +9,13 @@ get_fastqc_counts() } +get_nanoplot_counts() +{ + nanoplot_file=$1 + counts=$(grep 'Number of reads' $nanoplot_file | awk '{print $NF}' | cut -f1 -d'.' | sed 's/,//g') + echo $counts +} + output="" input="" @@ -30,47 +37,73 @@ data="" header="sample,base_fastq_counts,trimmed_read_counts,extracted_read_counts,corrected_read_counts" echo "$header" > $output -for sample_name in $(for file in $(readlink -f $input)/*.zip; do echo $file; done | cut -f1 -d'.' | sort -u) +for sample_name in $(for file in $(readlink -f $input)/*.tsv; do basename $file; done | cut -f1 -d'.' | sort -u) do + ############### + # INPUT_FILES # + ############### + raw_fastqc="${sample_name}.raw_fastqc.zip" + raw_nanoplot="${sample_name}.raw_NanoStats.txt" + trim_fastqc="${sample_name}.trimmed_fastqc.zip" + trim_nanoplot="${sample_name}.trimmed_NanoStats.txt" + extract_fastqc="${sample_name}.extracted_fastqc.zip" + extract_nanoplot="${sample_name}.extracted_NanoStats.txt" + correct_csv="${sample_name}.corrected_bc_umi.tsv" data="$(basename $sample_name)" - # RAW FASTQ COUNTS - + #################### + # RAW FASTQ COUNTS # + #################### if [[ -s "$raw_fastqc" ]] then fastqc_counts=$(get_fastqc_counts "$raw_fastqc") data="$data,$fastqc_counts" + elif [[ -s "$raw_nanoplot" ]] + then + nanoplot_counts=$(get_nanoplot_counts "$raw_nanoplot") + data="$data,$nanoplot_counts" else data="$data," fi - # TRIM COUNTS - + ############### + # TRIM COUNTS # + ############### if [[ -s "$trim_fastqc" ]] then trim_counts=$(get_fastqc_counts "$trim_fastqc") data="$data,$trim_counts" + elif [[ -s "$trim_nanoplot" ]] + then + nanoplot_counts=$(get_nanoplot_counts "$trim_nanoplot") + data="$data,$nanoplot_counts" else data="$data," fi - # PREEXTRACT COUNTS - - if [ -s "$extract_fastqc" ] + ##################### + # PREEXTRACT COUNTS # + ##################### + if [[ -s "$extract_fastqc" ]] then extract_counts=$(get_fastqc_counts "$extract_fastqc") data="$data,$extract_counts" + elif [[ -s "$extract_nanoplot" ]] + then + nanoplot_counts=$(get_nanoplot_counts "$extract_nanoplot") + data="$data,$nanoplot_counts" else data="$data," fi - # CORRECT COUNTS - - if [ -s $correct_csv ] + ################## + # CORRECT COUNTS # + ################## + if [[ -s $correct_csv ]] then correct_counts=$(cut -f6 $correct_csv | awk '{if ($0 != "") {print $0}}' | wc -l) data="$data,$correct_counts" diff --git a/bin/mtx_merge.py b/bin/mtx_merge.py new file mode 100755 index 0000000..c5f5467 --- /dev/null +++ b/bin/mtx_merge.py @@ -0,0 +1,91 @@ +#!/usr/bin/env python3 + +import argparse +import os +import pandas as pd + +LSUFFIX='_left' +RSUFFIX='_right' + +def parse_args(): + """ Parse commandline arguments """ + parser = argparse.ArgumentParser() + + parser.add_argument( + "-i", + "--in_dir", + default=None, + type=str, + required=True, + help="The input directory containing all matrices to merge" + ) + + parser.add_argument( + "-x", + "--in_extension", + default=".mtx", + type=str, + required=True, + help="The file extension for matrices to merge" + ) + + parser.add_argument( + "-o", + "--out_file", + default="out.mtx", + type=str, + required=False, + help="The name of the resulting matrix" + ) + + return parser.parse_args() + +def get_mtx_list(in_dir, mtx_ext): + mtx_list = [] + + for mtx in os.listdir(os.fsencode(in_dir)): + mtx_name = os.fsdecode(mtx) + + if mtx_name.endswith(mtx_ext): + mtx_list.append('/'.join([in_dir,mtx_name])) + + return mtx_list + +def merge_matrices(in_mtx): + final_mtx = None + for mtx in in_mtx: + print(mtx) + mtx_df = pd.read_csv(mtx, delimiter="\t", header=0, index_col=0).transpose() + + # This means the matrix is empty so we can skip it + if len(mtx_df.columns) <= 1 and 'count' in mtx_df.columns: + continue + + if final_mtx is None: + final_mtx = mtx_df + + else: + final_mtx = final_mtx.join(mtx_df, how='outer', lsuffix=LSUFFIX, rsuffix=RSUFFIX) + + # We do expect duplicate feature names, we just need to combine them + if final_mtx.columns.str.contains(LSUFFIX).any(): + dupe_bc_cols = final_mtx.columns[final_mtx.columns.str.contains(LSUFFIX)].str.replace(LSUFFIX, '') + + # Iterate through all duplicated columns and sum them + for dupe_bc_col in dupe_bc_cols: + bc_cols = final_mtx.columns[final_mtx.columns.str.contains(dupe_bc_col)] + final_mtx[dupe_bc_col] = final_mtx[bc_cols].sum(axis=1) + final_mtx = final_mtx.drop(columns = bc_cols) + + return final_mtx.transpose().fillna(value=0.0) + +def main(): + args = parse_args() + + mtx_list = get_mtx_list(args.in_dir, args.in_extension) + + final_mtx = merge_matrices(mtx_list) + final_mtx.to_csv(args.out_file, sep='\t') + +if __name__ == '__main__': + main() diff --git a/bin/seurat_qc.R b/bin/seurat_qc.R index 18b13ee..900a2fa 100755 --- a/bin/seurat_qc.R +++ b/bin/seurat_qc.R @@ -57,6 +57,7 @@ plotSingleCellDensity <- function(input_obj, params_list <- list( make_option(c("-i", "--input_matrix" ), type="character", default=NULL , metavar="path" , help="Count file matrix where rows are genes and columns are cells/nuclei."), + make_option(c("-j", "--input_dir" ), type="character", default=NULL , metavar="path" , help="Directory containing matrix.mtx, genes.tsv (or features.tsv) , and barcodes.tsv."), make_option(c("-s", "--flagstat" ), type="character", default=NULL , metavar="path" , help="Flagstat file from samtools QC." ), make_option(c("-d", "--id" ), type="character", default="scnanoseq", metavar="string" , help="Project name for Seurat object." ), make_option(c("-o", "--outdir" ), type="character", default="./" , metavar="path" , help="Output directory." ), @@ -66,9 +67,9 @@ params_list <- list( opt_parser <- OptionParser(option_list=params_list) opt <- parse_args(opt_parser) -if (is.null(opt$input_matrix)) { +if (is.null(opt$input_matrix) && is.null(opt$input_dir)) { print_help(opt_parser) - stop("Please provide a single-cell/nuclei matrix.", call. = FALSE) + stop("Please provide either a single-cell/nuclei matrix or a directory containing a matrix.mtx, genes.tsv (or features.tsv) and barcodes.tsv.", call. = FALSE) } if (is.null(opt$flagstat)) { @@ -80,8 +81,26 @@ if (is.null(opt$flagstat)) { ### READ IN INPUTs ### ###################### +# Create the Seurat object +#NOTE: we do not perform any pre-filtering at this point + # cell or nuclei matrix (calling it cell for simplicity) -cell_bc_matrix <- read.table(opt$input_matrix, sep="\t", header = TRUE, row.names = 1) + +if (!is.null(opt$input_dir)) { + cell_bc_matrix <- Read10X(data.dir = opt$input_dir, + gene.column = 1, + cell.column = 2) + seurat_obj <- CreateSeuratObject(counts = cell_bc_matrix, + min.cells = 0, + min.features = 0, + project = opt$id) +} else { + cell_bc_matrix <- read.table(opt$input_matrix, sep="\t", header = TRUE, row.names = 1) + seurat_obj <- CreateSeuratObject(counts = cell_bc_matrix, + min.cells = 0, + min.features = 0, + project = opt$id) +} flagstat_lines <- readLines(opt$flagstat) @@ -95,16 +114,6 @@ index_nums <- grep("total", flagstat_lines) # This will parse out the total read count total_reads <- as.numeric(gsub("([0-9]+).*$", "\\1", flagstat_lines[index_nums])) -##################### -### SEURAT OBJECT ### -##################### - -# Create the Seurat object -#NOTE: we do not perform any pre-filtering at this point -seurat_obj <- CreateSeuratObject(counts = cell_bc_matrix, - min.cells = 0, - min.features = 0, - project = opt$id) ###################### ### GENERATE PLOTS ### diff --git a/conf/base.config b/conf/base.config index 546de6b..1532158 100644 --- a/conf/base.config +++ b/conf/base.config @@ -10,9 +10,9 @@ process { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 * task.attempt } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' } maxRetries = 1 @@ -25,30 +25,30 @@ process { // adding in your local modules too. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors withLabel:process_single { - cpus = { check_max( 1 , 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 12.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 2 * task.attempt } + memory = { 12.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 36.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } + cpus = { 6 * task.attempt } + memory = { 36.GB * task.attempt } + time = { 8.h * task.attempt } } withLabel:process_high { - cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 72.GB * task.attempt, 'memory' ) } - time = { check_max( 20.h * task.attempt, 'time' ) } + cpus = { 12 * task.attempt } + memory = { 72.GB * task.attempt } + time = { 20.h * task.attempt } } withLabel:process_long { - time = { check_max( 60.h * task.attempt, 'time' ) } + time = { 60.h * task.attempt } } withLabel:process_high_memory { - memory = { check_max( 200.GB * task.attempt, 'memory' ) } + memory = { 200.GB * task.attempt } } withLabel:error_ignore { errorStrategy = 'ignore' diff --git a/conf/igenomes_ignored.config b/conf/igenomes_ignored.config new file mode 100644 index 0000000..b4034d8 --- /dev/null +++ b/conf/igenomes_ignored.config @@ -0,0 +1,9 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for iGenomes paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Empty genomes dictionary to use when igenomes is ignored. +---------------------------------------------------------------------------------------- +*/ + +params.genomes = [:] diff --git a/conf/modules.config b/conf/modules.config index cc4fc8c..3b1b357 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -69,6 +69,7 @@ if (!params.skip_qc && !params.skip_fastqc) { if (!params.skip_qc && !params.skip_nanoplot) { process { withName: '.*:FASTQC_NANOPLOT_PRE_TRIM:NANOPLOT' { + ext.prefix = { "${meta.id}.raw" } publishDir = [ path: { "${params.outdir}/${meta.id}/qc/nanoplot/pre_trim/" }, mode: params.publish_dir_mode, @@ -78,6 +79,7 @@ if (!params.skip_qc && !params.skip_nanoplot) { if (!params.skip_trimming) { withName: '.*:FASTQC_NANOPLOT_POST_TRIM:NANOPLOT' { + ext.prefix = { "${meta.id}.trimmed" } publishDir = [ path: { "${params.outdir}/${meta.id}/qc/nanoplot/post_trim/" }, mode: params.publish_dir_mode, @@ -87,6 +89,7 @@ if (!params.skip_qc && !params.skip_nanoplot) { } withName: '.*:FASTQC_NANOPLOT_POST_EXTRACT:NANOPLOT' { + ext.prefix = { "${meta.id}.extracted"} publishDir = [ path: { "${params.outdir}/${meta.id}/qc/nanoplot/post_extract/" }, mode: params.publish_dir_mode, @@ -109,17 +112,6 @@ if (!params.skip_qc && !params.skip_fastq_nanocomp) { } } -if (!params.skip_qc && !params.skip_bam_nanocomp) { - process { - withName: '.*:NANOCOMP_BAM' { - publishDir = [ - path: { "${params.outdir}/batch_qcs/nanocomp/bam" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - } - } -} // TOULLIGQC if (!params.skip_qc && !params.skip_toulligqc) { @@ -157,27 +149,6 @@ if (!params.skip_qc && !params.skip_toulligqc) { // SAMTOOLS if (!params.skip_qc){ - process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_MINIMAP:BAM_STATS_SAMTOOLS:.*' { - ext.prefix = { "${meta.id}.minimap" } - publishDir = [ - path: { "${params.outdir}/${meta.id}/qc/samtools/minimap" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - } - } - - process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_FILTERED:BAM_STATS_SAMTOOLS:.*' { - ext.prefix = { "${meta.id}.mapped_only" } - publishDir = [ - path: { "${params.outdir}/${meta.id}/qc/samtools/mapped_only" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - } - } process { withName:'.*:BAM_SORT_STATS_SAMTOOLS_CORRECTED:BAM_STATS_SAMTOOLS:.*' { @@ -190,35 +161,11 @@ if (!params.skip_qc){ } } - if (!params.skip_dedup){ - process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_DEDUP:BAM_STATS_SAMTOOLS:.*' { - ext.prefix = { "${meta.id}.dedup.sorted" } - publishDir = [ - path: { "${params.outdir}/${meta.id}/bam/dedup" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - } - } - } } -if (!params.skip_qc && !params.skip_rseqc) { - - process { - withName:'.*:RSEQC_READDISTRIBUTION' { - publishDir = [ - path: { "${params.outdir}/${meta.id}/qc/rseqc" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - } - } -} // READ COUNTS -if (!params.skip_qc && !params.skip_fastqc) { +if (!params.skip_qc) { process { withName:'.*:READ_COUNTS' { @@ -237,34 +184,13 @@ if (!params.skip_qc && !params.skip_fastqc) { // PREPARE_REFERENCE_FILES process { - withName: '.*:PREPARE_REFERENCE_FILES:SAMTOOLS_FAIDX' { + withName: '.*:PREPARE_REFERENCE_FILES:.*' { publishDir = [ enabled: false ] } } -// MINIMAP2_INDEX -if (!params.skip_save_minimap2_index) { - process { - withName:'.*:MINIMAP2_INDEX' { - ext.args = { - [ - "-ax splice", - params.stranded == "forward" ? "-uf" : params.stranded == "reverse" ? "-ub" : "-un", - "-k${params.kmer_size}", - params.save_secondary_alignment == false ? "--secondary=no " : "--secondary=yes " - ].join(' ').trim() - } - publishDir = [ - path: { "${params.outdir}/references/minimap_index" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - } - } -} - process { withName: '.*:UCSC_GTFTOGENEPRED' { publishDir = [ @@ -342,9 +268,7 @@ if (params.split_amount > 0) { process { withName: '.*:PIGZ_COMPRESS' { publishDir = [ - path: { "${params.outdir}/${meta.id}/fastq/extracted" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + enabled: false ] } } @@ -390,15 +314,6 @@ if (!params.skip_trimming) { } } -// PREEXTRACT_FASTQ -process { - withName: '.*:PREEXTRACT_FASTQ' { - ext.prefix = { params.split_amount <= 0 ? "${meta.id}" : "${reads}".toString().replace('.fastq', '') } - publishDir = [ - enabled: false - ] - } -} /////////////////////// // BARCODE DETECTION // @@ -420,59 +335,51 @@ process { } } -/////////////// -// ALIGNMENT // -/////////////// - -// MINIMAP +// PREEXTRACT_FASTQ process { - withName:'.*:MINIMAP2_ALIGN' { - ext.args = { - [ - "--MD -ax splice", - params.stranded == "forward" ? "-uf" : params.stranded == "reverse" ? "-ub" : "-un", - "-k${params.kmer_size}", - params.save_secondary_alignment == false ? "--secondary=no " : "--secondary=yes " - ].join(' ').trim() - } + withName: '.*:PREEXTRACT_FASTQ' { + ext.prefix = { params.split_amount <= 0 ? "${meta.id}" : "${reads}".toString().replace('.fastq', '') } publishDir = [ enabled: false ] } } -//////////////////// -// BAM PROCESSING // -//////////////////// - -// SAMTOOLS_VIEW_FILTER +// CORRECT_BARCODES process { - withName:'.*:SAMTOOLS_VIEW_FILTER' { - ext.args = "-b -F 4" - ext.prefix = { "${meta.id}.mapped_only" } + withName: '.*:CORRECT_BARCODES' { + ext.prefix = { params.split_amount <= 0 ? "${meta.id}" : "${bc_info}".toString().replace('.extracted.putative_bc_umi.tsv', '') } publishDir = [ enabled: false ] } } -// SAMTOOLS_SORT +// TAG_BARCODES +process { + withName: '.*PROCESS_LONGREAD_SCRNA_GENOME.*:TAG_BARCODES' { + publishDir = [ + path: { "${params.outdir}/${meta.id}/genome/bam/barcode_tagged" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } +} process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_MINIMAP:SAMTOOLS_SORT' { - ext.prefix = { "${meta.id}.sorted" } + withName: '.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:TAG_BARCODES' { publishDir = [ - path: { "${params.outdir}/${meta.id}/bam/original" }, + path: { "${params.outdir}/${meta.id}/transcriptome/bam/barcode_tagged" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } +// SAMTOOLS_INDEX process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_FILTERED:SAMTOOLS_SORT' { - ext.prefix = { "${meta.id}.mapped_only.sorted" } + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:SAMTOOLS_INDEX_TAGGED' { publishDir = [ - path: { "${params.outdir}/${meta.id}/bam/mapped_only" }, + path: { "${params.outdir}/${meta.id}/genome/bam/barcode_tagged" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] @@ -480,52 +387,146 @@ process { } process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_CORRECTED:SAMTOOLS_SORT' { + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:SAMTOOLS_INDEX_TAGGED' { publishDir = [ - path: { "${params.outdir}/${meta.id}/bam/corrected" }, + path: { "${params.outdir}/${meta.id}/transcriptome/bam/barcode_tagged" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } +// SAMTOOLS_FLAGSTAT +process { + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:SAMTOOLS_FLAGSTAT_TAGGED' { + ext.prefix = { "${meta.id}.genome.tagged" } + publishDir = [ + enabled: false + ] + } +} process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_TAGGED:.*' { - ext.prefix = { "${meta.id}.sorted" } + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:SAMTOOLS_FLAGSTAT_TAGGED' { + ext.prefix = { "${meta.id}.transcriptome.tagged" } publishDir = [ enabled: false ] } } -if (!params.skip_dedup){ +///////////////////// +// ALIGN_LONGREADS // +///////////////////// + +// MINIMAP2_INDEX +if (!params.skip_save_minimap2_index) { process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_DEDUP:SAMTOOLS_SORT' { - ext.prefix = { "${meta.id}.dedup.sorted" } + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:MINIMAP2_INDEX' { + ext.args = { + [ + "-ax splice", + params.stranded == "forward" ? "-uf" : params.stranded == "reverse" ? "-ub" : "-un", + "-k${params.kmer_size}", + params.save_genome_secondary_alignment == false ? "--secondary=no " : "--secondary=yes " + ].join(' ').trim() + } publishDir = [ - path: { "${params.outdir}/${meta.id}/bam/dedup" }, + path: { "${params.outdir}/references/genome/minimap_index" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } - process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_SPLIT:.*' { - ext.prefix = { "${meta.id}.sorted" } + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:MINIMAP2_INDEX' { + ext.args = { + [ + "-ax splice", + params.stranded == "forward" ? "-uf" : params.stranded == "reverse" ? "-ub" : "-un", + "-k${params.kmer_size}", + params.save_transcript_secondary_alignment == false ? "--secondary=no " : "--secondary=yes " + ].join(' ').trim() + } publishDir = [ - enabled: false + path: { "${params.outdir}/references/transcriptome/minimap_index" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } } -// SAMTOOLS_INDEX +// MINIMAP +process { + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:MINIMAP2_ALIGN' { + ext.args = { + [ + "--MD -ax splice", + params.stranded == "forward" ? "-uf" : params.stranded == "reverse" ? "-ub" : "-un", + "-k${params.kmer_size}", + params.save_genome_secondary_alignment == false ? "--secondary=no " : "--secondary=yes " + ].join(' ').trim() + } + publishDir = [ + enabled: false + ] + } +} + +process { + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:MINIMAP2_ALIGN' { + ext.args = { + [ + "--MD -ax map-ont --eqx -N 100", + params.stranded == "forward" ? "-uf" : params.stranded == "reverse" ? "-ub" : "-un", + "-k${params.kmer_size}", + params.save_transcript_secondary_alignment == false ? "--secondary=no " : "--secondary=yes ", + ].join(' ').trim() + } + publishDir = [ + enabled: false + ] + } +} + +// SAMTOOLS_VIEW +process { + withName:'.*:SAMTOOLS_VIEW' { + ext.args = "-b -F 4" + ext.prefix = { "${meta.id}.mapped_only" } + publishDir = [ + enabled: false + ] + } +} + +// SAMTOOLS_SORT +process { + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:ALIGN_LONGREADS:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT' { + ext.prefix = { "${meta.id}.genome.sorted" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/genome/bam/original" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } +} +process { + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:ALIGN_LONGREADS:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT' { + ext.prefix = { "${meta.id}.transcript.sorted" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/transcriptome/bam/original" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } +} process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_MINIMAP:SAMTOOLS_INDEX' { + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:BAM_SORT_STATS_SAMTOOLS_FILTERED:SAMTOOLS_SORT' { + ext.prefix = { "${meta.id}.genome_mapped_only.sorted" } publishDir = [ - path: { "${params.outdir}/${meta.id}/bam/original" }, + path: { "${params.outdir}/${meta.id}/genome/bam/mapped_only" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] @@ -533,19 +534,21 @@ process { } process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_FILTERED:SAMTOOLS_INDEX' { + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:BAM_SORT_STATS_SAMTOOLS_FILTERED:SAMTOOLS_SORT' { + ext.prefix = { "${meta.id}.transcript_mapped_only.sorted" } publishDir = [ - path: { "${params.outdir}/${meta.id}/bam/mapped_only" }, + path: { "${params.outdir}/${meta.id}/transcriptome/bam/mapped_only" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } +// SAMTOOLS_INDEX process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_CORRECTED:SAMTOOLS_INDEX' { + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:ALIGN_LONGREADS:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_INDEX' { publishDir = [ - path: { "${params.outdir}/${meta.id}/bam/corrected" }, + path: { "${params.outdir}/${meta.id}/genome/bam/original" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] @@ -553,54 +556,178 @@ process { } process { - withName:'.*:BAM_SORT_STATS_SAMTOOLS_DEDUP:SAMTOOLS_INDEX' { - ext.prefix = { "${meta.id}.dedup.sorted" } + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:ALIGN_LONGREADS:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_INDEX' { publishDir = [ - path: { "${params.outdir}/${meta.id}/bam/dedup" }, + path: { "${params.outdir}/${meta.id}/transcriptome/bam/original" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } -if (!params.skip_dedup){ +process { + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:ALIGN_LONGREADS:BAM_SORT_STATS_SAMTOOLS_FILTERED:SAMTOOLS_INDEX' { + publishDir = [ + path: { "${params.outdir}/${meta.id}/genome/bam/mapped_only" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } +} + +process { + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:ALIGN_LONGREADS:BAM_SORT_STATS_SAMTOOLS_FILTERED:SAMTOOLS_INDEX' { + publishDir = [ + path: { "${params.outdir}/${meta.id}/transcriptome/bam/mapped_only" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } +} + +if (!params.skip_qc) { + + // SAMTOOLS FLAGSTAT/STAT/IDXSTAT process { - withName:'.*:SAMTOOLS_MERGE'{ + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:ALIGN_LONGREADS:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:.*' { + ext.prefix = { "${meta.id}.genome.minimap" } publishDir = [ - enabled: false + path: { "${params.outdir}/${meta.id}/genome/qc/samtools/minimap" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + + process { + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:ALIGN_LONGREADS:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:.*' { + ext.prefix = { "${meta.id}.transcriptome.minimap" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/transcriptome/qc/samtools/minimap" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } + + process { + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:ALIGN_LONGREADS:BAM_SORT_STATS_SAMTOOLS_FILTERED:BAM_STATS_SAMTOOLS:.*' { + ext.prefix = { "${meta.id}.genome.mapped_only" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/genome/qc/samtools/mapped_only" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + + process { + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:ALIGN_LONGREADS:BAM_SORT_STATS_SAMTOOLS_FILTERED:BAM_STATS_SAMTOOLS:.*' { + ext.prefix = { "${meta.id}.transcriptome.mapped_only" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/transcriptome/qc/samtools/mapped_only" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + + // RSEQC + if (!params.skip_rseqc) { + process { + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:RSEQC_READDISTRIBUTION' { + publishDir = [ + path: { "${params.outdir}/${meta.id}/genome/qc/rseqc" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + + process { + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:RSEQC_READDISTRIBUTION' { + publishDir = [ + path: { "${params.outdir}/${meta.id}/transcriptome/qc/rseqc" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + } + + // NANOCOMP + if (!params.skip_bam_nanocomp) { + process { + withName: '.*PROCESS_LONGREAD_SCRNA_GENOME.*:ALIGN_LONGREADS:NANOCOMP' { + ext.prefix = { "${meta.id}.genome" } + publishDir = [ + path: { "${params.outdir}/batch_qcs/genome/nanocomp/bam" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + + process { + withName: '.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:ALIGN_LONGREADS:NANOCOMP' { + ext.prefix = { "${meta.id}.transcriptome" } + publishDir = [ + path: { "${params.outdir}/batch_qcs/transcriptome/nanocomp/bam" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + } } -//////////////////////// -// BARCODE CORRECTION // -//////////////////////// +//////////////////// +// BAM PROCESSING // +//////////////////// + + +// SAMTOOLS SORT -// TAG_BARCODES process { - withName: '.*:TAG_BARCODES' { + withName:'.*:BAM_SORT_STATS_SAMTOOLS_CORRECTED:SAMTOOLS_SORT' { publishDir = [ - path: { "${params.outdir}/${meta.id}/bam/barcode_tagged" }, + path: { "${params.outdir}/${meta.id}/bam/corrected" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } -// CORRECT_BARCODES +if (!params.skip_dedup) { + process { + withName:'.*:BAM_SORT_STATS_SAMTOOLS_MERGED:SAMTOOLS_SORT' { + ext.prefix = { "${meta.id}.merged.sorted" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/bam/dedup" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } +} + +// SAMTOOLS_INDEX process { - withName: '.*:CORRECT_BARCODES' { - ext.prefix = { params.split_amount <= 0 ? "${meta.id}" : "${bc_info}".toString().replace('.extracted.putative_bc_umi.tsv', '') } + withName:'.*:BAM_SORT_STATS_SAMTOOLS_CORRECTED:SAMTOOLS_INDEX' { publishDir = [ - enabled: false + path: { "${params.outdir}/${meta.id}/bam/corrected" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } -/////////////////////// -// UMI DEDUPLICATION // -/////////////////////// + + + +///////////////////////////// +// UMI DEDUPLICATION SPLIT // +///////////////////////////// if (!params.skip_dedup){ process { @@ -617,23 +744,135 @@ if (!params.skip_dedup){ } process { - withName: '.*:UMITOOLS_DEDUP' { + withName: '.*:SAMTOOLS_INDEX_SPLIT' { + publishDir = [ + enabled: false + ] + } + } + + process { + withName: '.*PROCESS_LONGREAD_SCRNA_GENOME.*:UMITOOLS_DEDUP' { + ext.prefix = { "${meta.id}.genome.umi_dedup" } + ext.args = { + [ + '--per-cell' + ].join(' ').trim() + } + publishDir = [ + enabled: false + ] + } + } + + process { + withName: '.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:UMITOOLS_DEDUP' { + ext.prefix = { "${meta.id}.transcriptome.umi_dedup" } ext.args = { [ '--per-cell' ].join(' ').trim() } - ext.prefix = { "${meta.id}.dedup" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/transcriptome/bam/dedup" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + + process { + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:SAMTOOLS_INDEX_DEDUP' { + ext.prefix = { "${meta.id}.genome.dedup.sorted" } publishDir = [ enabled: false ] } } + + process { + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:SAMTOOLS_INDEX_DEDUP' { + ext.prefix = { "${meta.id}.genome.dedup.sorted" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/transcriptome/bam/dedup" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + + process { + withName:'.*:SAMTOOLS_INDEX_MERGED' { + ext.prefix = { "${meta.id}.merged.sorted" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/genome/bam/dedup" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + + process { + withName:'.*:SAMTOOLS_MERGE'{ + publishDir = [ + path: { "${params.outdir}/${meta.id}/genome/bam/dedup" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + + if (!params.skip_qc) { + process { + withName:'.*PROCESS_LONGREAD_SCRNA_GENOME.*:UMITOOLS_DEDUP_SPLIT:BAM_STATS_SAMTOOLS:.*' { + ext.prefix = { "${meta.id}.genome.umi_dedup" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/genome/qc/samtools/dedup" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + process { + withName:'.*PROCESS_LONGREAD_SCRNA_TRANSCRIPT.*:UMITOOLS_DEDUP_SPLIT:BAM_STATS_SAMTOOLS:.*' { + ext.prefix = { "${meta.id}.transcriptome.umi_dedup" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/transcriptome/qc/samtools/dedup" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } + } +} + +///////////////////////////// +// QUANTIFY SCRNA ISOQUANT // +///////////////////////////// + +process { + withName: '.*:QUANTIFY_SCRNA_ISOQUANT:SPLIT_FASTA' { + publishDir = [ + enabled: false + ] + } +} + +process { + withName: '.*:QUANTIFY_SCRNA_ISOQUANT:SAMTOOLS_FAIDX_SPLIT' { + publishDir = [ + enabled: false + ] + } } -////////////// -// ISOQUANT // -////////////// +process { + withName: '.*:QUANTIFY_SCRNA_ISOQUANT:SPLIT_GTF' { + publishDir = [ + enabled: false + ] + } +} process { withName: '.*:ISOQUANT' { @@ -648,22 +887,38 @@ process { ].join(' ').trim() } publishDir = [ - path: { "${params.outdir}/${meta.id}/isoquant" }, + enabled: false + ] + } +} + +process { + withName: '.*:MERGE_MTX_GENE' { + ext.prefix = { "${meta.id}.gene" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/genome/isoquant/feature_bc_mtx" }, mode: params.publish_dir_mode, - saveAs: {filename -> filename.equals('versions.yml') ? null: filename } + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } } -/////////////// -// SEURAT_QC // -/////////////// +process { + withName: '.*:MERGE_MTX_TRANSCRIPT' { + ext.prefix = { "${meta.id}.transcript" } + publishDir = [ + path: { "${params.outdir}/${meta.id}/genome/isoquant/feature_bc_mtx" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } +} if (!params.skip_qc && !params.skip_seurat) { process { - withName: '.*:SEURAT_GENE' { + withName: '.*QUANTIFY_SCRNA_ISOQUANT:QC_SCRNA_GENE:SEURAT' { publishDir = [ - path: { "${params.outdir}/${meta.id}/qc/seurat/gene" }, + path: { "${params.outdir}/${meta.id}/genome/qc/seurat_isoquant/gene" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] @@ -671,9 +926,9 @@ if (!params.skip_qc && !params.skip_seurat) { } process { - withName: '.*:SEURAT_TRANSCRIPT' { + withName: '.*QUANTIFY_SCRNA_ISOQUANT:QC_SCRNA_TRANSCRIPT:SEURAT' { publishDir = [ - path: { "${params.outdir}/${meta.id}/qc/seurat/transcript" }, + path: { "${params.outdir}/${meta.id}/genome/qc/seurat_isoquant/transcript" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] @@ -681,17 +936,71 @@ if (!params.skip_qc && !params.skip_seurat) { } process { - withName: '.*:COMBINE_SEURAT_STATS_GENE' { - ext.args = "-o gene.corrected.tsv -f gene" + withName: '.*:QUANTIFY_SCRNA_ISOQUANT:QC_SCRNA_GENE:COMBINE_SEURAT_STATS' { + ext.args = "-o isoquant_gene.tsv -f gene" + publishDir = [ + enabled: false + ] + } + } + + process { + withName: '.*:QUANTIFY_SCRNA_ISOQUANT:QC_SCRNA_TRANSCRIPT:COMBINE_SEURAT_STATS' { + ext.args = "-o isoquant_transcript.tsv -f transcript" publishDir = [ enabled: false ] } } +} + +///////////////////////////// +// QUANTIFY SCRNA OARFISH // +///////////////////////////// + +process { + withName:'.*:QUANTIFY_SCRNA_OARFISH:SAMTOOLS_SORT' { + ext.args = { + [ + "-t CB" + ].join(' ').trim() + } + ext.prefix = { "${meta.id}.bc_sort" } + publishDir = [ + enabled: false + ] + } +} + +process { + withName: '.*:OARFISH' { + ext.args = { + [ + "--single-cell --model-coverage --filter-group no-filters" + ].join(' ').trim() + } + publishDir = [ + path: { "${params.outdir}/${meta.id}/transcriptome/oarfish" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } +} + +if (!params.skip_qc && !params.skip_seurat) { + process { + withName: '.*QUANTIFY_SCRNA_OARFISH:QC_SCRNA:SEURAT' { + publishDir = [ + path: { "${params.outdir}/${meta.id}/transcriptome/qc/seurat_oarfish/" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } process { - withName: '.*:COMBINE_SEURAT_STATS_TRANSCRIPT' { - ext.args = "-o transcript.corrected.tsv -f transcript" + withName: '.*:QUANTIFY_SCRNA_OARFISH:QC_SCRNA:COMBINE_SEURAT_STATS' { + ext.args = "-o oarfish_transcript.tsv -f transcript" publishDir = [ enabled: false ] diff --git a/conf/test.config b/conf/test.config index 6539ed9..96131ce 100644 --- a/conf/test.config +++ b/conf/test.config @@ -10,23 +10,29 @@ ---------------------------------------------------------------------------------------- */ +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data input = 'https://raw.githubusercontent.com/nf-core/test-datasets/scnanoseq/samplesheet/samplesheet_test.csv' // Genome references - fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/scnanoseq/reference/chr21.fa" + genome_fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/scnanoseq/reference/chr21.fa" gtf = "https://raw.githubusercontent.com/nf-core/test-datasets/scnanoseq/reference/chr21.gtf" // Barcode options barcode_format = "10X_3v3" + // Analysis options + quantifier = "isoquant" + } diff --git a/conf/test_full.config b/conf/test_full.config index 0ed950c..026fc5f 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -18,12 +18,14 @@ params { input = "https://raw.githubusercontent.com/U-BDS/test-datasets/scnanoseq/samplesheet/samplesheet_full.csv" // Genome references - fasta = "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/GRCh38.primary_assembly.genome.fa.gz" + genome_fasta = "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/GRCh38.primary_assembly.genome.fa.gz" + transcript_fasta = "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/gencode.v45.transcripts.fa.gz" gtf = "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/gencode.v45.annotation.gtf.gz" // Barcode options barcode_format = "10X_3v3" split_amount = 500000 + quantifier = "isoquant,oarfish" } diff --git a/docs/output.md b/docs/output.md index 9ddbf0a..35b233b 100644 --- a/docs/output.md +++ b/docs/output.md @@ -19,10 +19,10 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d - [Alignment Post-processing](#alignment-post-processing) - [Samtools](#samtools) - Sort and index alignments and make alignment qc - [Barcode Tagging](#barcode-tagging) - Barcode tagging with quality metrics and barcode information - - [Barcode Correction](#barcode-correction) - Barcode whitelist correction - [UMI Deduplication](#umi-deduplication) - UMI-based deduplication - [Feature-Barcode Quantification](#feature-barcode-quantification) - [IsoQuant](#isoquant) - Feature-barcode quantification (gene and transcript level) + - [oarfish](#oarfish) - Feature-barcode quantification (transcript-level only) - [Seurat](#seurat) - Feature-barcode matrix QC - [Other steps](#other-steps) - [UCSC](#ucsc) - Annotation BED file @@ -82,10 +82,16 @@ The knee plot (an example is listed above) that is provided by BLAZE shows all b Output files - `/` - - `bam/` - - `original/` - - `*.sorted.bam` : The mapped and sorted bam. - - `*.sorted.bam.bai` : The bam index for the mapped and sorted bam. + - `genome` + - `bam/` + - `original/` + - `*.sorted.bam` : The genome mapped and sorted bam. + - `*.sorted.bam.bai` : The bam index for the genome mapped and sorted bam. + - `transcriptome` + - `bam/` + - `original/` + - `*.sorted.bam` : The transcriptome mapped and sorted bam. + - `*.sorted.bam.bai` : The bam index for the transcriptome mapped and sorted bam. @@ -99,28 +105,44 @@ The knee plot (an example is listed above) that is provided by BLAZE shows all b Output files - `/` - - `bam/` - - `mapped_only/` - - `*.sorted.bam` : The bam contaning only reads that were able to be mapped. - - `*.sorted.bam.bai` : The bam index for the bam containing only reads that were able to be mapped. - - `qc/` - - `samtools/` - - `minimap/` - - `*.minimap.flagstat` : The flagstat file for the bam obtained from minimap. - - `*.minimap.idxstats` : The idxstats file for the bam obtained from minimap. - - `*.minimap.stats` : The stats file for the bam obtained from minimap. + - `genome/` + - `bam/` - `mapped_only/` - - `*.mapped_only.flagstat` : The flagstat file for the bam containing only mapped reads. - - `*.mapped_only.idxstats` : The idxstats file for the bam containing only mapped reads. - - `*.mapped_only.stats` : The stats file for the bam containing only mapped reads. - - `corrected/` - - `*.corrected.flagstat` : The flagstat file for the bam containing corrected barcodes. - - `*.corrected.idxstats` : The idxstat file for the bam containing corrected barcodes. - - `*.corrected.stats` : The stat file for the bam containing corrected barcodes. - - `dedup/` - - `*.dedup.flagstat` : The flagstat file for the bam containing deduplicated umis. - - `*.dedup.idxstats` : The idxstats file for the bam containing deduplicated umis. - - `*.dedup.stats` : The stats file for the bam containing deduplicated umis. + - `*.sorted.bam` : The genome aligned bam contaning only reads that were able to be mapped. + - `*.sorted.bam.bai` : The genome aligned bam index for the bam containing only reads that were able to be mapped. + - `qc/` + - `samtools/` + - `minimap/` + - `*.minimap.flagstat` : The flagstat file for the genome aligned bam obtained from minimap. + - `*.minimap.idxstats` : The idxstats file for the genome aligned bam obtained from minimap. + - `*.minimap.stats` : The stats file for the genome aligned bam obtained from minimap. + - `mapped_only/` + - `*.mapped_only.flagstat` : The flagstat file for the genome aligned bam containing only mapped reads. + - `*.mapped_only.idxstats` : The idxstats file for the genome aligned bam containing only mapped reads. + - `*.mapped_only.stats` : The stats file for the genome aligned bam containing only mapped reads. + - `dedup/` + - `*.dedup.flagstat` : The flagstat file for the genome aligned bam containing deduplicated umis. + - `*.dedup.idxstats` : The idxstats file for the genome aligned bam containing deduplicated umis. + - `*.dedup.stats` : The stats file for the genome aligned bam containing deduplicated umis. + - `transcriptome/` + - `bam/` + - `mapped_only/` + - `*.sorted.bam` : The transcriptome aligned bam contaning only reads that were able to be mapped. + - `*.sorted.bam.bai` : The transcriptome aligned bam index for the bam containing only reads that were able to be mapped. + - `qc/` + - `samtools/` + - `minimap/` + - `*.minimap.flagstat` : The flagstat file for the transcriptome aligned bam obtained from minimap. + - `*.minimap.idxstats` : The idxstats file for the transcriptome aligned bam obtained from minimap. + - `*.minimap.stats` : The stats file for the transcriptome aligned bam obtained from minimap. + - `mapped_only/` + - `*.mapped_only.flagstat` : The flagstat file for the transcriptome aligned bam containing only mapped reads. + - `*.mapped_only.idxstats` : The idxstats file for the transcriptome aligned bam containing only mapped reads. + - `*.mapped_only.stats` : The stats file for the transcriptome aligned bam containing only mapped reads. + - `dedup/` + - `*.dedup.flagstat` : The flagstat file for the transcriptome aligned bam containing deduplicated umis. + - `*.dedup.idxstats` : The idxstats file for the transcriptome aligned bam containing deduplicated umis. + - `*.dedup.stats` : The stats file for the transcriptome aligned bam containing deduplicated umis. @@ -135,9 +157,14 @@ The knee plot (an example is listed above) that is provided by BLAZE shows all b Output files - `/` - - `bam/` - - `barcode_tagged/` - - `*.tagged.bam` : The bam containing tagged barcode and UMI metadata. + - `genome/` + - `bam/` + - `barcode_tagged/` + - `*.tagged.bam` : The genome aligned bam containing tagged barcode and UMI metadata. + - `transcriptome/` + - `bam/` + - `barcode_tagged/` + - `*.tagged.bam` : The transcriptome aligned bam containing tagged barcode and UMI metadata. @@ -152,51 +179,67 @@ UMI quality tag = "UY" Please see [Barcode Correction](#barcode-correction) below for metadata added post-correction. -### Barcode Correction +### UMI Deduplication
Output files - `/` - - `bam/` - - `corrected/` - - `*.corrected.bam` : The bam containing corrected barcodes. - - `*.corected.bam.bai` : The bam index for the bam containing corrected barcodes. + - `genome/` + - `bam/` + - `dedup/` + - `*.dedup.bam` : The genome aligned bam containing corrected barcodes and deduplicated umis. + - `*.dedup.bam.bai` : The genome aligned bam index for the bam containing corrected barcodes and deduplicated umis. + - `transcriptome/` + - `bam/` + - `dedup/` + - `*.dedup.bam` : The transcriptome aligned bam containing corrected barcodes and deduplicated umis. + - `*.dedup.bam.bai` : The transcriptome aligned bam index for the bam containing corrected barcodes and deduplicated umis.
-Barcode correction is a custom script that uses the whitelist generated by BLAZE in order to correct barcodes that are not on the whitelist into a whitelisted barcode. During this step, an additional BAM tag is added, `CB`, to indicate a barcode sequence that is error-corected. +[UMI-Tools](https://umi-tools.readthedocs.io/en/latest/reference/dedup.html) deduplicate reads based on the mapping co-ordinate and the UMI attached to the read. The identification of duplicate reads is performed in an error-aware manner by building networks of related UMIs. -### UMI Deduplication +Users should note that `oarfish` requires input reads to be deduplicated. As a result, the `skip_dedup` option is only applicable to `IsoQuant`. By default, `scnanoseq` will perform deduplication for IsoQuant unless the `skip_dedup` option is explicitly enabled, while deduplication will always be executed for `oarfish` quantification. + +## Feature-Barcode Quantification + +### IsoQuant
Output files - `/` - - `bam/` - - `dedup/` - - `*.dedup.bam` : The bam containing corrected barcodes and deduplicated umis. - - `*.dedup.bam.bai` : The bam index for the bam containing corrected barcodes and deduplicated umis. + - `genome/` + - `isoquant/` + - `*.gene_counts.tsv` : The feature-barcode matrix from gene quantification. + - `*.transcript_counts.tsv` : The feature-barcode matrix from transcript quantification.
-[UMI-Tools](https://umi-tools.readthedocs.io/en/latest/reference/dedup.html) deduplicate reads based on the mapping co-ordinate and the UMI attached to the read. The identification of duplicate reads is performed in an error-aware manner by building networks of related UMIs +[IsoQuant](https://github.com/ablab/IsoQuant) is a tool for the genome-based analysis of long RNA reads, such as PacBio or Oxford Nanopores. IsoQuant allows to reconstruct and quantify transcript models with high precision and decent recall. If the reference annotation is given, IsoQuant also assigns reads to the annotated isoforms based on their intron and exon structure. IsoQuant further performs annotated gene, isoform, exon and intron quantification. The outputs of IsoQuant can be important for downstream analysis with tools specialized in single-cell/nuclei analysis (e.g.: `Seurat`). -## Feature-Barcode Quantification +In order to assist with the performance of IsoQuant, the inputs are split by chromosome to add a further degree of parallelization. -### IsoQuant +It should also be noted that IsoQuant can only accurately perform quantification on a **genome** aligned bam, and will produce both gene and transcript level matrices + +### oarfish
Output files - `/` - - `isoquant/` - - `*.gene_counts.tsv` : The feature-barcode matrix from gene quantification. - - `*.transcript_counts.tsv` : The feature-barcode matrix from transcript quantification. + - `transcriptome/` + - `oarfish/` + - `barcodes.tsv.gz` + - `features.tsv.gz` + - `matrix.mtx.gz`
-[IsoQuant](https://github.com/ablab/IsoQuant) is a tool for the genome-based analysis of long RNA reads, such as PacBio or Oxford Nanopores. IsoQuant allows to reconstruct and quantify transcript models with high precision and decent recall. If the reference annotation is given, IsoQuant also assigns reads to the annotated isoforms based on their intron and exon structure. IsoQuant further performs annotated gene, isoform, exon and intron quantification. The outputs of IsoQuant can be important for downstream analysis with tools specialized in single-cell/nuclei analysis (e.g.: `Seurat`). +[oarfish](https://github.com/COMBINE-lab/oarfish) is a program, written in Rust (https://www.rust-lang.org/), for quantifying transcript-level expression from long-read (i.e. Oxford nanopore cDNA and direct RNA and PacBio) sequencing technologies. oarfish requires a sample of sequencing reads aligned to the transcriptome (currntly not to the genome). It handles multi-mapping reads through the use of probabilistic allocation via an expectation-maximization (EM) algorithm. + +It should also be noted that oarfish can only accurately perform quantification on a **transcript** aligned bam, and will only produce transcript level matrices. It's also recommended to ensure that the `--save_transcript_secondary_alignment` is enabled to produce the most accurate oarfish results (true by default for `oarfish` quantification). Notably, this can lead to much higher number of reads reported as aligned, however, this is expected behavior when secondary aligments are included in the analysis. ### Seurat @@ -204,19 +247,22 @@ Barcode correction is a custom script that uses the whitelist generated by BLAZE Output files - `/` - - `qc/` - - `gene/` - - `*.csv`: A file containing statistics about the cell-read distribution for genes. - - `*.png`: A series of qc images to determine the quality of the gene quantification. - - `transcript/` - - `*.csv`: A file containing statistics about the cell-read distribution for transcript. - - `*.png`: A series of qc images to determine the quality of the transcript quantification. + - `genome/` + - `qc/` + - `gene/` + - `*.csv`: A file containing statistics about the isoquant generated cell-read distribution for genes. + - `*.png`: A series of qc images to determine the quality of the isoquant generated gene quantification. + - `transcript/` + - `*.csv`: A file containing statistics about the isoquant generated cell-read distribution for transcript. + - `*.png`: A series of qc images to determine the quality of the isoquant generated transcript quantification. + - `transcriptome/` + - `qc/` + - `transcript/` + - `*.csv`: A file containing statistics about the oarfish generated cell-read distribution for transcript. + - `*.png`: A series of qc images to determine the quality of the oarfish generated transcript quantification. -![MultiQC - seurat](images/seurat.png) -_High level statistics are provided in the MultiQC report, as show in this image. These provide an overview of the quality of the data in order to assess if the results are suitable for tertiary analysis._ - [Seurat](https://satijalab.org/seurat/) is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data. ## Other steps @@ -269,11 +315,25 @@ The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They m - `batch_qcs/` - `nanocomp/` - - `fastq/` and `bam/` + - `fastq/` - `NanoComp_*.log`: This is the log file detailing the nanocomp run. - `NanoComp-report.html` - This is browser-viewable report that contains all the figures in a single location. - `*.html`: Nanocomp outputs all the figures in the report as individual files that can be inspected separately. - `NanoStats.txt`: This file contains quality control statistics about the dataset. + - `genome` + - `nanocomp/` + - `bam/` + - `NanoComp_*.log`: This is the log file detailing the nanocomp run. + - `NanoComp-report.html` - This is browser-viewable report that contains all the figures in a single location. + - `*.html`: Nanocomp outputs all the figures in the report as individual files that can be inspected separately. + - `NanoStats.txt`: This file contains quality control statistics about the dataset. + - `transcriptome` + - `nanocomp/` + - `bam/` + - `NanoComp_*.log`: This is the log file detailing the nanocomp run. + - `NanoComp-report.html` - This is browser-viewable report that contains all the figures in a single location. + - `*.html`: Nanocomp outputs all the figures in the report as individual files that can be inspected separately. + - `NanoStats.txt`: This file contains quality control statistics about the dataset. @@ -356,6 +416,8 @@ The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They m This is a custom script written using BASH scripting. Its purpose is to report the amount of reads that are filtered out at steps in the pipeline that will result in filtered reads, such as barcode detection, barcode correction, alignment, etc. Elevated levels of filtering can be indicative of quality concerns. +For performance, this step parses the read counts from the output of either FastQC or NanoPlot rather than computing it. If the options `--skip_fastqc` and `--skip_nanoplot` or `--skip_qc` is used, this file will not be produced. + ### MultiQC
diff --git a/docs/usage.md b/docs/usage.md index 0543091..d8ca278 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -57,12 +57,16 @@ The typical command for running the pipeline is as follows: nextflow run nf-core/scnanoseq \ --input ./samplesheet.csv \ --outdir ./results \ - --genome /path/to/genome.fa \ - --gtf /path/to/genome.gtf \ + --genome_fasta /path/to/genome.fa \ + --transcript_fasta /path/to/transcriptome.fa \ + --gtf /path/to/file.gtf \ + --quantifier "isoquant|oarfish|isoquant,oarfish" \ --barcode_format 10X_3v3 \ -profile ``` +Please note that while the above command specifies both transcriptome and genome fasta files, only one is needed for the pipeline and is dependent on which quantifier you wish to use. + Note that the pipeline will create the following files in your working directory: ```bash @@ -91,8 +95,10 @@ with ```yaml title="params.yaml" input: "./samplesheet.csv" outdir: "./results/" -fasta: "/path/to/genome.fa" -gtf: "/path/to/genome.gtf" +genome_fasta: "/path/to/genome.fa" +transcript_fasta: "/path/to/transcript.fa" +gtf: "/path/to/file.gtf" +quantifier: "isoquant|oarfish|isoquant,oarfish" barcode_format: "10X_3v3" <...> ``` @@ -202,14 +208,6 @@ See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -## Azure Resource Requests - -To be used with the `azurebatch` profile by specifying the `-profile azurebatch`. -We recommend providing a compute `params.vm_type` of `Standard_D16_v3` VMs by default but these options can be changed if required. - -Note that the choice of VM size depends on your quota and the overall workload during the analysis. -For a thorough list, please refer the [Azure Sizes for virtual machines in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes). - ## Running in the background Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. diff --git a/main.nf b/main.nf index 814ee12..37f3018 100644 --- a/main.nf +++ b/main.nf @@ -9,8 +9,6 @@ ---------------------------------------------------------------------------------------- */ -nextflow.enable.dsl = 2 - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS @@ -20,17 +18,8 @@ nextflow.enable.dsl = 2 include { SCNANOSEQ } from './workflows/scnanoseq' include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_scnanoseq_pipeline' include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_scnanoseq_pipeline' - include { getGenomeAttribute } from './subworkflows/local/utils_nfcore_scnanoseq_pipeline' -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - GENOME PARAMETER VALUES -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/ - -params.fasta = getGenomeAttribute('fasta') - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NAMED WORKFLOWS FOR PIPELINE @@ -52,10 +41,8 @@ workflow NFCORE_SCNANOSEQ { SCNANOSEQ ( samplesheet ) - emit: multiqc_report = SCNANOSEQ.out.multiqc_report // channel: /path/to/multiqc_report.html - } /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -66,13 +53,11 @@ workflow NFCORE_SCNANOSEQ { workflow { main: - // // SUBWORKFLOW: Run initialisation tasks // PIPELINE_INITIALISATION ( params.version, - params.help, params.validate_params, params.monochrome_logs, args, @@ -86,7 +71,6 @@ workflow { NFCORE_SCNANOSEQ ( PIPELINE_INITIALISATION.out.samplesheet ) - // // SUBWORKFLOW: Run completion tasks // diff --git a/modules.json b/modules.json index 7dcf8ef..7311a0d 100644 --- a/modules.json +++ b/modules.json @@ -27,7 +27,7 @@ }, "fastqc": { "branch": "master", - "git_sha": "f4ae1d942bd50c5c0b9bd2de1393ce38315ba57c", + "git_sha": "dc94b6ee04a05ddb9f7ae050712ff30a13149164", "installed_by": ["modules"] }, "minimap2/align": { @@ -43,7 +43,7 @@ }, "multiqc": { "branch": "master", - "git_sha": "7c316cae26baf55e0add993bed2b0c9f7105c653", + "git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d", "installed_by": ["modules"] }, "nanocomp": { @@ -141,17 +141,17 @@ }, "utils_nextflow_pipeline": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "3aa0aec1d52d492fe241919f0c6100ebf0074082", "installed_by": ["subworkflows"] }, "utils_nfcore_pipeline": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "1b89f75f1aa2021ec3360d0deccd0f6e97240551", "installed_by": ["subworkflows"] }, - "utils_nfvalidation_plugin": { + "utils_nfschema_plugin": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "bbd5a41f4535a8defafe6080e00ea74c45f4f96c", "installed_by": ["subworkflows"] } } diff --git a/modules/local/blaze.nf b/modules/local/blaze.nf index b1a0a18..fe34848 100644 --- a/modules/local/blaze.nf +++ b/modules/local/blaze.nf @@ -3,7 +3,7 @@ process BLAZE { label 'process_medium' label 'process_long' - conda "atrull314::fast_edit_distance=1.2.1 conda-forge::matplotlib=3.8.4 conda-forge::biopython=1.83 conda-forge::pandas=2.2.2 conda-forge::numpy=2.0.0rc2 conda-forge::tqdm=4.66.4" + conda "atrull314::fast_edit_distance=1.2.1 conda-forge::matplotlib=3.8.4 conda-forge::biopython=1.83 conda-forge::pandas=2.2.2 conda-forge::numpy=2.0.2 conda-forge::tqdm=4.66.4" container "${ workflow.containerEngine == 'singularity' ? 'docker://agtrull314/blaze:2.2.0' : @@ -24,9 +24,10 @@ process BLAZE { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.id}" - def VERSION = '2.2.0' // WARN: Version information not provided by tool on CLI. Please update this string when bumping BLAZE code + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + // WARN: Version information not provided by tool on CLI. Please update this string when upgrading BLAZE code + def VERSION = '2.2.0' def cell_count = "${meta.cell_count}" """ @@ -52,4 +53,19 @@ process BLAZE { blaze: $VERSION END_VERSIONS """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + def VERSION = '2.2.0' + """ + touch ${prefix}.putative_bc.no_header.csv + touch ${prefix}.whitelist.csv + touch ${prefix}.bc_count.txt + touch ${prefix}.knee_plot.png + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + blaze: $VERSION + END_VERSIONS + """ } diff --git a/modules/local/combine_seurat_stats.nf b/modules/local/combine_seurat_stats.nf index aeb5874..9936942 100644 --- a/modules/local/combine_seurat_stats.nf +++ b/modules/local/combine_seurat_stats.nf @@ -29,4 +29,14 @@ process COMBINE_SEURAT_STATS { cat: \$(echo \$(cat --version) | sed 's/^.*cat (GNU coreutils) //; s/ .*//') END_VERSIONS """ + + stub: + """ + touch combined_seurat.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$(echo \$(cat --version) | sed 's/^.*cat (GNU coreutils) //; s/ .*//') + END_VERSIONS + """ } diff --git a/modules/local/correct_barcodes.nf b/modules/local/correct_barcodes.nf index ca15456..65aa370 100644 --- a/modules/local/correct_barcodes.nf +++ b/modules/local/correct_barcodes.nf @@ -18,7 +18,7 @@ process CORRECT_BARCODES { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' + def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" """ @@ -35,4 +35,15 @@ process CORRECT_BARCODES { python: \$(python --version | sed 's/Python //g') END_VERSIONS """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.corrected_bc_umi.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version | sed 's/Python //g') + END_VERSIONS + """ } diff --git a/modules/local/isoquant.nf b/modules/local/isoquant.nf index 5317a4a..b3e6ebb 100644 --- a/modules/local/isoquant.nf +++ b/modules/local/isoquant.nf @@ -1,17 +1,14 @@ process ISOQUANT { tag "$meta.id" - label 'process_high' + label 'process_medium' - conda "bioconda::isoquant=3.5.0" + conda "bioconda::isoquant=3.6.1" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/isoquant:3.5.0--hdfd78af_0' : - 'biocontainers/isoquant:3.5.0--hdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/isoquant:3.6.1--hdfd78af_0' : + 'biocontainers/isoquant:3.6.1--hdfd78af_0' }" input: - tuple val(meta), path(bam), path(bai) - tuple val(meta_gtf), path(gtf) - tuple val(meta_fa), path(fasta) - tuple val(meta_fai), path(fai) + tuple val(meta), path(bam), path(bai), path(fasta), path(fai), path(gtf) val group_category output: @@ -23,7 +20,7 @@ process ISOQUANT { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' + def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" // setting custom home via export (see issue #30) @@ -71,6 +68,17 @@ process ISOQUANT { isoquant: \$(isoquant.py -v | sed 's#IsoQuant ##') END_VERSIONS """ - } + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.gene_counts.tsv + touch ${prefix}.transcript_counts.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + isoquant: \$(isoquant.py -v | sed 's#IsoQuant ##') + END_VERSIONS + """ } diff --git a/modules/local/merge_mtx.nf b/modules/local/merge_mtx.nf new file mode 100644 index 0000000..eaa23ef --- /dev/null +++ b/modules/local/merge_mtx.nf @@ -0,0 +1,47 @@ +process MERGE_MTX { + tag "$meta.id" + label 'process_high' + + conda "conda-forge::pandas=1.5.2" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pandas:1.5.2' : + 'biocontainers/pandas:1.5.2' }" + + input: + tuple val(meta), path(files_in) + + output: + tuple val(meta), path("*.merged.tsv"), emit: merged_mtx + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + + """ + mtx_merge.py \\ + -i \$(pwd) \\ + -x ".tsv" \\ + -o ${prefix}.merged.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version | sed 's/Python //g') + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + + """ + touch ${prefix}.merged.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version | sed 's/Python //g') + END_VERSIONS + """ +} diff --git a/modules/local/nanofilt.nf b/modules/local/nanofilt.nf index fa59775..70f0175 100644 --- a/modules/local/nanofilt.nf +++ b/modules/local/nanofilt.nf @@ -18,7 +18,7 @@ process NANOFILT { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' + def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" """ @@ -34,4 +34,14 @@ process NANOFILT { nanofilt: \$( NanoFilt --version | sed -e "s/NanoFilt //g" ) END_VERSIONS """ + + stub: + """ + touch ${prefix}.filtered.fastq + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + nanofilt: \$( NanoFilt --version | sed -e "s/NanoFilt //g" ) + END_VERSIONS + """ } diff --git a/modules/local/oarfish.nf b/modules/local/oarfish.nf new file mode 100644 index 0000000..5a7f402 --- /dev/null +++ b/modules/local/oarfish.nf @@ -0,0 +1,59 @@ +process OARFISH { + tag "$meta.id" + label 'process_low' + + conda "bioconda::oarfish=0.6.5" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/oarfish:0.6.5--h43eeafb_0' : + 'biocontainers/oarfish:0.6.5--h43eeafb_0' }" + + input: + tuple val(meta), path(bam) + + output: + tuple val(meta), path("*features.tsv.gz") , emit: features + tuple val(meta), path("*barcodes.tsv.gz") , emit: barcodes + tuple val(meta), path("*matrix.mtx.gz") , emit: mtx + tuple val(meta), path("*meta_info.json") , emit: meta_info + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + + """ + oarfish \\ + --output ${prefix} \\ + --alignments $bam \\ + --threads ${task.cpus} \\ + ${args} + + mv *features.txt features.tsv + mv *barcodes.txt barcodes.tsv + + grep '^%' *count.mtx > matrix.mtx + grep -v '^%' *count.mtx | awk '{print \$2" "\$1" "\$3}' >> matrix.mtx + + for tsv_file in *features.tsv *barcodes.tsv *matrix.mtx + do + gzip \$tsv_file + done + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + oarfish: \$(oarfish --version | sed 's#oarfish ##g') + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.features.tsv.gz + touch ${prefix}.barcodes.tsv.gz + touch ${prefix}.matrix.mtx.gz + touch ${prefix}.meta_info.json + """ +} diff --git a/modules/local/preextract_fastq.nf b/modules/local/preextract_fastq.nf index d4440e2..4f38220 100644 --- a/modules/local/preextract_fastq.nf +++ b/modules/local/preextract_fastq.nf @@ -21,7 +21,7 @@ process PREEXTRACT_FASTQ { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' + def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" """ @@ -38,4 +38,16 @@ process PREEXTRACT_FASTQ { python: \$(python --version | sed 's/Python //g') END_VERSIONS """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.putative_bc_umi.tsv + touch ${prefix}.extracted.fastq + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version | sed 's/Python //g') + END_VERSIONS + """ } diff --git a/modules/local/read_counts.nf b/modules/local/read_counts.nf index bebca0b..f4a8d0e 100644 --- a/modules/local/read_counts.nf +++ b/modules/local/read_counts.nf @@ -33,4 +33,14 @@ process READ_COUNTS { perl: \$(perl --version | head -n2 | tail -n1 | sed -n 's/.*(v\\([^)]*\\)).*/\\1/p') END_VERSIONS """ + + stub: + """ + touch read_counts.csv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + perl: \$(perl --version | head -n2 | tail -n1 | sed -n 's/.*(v\\([^)]*\\)).*/\\1/p') + END_VERSIONS + """ } diff --git a/modules/local/seurat.nf b/modules/local/seurat.nf index 8c5ffce..1cc7cb2 100644 --- a/modules/local/seurat.nf +++ b/modules/local/seurat.nf @@ -1,14 +1,15 @@ process SEURAT { tag "$meta.id" - label 'process_low' + label 'process_medium' - conda "conda-forge::r-base conda-forge::r-seurat=4.1.1 conda-forge::r-ggplot2 conda-forge::r-optparse" + conda "conda-forge::r-base conda-forge::r-seurat=4.1.1 conda-forge::r-ggplot2 conda-forge::r-optparse conda-forge::r-stringi" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/mulled-v2-b4cd78f1471a75cb2d338d3be506b2352723c0d2:4d30026c33d2809f4bf7b3a62f0e2b8529cb6915-0' : 'biocontainers/mulled-v2-b4cd78f1471a75cb2d338d3be506b2352723c0d2:4d30026c33d2809f4bf7b3a62f0e2b8529cb6915-0' }" input: tuple val(meta), path(counts), path(flagstat) + val mtx_format output: tuple val(meta), path("*.csv"), emit: seurat_stats @@ -19,15 +20,51 @@ process SEURAT { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + + if (mtx_format.equals("MEX")) { + """ + mkdir indir + for file in $counts + do + mv \$file indir + done + + seurat_qc.R \\ + $args \\ + -j indir \\ + -s $flagstat \\ + -d $prefix \\ + -r $prefix + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') + r-seurat: \$(Rscript -e "library(Seurat); cat(as.character(packageVersion('Seurat')))") + END_VERSIONS + """ + } else { + """ + seurat_qc.R \\ + $args \\ + -i $counts \\ + -s $flagstat \\ + -d $prefix \\ + -r $prefix + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') + r-seurat: \$(Rscript -e "library(Seurat); cat(as.character(packageVersion('Seurat')))") + END_VERSIONS + """ + } + stub: def prefix = task.ext.prefix ?: "${meta.id}" """ - seurat_qc.R \\ - $args \\ - -i $counts \\ - -s $flagstat \\ - -d $prefix \\ - -r $prefix + touch ${prefix}.stats.csv + touch ${prefix}.png cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/local/split_fasta.nf b/modules/local/split_fasta.nf new file mode 100644 index 0000000..00d9fbb --- /dev/null +++ b/modules/local/split_fasta.nf @@ -0,0 +1,30 @@ +process SPLIT_FASTA { + label 'process_low' + + conda "conda-forge::sed=4.7" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'nf-core/ubuntu:20.04' }" + + input: + tuple val(meta), path(fasta) + + output: + path "*.split.fa" , emit: split_fasta + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + + """ + awk '/^>/{chrom=(split(substr(\$0,2), a, " ")); filename=( a[1] ".split.fa"); print > filename; next}{print >> filename}' $fasta + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$(echo \$(cat --version) | sed 's/^.*cat (GNU coreutils) //; s/ .*//') + END_VERSIONS + """ +} diff --git a/modules/local/split_file.nf b/modules/local/split_file.nf index 0caac3c..ee0748b 100644 --- a/modules/local/split_file.nf +++ b/modules/local/split_file.nf @@ -20,7 +20,7 @@ process SPLIT_FILE { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' + def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" """ split -a10 -l ${split_amount} -d --additional-suffix ${file_ext} ${unsplit_file} ${prefix}. @@ -30,4 +30,15 @@ process SPLIT_FILE { split: \$(echo \$(split --version 2>&1 | head -n1 | sed 's#split (GNU coreutils) ##g')) END_VERSIONS """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.${file_ext} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + split: \$(echo \$(split --version 2>&1 | head -n1 | sed 's#split (GNU coreutils) ##g')) + END_VERSIONS + """ } diff --git a/modules/local/split_gtf.nf b/modules/local/split_gtf.nf new file mode 100644 index 0000000..1f91ee7 --- /dev/null +++ b/modules/local/split_gtf.nf @@ -0,0 +1,40 @@ +process SPLIT_GTF { + label 'process_low' + + conda "conda-forge::sed=4.7" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'nf-core/ubuntu:20.04' }" + + input: + tuple val(meta), path(gtf) + + output: + path "*.split.gtf" , emit: split_gtf + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + + """ + grep -v '^#' $gtf | awk -F \$'\\t' '{print > \$1".split.gtf"}' + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$(echo \$(cat --version) | sed 's/^.*cat (GNU coreutils) //; s/ .*//') + END_VERSIONS + """ + + stub: + """ + touch test.split.gtf + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$(echo \$(cat --version) | sed 's/^.*cat (GNU coreutils) //; s/ .*//') + END_VERSIONS + """ +} diff --git a/modules/local/tag_barcodes.nf b/modules/local/tag_barcodes.nf index b6f70a9..bbb4840 100644 --- a/modules/local/tag_barcodes.nf +++ b/modules/local/tag_barcodes.nf @@ -1,6 +1,6 @@ process TAG_BARCODES { tag "$meta.id" - label 'process_medium' + label 'process_high' conda "bioconda::pysam=0.19.1" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? @@ -18,7 +18,7 @@ process TAG_BARCODES { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' + def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" """ @@ -32,4 +32,15 @@ process TAG_BARCODES { python: \$(python --version | sed 's/Python //g') END_VERSIONS """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.tagged.bam + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version | sed 's/Python //g') + END_VERSIONS + """ } diff --git a/modules/local/ucsc_genepredtobed.nf b/modules/local/ucsc_genepredtobed.nf index c822d3c..8296076 100644 --- a/modules/local/ucsc_genepredtobed.nf +++ b/modules/local/ucsc_genepredtobed.nf @@ -18,13 +18,28 @@ process UCSC_GENEPREDTOBED { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' - def VERSION = '447' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${genepred.baseName}" + // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. + def VERSION = '447' """ genePredToBed \\ $args \\ $genepred \\ - ${genepred.baseName}.bed + ${prefix}.bed + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + ucsc: $VERSION + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${genepred.baseName}" + // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. + def VERSION = '447' + """ + touch ${prefix}.bed cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/local/ucsc_gtftogenepred.nf b/modules/local/ucsc_gtftogenepred.nf index 62b1741..5aed347 100644 --- a/modules/local/ucsc_gtftogenepred.nf +++ b/modules/local/ucsc_gtftogenepred.nf @@ -18,14 +18,28 @@ process UCSC_GTFTOGENEPRED { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' - def VERSION = '447' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${gtf.baseName}" + // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. + def VERSION = '447' """ gtfToGenePred \\ $args \\ $gtf \\ - ${gtf.baseName}.genepred + ${prefix}.genepred + cat <<-END_VERSIONS > versions.yml + "${task.process}": + ucsc: $VERSION + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${gtf.baseName}" + // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. + def VERSION = '447' + """ + touch ${prefix}.genepred cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/nf-core/fastqc/environment.yml b/modules/nf-core/fastqc/environment.yml index 1787b38..691d4c7 100644 --- a/modules/nf-core/fastqc/environment.yml +++ b/modules/nf-core/fastqc/environment.yml @@ -1,7 +1,5 @@ -name: fastqc channels: - conda-forge - bioconda - - defaults dependencies: - bioconda::fastqc=0.12.1 diff --git a/modules/nf-core/fastqc/main.nf b/modules/nf-core/fastqc/main.nf index 9e19a74..752c3a1 100644 --- a/modules/nf-core/fastqc/main.nf +++ b/modules/nf-core/fastqc/main.nf @@ -24,7 +24,15 @@ process FASTQC { // Make list of old name and new name pairs to use for renaming in the bash while loop def old_new_pairs = reads instanceof Path || reads.size() == 1 ? [[ reads, "${prefix}.${reads.extension}" ]] : reads.withIndex().collect { entry, index -> [ entry, "${prefix}_${index + 1}.${entry.extension}" ] } def rename_to = old_new_pairs*.join(' ').join(' ') - def renamed_files = old_new_pairs.collect{ old_name, new_name -> new_name }.join(' ') + def renamed_files = old_new_pairs.collect{ _old_name, new_name -> new_name }.join(' ') + + // The total amount of allocated RAM by FastQC is equal to the number of threads defined (--threads) time the amount of RAM defined (--memory) + // https://github.com/s-andrews/FastQC/blob/1faeea0412093224d7f6a07f777fad60a5650795/fastqc#L211-L222 + // Dividing the task.memory by task.cpu allows to stick to requested amount of RAM in the label + def memory_in_mb = MemoryUnit.of("${task.memory}").toUnit('MB') / task.cpus + // FastQC memory value allowed range (100 - 10000) + def fastqc_memory = memory_in_mb > 10000 ? 10000 : (memory_in_mb < 100 ? 100 : memory_in_mb) + """ printf "%s %s\\n" $rename_to | while read old_name new_name; do [ -f "\${new_name}" ] || ln -s \$old_name \$new_name @@ -33,6 +41,7 @@ process FASTQC { fastqc \\ $args \\ --threads $task.cpus \\ + --memory $fastqc_memory \\ $renamed_files cat <<-END_VERSIONS > versions.yml diff --git a/modules/nf-core/fastqc/meta.yml b/modules/nf-core/fastqc/meta.yml index ee5507e..2b2e62b 100644 --- a/modules/nf-core/fastqc/meta.yml +++ b/modules/nf-core/fastqc/meta.yml @@ -11,40 +11,50 @@ tools: FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T). + You get information about adapter contamination and other overrepresented sequences. homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ licence: ["GPL-2.0-only"] + identifier: biotools:fastqc input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - html: - type: file - description: FastQC report - pattern: "*_{fastqc.html}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.html": + type: file + description: FastQC report + pattern: "*_{fastqc.html}" - zip: - type: file - description: FastQC report archive - pattern: "*_{fastqc.zip}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.zip": + type: file + description: FastQC report archive + pattern: "*_{fastqc.zip}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@grst" diff --git a/modules/nf-core/fastqc/tests/main.nf.test b/modules/nf-core/fastqc/tests/main.nf.test index 70edae4..e9d79a0 100644 --- a/modules/nf-core/fastqc/tests/main.nf.test +++ b/modules/nf-core/fastqc/tests/main.nf.test @@ -23,17 +23,14 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it. - // looks like this:
Mon 2 Oct 2023
test.gz
- // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039 - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_single") } + { assert process.success }, + // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it. + // looks like this:
Mon 2 Oct 2023
test.gz
+ // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039 + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -54,16 +51,14 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, - { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, - { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, - { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, - { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_paired") } + { assert process.success }, + { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, + { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, + { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, + { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, + { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -83,13 +78,11 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_interleaved") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -109,13 +102,11 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_bam") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -138,22 +129,20 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, - { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, - { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" }, - { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" }, - { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, - { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, - { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" }, - { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" }, - { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][2]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][3]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_multiple") } + { assert process.success }, + { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, + { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, + { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" }, + { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" }, + { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, + { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, + { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" }, + { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" }, + { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][2]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][3]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -173,21 +162,18 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_custom_prefix") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } test("sarscov2 single-end [fastq] - stub") { - options "-stub" - + options "-stub" when { process { """ @@ -201,12 +187,123 @@ nextflow_process { then { assertAll ( - { assert process.success }, - { assert snapshot(process.out.html.collect { file(it[1]).getName() } + - process.out.zip.collect { file(it[1]).getName() } + - process.out.versions ).match("fastqc_stub") } + { assert process.success }, + { assert snapshot(process.out).match() } ) } } + test("sarscov2 paired-end [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 interleaved [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 paired-end [bam] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 multiple [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_2.fastq.gz', checkIfExists: true) ] + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 custom_prefix - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [ id:'mysample', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } } diff --git a/modules/nf-core/fastqc/tests/main.nf.test.snap b/modules/nf-core/fastqc/tests/main.nf.test.snap index 86f7c31..d5db309 100644 --- a/modules/nf-core/fastqc/tests/main.nf.test.snap +++ b/modules/nf-core/fastqc/tests/main.nf.test.snap @@ -1,88 +1,392 @@ { - "fastqc_versions_interleaved": { + "sarscov2 custom_prefix": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:07.293713" + "timestamp": "2024-07-22T11:02:16.374038" }, - "fastqc_stub": { + "sarscov2 single-end [fastq] - stub": { "content": [ - [ - "test.html", - "test.zip", - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] + { + "0": [ + [ + { + "id": "test", + "single_end": true + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": true + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": true + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:24.993809" + }, + "sarscov2 custom_prefix - stub": { + "content": [ + { + "0": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:31:01.425198" + "timestamp": "2024-07-22T11:03:10.93942" }, - "fastqc_versions_multiple": { + "sarscov2 interleaved [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:55.797907" + "timestamp": "2024-07-22T11:01:42.355718" }, - "fastqc_versions_bam": { + "sarscov2 paired-end [bam]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:26.795862" + "timestamp": "2024-07-22T11:01:53.276274" }, - "fastqc_versions_single": { + "sarscov2 multiple [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:39:27.043675" + "timestamp": "2024-07-22T11:02:05.527626" }, - "fastqc_versions_paired": { + "sarscov2 paired-end [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:01:31.188871" + }, + "sarscov2 paired-end [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:34.273566" + }, + "sarscov2 multiple [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:39:47.584191" + "timestamp": "2024-07-22T11:03:02.304411" }, - "fastqc_versions_custom_prefix": { + "sarscov2 single-end [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:01:19.095607" + }, + "sarscov2 interleaved [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:44.640184" + }, + "sarscov2 paired-end [bam] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:41:14.576531" + "timestamp": "2024-07-22T11:02:53.550742" } } \ No newline at end of file diff --git a/modules/nf-core/multiqc/environment.yml b/modules/nf-core/multiqc/environment.yml index 0fe1264..6f5b867 100644 --- a/modules/nf-core/multiqc/environment.yml +++ b/modules/nf-core/multiqc/environment.yml @@ -2,4 +2,4 @@ channels: - conda-forge - bioconda dependencies: - - bioconda::multiqc=1.25 + - bioconda::multiqc=1.25.1 diff --git a/modules/nf-core/multiqc/main.nf b/modules/nf-core/multiqc/main.nf index b9ccebd..cc0643e 100644 --- a/modules/nf-core/multiqc/main.nf +++ b/modules/nf-core/multiqc/main.nf @@ -3,8 +3,8 @@ process MULTIQC { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/multiqc:1.25--pyhdfd78af_0' : - 'biocontainers/multiqc:1.25--pyhdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/multiqc:1.25.1--pyhdfd78af_0' : + 'biocontainers/multiqc:1.25.1--pyhdfd78af_0' }" input: path multiqc_files, stageAs: "?/*" @@ -52,7 +52,7 @@ process MULTIQC { stub: """ mkdir multiqc_data - touch multiqc_plots + mkdir multiqc_plots touch multiqc_report.html cat <<-END_VERSIONS > versions.yml diff --git a/modules/nf-core/multiqc/meta.yml b/modules/nf-core/multiqc/meta.yml index 382c08c..b16c187 100644 --- a/modules/nf-core/multiqc/meta.yml +++ b/modules/nf-core/multiqc/meta.yml @@ -1,5 +1,6 @@ name: multiqc -description: Aggregate results from bioinformatics analyses across many samples into a single report +description: Aggregate results from bioinformatics analyses across many samples into + a single report keywords: - QC - bioinformatics tools @@ -12,53 +13,59 @@ tools: homepage: https://multiqc.info/ documentation: https://multiqc.info/docs/ licence: ["GPL-3.0-or-later"] + identifier: biotools:multiqc input: - - multiqc_files: - type: file - description: | - List of reports / files recognised by MultiQC, for example the html and zip output of FastQC - - multiqc_config: - type: file - description: Optional config yml for MultiQC - pattern: "*.{yml,yaml}" - - extra_multiqc_config: - type: file - description: Second optional config yml for MultiQC. Will override common sections in multiqc_config. - pattern: "*.{yml,yaml}" - - multiqc_logo: - type: file - description: Optional logo file for MultiQC - pattern: "*.{png}" - - replace_names: - type: file - description: | - Optional two-column sample renaming file. First column a set of - patterns, second column a set of corresponding replacements. Passed via - MultiQC's `--replace-names` option. - pattern: "*.{tsv}" - - sample_names: - type: file - description: | - Optional TSV file with headers, passed to the MultiQC --sample_names - argument. - pattern: "*.{tsv}" + - - multiqc_files: + type: file + description: | + List of reports / files recognised by MultiQC, for example the html and zip output of FastQC + - - multiqc_config: + type: file + description: Optional config yml for MultiQC + pattern: "*.{yml,yaml}" + - - extra_multiqc_config: + type: file + description: Second optional config yml for MultiQC. Will override common sections + in multiqc_config. + pattern: "*.{yml,yaml}" + - - multiqc_logo: + type: file + description: Optional logo file for MultiQC + pattern: "*.{png}" + - - replace_names: + type: file + description: | + Optional two-column sample renaming file. First column a set of + patterns, second column a set of corresponding replacements. Passed via + MultiQC's `--replace-names` option. + pattern: "*.{tsv}" + - - sample_names: + type: file + description: | + Optional TSV file with headers, passed to the MultiQC --sample_names + argument. + pattern: "*.{tsv}" output: - report: - type: file - description: MultiQC report file - pattern: "multiqc_report.html" + - "*multiqc_report.html": + type: file + description: MultiQC report file + pattern: "multiqc_report.html" - data: - type: directory - description: MultiQC data dir - pattern: "multiqc_data" + - "*_data": + type: directory + description: MultiQC data dir + pattern: "multiqc_data" - plots: - type: file - description: Plots created by MultiQC - pattern: "*_data" + - "*_plots": + type: file + description: Plots created by MultiQC + pattern: "*_data" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@abhi18av" - "@bunop" diff --git a/modules/nf-core/multiqc/tests/main.nf.test.snap b/modules/nf-core/multiqc/tests/main.nf.test.snap index b779e46..2fcbb5f 100644 --- a/modules/nf-core/multiqc/tests/main.nf.test.snap +++ b/modules/nf-core/multiqc/tests/main.nf.test.snap @@ -2,14 +2,14 @@ "multiqc_versions_single": { "content": [ [ - "versions.yml:md5,8c8724363a5efe0c6f43ab34faa57efd" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-10T12:41:34.562023" + "timestamp": "2024-10-02T17:51:46.317523" }, "multiqc_stub": { "content": [ @@ -17,25 +17,25 @@ "multiqc_report.html", "multiqc_data", "multiqc_plots", - "versions.yml:md5,8c8724363a5efe0c6f43ab34faa57efd" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-10T11:27:11.933869532" + "timestamp": "2024-10-02T17:52:20.680978" }, "multiqc_versions_config": { "content": [ [ - "versions.yml:md5,8c8724363a5efe0c6f43ab34faa57efd" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "24.04.2" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-07-10T11:26:56.709849369" + "timestamp": "2024-10-02T17:52:09.185842" } -} +} \ No newline at end of file diff --git a/modules/nf-core/nanoplot/main.nf b/modules/nf-core/nanoplot/main.nf index 232a11e..9ee1540 100644 --- a/modules/nf-core/nanoplot/main.nf +++ b/modules/nf-core/nanoplot/main.nf @@ -24,12 +24,21 @@ process NANOPLOT { def args = task.ext.args ?: '' def input_file = ("$ontfile".endsWith(".fastq.gz") || "$ontfile".endsWith(".fastq")) ? "--fastq ${ontfile}" : ("$ontfile".endsWith(".txt")) ? "--summary ${ontfile}" : '' + def prefix = task.ext.prefix ?: "${meta.id}" """ NanoPlot \\ $args \\ -t $task.cpus \\ $input_file + for nanoplot_file in *.html *.png *.txt *.log + do + if [[ -s \$nanoplot_file ]] + then + mv \$nanoplot_file ${prefix}_\$nanoplot_file + fi + done + cat <<-END_VERSIONS > versions.yml "${task.process}": nanoplot: \$(echo \$(NanoPlot --version 2>&1) | sed 's/^.*NanoPlot //; s/ .*\$//') diff --git a/modules/nf-core/nanoplot/nanoplot.diff b/modules/nf-core/nanoplot/nanoplot.diff index 4cb0591..43b57f3 100644 --- a/modules/nf-core/nanoplot/nanoplot.diff +++ b/modules/nf-core/nanoplot/nanoplot.diff @@ -9,14 +9,29 @@ Changes in module 'nf-core/nanoplot' conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? -@@ -22,7 +22,7 @@ +@@ -22,13 +22,22 @@ script: def args = task.ext.args ?: '' - def input_file = ("$ontfile".endsWith(".fastq.gz")) ? "--fastq ${ontfile}" : + def input_file = ("$ontfile".endsWith(".fastq.gz") || "$ontfile".endsWith(".fastq")) ? "--fastq ${ontfile}" : ("$ontfile".endsWith(".txt")) ? "--summary ${ontfile}" : '' ++ def prefix = task.ext.prefix ?: "${meta.id}" """ NanoPlot \\ + $args \\ + -t $task.cpus \\ + $input_file ++ ++ for nanoplot_file in *.html *.png *.txt *.log ++ do ++ if [[ -s \$nanoplot_file ]] ++ then ++ mv \$nanoplot_file ${prefix}_\$nanoplot_file ++ fi ++ done + + cat <<-END_VERSIONS > versions.yml + "${task.process}": ************************************************************ diff --git a/nextflow.config b/nextflow.config index de5ba81..2bd4143 100644 --- a/nextflow.config +++ b/nextflow.config @@ -10,112 +10,95 @@ params { // Input options - input = null + input = null + // References - genome = null - fasta = null - gtf = null - igenomes_base = 's3://ngi-igenomes/igenomes' - igenomes_ignore = true + genome = null + genome_fasta = null + transcript_fasta = null + gtf = null + igenomes_base = 's3://ngi-igenomes/igenomes/' + igenomes_ignore = true // Fastq Options - split_amount = 0 + split_amount = 0 // Read Trimming Options - min_length = 1 - min_q_score = 10 - skip_trimming = false + min_length = 1 + min_q_score = 10 + skip_trimming = false // Cell barcode options - whitelist = null - barcode_format = null + whitelist = null + barcode_format = null // Library strandness option - stranded = null + stranded = null // Mapping - skip_save_minimap2_index = false - kmer_size = 14 + skip_save_minimap2_index = false + kmer_size = 14 // Analysis options - retain_introns = true + retain_introns = true + quantifier = '' // Process Skipping options - skip_qc = false - skip_nanoplot = false - skip_toulligqc = false - skip_fastqc = false - skip_fastq_nanocomp = false - skip_bam_nanocomp = false - skip_rseqc = false - save_secondary_alignment = false - skip_dedup = false - skip_seurat = false - skip_multiqc = false + skip_qc = false + skip_nanoplot = false + skip_toulligqc = false + skip_fastqc = false + skip_fastq_nanocomp = false + skip_bam_nanocomp = false + skip_rseqc = false + save_genome_secondary_alignment = false + save_transcript_secondary_alignment = true + skip_dedup = false + skip_seurat = false + skip_multiqc = false // MultiQC options - multiqc_config = null - multiqc_title = null - multiqc_logo = null - max_multiqc_email_size = '25.MB' - multiqc_methods_description = null + multiqc_config = null + multiqc_title = null + multiqc_logo = null + max_multiqc_email_size = '25.MB' + multiqc_methods_description = null // Boilerplate options - outdir = null - publish_dir_mode = 'copy' - email = null - email_on_fail = null - plaintext_email = false - monochrome_logs = false - hook_url = null - help = false - version = false - pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/' + outdir = null + publish_dir_mode = 'copy' + email = null + email_on_fail = null + plaintext_email = false + monochrome_logs = false + hook_url = null + help = false + help_full = false + show_hidden = false + version = false + pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/' // Config options - config_profile_name = null - config_profile_description = null - custom_config_version = 'master' - custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" - config_profile_contact = null - config_profile_url = null - - // Max resource options - // Defaults only, expecting to be overwritten - max_memory = '128.GB' - max_cpus = 16 - max_time = '240.h' + config_profile_name = null + config_profile_description = null - // Schema validation default options - validationFailUnrecognisedParams = false - validationLenientMode = false - validationSchemaIgnoreParams = 'genomes,igenomes_base' - validationShowHiddenParams = false - validate_params = true + custom_config_version = 'master' + custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" + config_profile_contact = null + config_profile_url = null + // Schema validation default options + validate_params = true } // Load base.config by default for all pipelines includeConfig 'conf/base.config' -// Load nf-core custom profiles from different Institutions -try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") -} - -// Load nf-core/scnanoseq custom profiles from different institutions. -try { - includeConfig "${params.custom_config_base}/pipeline/scnanoseq.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config/scnanoseq profiles: ${params.custom_config_base}/pipeline/scnanoseq.config") -} profiles { debug { - dumpHashes = true - process.beforeScript = 'echo $HOSTNAME' - cleanup = false + dumpHashes = true + process.beforeScript = 'echo $HOSTNAME' + cleanup = false nextflow.enable.configProcessNamesValidation = true } conda { @@ -125,7 +108,7 @@ profiles { podman.enabled = false shifter.enabled = false charliecloud.enabled = false - conda.channels = ['conda-forge', 'bioconda', 'defaults'] + conda.channels = ['conda-forge', 'bioconda'] apptainer.enabled = false } mamba { @@ -214,25 +197,24 @@ profiles { test_full { includeConfig 'conf/test_full.config' } } -// Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile -// Will not be used unless Apptainer / Docker / Podman / Singularity are enabled -// Set to your registry if you have a mirror of containers -apptainer.registry = 'quay.io' -docker.registry = 'quay.io' -podman.registry = 'quay.io' -singularity.registry = 'quay.io' +// Load nf-core custom profiles from different Institutions +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null" -// Nextflow plugins -plugins { - id 'nf-validation@1.1.3' // Validation of pipeline parameters and creation of an input channel from a sample sheet -} +// Load nf-core/scnanoseq custom profiles from different institutions. +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/pipeline/scnanoseq.config" : "/dev/null" + +// Set default registry for Apptainer, Docker, Podman, Charliecloud and Singularity independent of -profile +// Will not be used unless Apptainer / Docker / Podman / Charliecloud / Singularity are enabled +// Set to your registry if you have a mirror of containers +apptainer.registry = 'quay.io' +docker.registry = 'quay.io' +podman.registry = 'quay.io' +singularity.registry = 'quay.io' +charliecloud.registry = 'quay.io' // Load igenomes.config if required -if (!params.igenomes_ignore) { - includeConfig 'conf/igenomes.config' -} else { - params.genomes = [:] -} +includeConfig !params.igenomes_ignore ? 'conf/igenomes.config' : 'conf/igenomes_ignored.config' + // Export these variables to prevent local Python/R libraries from conflicting with those in the container // The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container. // See https://apeltzer.github.io/post/03-julia-lang-nextflow/ for details on that. Once we have a common agreement on where to keep Julia packages, this is adjustable. @@ -244,8 +226,15 @@ env { JULIA_DEPOT_PATH = "/usr/local/share/julia" } -// Capture exit codes from upstream processes when piping -process.shell = ['/bin/bash', '-euo', 'pipefail'] +// Set bash options +process.shell = """\ +bash + +set -e # Exit if a tool returns a non-zero status/exit code +set -u # Treat unset variables and parameters as an error +set -o pipefail # Returns the status of the last command to exit with a non-zero status or zero if all successfully execute +set -C # No clobber - prevent output redirection from overwriting files. +""" // Disable process selector warnings by default. Use debug profile to enable warnings. nextflow.enable.configProcessNamesValidation = false @@ -274,43 +263,46 @@ manifest { homePage = 'https://github.com/nf-core/scnanoseq' description = """Single-cell/nuclei pipeline for data derived from Oxford Nanopore""" mainScript = 'main.nf' - nextflowVersion = '!>=23.04.0' - version = '1.0.0' + nextflowVersion = '!>=24.04.2' + version = '1.1.0' doi = '' } -// Load modules.config for DSL2 module specific options -includeConfig 'conf/modules.config' +// Nextflow plugins +plugins { + id 'nf-schema@2.1.1' // Validation of pipeline parameters and creation of an input channel from a sample sheet +} -// Function to ensure that resource requirements don't go beyond -// a maximum limit -def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj - } +validation { + defaultIgnoreParams = ["genomes"] + help { + enabled = true + command = "nextflow run $manifest.name -profile --input samplesheet.csv --outdir " + fullParameter = "help_full" + showHiddenParameter = "show_hidden" + beforeText = """ +-\033[2m----------------------------------------------------\033[0m- + \033[0;32m,--.\033[0;30m/\033[0;32m,-.\033[0m +\033[0;34m ___ __ __ __ ___ \033[0;32m/,-._.--~\'\033[0m +\033[0;34m |\\ | |__ __ / ` / \\ |__) |__ \033[0;33m} {\033[0m +\033[0;34m | \\| | \\__, \\__/ | \\ |___ \033[0;32m\\`-._,-`-,\033[0m + \033[0;32m`._,._,\'\033[0m +\033[0;35m ${manifest.name} ${manifest.version}\033[0m +-\033[2m----------------------------------------------------\033[0m- +""" + afterText = """${manifest.doi ? "* The pipeline\n" : ""}${manifest.doi.tokenize(",").collect { " https://doi.org/${it.trim().replace('https://doi.org/','')}"}.join("\n")}${manifest.doi ? "\n" : ""} +* The nf-core framework + https://doi.org/10.1038/s41587-020-0439-x + +* Software dependencies + https://github.com/${manifest.name}/blob/master/CITATIONS.md +""" + } + summary { + beforeText = validation.help.beforeText + afterText = validation.help.afterText } } + +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' diff --git a/nextflow_schema.json b/nextflow_schema.json index 6d73bae..a01c787 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/scnanoseq/master/nextflow_schema.json", "title": "nf-core/scnanoseq pipeline parameters", "description": "Single-cell/nuclei pipeline for data derived from Oxford Nanopore", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -49,22 +49,25 @@ "fa_icon": "fas fa-dna", "description": "Reference genome related files and options required for the workflow.", "properties": { - "genome": { - "type": "string", - "description": "Name of iGenomes reference.", - "fa_icon": "fas fa-book", - "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`. \n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." - }, - "fasta": { + "genome_fasta": { "type": "string", "format": "file-path", "exists": true, "mimetype": "text/plain", "pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?$", - "description": "Path to FASTA genome file.", - "help_text": "This parameter is *mandatory* if `--genome` is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with `--save_reference` to save BWA index for future runs.", + "description": "Path to genome FASTA file.", + "help_text": "This parameter is *mandatory* if `--quantifier \"both\"` or `--quantifier \"isoquant\"` is selected. ", "fa_icon": "far fa-file-code" }, + "transcript_fasta": { + "type": "string", + "description": "Path to transcriptome FASTA file.", + "pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?$", + "format": "file-path", + "mimetype": "text/plain", + "fa_icon": "far fa-file-code", + "help_text": "This parameter is *mandatory* if `--quantifier \"both\"` or `--quantifier \"oarfish\"` is selected. " + }, "gtf": { "type": "string", "fa_icon": "far fa-file-code", @@ -72,13 +75,11 @@ "format": "file-path", "pattern": "^\\S+\\.gtf(\\.gz)?$" }, - "igenomes_base": { + "genome": { "type": "string", - "format": "directory-path", - "description": "Directory / URL base for iGenomes references.", - "default": "s3://ngi-igenomes/igenomes", - "fa_icon": "fas fa-cloud-download-alt", - "hidden": true + "description": "Name of iGenomes reference.", + "fa_icon": "fas fa-book", + "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`. \n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." }, "igenomes_ignore": { "type": "boolean", @@ -87,9 +88,17 @@ "hidden": true, "help_text": "Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`.", "default": true + }, + "igenomes_base": { + "type": "string", + "format": "directory-path", + "description": "The base path to the igenomes reference files", + "fa_icon": "fas fa-ban", + "hidden": true, + "default": "s3://ngi-igenomes/igenomes/" } }, - "required": ["fasta", "gtf"] + "required": ["gtf"] }, "fastq_options": { "title": "Fastq options", @@ -172,10 +181,14 @@ "description": "Minimizer k-mer length.", "fa_icon": "fas fa-sort-amount-down" }, - "save_secondary_alignment": { + "save_genome_secondary_alignment": { + "type": "boolean", + "description": "Save the secondary alignments when aligning to the genome" + }, + "save_transcript_secondary_alignment": { "type": "boolean", - "description": "Save secondary alignment outputs.", - "fa_icon": "far fa-save" + "default": true, + "description": "Save the secondary alignments when aligning to the transcriptome" } }, "fa_icon": "far fa-map" @@ -191,8 +204,15 @@ "default": true, "description": "Indicate whether to include introns in the count matrices", "fa_icon": "fas fa-filter" + }, + "quantifier": { + "type": "string", + "description": "Provide a comma-delimited options of quantifiers for the pipeline to use. Options: (isoquant, oarfish)", + "pattern": "^(oarfish|isoquant)(,(oarfish|isoquant))*$" } - } + }, + "required": ["quantifier"], + "description": "Options related to post-mapping analysis" }, "process_skipping_options": { "title": "Process skipping options", @@ -307,41 +327,6 @@ } } }, - "max_job_request_options": { - "title": "Max job request options", - "type": "object", - "fa_icon": "fab fa-acquisitions-incorporated", - "description": "Set the top limit for requested resources for any single job.", - "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", - "properties": { - "max_cpus": { - "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", - "default": 16, - "fa_icon": "fas fa-microchip", - "hidden": true, - "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" - }, - "max_memory": { - "type": "string", - "description": "Maximum amount of memory that can be requested for any single job.", - "default": "128.GB", - "fa_icon": "fas fa-memory", - "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", - "hidden": true, - "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" - }, - "max_time": { - "type": "string", - "description": "Maximum amount of time that can be requested for any single job.", - "default": "240.h", - "fa_icon": "far fa-clock", - "pattern": "^(\\d+\\.?\\s*(s|m|h|d|day)\\s*)+$", - "hidden": true, - "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" - } - } - }, "generic_options": { "title": "Generic options", "type": "object", @@ -349,12 +334,6 @@ "description": "Less common options for the pipeline, typically set in a config file.", "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", "properties": { - "help": { - "type": "boolean", - "description": "Display help text.", - "fa_icon": "fas fa-question-circle", - "hidden": true - }, "version": { "type": "boolean", "description": "Display version and exit.", @@ -430,27 +409,6 @@ "fa_icon": "fas fa-check-square", "hidden": true }, - "validationShowHiddenParams": { - "type": "boolean", - "fa_icon": "far fa-eye-slash", - "description": "Show all params when using `--help`", - "hidden": true, - "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." - }, - "validationFailUnrecognisedParams": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters fails when an unrecognised parameter is found.", - "hidden": true, - "help_text": "By default, when an unrecognised parameter is found, it returns a warinig." - }, - "validationLenientMode": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters in lenient more.", - "hidden": true, - "help_text": "Allows string values that are parseable as numbers or booleans. For further information see [JSONSchema docs](https://github.com/everit-org/json-schema#lenient-mode)." - }, "pipelines_testdata_base_path": { "type": "string", "fa_icon": "far fa-check-circle", @@ -463,37 +421,34 @@ }, "allOf": [ { - "$ref": "#/definitions/input_output_options" - }, - { - "$ref": "#/definitions/reference_genome_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/fastq_options" + "$ref": "#/$defs/reference_genome_options" }, { - "$ref": "#/definitions/read_trimming_options" + "$ref": "#/$defs/fastq_options" }, { - "$ref": "#/definitions/cell_barcode_options" + "$ref": "#/$defs/read_trimming_options" }, { - "$ref": "#/definitions/mapping" + "$ref": "#/$defs/cell_barcode_options" }, { - "$ref": "#/definitions/analysis_options" + "$ref": "#/$defs/mapping" }, { - "$ref": "#/definitions/process_skipping_options" + "$ref": "#/$defs/analysis_options" }, { - "$ref": "#/definitions/institutional_config_options" + "$ref": "#/$defs/process_skipping_options" }, { - "$ref": "#/definitions/max_job_request_options" + "$ref": "#/$defs/institutional_config_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/subworkflows/local/align_longreads.nf b/subworkflows/local/align_longreads.nf new file mode 100644 index 0000000..d20009d --- /dev/null +++ b/subworkflows/local/align_longreads.nf @@ -0,0 +1,134 @@ +// +// Performs alignment +// + +// SUBWORKFLOWS +include { BAM_SORT_STATS_SAMTOOLS } from '../../subworkflows/nf-core/bam_sort_stats_samtools/main' +include { BAM_SORT_STATS_SAMTOOLS as BAM_SORT_STATS_SAMTOOLS_FILTERED } from '../../subworkflows/nf-core/bam_sort_stats_samtools/main' + +// MODULES +include { MINIMAP2_INDEX } from '../../modules/nf-core/minimap2/index' +include { MINIMAP2_ALIGN } from '../../modules/nf-core/minimap2/align' +include { SAMTOOLS_VIEW } from '../../modules/nf-core/samtools/view' + +include { RSEQC_READDISTRIBUTION } from '../../modules/nf-core/rseqc/readdistribution/main' +include { NANOCOMP } from '../../modules/nf-core/nanocomp/main' + + +workflow ALIGN_LONGREADS { + take: + fasta // channel: [ val(meta), path(fasta) ] + fai // channel: [ val(meta), path(fai) ] + gtf // channel: [ val(meta), path(gtf) ] + fastq // channel: [ val(meta), path(fastq) ] + rseqc_bed // channel: [ val(meta), path(rseqc_bed) ] + + skip_save_minimap2_index // bool: Skip saving the minimap2 index + skip_qc // bool: Skip qc steps + skip_rseqc // bool: Skip RSeQC + skip_bam_nanocomp // bool: Skip Nanocomp + + main: + ch_versions = Channel.empty() + // + // MINIMAP2_INDEX + // + if (skip_save_minimap2_index) { + MINIMAP2_INDEX ( fasta ) + ch_minimap_ref = MINIMAP2_INDEX.out.index + ch_versions = ch_versions.mix(MINIMAP2_INDEX.out.versions) + } else { + ch_minimap_ref = fasta + } + + // + // MINIMAP2_ALIGN + // + + MINIMAP2_ALIGN ( + fastq, + ch_minimap_ref, + true, + "bai", + "", + "" + ) + + ch_versions = ch_versions.mix(MINIMAP2_ALIGN.out.versions) + + // + // SUBWORKFLOW: BAM_SORT_STATS_SAMTOOLS + // The subworkflow is called in both the minimap2 bams and filtered (mapped only) version + BAM_SORT_STATS_SAMTOOLS ( MINIMAP2_ALIGN.out.bam, fasta ) + ch_versions = ch_versions.mix(BAM_SORT_STATS_SAMTOOLS.out.versions) + + // acquire only mapped reads from bam for downstream processing + // NOTE: some QCs steps are performed on the full BAM + SAMTOOLS_VIEW ( + BAM_SORT_STATS_SAMTOOLS.out.bam.join( BAM_SORT_STATS_SAMTOOLS.out.bai, by: 0 ), + [[],[]], + [] + ) + + ch_minimap_mapped_only_bam = SAMTOOLS_VIEW.out.bam + ch_versions = ch_versions.mix(SAMTOOLS_VIEW.out.versions) + + BAM_SORT_STATS_SAMTOOLS_FILTERED ( + ch_minimap_mapped_only_bam, + fasta + ) + + ch_minimap_filtered_sorted_bam = BAM_SORT_STATS_SAMTOOLS_FILTERED.out.bam + ch_minimap_filtered_sorted_bai = BAM_SORT_STATS_SAMTOOLS_FILTERED.out.bai + ch_versions = ch_versions.mix(BAM_SORT_STATS_SAMTOOLS_FILTERED.out.versions) + + // + // MODULE: RSeQC read distribution for BAM files (unfiltered for QC purposes) + // + ch_rseqc_read_dist = Channel.empty() + if (!skip_qc && !skip_rseqc) { + RSEQC_READDISTRIBUTION ( BAM_SORT_STATS_SAMTOOLS.out.bam, rseqc_bed ) + ch_rseqc_read_dist = RSEQC_READDISTRIBUTION.out.txt + ch_versions = ch_versions.mix(RSEQC_READDISTRIBUTION.out.versions) + } + + // + // MODULE: NanoComp for BAM files (unfiltered for QC purposes) + // + ch_nanocomp_bam_html = Channel.empty() + ch_nanocomp_bam_txt = Channel.empty() + + if (!skip_qc && !skip_bam_nanocomp) { + + NANOCOMP ( + BAM_SORT_STATS_SAMTOOLS.out.bam + .collect{it[1]} + .map{ + [ [ 'id': 'nanocomp_bam.' ] , it ] + } + ) + + ch_nanocomp_bam_html = NANOCOMP.out.report_html + ch_nanocomp_bam_txt = NANOCOMP.out.stats_txt + ch_versions = ch_versions.mix( NANOCOMP.out.versions ) + } + + emit: + versions = ch_versions + + // Bam and Bai + sorted_bam = BAM_SORT_STATS_SAMTOOLS_FILTERED.out.bam + sorted_bai = BAM_SORT_STATS_SAMTOOLS_FILTERED.out.bai + + // SAMtool stats from initial mapping + stats = BAM_SORT_STATS_SAMTOOLS.out.stats + flagstat = BAM_SORT_STATS_SAMTOOLS.out.flagstat + idxstats = BAM_SORT_STATS_SAMTOOLS.out.idxstats + + // RSeQC stats + rseqc_read_dist = ch_rseqc_read_dist + + // Nanoplot stats + nanocomp_bam_html = ch_nanocomp_bam_html + nanocomp_bam_txt = ch_nanocomp_bam_txt +} diff --git a/subworkflows/local/prepare_reference_files.nf b/subworkflows/local/prepare_reference_files.nf index 66c0df5..4028290 100644 --- a/subworkflows/local/prepare_reference_files.nf +++ b/subworkflows/local/prepare_reference_files.nf @@ -1,17 +1,18 @@ // -// Creates gtfs to that add introns as features +// Modifies the reference files for easier analysis // -include { PIGZ_UNCOMPRESS as UNZIP_FASTA } from '../../modules/nf-core/pigz/uncompress/main' -include { PIGZ_UNCOMPRESS as UNZIP_GTF } from '../../modules/nf-core/pigz/uncompress/main' -include { SAMTOOLS_FAIDX } from '../../modules/nf-core/samtools/faidx/main' +include { PIGZ_UNCOMPRESS as UNZIP_GENOME_FASTA } from '../../modules/nf-core/pigz/uncompress/main' +include { PIGZ_UNCOMPRESS as UNZIP_TRANSCRIPT_FASTA } from '../../modules/nf-core/pigz/uncompress/main' +include { PIGZ_UNCOMPRESS as UNZIP_GTF } from '../../modules/nf-core/pigz/uncompress/main' +include { SAMTOOLS_FAIDX as GENOME_FAIDX } from '../../modules/nf-core/samtools/faidx/main' +include { SAMTOOLS_FAIDX as TRANSCRIPT_FAIDX } from '../../modules/nf-core/samtools/faidx/main' workflow PREPARE_REFERENCE_FILES { take: - fasta_preparation_method - gtf_preparation_method - fasta - gtf + genome_fasta // file: path/to/genome.fasta + transcript_fasta // file: path/to/transcript.fasta + gtf // file: path/to/genome.gtf main: ch_versions = Channel.empty() @@ -19,16 +20,52 @@ workflow PREPARE_REFERENCE_FILES { // Check if fasta and gtf are zipped // - ch_prepared_fasta = Channel.empty() - if (fasta.endsWith('.gz')){ - UNZIP_FASTA( [ [:], fasta ]) + // MODULE: Unzip Genome FASTA + // + ch_genome_fasta = Channel.empty() + ch_genome_fai = Channel.empty() + if (genome_fasta) { + if (genome_fasta.endsWith('.gz')){ + UNZIP_GENOME_FASTA( [ [:], genome_fasta ]) - ch_prepared_fasta = UNZIP_FASTA.out.file - ch_versions = ch_versions.mix(UNZIP_FASTA.out.versions) - } else { - ch_prepared_fasta = [ [:], fasta ] + ch_genome_fasta = UNZIP_GENOME_FASTA.out.file + ch_versions = ch_versions.mix(UNZIP_GENOME_FASTA.out.versions) + // + // MODULE: Index the genome fasta + // + } else { + ch_genome_fasta = [ [:], genome_fasta ] + } + + GENOME_FAIDX( ch_genome_fasta, [ [:], "$projectDir/assets/dummy_file.txt" ]) + ch_genome_fai = GENOME_FAIDX.out.fai + } + + // + // MODULE: Unzip Transcript FASTA + // + ch_transcript_fasta = Channel.empty() + ch_transcript_fai = Channel.empty() + if (transcript_fasta) { + if (transcript_fasta.endsWith('.gz')){ + UNZIP_TRANSCRIPT_FASTA( [ [:], transcript_fasta ]) + + ch_transcript_fasta = UNZIP_TRANSCRIPT_FASTA.out.file + ch_versions = ch_versions.mix(UNZIP_TRANSCRIPT_FASTA.out.versions) + } else { + ch_transcript_fasta = [ [:], transcript_fasta ] + } + + // + // MODULE: Index the transcript fasta + // + TRANSCRIPT_FAIDX( ch_transcript_fasta, [ [:], "$projectDir/assets/dummy_file.txt" ]) + ch_transcript_fai = TRANSCRIPT_FAIDX.out.fai } + // + // MODULE: Unzip GTF + // ch_prepared_gtf = Channel.empty() if (gtf.endsWith('.gz')){ UNZIP_GTF( [ [:], gtf ]) @@ -39,15 +76,11 @@ workflow PREPARE_REFERENCE_FILES { ch_prepared_gtf = [ [:], gtf] } - // - // MODULE: Index the fasta - // - SAMTOOLS_FAIDX( ch_prepared_fasta, [ [:], "$projectDir/assets/dummy_file.txt" ]) - ch_prepared_fai = SAMTOOLS_FAIDX.out.fai - emit: - prepped_fasta = ch_prepared_fasta - prepped_fai = ch_prepared_fai - prepped_gtf = ch_prepared_gtf - versions = ch_versions + prepped_genome_fasta = ch_genome_fasta + genome_fai = ch_genome_fai + prepped_transcript_fasta = ch_transcript_fasta + transcript_fai = ch_transcript_fai + prepped_gtf = ch_prepared_gtf + versions = ch_versions } diff --git a/subworkflows/local/process_longread_scrna.nf b/subworkflows/local/process_longread_scrna.nf new file mode 100644 index 0000000..5efaacc --- /dev/null +++ b/subworkflows/local/process_longread_scrna.nf @@ -0,0 +1,178 @@ +// +// Performs alignment +// + +// SUBWORKFLOWS +include { ALIGN_LONGREADS } from '../../subworkflows/local/align_longreads' +include { QUANTIFY_SCRNA_ISOQUANT } from '../../subworkflows/local/quantify_scrna_isoquant' +include { QUANTIFY_SCRNA_OARFISH } from '../../subworkflows/local/quantify_scrna_oarfish' +include { UMITOOLS_DEDUP_SPLIT } from '../../subworkflows/local/umitools_dedup_split' + +// MODULES +include { SAMTOOLS_INDEX as SAMTOOLS_INDEX_TAGGED } from '../../modules/nf-core/samtools/index' +include { SAMTOOLS_FLAGSTAT as SAMTOOLS_FLAGSTAT_TAGGED } from '../../modules/nf-core/samtools/flagstat' + +include { TAG_BARCODES } from '../../modules/local/tag_barcodes' + + +workflow PROCESS_LONGREAD_SCRNA { + take: + fasta // channel: [ val(meta), path(fasta) ] + fai // channel: [ val(meta), path(fai) ] + gtf // channel: [ val(meta), path(gtf) ] + fastq // channel: [ val(meta), path(fastq) ] + rseqc_bed // channel: [ val(meta), path(rseqc_bed) ] + read_bc_info // channel: [ val(meta), path(read_barcode_info) ] + quant_list // list: List of quantifiers to use + + skip_save_minimap2_index // bool: Skip saving the minimap2 index + skip_qc // bool: Skip qc steps + skip_rseqc // bool: Skip RSeQC + skip_bam_nanocomp // bool: Skip Nanocomp + skip_seurat // bool: Skip seurat qc steps + skip_dedup // bool: Skip umitools deduplication + split_umitools_bam // bool: Skip splitting on chromsome for umitools + + main: + ch_versions = Channel.empty() + + // + // SUBWORKFLOW: Align long Read Data + // + + ALIGN_LONGREADS( + fasta, + fai, + gtf, + fastq, + rseqc_bed, + skip_save_minimap2_index, + skip_qc, + skip_rseqc, + skip_bam_nanocomp + ) + ch_versions = ch_versions.mix(ALIGN_LONGREADS.out.versions) + + // + // MODULE: Tag Barcodes + // + + TAG_BARCODES ( + ALIGN_LONGREADS.out.sorted_bam + .join( ALIGN_LONGREADS.out.sorted_bai, by: 0 ) + .join( read_bc_info, by: 0) + ) + ch_versions = ch_versions.mix(TAG_BARCODES.out.versions) + + // + // MODULE: Index Tagged Bam + // + SAMTOOLS_INDEX_TAGGED ( TAG_BARCODES.out.tagged_bam ) + ch_versions = ch_versions.mix(SAMTOOLS_INDEX_TAGGED.out.versions) + + // + // MODULE: Flagstat Tagged Bam + // + SAMTOOLS_FLAGSTAT_TAGGED ( + TAG_BARCODES.out.tagged_bam + .join( SAMTOOLS_INDEX_TAGGED.out.bai, by: [0]) + ) + ch_versions = ch_versions.mix(SAMTOOLS_FLAGSTAT_TAGGED.out.versions) + + // + // SUBWORKFLOW: UMI Deduplication + // + ch_bam = Channel.empty() + ch_bai = Channel.empty() + ch_flagstat = Channel.empty() + ch_dedup_log = Channel.empty() + ch_idxstats = Channel.empty() + + if (!skip_dedup) { + UMITOOLS_DEDUP_SPLIT( + fasta, + fai, + TAG_BARCODES.out.tagged_bam, + SAMTOOLS_INDEX_TAGGED.out.bai, + split_umitools_bam + ) + + ch_bam = UMITOOLS_DEDUP_SPLIT.out.dedup_bam + ch_bai = UMITOOLS_DEDUP_SPLIT.out.dedup_bai + ch_log = UMITOOLS_DEDUP_SPLIT.out.dedup_log + ch_flagstat = UMITOOLS_DEDUP_SPLIT.out.dedup_flagstat + ch_idxstats = UMITOOLS_DEDUP_SPLIT.out.dedup_idxstats + + ch_versions = ch_versions.mix(UMITOOLS_DEDUP_SPLIT.out.versions) + } else { + ch_bam = TAG_BARCODES.out.tagged_bam + ch_bai = SAMTOOLS_INDEX_TAGGED.out.bai + ch_flagstat = SAMTOOLS_FLAGSTAT_TAGGED.out.flagstat + } + + // + // SUBWORKFLOW: Quantify Features + // + + ch_gene_qc_stats = Channel.empty() + ch_transcript_qc_stats = Channel.empty() + + if (quant_list.contains("oarfish")) { + QUANTIFY_SCRNA_OARFISH ( + ch_bam, + ch_bai, + ch_flagstat, + fasta, + skip_qc, + skip_seurat + ) + ch_versions = ch_versions.mix(QUANTIFY_SCRNA_OARFISH.out.versions) + ch_transcript_qc_stats = QUANTIFY_SCRNA_OARFISH.out.transcript_qc_stats + } + + if (quant_list.contains("isoquant")) { + QUANTIFY_SCRNA_ISOQUANT ( + ch_bam, + ch_bai, + ch_flagstat, + fasta, + fai, + gtf, + skip_qc, + skip_seurat + ) + + ch_versions = ch_versions.mix(QUANTIFY_SCRNA_ISOQUANT.out.versions) + ch_gene_qc_stats = QUANTIFY_SCRNA_ISOQUANT.out.gene_qc_stats + ch_transcript_qc_stats = QUANTIFY_SCRNA_ISOQUANT.out.transcript_qc_stats + } + + emit: + // Versions + versions = ch_versions + + // Minimap results + qc's + minimap_bam = ALIGN_LONGREADS.out.sorted_bam + minimap_bai = ALIGN_LONGREADS.out.sorted_bai + minimap_stats = ALIGN_LONGREADS.out.stats + minimap_flagstat = ALIGN_LONGREADS.out.flagstat + minimap_idxstats = ALIGN_LONGREADS.out.idxstats + minimap_rseqc_read_dist = ALIGN_LONGREADS.out.rseqc_read_dist + minimap_nanocomp_bam_txt = ALIGN_LONGREADS.out.nanocomp_bam_txt + + // Barcode tagging results + qc's + bc_tagged_bam = TAG_BARCODES.out.tagged_bam + bc_tagged_bai = SAMTOOLS_INDEX_TAGGED.out.bai + bc_tagged_flagstat = SAMTOOLS_FLAGSTAT_TAGGED.out.flagstat + + // Deduplication results + dedup_bam = ch_bam + dedup_bai = ch_bai + dedup_log = ch_dedup_log + dedup_flagstat = ch_flagstat + dedup_idxstats = ch_idxstats + + // Seurat QC Stats + gene_qc_stats = ch_gene_qc_stats + transcript_qc_stats = ch_transcript_qc_stats +} diff --git a/subworkflows/local/qc_scrna.nf b/subworkflows/local/qc_scrna.nf new file mode 100644 index 0000000..03800c0 --- /dev/null +++ b/subworkflows/local/qc_scrna.nf @@ -0,0 +1,32 @@ +// +// Performs feature quantification for long read single-cell rna data +// + +include { SEURAT } from '../../modules/local/seurat' +include { COMBINE_SEURAT_STATS } from '../../modules/local/combine_seurat_stats' + +workflow QC_SCRNA { + take: + in_mtx + in_flagstat + mtx_format + + main: + ch_versions = Channel.empty() + + // + // MODULE: Seurat + // + SEURAT ( in_mtx.join(in_flagstat, by: [0]), mtx_format ) + ch_versions = ch_versions.mix(SEURAT.out.versions) + + // + // MODULE: Combine Seurat Stats + // + COMBINE_SEURAT_STATS ( SEURAT.out.seurat_stats.collect{it[1]} ) + ch_versions = ch_versions.mix(COMBINE_SEURAT_STATS.out.versions) + + emit: + seurat_stats = COMBINE_SEURAT_STATS.out.combined_stats + versions = ch_versions +} diff --git a/subworkflows/local/quantify_scrna_isoquant.nf b/subworkflows/local/quantify_scrna_isoquant.nf new file mode 100644 index 0000000..11858cf --- /dev/null +++ b/subworkflows/local/quantify_scrna_isoquant.nf @@ -0,0 +1,139 @@ +// +// Performs feature quantification for long read single-cell rna data +// + +include { ISOQUANT } from '../../modules/local/isoquant' +include { MERGE_MTX as MERGE_MTX_GENE } from '../../modules/local/merge_mtx' +include { MERGE_MTX as MERGE_MTX_TRANSCRIPT } from '../../modules/local/merge_mtx' +include { SPLIT_GTF } from '../../modules/local/split_gtf' +include { SPLIT_FASTA } from '../../modules/local/split_fasta' +include { SAMTOOLS_FAIDX as SAMTOOLS_FAIDX_SPLIT } from '../../modules/nf-core/samtools/faidx/main' +include { QC_SCRNA as QC_SCRNA_GENE } from '../../subworkflows/local/qc_scrna' +include { QC_SCRNA as QC_SCRNA_TRANSCRIPT } from '../../subworkflows/local/qc_scrna' + +workflow QUANTIFY_SCRNA_ISOQUANT { + take: + in_bam + in_bai + in_flagstat + in_fasta + in_fai + in_gtf + skip_qc + skip_seurat + + main: + ch_versions = Channel.empty() + + // + // MODULE: Split the FASTA + // + SPLIT_FASTA( in_fasta ) + ch_versions = ch_versions.mix(SPLIT_FASTA.out.versions) + ch_split_fasta = SPLIT_FASTA.out.split_fasta + .flatten() + .map{ + fasta -> + fasta_basename = fasta.toString().split('/')[-1] + meta = [ 'chr': fasta_basename.split(/\./)[0] ] + [ meta, fasta ] + } + + SAMTOOLS_FAIDX_SPLIT( ch_split_fasta, [ [:], "$projectDir/assets/dummy_file.txt" ]) + ch_split_fai = SAMTOOLS_FAIDX_SPLIT.out.fai + ch_versions = ch_versions.mix(SAMTOOLS_FAIDX_SPLIT.out.versions) + + // + // MODULE: Split the GTF + // + SPLIT_GTF( in_gtf ) + ch_split_gtf = SPLIT_GTF.out.split_gtf + .flatten() + .map{ + gtf -> + gtf_basename = gtf.toString().split('/')[-1] + meta = ['chr': gtf_basename.split(/\./)[0]] + [ meta, gtf ] + } + ch_versions = ch_versions.mix(SPLIT_GTF.out.versions) + + // + // MODULE: Isoquant + // + ISOQUANT ( + in_bam + .join(in_bai, by: [0]) + .map{ + meta, bam, bai -> + bam_basename = bam.toString().split('/')[-1] + split_bam_basename = bam_basename.split(/\./) + chr = [ + 'chr': split_bam_basename[1].replace("REF_","") + ] + [ chr, meta, bam, bai] + } + .combine(ch_split_fasta, by: [0]) + .combine(ch_split_fai, by: [0]) + .combine(ch_split_gtf, by: [0]) + .map{ + chr, meta, bam, bai, fasta, fai, gtf -> + [ meta, bam, bai, fasta, fai, gtf ] + }, + 'tag:CB' + ) + ch_versions = ch_versions.mix(ISOQUANT.out.versions) + + // + // MODULE: Merge Matrix + // + MERGE_MTX_GENE ( + ISOQUANT.out.gene_count_mtx + .map{ + meta, mtx -> + basename = mtx.toString().split('/')[-1] + split_basename = basename.split(/\./) + meta = [ 'id': split_basename[0] ] + [ meta, mtx ] + } + .groupTuple() + ) + ch_merged_gene_mtx = MERGE_MTX_GENE.out.merged_mtx + ch_versions = ch_versions.mix(MERGE_MTX_GENE.out.versions) + + MERGE_MTX_TRANSCRIPT ( + ISOQUANT.out.transcript_count_mtx + .map{ + meta, mtx -> + basename = mtx.toString().split('/')[-1] + split_basename = basename.split(/\./) + meta = [ 'id': split_basename[0] ] + [ meta, mtx ] + } + .groupTuple() + ) + ch_merged_transcript_mtx = MERGE_MTX_TRANSCRIPT.out.merged_mtx + ch_versions = ch_versions.mix(MERGE_MTX_TRANSCRIPT.out.versions) + + if (!params.skip_qc && !params.skip_seurat){ + QC_SCRNA_GENE ( + MERGE_MTX_GENE.out.merged_mtx, + in_flagstat, + "BASE" + ) + ch_versions = ch_versions.mix(QC_SCRNA_GENE.out.versions) + + QC_SCRNA_TRANSCRIPT ( + MERGE_MTX_TRANSCRIPT.out.merged_mtx, + in_flagstat, + "BASE" + ) + ch_versions = ch_versions.mix(QC_SCRNA_TRANSCRIPT.out.versions) + } + + emit: + versions = ch_versions + gene_mtx = ch_merged_gene_mtx + transcript_mtx = ch_merged_transcript_mtx + gene_qc_stats = QC_SCRNA_GENE.out.seurat_stats + transcript_qc_stats = QC_SCRNA_TRANSCRIPT.out.seurat_stats +} diff --git a/subworkflows/local/quantify_scrna_oarfish.nf b/subworkflows/local/quantify_scrna_oarfish.nf new file mode 100644 index 0000000..0a38aea --- /dev/null +++ b/subworkflows/local/quantify_scrna_oarfish.nf @@ -0,0 +1,51 @@ +// +// Performs feature quantification for long read single-cell rna data +// + +include { SAMTOOLS_SORT } from '../../modules/nf-core/samtools/sort/main' +include { OARFISH } from '../../modules/local/oarfish' +include { QC_SCRNA } from '../../subworkflows/local/qc_scrna' + +workflow QUANTIFY_SCRNA_OARFISH { + take: + in_bam + in_bai + in_flagstat + in_fasta + skip_qc + skip_seurat + + main: + ch_versions = Channel.empty() + + // + // MODULE: Samtools Sort + // + SAMTOOLS_SORT ( in_bam, in_fasta ) + ch_versions = ch_versions.mix(SAMTOOLS_SORT.out.versions) + + // + // MODULE: Oarfish + // + OARFISH ( SAMTOOLS_SORT.out.bam ) + ch_versions = ch_versions.mix(OARFISH.out.versions) + + if (!params.skip_qc && !params.skip_seurat) { + QC_SCRNA( + OARFISH.out.features + .join(OARFISH.out.barcodes, by: [0]) + .join(OARFISH.out.mtx, by: [0]) + .map{ + meta,features,barcodes,mtx -> + [ meta, [ features, barcodes, mtx ]] + }, + in_flagstat, + "MEX" + ) + ch_versions = ch_versions.mix(QC_SCRNA.out.versions) + } + + emit: + versions = ch_versions + transcript_qc_stats = QC_SCRNA.out.seurat_stats +} diff --git a/subworkflows/local/umitools_dedup_split.nf b/subworkflows/local/umitools_dedup_split.nf new file mode 100644 index 0000000..afd6a25 --- /dev/null +++ b/subworkflows/local/umitools_dedup_split.nf @@ -0,0 +1,126 @@ +// +// Rum UMI Dedupliation and optionally split the bam for better parallel processing +// + +// +// MODULES +// +include { BAMTOOLS_SPLIT } from '../../modules/nf-core/bamtools/split/main' +include { UMITOOLS_DEDUP } from '../../modules/nf-core/umitools/dedup/main' +include { SAMTOOLS_INDEX as SAMTOOLS_INDEX_SPLIT } from '../../modules/nf-core/samtools/index/main' +include { SAMTOOLS_INDEX as SAMTOOLS_INDEX_DEDUP } from '../../modules/nf-core/samtools/index/main' +include { SAMTOOLS_INDEX as SAMTOOLS_INDEX_MERGED } from '../../modules/nf-core/samtools/index/main' +include { SAMTOOLS_MERGE } from '../../modules/nf-core/samtools/merge/main' + +// +// SUBWORKFLOWS +// +include { BAM_STATS_SAMTOOLS } from '../../subworkflows/nf-core/bam_stats_samtools/main' + +workflow UMITOOLS_DEDUP_SPLIT { + take: + fasta // channel: [ val(meta), path(fasta) ] + fai // channel: [ val(meta), path(fai) ] + in_bam // channel: [ val(meta), path(bam) ] + in_bai // channel: [ val(meta), path(bai) ] + split_bam // bool: Split the bam + + main: + ch_versions = Channel.empty() + + if (split_bam) { + // + // MODULE: Bamtools Split + // + BAMTOOLS_SPLIT ( in_bam ) + ch_versions = ch_versions.mix(BAMTOOLS_SPLIT.out.versions.first()) + ch_undedup_bam = BAMTOOLS_SPLIT.out.bam + .map{ + meta, bam -> + [bam] + } + .flatten() + .map{ + bam -> + bam_basename = bam.toString().split('/')[-1] + split_bam_basename = bam_basename.split(/\./) + meta = [ + 'id': split_bam_basename.take(split_bam_basename.size()-1).join("."), + ] + [ meta, bam ] + } + // + // MODULE: Samtools Index + // + SAMTOOLS_INDEX_SPLIT( ch_undedup_bam ) + ch_undedup_bai = SAMTOOLS_INDEX_SPLIT.out.bai + ch_versions = ch_versions.mix(SAMTOOLS_INDEX_SPLIT.out.versions.first()) + } + else { + ch_undedup_bam = in_bam + ch_undedup_bai = in_bai + } + + // + // MODULE: Umitools Dedup + // + UMITOOLS_DEDUP ( ch_undedup_bam.join(ch_undedup_bai, by: [0]), true ) + ch_versions = ch_versions.mix(UMITOOLS_DEDUP.out.versions) + + // + // MODULE: Samtools Index + // + SAMTOOLS_INDEX_DEDUP( UMITOOLS_DEDUP.out.bam ) + ch_versions = ch_versions.mix(SAMTOOLS_INDEX_DEDUP.out.versions) + + if (split_bam) { + // + // MODULE: Samtools Merge + // + SAMTOOLS_MERGE ( + UMITOOLS_DEDUP.out.bam + .map{ + meta, bam -> + bam_basename = bam.toString().split('/')[-1] + split_bam_basename = bam_basename.split(/\./) + meta = [ 'id': split_bam_basename[0] ] + [ meta, bam ] + } + .groupTuple(), + fasta, + fai) + ch_dedup_single_bam = SAMTOOLS_MERGE.out.bam + ch_versions = ch_versions.mix(SAMTOOLS_MERGE.out.versions) + + // + // MODULE: Samtools Index + // + SAMTOOLS_INDEX_MERGED( ch_dedup_single_bam ) + ch_dedup_single_bai = SAMTOOLS_INDEX_MERGED.out.bai + ch_versions = ch_versions.mix(SAMTOOLS_INDEX_MERGED.out.versions) + } + else { + ch_dedup_single_bam = UMITOOLS_DEDUP.out.bam + ch_dedup_single_bai = SAMTOOLS_INDEX_DEDUP.out.bai + } + + // + // SUBWORKFLOW: BAM_STATS_SAMTOOLS + // + BAM_STATS_SAMTOOLS ( + ch_dedup_single_bam.join(ch_dedup_single_bai), + fasta + ) + ch_versions = ch_versions.mix(BAM_STATS_SAMTOOLS.out.versions) + + emit: + versions = ch_versions + dedup_bam = UMITOOLS_DEDUP.out.bam + dedup_log = UMITOOLS_DEDUP.out.log + dedup_bai = SAMTOOLS_INDEX_DEDUP.out.bai + dedup_flagstat = BAM_STATS_SAMTOOLS.out.flagstat + + // TODO: Do we need these? + dedup_stats = BAM_STATS_SAMTOOLS.out.stats + dedup_idxstats = BAM_STATS_SAMTOOLS.out.idxstats +} diff --git a/subworkflows/local/utils_nfcore_scnanoseq_pipeline/main.nf b/subworkflows/local/utils_nfcore_scnanoseq_pipeline/main.nf index b7625a3..8a62f6e 100644 --- a/subworkflows/local/utils_nfcore_scnanoseq_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_scnanoseq_pipeline/main.nf @@ -8,29 +8,25 @@ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -include { UTILS_NFVALIDATION_PLUGIN } from '../../nf-core/utils_nfvalidation_plugin' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { fromSamplesheet } from 'plugin/nf-validation' -include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' +include { UTILS_NFSCHEMA_PLUGIN } from '../../nf-core/utils_nfschema_plugin' +include { paramsSummaryMap } from 'plugin/nf-schema' +include { samplesheetToList } from 'plugin/nf-schema' include { completionEmail } from '../../nf-core/utils_nfcore_pipeline' include { completionSummary } from '../../nf-core/utils_nfcore_pipeline' -include { dashedLine } from '../../nf-core/utils_nfcore_pipeline' -include { nfCoreLogo } from '../../nf-core/utils_nfcore_pipeline' include { imNotification } from '../../nf-core/utils_nfcore_pipeline' include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline' -include { workflowCitation } from '../../nf-core/utils_nfcore_pipeline' +include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW TO INITIALISE PIPELINE -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_INITIALISATION { take: version // boolean: Display version and exit - help // boolean: Display help text validate_params // boolean: Boolean whether to validate parameters against the schema at runtime monochrome_logs // boolean: Do not use coloured log outputs nextflow_cli_args // array: List of positional nextflow CLI args @@ -54,14 +50,8 @@ workflow PIPELINE_INITIALISATION { // // Validate parameters and generate parameter summary to stdout // - pre_help_text = nfCoreLogo(monochrome_logs) - post_help_text = '\n' + workflowCitation() + '\n' + dashedLine(monochrome_logs) - def String workflow_command = "nextflow run ${workflow.manifest.name} -profile --input samplesheet.csv --outdir " - UTILS_NFVALIDATION_PLUGIN ( - help, - workflow_command, - pre_help_text, - post_help_text, + UTILS_NFSCHEMA_PLUGIN ( + workflow, validate_params, "nextflow_schema.json" ) @@ -72,6 +62,7 @@ workflow PIPELINE_INITIALISATION { UTILS_NFCORE_PIPELINE ( nextflow_cli_args ) + // // Custom validation for pipeline parameters // @@ -82,14 +73,14 @@ workflow PIPELINE_INITIALISATION { // Channel - .fromSamplesheet("input") + .fromList(samplesheetToList(params.input, "${projectDir}/assets/schema_input.json")) .map{ meta, fastq, cell_count_val -> return [ meta.id, meta + [ single_end:true, cell_count: cell_count_val ], [ fastq ] ] } .groupTuple() - .map { - validateInputSamplesheet(it) + .map { samplesheet -> + validateInputSamplesheet(samplesheet) } .map { meta, fastqs -> @@ -103,9 +94,9 @@ workflow PIPELINE_INITIALISATION { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW FOR PIPELINE COMPLETION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_COMPLETION { @@ -120,7 +111,6 @@ workflow PIPELINE_COMPLETION { multiqc_report // string: Path to MultiQC report main: - summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json") // @@ -128,11 +118,18 @@ workflow PIPELINE_COMPLETION { // workflow.onComplete { if (email || email_on_fail) { - completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs, multiqc_report.toList()) + completionEmail( + summary_params, + email, + email_on_fail, + plaintext_email, + outdir, + monochrome_logs, + multiqc_report.toList() + ) } completionSummary(monochrome_logs) - if (hook_url) { imNotification(summary_params, hook_url) } @@ -144,15 +141,25 @@ workflow PIPELINE_COMPLETION { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Check and validate pipeline parameters // def validateInputParameters() { genomeExistsError() + + if ((params.quantifier.equals('isoquant') || params.quantifier.equals('both')) && !params.genome_fasta) { + def error_string = "In order to quantify with isoquant, a genome fasta must be provided" + error(error_string) + } + + if ((params.quantifier.equals('oarfish') || params.quantifier.equals('both')) && !params.transcript_fasta) { + def error_string = "In order to quantify with oarfish, a transcript fasta must be provided" + error(error_string) + } } // @@ -162,7 +169,7 @@ def validateInputSamplesheet(input) { def (metas, fastqs) = input[1..2] // Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end - def endedness_ok = metas.collect{ it.single_end }.unique().size == 1 + def endedness_ok = metas.collect{ meta -> meta.single_end }.unique().size == 1 if (!endedness_ok) { error("Please check input samplesheet -> Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end: ${metas[0].id}") } @@ -194,7 +201,6 @@ def genomeExistsError() { error(error_string) } } - // // Generate methods description for MultiQC // @@ -236,8 +242,10 @@ def methodsDescriptionText(mqc_methods_yaml) { // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers // Removing ` ` since the manifest.doi is a string and not a proper list def temp_doi_ref = "" - String[] manifest_doi = meta.manifest_map.doi.tokenize(",") - for (String doi_ref: manifest_doi) temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), " + def manifest_doi = meta.manifest_map.doi.tokenize(",") + manifest_doi.each { doi_ref -> + temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), " + } meta["doi_text"] = temp_doi_ref.substring(0, temp_doi_ref.length() - 2) } else meta["doi_text"] = "" meta["nodoi_text"] = meta.manifest_map.doi ? "" : "
  • If available, make sure to update the text to include the Zenodo DOI of version of the pipeline used.
  • " @@ -258,3 +266,4 @@ def methodsDescriptionText(mqc_methods_yaml) { return description_html.toString() } + diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf index ac31f28..0fcbf7b 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf @@ -2,18 +2,13 @@ // Subworkflow with functionality that may be useful for any Nextflow pipeline // -import org.yaml.snakeyaml.Yaml -import groovy.json.JsonOutput -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NEXTFLOW_PIPELINE { - take: print_version // boolean: print version dump_parameters // boolean: dump parameters @@ -26,7 +21,7 @@ workflow UTILS_NEXTFLOW_PIPELINE { // Print workflow version and exit on --version // if (print_version) { - log.info "${workflow.manifest.name} ${getWorkflowVersion()}" + log.info("${workflow.manifest.name} ${getWorkflowVersion()}") System.exit(0) } @@ -49,16 +44,16 @@ workflow UTILS_NEXTFLOW_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Generate version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -76,13 +71,13 @@ def getWorkflowVersion() { // Dump pipeline parameters to a JSON file // def dumpParametersToJSON(outdir) { - def timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') - def filename = "params_${timestamp}.json" - def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") - def jsonStr = JsonOutput.toJson(params) - temp_pf.text = JsonOutput.prettyPrint(jsonStr) + def timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss') + def filename = "params_${timestamp}.json" + def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") + def jsonStr = groovy.json.JsonOutput.toJson(params) + temp_pf.text = groovy.json.JsonOutput.prettyPrint(jsonStr) - FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") + nextflow.extension.FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") temp_pf.delete() } @@ -90,37 +85,40 @@ def dumpParametersToJSON(outdir) { // When running with -profile conda, warn if channels have not been set-up appropriately // def checkCondaChannels() { - Yaml parser = new Yaml() + def parser = new org.yaml.snakeyaml.Yaml() def channels = [] try { def config = parser.load("conda config --show channels".execute().text) channels = config.channels - } catch(NullPointerException | IOException e) { - log.warn "Could not verify conda channel configuration." - return + } + catch (NullPointerException e) { + log.warn("Could not verify conda channel configuration.") + return null + } + catch (IOException e) { + log.warn("Could not verify conda channel configuration.") + return null } // Check that all channels are present // This channel list is ordered by required channel priority. - def required_channels_in_order = ['conda-forge', 'bioconda', 'defaults'] + def required_channels_in_order = ['conda-forge', 'bioconda'] def channels_missing = ((required_channels_in_order as Set) - (channels as Set)) as Boolean // Check that they are in the right order - def channel_priority_violation = false - def n = required_channels_in_order.size() - for (int i = 0; i < n - 1; i++) { - channel_priority_violation |= !(channels.indexOf(required_channels_in_order[i]) < channels.indexOf(required_channels_in_order[i+1])) - } + def channel_priority_violation = required_channels_in_order != channels.findAll { ch -> ch in required_channels_in_order } if (channels_missing | channel_priority_violation) { - log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + - " There is a problem with your Conda configuration!\n\n" + - " You will need to set-up the conda-forge and bioconda channels correctly.\n" + - " Please refer to https://bioconda.github.io/\n" + - " The observed channel order is \n" + - " ${channels}\n" + - " but the following channel order is required:\n" + - " ${required_channels_in_order}\n" + - "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + log.warn """\ + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + There is a problem with your Conda configuration! + You will need to set-up the conda-forge and bioconda channels correctly. + Please refer to https://bioconda.github.io/ + The observed channel order is + ${channels} + but the following channel order is required: + ${required_channels_in_order} + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + """.stripIndent(true) } } diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config index d0a926b..a09572e 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config +++ b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config @@ -3,7 +3,7 @@ manifest { author = """nf-core""" homePage = 'https://127.0.0.1' description = """Dummy pipeline""" - nextflowVersion = '!>=23.04.0' + nextflowVersion = '!>=23.04.0' version = '9.9.9' doi = 'https://doi.org/10.5281/zenodo.5070524' } diff --git a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf index a8b55d6..4cd3362 100644 --- a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf @@ -2,17 +2,13 @@ // Subworkflow with utility functions specific to the nf-core pipeline template // -import org.yaml.snakeyaml.Yaml -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NFCORE_PIPELINE { - take: nextflow_cli_args @@ -25,23 +21,20 @@ workflow UTILS_NFCORE_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Warn if a -profile or Nextflow config has not been provided to run the pipeline // def checkConfigProvided() { - valid_config = true + def valid_config = true as Boolean if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) { - log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" + - "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + - " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + - " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + - " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + - "Please refer to the quick start section and usage docs for the pipeline.\n " + log.warn( + "[${workflow.manifest.name}] You are attempting to run the pipeline without any custom configuration!\n\n" + "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + "Please refer to the quick start section and usage docs for the pipeline.\n " + ) valid_config = false } return valid_config @@ -52,12 +45,14 @@ def checkConfigProvided() { // def checkProfileProvided(nextflow_cli_args) { if (workflow.profile.endsWith(',')) { - error "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + error( + "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } if (nextflow_cli_args[0]) { - log.warn "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + log.warn( + "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } } @@ -65,20 +60,22 @@ def checkProfileProvided(nextflow_cli_args) { // Citation string for pipeline // def workflowCitation() { - return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + - "* The pipeline\n" + - " ${workflow.manifest.doi}\n\n" + - "* The nf-core framework\n" + - " https://doi.org/10.1038/s41587-020-0439-x\n\n" + - "* Software dependencies\n" + - " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + def temp_doi_ref = "" + def manifest_doi = workflow.manifest.doi.tokenize(",") + // Handling multiple DOIs + // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers + // Removing ` ` since the manifest.doi is a string and not a proper list + manifest_doi.each { doi_ref -> + temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n" + } + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + "* The pipeline\n" + temp_doi_ref + "\n" + "* The nf-core framework\n" + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + "* Software dependencies\n" + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" } // // Generate workflow version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -96,8 +93,8 @@ def getWorkflowVersion() { // Get software versions for pipeline // def processVersionsFromYAML(yaml_file) { - Yaml yaml = new Yaml() - versions = yaml.load(yaml_file).collectEntries { k, v -> [ k.tokenize(':')[-1], v ] } + def yaml = new org.yaml.snakeyaml.Yaml() + def versions = yaml.load(yaml_file).collectEntries { k, v -> [k.tokenize(':')[-1], v] } return yaml.dumpAsMap(versions).trim() } @@ -107,8 +104,8 @@ def processVersionsFromYAML(yaml_file) { def workflowVersionToYAML() { return """ Workflow: - $workflow.manifest.name: ${getWorkflowVersion()} - Nextflow: $workflow.nextflow.version + ${workflow.manifest.name}: ${getWorkflowVersion()} + Nextflow: ${workflow.nextflow.version} """.stripIndent().trim() } @@ -116,11 +113,7 @@ def workflowVersionToYAML() { // Get channel of software versions used in pipeline in YAML format // def softwareVersionsToYAML(ch_versions) { - return ch_versions - .unique() - .map { processVersionsFromYAML(it) } - .unique() - .mix(Channel.of(workflowVersionToYAML())) + return ch_versions.unique().map { version -> processVersionsFromYAML(version) }.unique().mix(Channel.of(workflowVersionToYAML())) } // @@ -128,25 +121,31 @@ def softwareVersionsToYAML(ch_versions) { // def paramsSummaryMultiqc(summary_params) { def summary_section = '' - for (group in summary_params.keySet()) { - def group_params = summary_params.get(group) // This gets the parameters of that particular group - if (group_params) { - summary_section += "

    $group

    \n" - summary_section += "
    \n" - for (param in group_params.keySet()) { - summary_section += "
    $param
    ${group_params.get(param) ?: 'N/A'}
    \n" + summary_params + .keySet() + .each { group -> + def group_params = summary_params.get(group) + // This gets the parameters of that particular group + if (group_params) { + summary_section += "

    ${group}

    \n" + summary_section += "
    \n" + group_params + .keySet() + .sort() + .each { param -> + summary_section += "
    ${param}
    ${group_params.get(param) ?: 'N/A'}
    \n" + } + summary_section += "
    \n" } - summary_section += "
    \n" } - } - String yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n" - yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" - yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" - yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" - yaml_file_text += "plot_type: 'html'\n" - yaml_file_text += "data: |\n" - yaml_file_text += "${summary_section}" + def yaml_file_text = "id: '${workflow.manifest.name.replace('/', '-')}-summary'\n" as String + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += "data: |\n" + yaml_file_text += "${summary_section}" return yaml_file_text } @@ -155,7 +154,7 @@ def paramsSummaryMultiqc(summary_params) { // nf-core logo // def nfCoreLogo(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map String.format( """\n ${dashedLine(monochrome_logs)} @@ -174,7 +173,7 @@ def nfCoreLogo(monochrome_logs=true) { // Return dashed line // def dashedLine(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map return "-${colors.dim}----------------------------------------------------${colors.reset}-" } @@ -182,7 +181,7 @@ def dashedLine(monochrome_logs=true) { // ANSII colours used for terminal logging // def logColours(monochrome_logs=true) { - Map colorcodes = [:] + def colorcodes = [:] as Map // Reset / Meta colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" @@ -194,54 +193,54 @@ def logColours(monochrome_logs=true) { colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" // Regular Colors - colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" - colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" - colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" - colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" - colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" - colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" - colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" - colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" // Bold - colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" - colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" - colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" - colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" - colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" - colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" - colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" - colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" // Underline - colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" - colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" - colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" - colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" - colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" - colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" - colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" - colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" // High Intensity - colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" - colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" - colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" - colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" - colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" - colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" - colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" - colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" // Bold High Intensity - colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" - colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" - colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" - colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" - colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" - colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" - colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" - colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" return colorcodes } @@ -256,14 +255,16 @@ def attachMultiqcReport(multiqc_report) { mqc_report = multiqc_report.getVal() if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { if (mqc_report.size() > 1) { - log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" + log.warn("[${workflow.manifest.name}] Found multiple reports from process 'MULTIQC', will use only one") } mqc_report = mqc_report[0] } } - } catch (all) { + } + catch (Exception msg) { + log.debug(msg) if (multiqc_report) { - log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" + log.warn("[${workflow.manifest.name}] Could not attach MultiQC report to summary email") } } return mqc_report @@ -275,26 +276,35 @@ def attachMultiqcReport(multiqc_report) { def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs=true, multiqc_report=null) { // Set up the e-mail variables - def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + def subject = "[${workflow.manifest.name}] Successful: ${workflow.runName}" if (!workflow.success) { - subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + subject = "[${workflow.manifest.name}] FAILED: ${workflow.runName}" } def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] misc_fields['Date Started'] = workflow.start misc_fields['Date Completed'] = workflow.complete misc_fields['Pipeline script file path'] = workflow.scriptFile misc_fields['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision - misc_fields['Nextflow Version'] = workflow.nextflow.version - misc_fields['Nextflow Build'] = workflow.nextflow.build + if (workflow.repository) { + misc_fields['Pipeline repository Git URL'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['Pipeline repository Git Commit'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['Pipeline Git branch/tag'] = workflow.revision + } + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp def email_fields = [:] @@ -332,39 +342,43 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Render the sendmail template def max_multiqc_email_size = (params.containsKey('max_multiqc_email_size') ? params.max_multiqc_email_size : 0) as nextflow.util.MemoryUnit - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def smail_fields = [email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes()] def sf = new File("${workflow.projectDir}/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() // Send the HTML e-mail - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (email_address) { try { - if (plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + if (plaintext_email) { +new org.codehaus.groovy.GroovyException('Send plaintext e-mail, not HTML') } // Try to send HTML e-mail using sendmail def sendmail_tf = new File(workflow.launchDir.toString(), ".sendmail_tmp.html") sendmail_tf.withWriter { w -> w << sendmail_html } - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" - } catch (all) { + ['sendmail', '-t'].execute() << sendmail_html + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (sendmail)-") + } + catch (Exception msg) { + log.debug(msg) + log.debug("Trying with mail instead of sendmail") // Catch failures and try with plaintext - def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + def mail_cmd = ['mail', '-s', subject, '--content-type=text/html', email_address] mail_cmd.execute() << email_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (mail)-") } } // Write summary e-mail HTML to a file def output_hf = new File(workflow.launchDir.toString(), ".pipeline_report.html") output_hf.withWriter { w -> w << email_html } - FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html"); + nextflow.extension.FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html") output_hf.delete() // Write summary e-mail TXT to a file def output_tf = new File(workflow.launchDir.toString(), ".pipeline_report.txt") output_tf.withWriter { w -> w << email_txt } - FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt"); + nextflow.extension.FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt") output_tf.delete() } @@ -372,15 +386,17 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Print pipeline summary on completion // def completionSummary(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (workflow.success) { if (workflow.stats.ignoredCount == 0) { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Pipeline completed successfully${colors.reset}-") + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-") } - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.red} Pipeline completed with errors${colors.reset}-") } } @@ -389,21 +405,30 @@ def completionSummary(monochrome_logs=true) { // def imNotification(summary_params, hook_url) { def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] - misc_fields['start'] = workflow.start - misc_fields['complete'] = workflow.complete - misc_fields['scriptfile'] = workflow.scriptFile - misc_fields['scriptid'] = workflow.scriptId - if (workflow.repository) misc_fields['repository'] = workflow.repository - if (workflow.commitId) misc_fields['commitid'] = workflow.commitId - if (workflow.revision) misc_fields['revision'] = workflow.revision - misc_fields['nxf_version'] = workflow.nextflow.version - misc_fields['nxf_build'] = workflow.nextflow.build - misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp + misc_fields['start'] = workflow.start + misc_fields['complete'] = workflow.complete + misc_fields['scriptfile'] = workflow.scriptFile + misc_fields['scriptid'] = workflow.scriptId + if (workflow.repository) { + misc_fields['repository'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['commitid'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['revision'] = workflow.revision + } + misc_fields['nxf_version'] = workflow.nextflow.version + misc_fields['nxf_build'] = workflow.nextflow.build + misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp def msg_fields = [:] msg_fields['version'] = getWorkflowVersion() @@ -428,13 +453,13 @@ def imNotification(summary_params, hook_url) { def json_message = json_template.toString() // POST - def post = new URL(hook_url).openConnection(); + def post = new URL(hook_url).openConnection() post.setRequestMethod("POST") post.setDoOutput(true) post.setRequestProperty("Content-Type", "application/json") - post.getOutputStream().write(json_message.getBytes("UTF-8")); - def postRC = post.getResponseCode(); - if (! postRC.equals(200)) { - log.warn(post.getErrorStream().getText()); + post.getOutputStream().write(json_message.getBytes("UTF-8")) + def postRC = post.getResponseCode() + if (!postRC.equals(200)) { + log.warn(post.getErrorStream().getText()) } } diff --git a/subworkflows/nf-core/utils_nfschema_plugin/main.nf b/subworkflows/nf-core/utils_nfschema_plugin/main.nf new file mode 100644 index 0000000..4994303 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/main.nf @@ -0,0 +1,46 @@ +// +// Subworkflow that uses the nf-schema plugin to validate parameters and render the parameter summary +// + +include { paramsSummaryLog } from 'plugin/nf-schema' +include { validateParameters } from 'plugin/nf-schema' + +workflow UTILS_NFSCHEMA_PLUGIN { + + take: + input_workflow // workflow: the workflow object used by nf-schema to get metadata from the workflow + validate_params // boolean: validate the parameters + parameters_schema // string: path to the parameters JSON schema. + // this has to be the same as the schema given to `validation.parametersSchema` + // when this input is empty it will automatically use the configured schema or + // "${projectDir}/nextflow_schema.json" as default. This input should not be empty + // for meta pipelines + + main: + + // + // Print parameter summary to stdout. This will display the parameters + // that differ from the default given in the JSON schema + // + if(parameters_schema) { + log.info paramsSummaryLog(input_workflow, parameters_schema:parameters_schema) + } else { + log.info paramsSummaryLog(input_workflow) + } + + // + // Validate the parameters using nextflow_schema.json or the schema + // given via the validation.parametersSchema configuration option + // + if(validate_params) { + if(parameters_schema) { + validateParameters(parameters_schema:parameters_schema) + } else { + validateParameters() + } + } + + emit: + dummy_emit = true +} + diff --git a/subworkflows/nf-core/utils_nfschema_plugin/meta.yml b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml new file mode 100644 index 0000000..f7d9f02 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml @@ -0,0 +1,35 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "utils_nfschema_plugin" +description: Run nf-schema to validate parameters and create a summary of changed parameters +keywords: + - validation + - JSON schema + - plugin + - parameters + - summary +components: [] +input: + - input_workflow: + type: object + description: | + The workflow object of the used pipeline. + This object contains meta data used to create the params summary log + - validate_params: + type: boolean + description: Validate the parameters and error if invalid. + - parameters_schema: + type: string + description: | + Path to the parameters JSON schema. + This has to be the same as the schema given to the `validation.parametersSchema` config + option. When this input is empty it will automatically use the configured schema or + "${projectDir}/nextflow_schema.json" as default. The schema should not be given in this way + for meta pipelines. +output: + - dummy_emit: + type: boolean + description: Dummy emit to make nf-core subworkflows lint happy +authors: + - "@nvnieuwk" +maintainers: + - "@nvnieuwk" diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test new file mode 100644 index 0000000..842dc43 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test @@ -0,0 +1,117 @@ +nextflow_workflow { + + name "Test Subworkflow UTILS_NFSCHEMA_PLUGIN" + script "../main.nf" + workflow "UTILS_NFSCHEMA_PLUGIN" + + tag "subworkflows" + tag "subworkflows_nfcore" + tag "subworkflows/utils_nfschema_plugin" + tag "plugin/nf-schema" + + config "./nextflow.config" + + test("Should run nothing") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } + + test("Should run nothing - custom schema") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params - custom schema") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } +} diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config new file mode 100644 index 0000000..0907ac5 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config @@ -0,0 +1,8 @@ +plugins { + id "nf-schema@2.1.0" +} + +validation { + parametersSchema = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + monochromeLogs = true +} \ No newline at end of file diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json similarity index 95% rename from subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json rename to subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json index 7626c1c..331e0d2 100644 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/./master/nextflow_schema.json", "title": ". pipeline parameters", "description": "", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -87,10 +87,10 @@ }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf b/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf deleted file mode 100644 index 2585b65..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf +++ /dev/null @@ -1,62 +0,0 @@ -// -// Subworkflow that uses the nf-validation plugin to render help text and parameter summary -// - -/* -======================================================================================== - IMPORT NF-VALIDATION PLUGIN -======================================================================================== -*/ - -include { paramsHelp } from 'plugin/nf-validation' -include { paramsSummaryLog } from 'plugin/nf-validation' -include { validateParameters } from 'plugin/nf-validation' - -/* -======================================================================================== - SUBWORKFLOW DEFINITION -======================================================================================== -*/ - -workflow UTILS_NFVALIDATION_PLUGIN { - - take: - print_help // boolean: print help - workflow_command // string: default commmand used to run pipeline - pre_help_text // string: string to be printed before help text and summary log - post_help_text // string: string to be printed after help text and summary log - validate_params // boolean: validate parameters - schema_filename // path: JSON schema file, null to use default value - - main: - - log.debug "Using schema file: ${schema_filename}" - - // Default values for strings - pre_help_text = pre_help_text ?: '' - post_help_text = post_help_text ?: '' - workflow_command = workflow_command ?: '' - - // - // Print help message if needed - // - if (print_help) { - log.info pre_help_text + paramsHelp(workflow_command, parameters_schema: schema_filename) + post_help_text - System.exit(0) - } - - // - // Print parameter summary to stdout - // - log.info pre_help_text + paramsSummaryLog(workflow, parameters_schema: schema_filename) + post_help_text - - // - // Validate parameters relative to the parameter JSON schema - // - if (validate_params){ - validateParameters(parameters_schema: schema_filename) - } - - emit: - dummy_emit = true -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml deleted file mode 100644 index 3d4a6b0..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml +++ /dev/null @@ -1,44 +0,0 @@ -# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json -name: "UTILS_NFVALIDATION_PLUGIN" -description: Use nf-validation to initiate and validate a pipeline -keywords: - - utility - - pipeline - - initialise - - validation -components: [] -input: - - print_help: - type: boolean - description: | - Print help message and exit - - workflow_command: - type: string - description: | - The command to run the workflow e.g. "nextflow run main.nf" - - pre_help_text: - type: string - description: | - Text to print before the help message - - post_help_text: - type: string - description: | - Text to print after the help message - - validate_params: - type: boolean - description: | - Validate the parameters and error if invalid. - - schema_filename: - type: string - description: | - The filename of the schema to validate against. -output: - - dummy_emit: - type: boolean - description: | - Dummy emit to make nf-core subworkflows lint happy -authors: - - "@adamrtalbot" -maintainers: - - "@adamrtalbot" - - "@maxulysse" diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test deleted file mode 100644 index 5784a33..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test +++ /dev/null @@ -1,200 +0,0 @@ -nextflow_workflow { - - name "Test Workflow UTILS_NFVALIDATION_PLUGIN" - script "../main.nf" - workflow "UTILS_NFVALIDATION_PLUGIN" - tag "subworkflows" - tag "subworkflows_nfcore" - tag "plugin/nf-validation" - tag "'plugin/nf-validation'" - tag "utils_nfvalidation_plugin" - tag "subworkflows/utils_nfvalidation_plugin" - - test("Should run nothing") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success } - ) - } - } - - test("Should run help") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with command") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with extra text") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = "pre-help-text" - post_help_text = "post-help-text" - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('pre-help-text') } }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } }, - { assert workflow.stdout.any { it.contains('post-help-text') } } - ) - } - } - - test("Should validate params") { - - when { - - params { - monochrome_logs = true - test_data = '' - outdir = 1 - } - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = true - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.failed }, - { assert workflow.stdout.any { it.contains('ERROR ~ ERROR: Validation of pipeline parameters failed!') } } - ) - } - } -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml deleted file mode 100644 index 60b1cff..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml +++ /dev/null @@ -1,2 +0,0 @@ -subworkflows/utils_nfvalidation_plugin: - - subworkflows/nf-core/utils_nfvalidation_plugin/** diff --git a/workflows/scnanoseq.nf b/workflows/scnanoseq.nf index 8372f88..9bc4c52 100644 --- a/workflows/scnanoseq.nf +++ b/workflows/scnanoseq.nf @@ -4,6 +4,7 @@ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ +// Whitelist if (params.whitelist) { blaze_whitelist = params.whitelist } @@ -16,6 +17,25 @@ else { } } +// Quantifiers + +// Associate the quantifiers with the kind of alignment needed +GENOME_QUANT_OPTS = [ 'isoquant' ] +TRANSCRIPT_QUANT_OPTS = [ 'oarfish' ] + +genome_quants = [] +transcript_quants = [] +for (quantifier in params.quantifier.split(',')) { + if (quantifier in GENOME_QUANT_OPTS) { + genome_quants.add(quantifier) + } + + if (quantifier in TRANSCRIPT_QUANT_OPTS) { + transcript_quants.add(quantifier) + } +} + + /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CONFIG FILES @@ -37,73 +57,55 @@ ch_multiqc_custom_methods_description = params.multiqc_methods_description ? f // MODULE: Loaded from modules/local/ // -include { NANOFILT } from "../modules/local/nanofilt" -include { SPLIT_FILE } from "../modules/local/split_file" -include { SPLIT_FILE as SPLIT_FILE_BC_FASTQ } from "../modules/local/split_file" -include { SPLIT_FILE as SPLIT_FILE_BC_CSV } from "../modules/local/split_file" -include { BLAZE } from "../modules/local/blaze" -include { PREEXTRACT_FASTQ } from "../modules/local/preextract_fastq.nf" -include { READ_COUNTS } from "../modules/local/read_counts.nf" -include { TAG_BARCODES } from "../modules/local/tag_barcodes" -include { CORRECT_BARCODES } from "../modules/local/correct_barcodes" -include { ISOQUANT } from "../modules/local/isoquant" -include { SEURAT as SEURAT_GENE } from "../modules/local/seurat" -include { SEURAT as SEURAT_TRANSCRIPT } from "../modules/local/seurat" -include { COMBINE_SEURAT_STATS as COMBINE_SEURAT_STATS_GENE } from "../modules/local/combine_seurat_stats" -include { COMBINE_SEURAT_STATS as COMBINE_SEURAT_STATS_TRANSCRIPT } from "../modules/local/combine_seurat_stats" -include { UCSC_GTFTOGENEPRED } from "../modules/local/ucsc_gtftogenepred" -include { UCSC_GENEPREDTOBED } from "../modules/local/ucsc_genepredtobed" +include { NANOFILT } from "../modules/local/nanofilt" +include { SPLIT_FILE } from "../modules/local/split_file" +include { SPLIT_FILE as SPLIT_FILE_BC_FASTQ } from "../modules/local/split_file" +include { SPLIT_FILE as SPLIT_FILE_BC_CSV } from "../modules/local/split_file" +include { BLAZE } from "../modules/local/blaze" +include { PREEXTRACT_FASTQ } from "../modules/local/preextract_fastq.nf" +include { READ_COUNTS } from "../modules/local/read_counts.nf" +include { CORRECT_BARCODES } from "../modules/local/correct_barcodes" +include { UCSC_GTFTOGENEPRED } from "../modules/local/ucsc_gtftogenepred" +include { UCSC_GENEPREDTOBED } from "../modules/local/ucsc_genepredtobed" // // SUBWORKFLOW: Consisting of a mix of local and nf-core/modules // -include { PREPARE_REFERENCE_FILES } from "../subworkflows/local/prepare_reference_files" +include { PREPARE_REFERENCE_FILES } from "../subworkflows/local/prepare_reference_files" +include { PROCESS_LONGREAD_SCRNA as PROCESS_LONGREAD_SCRNA_GENOME } from "../subworkflows/local/process_longread_scrna" +include { PROCESS_LONGREAD_SCRNA as PROCESS_LONGREAD_SCRNA_TRANSCRIPT } from "../subworkflows/local/process_longread_scrna" /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORT NF-CORE MODULES/SUBWORKFLOWS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ - // // MODULE: Installed directly from nf-core/modules // -include { PIGZ_UNCOMPRESS } from "../modules/nf-core/pigz/uncompress/main" -include { PIGZ_COMPRESS } from "../modules/nf-core/pigz/compress/main" -include { NANOCOMP as NANOCOMP_FASTQ } from "../modules/nf-core/nanocomp/main" -include { NANOCOMP as NANOCOMP_BAM } from "../modules/nf-core/nanocomp/main" -include { MULTIQC as MULTIQC_RAWQC } from "../modules/nf-core/multiqc/main" -include { MULTIQC as MULTIQC_FINALQC } from "../modules/nf-core/multiqc/main" -include { CUSTOM_DUMPSOFTWAREVERSIONS } from "../modules/nf-core/custom/dumpsoftwareversions/main" -include { UMITOOLS_DEDUP } from "../modules/nf-core/umitools/dedup/main" -include { SAMTOOLS_VIEW as SAMTOOLS_VIEW_FILTER } from "../modules/nf-core/samtools/view/main" -include { CAT_CAT } from "../modules/nf-core/cat/cat/main" -include { CAT_CAT as CAT_CAT_PREEXTRACT } from "../modules/nf-core/cat/cat/main" -include { CAT_CAT as CAT_CAT_BARCODE } from "../modules/nf-core/cat/cat/main" -include { CAT_FASTQ } from "../modules/nf-core/cat/fastq/main" -include { MINIMAP2_INDEX } from "../modules/nf-core/minimap2/index/main" -include { MINIMAP2_ALIGN } from "../modules/nf-core/minimap2/align/main" -include { RSEQC_READDISTRIBUTION } from "../modules/nf-core/rseqc/readdistribution/main" -include { BAMTOOLS_SPLIT } from "../modules/nf-core/bamtools/split/main" -include { SAMTOOLS_MERGE } from "../modules/nf-core/samtools/merge/main" -include { paramsSummaryMap } from "plugin/nf-validation" +include { PIGZ_UNCOMPRESS } from "../modules/nf-core/pigz/uncompress/main" +include { PIGZ_COMPRESS } from "../modules/nf-core/pigz/compress/main" +include { NANOCOMP as NANOCOMP_FASTQ } from "../modules/nf-core/nanocomp/main" +include { MULTIQC as MULTIQC_RAWQC } from "../modules/nf-core/multiqc/main" +include { MULTIQC as MULTIQC_FINALQC } from "../modules/nf-core/multiqc/main" +include { CUSTOM_DUMPSOFTWAREVERSIONS } from "../modules/nf-core/custom/dumpsoftwareversions/main" +include { CAT_CAT } from "../modules/nf-core/cat/cat/main" +include { CAT_CAT as CAT_CAT_PREEXTRACT } from "../modules/nf-core/cat/cat/main" +include { CAT_CAT as CAT_CAT_BARCODE } from "../modules/nf-core/cat/cat/main" +include { CAT_FASTQ } from "../modules/nf-core/cat/fastq/main" +include { paramsSummaryMap } from "plugin/nf-schema" /* - * SUBWORKFLOW: Consisting entirely of nf-core/modules + * SUBWORKFLOW: Consisting entirely of nf-core/subworkflows */ include { QCFASTQ_NANOPLOT_FASTQC as FASTQC_NANOPLOT_PRE_TRIM } from "../subworkflows/nf-core/qcfastq_nanoplot_fastqc" include { QCFASTQ_NANOPLOT_FASTQC as FASTQC_NANOPLOT_POST_TRIM } from "../subworkflows/nf-core/qcfastq_nanoplot_fastqc" include { QCFASTQ_NANOPLOT_FASTQC as FASTQC_NANOPLOT_POST_EXTRACT } from "../subworkflows/nf-core/qcfastq_nanoplot_fastqc" -include { BAM_SORT_STATS_SAMTOOLS as BAM_SORT_STATS_SAMTOOLS_MINIMAP } from "../subworkflows/nf-core/bam_sort_stats_samtools/main" -include { BAM_SORT_STATS_SAMTOOLS as BAM_SORT_STATS_SAMTOOLS_FILTERED } from "../subworkflows/nf-core/bam_sort_stats_samtools/main" -include { BAM_SORT_STATS_SAMTOOLS as BAM_SORT_STATS_SAMTOOLS_TAGGED } from "../subworkflows/nf-core/bam_sort_stats_samtools/main" -include { BAM_SORT_STATS_SAMTOOLS as BAM_SORT_STATS_SAMTOOLS_CORRECTED } from "../subworkflows/nf-core/bam_sort_stats_samtools/main" -include { BAM_SORT_STATS_SAMTOOLS as BAM_SORT_STATS_SAMTOOLS_SPLIT } from "../subworkflows/nf-core/bam_sort_stats_samtools/main" -include { BAM_SORT_STATS_SAMTOOLS as BAM_SORT_STATS_SAMTOOLS_DEDUP } from "../subworkflows/nf-core/bam_sort_stats_samtools/main" include { paramsSummaryMultiqc } from "../subworkflows/nf-core/utils_nfcore_pipeline" include { softwareVersionsToYAML } from "../subworkflows/nf-core/utils_nfcore_pipeline" include { methodsDescriptionText } from "../subworkflows/local/utils_nfcore_scnanoseq_pipeline" + /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RUN MAIN WORKFLOW @@ -114,14 +116,11 @@ workflow SCNANOSEQ { take: ch_samplesheet // channel: samplesheet read in from --input - main: ch_versions = Channel.empty() ch_multiqc_report = Channel.empty() - Channel.of(blaze_whitelist).view() - // // SUBWORKFLOW: Read in samplesheet, validate and stage input files // @@ -159,6 +158,7 @@ workflow SCNANOSEQ { ch_versions = ch_versions.mix(FASTQC_NANOPLOT_PRE_TRIM.out.fastqc_version.first().ifEmpty(null)) ch_fastqc_multiqc_pretrim = FASTQC_NANOPLOT_PRE_TRIM.out.fastqc_multiqc.ifEmpty([]) + ch_nanostat_pretrim = FASTQC_NANOPLOT_PRE_TRIM.out.nanoplot_txt.ifEmpty([]) } // @@ -188,16 +188,18 @@ workflow SCNANOSEQ { // SUBWORKFLOW: Prepare reference files // - PREPARE_REFERENCE_FILES ( "", - "", - params.fasta, - params.gtf ) + PREPARE_REFERENCE_FILES ( + params.genome_fasta, + params.transcript_fasta, + params.gtf + ) - fasta = PREPARE_REFERENCE_FILES.out.prepped_fasta - fai = PREPARE_REFERENCE_FILES.out.prepped_fai + genome_fasta = PREPARE_REFERENCE_FILES.out.prepped_genome_fasta + genome_fai = PREPARE_REFERENCE_FILES.out.genome_fai + transcript_fasta = PREPARE_REFERENCE_FILES.out.prepped_transcript_fasta + transcript_fai = PREPARE_REFERENCE_FILES.out.transcript_fai gtf = PREPARE_REFERENCE_FILES.out.prepped_gtf - ch_versions = ch_versions.mix( PREPARE_REFERENCE_FILES.out.versions ) // @@ -276,6 +278,7 @@ workflow SCNANOSEQ { FASTQC_NANOPLOT_POST_TRIM ( ch_trimmed_reads_combined, params.skip_nanoplot, params.skip_toulligqc, params.skip_fastqc ) ch_fastqc_multiqc_postrim = FASTQC_NANOPLOT_POST_TRIM.out.fastqc_multiqc.ifEmpty([]) + ch_nanostat_posttrim = FASTQC_NANOPLOT_POST_TRIM.out.nanoplot_txt.ifEmpty([]) ch_versions = ch_versions.mix(FASTQC_NANOPLOT_POST_TRIM.out.nanoplot_version.first().ifEmpty(null)) ch_versions = ch_versions.mix(FASTQC_NANOPLOT_POST_TRIM.out.toulligqc_version.first().ifEmpty(null)) ch_versions = ch_versions.mix(FASTQC_NANOPLOT_POST_TRIM.out.fastqc_version.first().ifEmpty(null)) @@ -312,6 +315,7 @@ workflow SCNANOSEQ { .set { ch_split_bc } } + // // MODULE: Extract barcodes // @@ -365,267 +369,158 @@ workflow SCNANOSEQ { FASTQC_NANOPLOT_POST_EXTRACT ( ch_extracted_fastq, params.skip_nanoplot, params.skip_toulligqc, params.skip_fastqc ) ch_fastqc_multiqc_postextract = FASTQC_NANOPLOT_POST_EXTRACT.out.fastqc_multiqc.ifEmpty([]) + ch_nanostat_postextract = FASTQC_NANOPLOT_POST_EXTRACT.out.nanoplot_txt.ifEmpty([]) ch_versions = ch_versions.mix(FASTQC_NANOPLOT_POST_EXTRACT.out.nanoplot_version.first().ifEmpty(null)) ch_versions = ch_versions.mix(FASTQC_NANOPLOT_POST_EXTRACT.out.toulligqc_version.first().ifEmpty(null)) ch_versions = ch_versions.mix(FASTQC_NANOPLOT_POST_EXTRACT.out.fastqc_version.first().ifEmpty(null)) + // + // MODULE: Generate read counts + // + + ch_pretrim_counts = Channel.empty() + ch_posttrim_counts = Channel.empty() + ch_postextract_counts = Channel.empty() if (!params.skip_fastqc){ + ch_pretrim_counts = ch_fastqc_multiqc_pretrim.collect{it[0]} + ch_posttrim_counts = ch_fastqc_multiqc_postrim.collect{it[0]} + ch_postextract_counts = ch_fastqc_multiqc_postextract.collect{it[0]} - READ_COUNTS ( - ch_fastqc_multiqc_pretrim.collect{it[0]}, - ch_fastqc_multiqc_postrim.collect{it[0]}.ifEmpty([]), - ch_fastqc_multiqc_postextract.collect{it[0]}, - ch_corrected_bc_info.collect{it[1]}) + } else if (!params.skip_nanoplot){ + ch_pretrim_counts = ch_nanostat_pretrim.collect{it[1]} + ch_posttrim_counts = ch_nanostat_posttrim.collect{it[1]} + ch_postextract_counts = ch_nanostat_postextract.collect{it[1]} - ch_read_counts = READ_COUNTS.out.read_counts - ch_versions = ch_versions.mix(READ_COUNTS.out.versions) } - } - - // - // MINIMAP2_INDEX - // - ch_minimap_ref = fasta - - if (!params.skip_save_minimap2_index) { - MINIMAP2_INDEX ( fasta ) - ch_minimap_ref = MINIMAP2_INDEX.out.index - ch_versions = ch_versions.mix(MINIMAP2_INDEX.out.versions) - } - - // - // MINIMAP2_ALIGN - // - - MINIMAP2_ALIGN ( - ch_extracted_fastq, - ch_minimap_ref, - true, - "bai", - "", - "" ) - - ch_versions = ch_versions.mix(MINIMAP2_ALIGN.out.versions) - ch_minimap_bam = MINIMAP2_ALIGN.out.bam + READ_COUNTS ( + ch_pretrim_counts.ifEmpty([]), + ch_posttrim_counts.ifEmpty([]), + ch_postextract_counts.ifEmpty([]), + ch_corrected_bc_info.collect{it[1]}) - // - // SUBWORKFLOW: BAM_SORT_STATS_SAMTOOLS - // The subworkflow is called in both the minimap2 bams and filtered (mapped only) version - BAM_SORT_STATS_SAMTOOLS_MINIMAP ( ch_minimap_bam, - fasta ) - ch_minimap_sorted_bam = BAM_SORT_STATS_SAMTOOLS_MINIMAP.out.bam - ch_minimap_sorted_bai = BAM_SORT_STATS_SAMTOOLS_MINIMAP.out.bai - - // these stats go for multiqc - ch_minimap_sorted_stats = BAM_SORT_STATS_SAMTOOLS_MINIMAP.out.stats - ch_minimap_sorted_flagstat = BAM_SORT_STATS_SAMTOOLS_MINIMAP.out.flagstat - ch_minimap_sorted_idxstats = BAM_SORT_STATS_SAMTOOLS_MINIMAP.out.idxstats - ch_versions = ch_versions.mix(BAM_SORT_STATS_SAMTOOLS_MINIMAP.out.versions) - - // acquire only mapped reads from bam for downstream processing - // NOTE: some QCs steps are performed on the full BAM - SAMTOOLS_VIEW_FILTER ( - ch_minimap_sorted_bam.join( ch_minimap_sorted_bai, by: 0 ), - [[],[]], - [] - ) - - ch_minimap_mapped_only_bam = SAMTOOLS_VIEW_FILTER.out.bam - ch_versions = ch_versions.mix(SAMTOOLS_VIEW_FILTER.out.versions) - - BAM_SORT_STATS_SAMTOOLS_FILTERED ( - ch_minimap_mapped_only_bam, - fasta - ) - - ch_minimap_filtered_sorted_bam = BAM_SORT_STATS_SAMTOOLS_FILTERED.out.bam - ch_minimap_filtered_sorted_bai = BAM_SORT_STATS_SAMTOOLS_FILTERED.out.bai - ch_versions = ch_versions.mix(BAM_SORT_STATS_SAMTOOLS_FILTERED.out.versions) - - // - // MODULE: RSeQC read distribution for BAM files (unfiltered for QC purposes) - // - ch_rseqc_read_dist = Channel.empty() - if (!params.skip_qc && !params.skip_rseqc) { - RSEQC_READDISTRIBUTION ( ch_minimap_sorted_bam, ch_rseqc_bed ) - ch_rseqc_read_dist = RSEQC_READDISTRIBUTION.out.txt - ch_versions = ch_versions.mix(RSEQC_READDISTRIBUTION.out.versions) + ch_read_counts = READ_COUNTS.out.read_counts + ch_versions = ch_versions.mix(READ_COUNTS.out.versions) } // - // MODULE: NanoComp for BAM files (unfiltered for QC purposes) + // SUBWORKFLOW: Align Long Read Data // - ch_nanocomp_bam_html = Channel.empty() - ch_nanocomp_bam_txt = Channel.empty() - - if (!params.skip_qc && !params.skip_bam_nanocomp) { - NANOCOMP_BAM ( - ch_minimap_sorted_bam - .collect{it[1]} - .map{ - [ [ 'id': 'nanocomp_bam.' ] , it ] - } + ch_multiqc_finalqc_files = Channel.empty() + if (genome_quants){ + PROCESS_LONGREAD_SCRNA_GENOME( + genome_fasta, + genome_fai, + gtf, + ch_extracted_fastq, + ch_rseqc_bed, + ch_corrected_bc_info, + genome_quants, + params.skip_save_minimap2_index, + params.skip_qc, + params.skip_rseqc, + params.skip_bam_nanocomp, + params.skip_seurat, + params.skip_dedup, + true ) + ch_versions = ch_versions.mix(PROCESS_LONGREAD_SCRNA_GENOME.out.versions) - ch_nanocomp_bam_html = NANOCOMP_BAM.out.report_html - ch_nanocomp_bam_txt = NANOCOMP_BAM.out.stats_txt - ch_versions = ch_versions.mix( NANOCOMP_BAM.out.versions ) - } - - // - // MODULE: Tag Barcodes - // - - TAG_BARCODES ( - ch_minimap_filtered_sorted_bam - .join( ch_minimap_filtered_sorted_bai, by: 0) - .join( ch_corrected_bc_info, by: 0 ) - ) - - ch_tagged_bam = TAG_BARCODES.out.tagged_bam - ch_versions = ch_versions.mix(TAG_BARCODES.out.versions) - - // - // SUBWORKFLOW: BAM_SORT_STATS_SAMTOOLS - BAM_SORT_STATS_SAMTOOLS_TAGGED ( ch_tagged_bam, - fasta ) - - ch_tagged_sorted_bam = BAM_SORT_STATS_SAMTOOLS_TAGGED.out.bam - ch_tagged_sorted_bai = BAM_SORT_STATS_SAMTOOLS_TAGGED.out.bai - - // these stats go for multiqc - ch_tagged_sorted_stats = BAM_SORT_STATS_SAMTOOLS_TAGGED.out.stats - ch_tagged_sorted_flagstat = BAM_SORT_STATS_SAMTOOLS_TAGGED.out.flagstat - ch_tagged_sorted_idxstats = BAM_SORT_STATS_SAMTOOLS_TAGGED.out.idxstats - ch_versions = ch_versions.mix(BAM_SORT_STATS_SAMTOOLS_TAGGED.out.versions) - - ch_dedup_sorted_bam = ch_tagged_sorted_bam - ch_dedup_sorted_bai = ch_tagged_sorted_bai - ch_dedup_sorted_flagstat = ch_tagged_sorted_flagstat - ch_dedup_sorted_idxstats = Channel.empty() - ch_dedup_log = Channel.empty() - - if (!params.skip_dedup) { - - // - // MODULE: Bamtools Split - // - BAMTOOLS_SPLIT ( ch_tagged_sorted_bam ) - ch_split_bams = BAMTOOLS_SPLIT.out.bam - - ch_split_tagged_bam = ch_split_bams - .map{ - meta, bam -> - [bam] - } - .flatten() - .map{ - bam -> - bam_basename = bam.toString().split('/')[-1] - split_bam_basename = bam_basename.split(/\./) - meta = [ 'id': split_bam_basename.take(split_bam_basename.size()-1).join(".") ] - [ meta, bam ] - } - - // - // SUBWORKFLOW: BAM_SORT_STATS_SAMTOOLS - // The subworkflow is called in both the minimap2 bams and filtered (mapped only) version - BAM_SORT_STATS_SAMTOOLS_SPLIT ( ch_split_tagged_bam, - fasta ) - - ch_split_sorted_bam = BAM_SORT_STATS_SAMTOOLS_SPLIT.out.bam - ch_split_sorted_bai = BAM_SORT_STATS_SAMTOOLS_SPLIT.out.bai + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_GENOME.out.minimap_flagstat.collect{it[1]}.ifEmpty([]) + ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_GENOME.out.minimap_idxstats.collect{it[1]}.ifEmpty([]) + ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_GENOME.out.minimap_rseqc_read_dist.collect{it[1]}.ifEmpty([]) + ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_GENOME.out.minimap_nanocomp_bam_txt.collect{it[1]}.ifEmpty([]) + ) - // - // MODULE: Umitools Dedup - // - UMITOOLS_DEDUP ( ch_split_sorted_bam.join(ch_split_sorted_bai, by: [0]), true ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_GENOME.out.bc_tagged_flagstat.collect{it[1]}.ifEmpty([]) + ) - ch_dedup_bam = UMITOOLS_DEDUP.out.bam - ch_dedup_log = UMITOOLS_DEDUP.out.log - ch_versions = ch_versions.mix(UMITOOLS_DEDUP.out.versions) + if (!params.skip_dedup) { + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_GENOME.out.dedup_flagstat.collect{it[1]}.ifEmpty([]) + ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_GENOME.out.dedup_idxstats.collect{it[1]}.ifEmpty([]) + ) + } - // - // MODULE: Samtools merge - // - ch_bams_to_merge = ch_dedup_bam - .map{ - meta, bam -> - bam_basename = bam.toString().split('/')[-1] - split_bam_basename = bam_basename.split(/\./) - meta = [ 'id': split_bam_basename[0] ] - [ meta, bam ] - } - .groupTuple() - - SAMTOOLS_MERGE ( ch_bams_to_merge, fasta, fai) - - ch_dedup_merged_bam = SAMTOOLS_MERGE.out.bam - - // SUBWORKFLOW: BAM_SORT_STATS_SAMTOOLS - // The subworkflow is called in both the minimap2 bams and filtered (mapped only) version - BAM_SORT_STATS_SAMTOOLS_DEDUP ( ch_dedup_merged_bam, - fasta ) - - ch_dedup_sorted_bam = BAM_SORT_STATS_SAMTOOLS_DEDUP.out.bam - ch_dedup_sorted_bai = BAM_SORT_STATS_SAMTOOLS_DEDUP.out.bai - - // these stats go for multiqc - ch_dedup_sorted_stats = BAM_SORT_STATS_SAMTOOLS_DEDUP.out.stats - ch_dedup_sorted_flagstat = BAM_SORT_STATS_SAMTOOLS_DEDUP.out.flagstat - ch_dedup_sorted_idxstats = BAM_SORT_STATS_SAMTOOLS_DEDUP.out.idxstats - ch_versions = ch_versions.mix(BAM_SORT_STATS_SAMTOOLS_DEDUP.out.versions) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + ch_read_counts.collect().ifEmpty([]) + ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_GENOME.out.gene_qc_stats.collect().ifEmpty([]) + ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_GENOME.out.transcript_qc_stats.collect().ifEmpty([]) + ) } - // - // MODULE: Isoquant - // - ISOQUANT ( ch_dedup_sorted_bam.join(ch_dedup_sorted_bai, by: [0]), gtf, fasta, fai, 'tag:CB') - ch_gene_count_mtx = ISOQUANT.out.gene_count_mtx - ch_transcript_count_mtx = ISOQUANT.out.transcript_count_mtx - ch_versions = ch_versions.mix(ISOQUANT.out.versions) + // oarfish expects deduplicated reads + if (transcript_quants) { + PROCESS_LONGREAD_SCRNA_TRANSCRIPT ( + transcript_fasta, + transcript_fai, + gtf, + ch_extracted_fastq, + ch_rseqc_bed, + ch_corrected_bc_info, + transcript_quants, + params.skip_save_minimap2_index, + params.skip_qc, + true, + params.skip_bam_nanocomp, + params.skip_seurat, + false, + false + ) - if (!params.skip_qc && !params.skip_seurat){ - // - // MODULE: Seurat - // - SEURAT_GENE ( ch_gene_count_mtx.join(ch_dedup_sorted_flagstat, by: [0]) ) - ch_gene_seurat_qc = SEURAT_GENE.out.seurat_stats - ch_versions = ch_versions.mix(SEURAT_GENE.out.versions) + ch_versions = ch_versions.mix(PROCESS_LONGREAD_SCRNA_TRANSCRIPT.out.versions) - SEURAT_TRANSCRIPT ( ch_transcript_count_mtx.join(ch_dedup_sorted_flagstat, by: [0]) ) - ch_transcript_seurat_qc = SEURAT_TRANSCRIPT.out.seurat_stats - ch_versions = ch_versions.mix(SEURAT_TRANSCRIPT.out.versions) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_TRANSCRIPT.out.minimap_flagstat.collect{it[1]}.ifEmpty([]) + ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_TRANSCRIPT.out.minimap_rseqc_read_dist.collect{it[1]}.ifEmpty([]) + ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_TRANSCRIPT.out.minimap_nanocomp_bam_txt.collect{it[1]}.ifEmpty([]) + ) - // - // MODULE: Combine Seurat Stats - // + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_TRANSCRIPT.out.bc_tagged_flagstat.collect{it[1]}.ifEmpty([]) + ) - ch_gene_stats = SEURAT_GENE.out.seurat_stats.collect{it[1]} - COMBINE_SEURAT_STATS_GENE ( ch_gene_stats ) - ch_gene_stats_combined = COMBINE_SEURAT_STATS_GENE.out.combined_stats - ch_versions = ch_versions.mix(COMBINE_SEURAT_STATS_GENE.out.versions) + if (!params.skip_dedup) { + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_TRANSCRIPT.out.dedup_log.collect{it[1]}.ifEmpty([]) + ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_TRANSCRIPT.out.dedup_flagstat.collect{it[1]}.ifEmpty([]) + ) + } - ch_transcript_stats = SEURAT_TRANSCRIPT.out.seurat_stats.collect{it[1]} - COMBINE_SEURAT_STATS_TRANSCRIPT ( ch_transcript_stats ) - ch_transcript_stats_combined = COMBINE_SEURAT_STATS_TRANSCRIPT.out.combined_stats - ch_versions = ch_versions.mix(COMBINE_SEURAT_STATS_TRANSCRIPT.out.versions) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + ch_read_counts.collect().ifEmpty([]) + ) + ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix( + PROCESS_LONGREAD_SCRNA_TRANSCRIPT.out.transcript_qc_stats.collect().ifEmpty([]) + ) } // // SOFTWARE_VERSIONS // - // - // Collate and save software versions - // - //softwareVersionsToYAML(ch_versions) - // .collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_pipeline_software_mqc_versions.yml', sort: true, newLine: true) - // .set { ch_collated_versions } - CUSTOM_DUMPSOFTWAREVERSIONS ( ch_versions.unique().collectFile(name: 'collated_versions.yml') ) @@ -658,7 +553,6 @@ workflow SCNANOSEQ { summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json") ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params)) - ch_multiqc_finalqc_files = Channel.empty() ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_multiqc_config) ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_multiqc_custom_config.collect().ifEmpty([])) ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) @@ -667,23 +561,6 @@ workflow SCNANOSEQ { ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_fastqc_multiqc_postrim.collect().ifEmpty([])) ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_fastqc_multiqc_postextract.collect().ifEmpty([])) - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_minimap_sorted_stats.collect{it[1]}.ifEmpty([])) - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_minimap_sorted_flagstat.collect{it[1]}.ifEmpty([])) - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_minimap_sorted_idxstats.collect{it[1]}.ifEmpty([])) - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_rseqc_read_dist.collect{it[1]}.ifEmpty([])) - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_nanocomp_bam_txt.collect{it[1]}.ifEmpty([])) - - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_tagged_sorted_flagstat.collect{it[1]}.ifEmpty([])) - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_tagged_sorted_idxstats.collect{it[1]}.ifEmpty([])) - - if (!params.skip_dedup) { - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_dedup_sorted_flagstat.collect{it[1]}.ifEmpty([])) - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_dedup_sorted_idxstats.collect{it[1]}.ifEmpty([])) - } - - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_read_counts.collect().ifEmpty([])) - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_gene_stats_combined.collect().ifEmpty([])) - ch_multiqc_finalqc_files = ch_multiqc_finalqc_files.mix(ch_transcript_stats_combined.collect().ifEmpty([])) MULTIQC_FINALQC ( ch_multiqc_finalqc_files.collect(),