Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Release v16.0.0 #1491

Merged
merged 43 commits into from
Nov 19, 2024
Merged

feat: Release v16.0.0 #1491

merged 43 commits into from
Nov 19, 2024

Conversation

mathiasbio
Copy link
Collaborator

@mathiasbio mathiasbio commented Oct 17, 2024

Description

Big update with a few primary features:

  • MSI for tumor normal analyses
  • UMIs used by default with 1,0,0 for all TGA analyses
  • New PONs built for 6 panels
  • TGA CNVkit results for GENS implemented
  • TNscope merged with VarDict results for TGA
  • Added artefact database for SNVs to all workflows

Added:

Changed:

Removed:

Fixed:

Pre-Validation Checklist

Before proceeding with the validation process, ensure that the following tasks have been completed:

  • Install Balsamic in stage and production environments in hasta and build its cache.

    • BALSAMIC was installed on stage after making the BALSAMIC release_v16.0.0 branch, from the instructions given above:
    1. sudo <...>
    2. tmux new -s <...>
    3. Activate stage and conda environments.
    4. pip uninstall balsamic
    5. pip install --no-cache-dir -U git+https://github.com/Clinical-Genomics/BALSAMIC@release_v16.0.0
    6. balsamic init --out-dir <STAGE_PATH> --account <...> --cosmic-key ${COSMIC_KEY} --genome-version hg19 --cache-version 16.0.0 --run-mode local --snakemake-opt "--cores 40" -r
    7. balsamic init --out-dir /home/proj/stage/cancer/balsamic_cache --account development --cosmic-key ${COSMIC_KEY} --genome-version hg38 --cache-version 16.0.0 --run-mode local --snakemake-opt "--cores 8" -r
    8. balsamic init --out-dir /home/proj/stage/cancer/balsamic_cache --account development --cosmic-key ${COSMIC_KEY} --genome-version canfam3 --cache-version 16.0.0 --run-mode local --snakemake-opt "--cores 8" -r
  • Confirm the availability of necessary resources, such as test cases. (Made script to verify this automatically: /home/proj/stage/cancer/validation/scripts/verify_presence_of_test_samples.sh)

  • Review the changelog and ensure all changes and updates are documented:

    Document Sections to Be Updated Pull Request
    Balsamic Documentation Already updated in development PRs NA
    Atlas Documentation Multiple places: PONs, New tools, new validationsamples, new delivered files, new loqusdb https://github.com/Clinical-Genomics/atlas/pull/2859
  • Set up the stage environment with the necessary software and configurations:

    Software Current Version Pull Request with the Required Updates
    CG v64.3.3 Balsamic v16 release cg#3408
    target_capture_bed v0.14.30 https://github.com/Clinical-Genomics/target_capture_bed/pull/136
    hermes 5.1.0 Update files and tags for balsamic release 16.0.0 hermes#117

Workflow integrity results

Workflow Integrity Verification Cases

More details here: https://docs.google.com/spreadsheets/d/1g6uXPjCT0INrYS9n5mLblC_7TOgX3hz6j2mXoqp9apQ/edit?gid=0#gid=0

CaseID limsID AnalysisType T/TN SequencingType ExpectedQC Status(Pass/Fail) Warnings Warnings observed before (yes / no)
invitingemu ACC15922A8 balsamic-qc tumor-only TGA Pass Pass 🟢
civilsole ACC7204A2 balsamic tumor-only WGS Fail (PCT_60X=0.004532) Fail QC metric PCT_60X: 0.004522 🟢
fleetjay ACC6307A1:ACC5821A7 balsamic tumor-normal WGS Fail (PCT_60X=0.00836) Fail QC metric PCT_60X: 0.006589 🟢
setamoeba ACC8254A2 balsamic tumor-only TGA Pass Pass 🟢
unitedbeagle ACC6225A18:ACC6225A14 balsamic tumor-normal TGA Pass Pass 🟢
uphippo ACC5611A3 balsamic-umi tumor-only TGA Fail (GC dropout=1.650392) Fail: GC_DROPOUT: 1.62827 🟢 PCT_TARGET_BASES_1000X: 0.915272 yes (and improved)
equalbug ACC7363A2:ACC7356A4 balsamic-umi tumor-normal TGA Fail (GC_DROPOUT=1.087173 and RELATEDNESS=-0.524) ACC7356A4 GC_DROPOUT: 1.069459 and RELATEDNESS: -0.512

Hg38 integrity verifications:

AnalysisType T/TN SequencingType ExpectedQC Status(Pass/Fail)
balsamic tumor-normal WGS should complete without errors FAIL

Issues discovered for HG38:

cg command: cg workflow balsamic start --genome-version hg38 wgscase collects loqusdb and other references specifically for hg19 and adds to the case.

Despite starting the case manually the analysis fails on multiple places. Copying the failed case to a folder to investigate after release: /home/proj/development/cancer/failed_cases/hg38_release16.0.0_failed

Release specific integration verifications:

AnalysisType T/TN SequencingType ExpectedQC Status(Pass/Fail)
  • This section has been verified successfully

Storage, Delivery and Upload Integrity Verifications

Processes Affected in current version Affected workflows
New files to be stored yes wgs, tga, umi, (tumor only and tumor normal for all)
New files to be delivered yes wgs, tga, umi, (tumor only and tumor normal for all)
New files to be uploaded to Scout no N/A
Changes to Housekeeper IDs yes wgs, tga, umi, (tumor only and tumor normal for all)
Changes to Scout upload yes tga (tumor only and tumor normal)
AnalysisType T/TN Storage status Delivery status Upload status
balsamic-qc tumor-only Successful Successful N/A
balsamic wgs tumor-only Successful Successful Successful
balsamic wgs tumor-normal Successful Successful Successful
balsamic tumor-only Successful Successful Successful
balsamic tumor-normal Successful Successful Successful
balsamic-umi tumor-only Successful Successful Successful
balsamic-umi tumor-normal Successful Successful Successful
  • This section has been verified successfully, or been identified as irrelevant for the current verification

Validation and implementation plan status

Pull-request for validation-report made here: https://github.com/Clinical-Genomics/validations/pull/241

  • Validation report signed

Pull-request for implementation-plan here: LINK

  • Implementation plan signed

khurrammaqbool and others added 30 commits May 7, 2024 10:40
…ntainer

feat: add msisensorpro container
The new version of multiqc supports picard mimicked reports from Sentieon tools: MultiQC/MultiQC#2110

This should solve this issue: #1290 where an ugly solution was implemented in the Dedup rule to make MultiQC accept dedup-stats from Sentieon dedup. 

It may also allow us to move away from Picard to generate our QC reports and instead use the Sentieon tools which should be faster and enable us to clear away some rules for a more streamlined and less messy workflow. 

#### Added

- separate container for multiqc

#### Changed

- updated multiqc from 1.12 to 1.22.3

#### Removed

- no longer necessary sed command in dedup rule 
- deprecated and unused TNhaplotyper rule
* add msisensorpro

* changelog

* change filenames

* MSI to PDf

* Refactor

* update docs

* fix qoutation mark
* update MSI table

* changelog
#### Added

- Sentieon install directory path and Sentieon license to case config arguments
* add MSI TN to storage

* changelog

* fix blank line

* changelog
#### Added

- QC threshold for lymphoma_MRD panel
#### Added

- UMI extraction and deduplication to TGA workflow
- Adapter trimming of fastqs to UMI workflow
- Cap base quality in bam for Manta input

#### Changed

- Refactored multi workflow rule-files to separate files to decrease complexity
- Refactored output files to in general comply with format {sample_type}.{sample_name}
- Replaced Picard QC tools with matching Sentieon QC tools

#### Removed

- UMI specific rules for UMI-extraction and alignment (using new TGA-rules instead) 
- Fastq and UMI trimming command-line options


Merged this PR into this one: #1465

#### Added

- Added extension of target bed regions to a minimum size of 100 for CNV analysis
- PON for: Exome comprehensive 10.2 
- PON for: GMSsolid 15.2 
- PON for: GMCKsolid 4.2

#### Changed

- updated PON for GMCKSolid v4.1 
- updated PON for GMSMyeloid v5.3 
- updated PON for GMSlymphoid v7.3

Merged this PR into this one: #1448

#### Added

- Script to post-process CNVkit output to GENS-format
- DNAscope gnomad calling to TGA for GENS

#### Changed

- Parsing of GENS arguments changed to account for TGA

Merged this PR: #1475 into this one

#### Changed

- Refactored rules for bcftools filters
- Renamed final UMI bamfile to ensure hsmetrics are collected in multiqc json
- Changed ranked VCF from research to clincial
- Lowered min AF for TGA from 0.007 to 0.005
- Lowered maximal SOR for TNscope in TGA tumor only cases from 3 to 2.7
- Changed filter settings for research TNscope vcf, now either PASS or triallelic_site (fixing this issue: #1293)

#### Added

- TNscope for TGA workflows, merged with VarDict results
- New filter for VarDict for tumor in normal contamination
- Export TMP environment variables to rules that lack them
- Added genmod ranked VCFs to be delivered
- Added family-id to genmod in order to get ranked variants to Scout (solved this: #1045)
- Added DP and AF to INFO-field of TNscope vcfs for ranking model
- Raw TNscope calls and unfiltered research-annotated SNVs to delivery

#### Removed

- ML-model for TNscope is removed due to license issue with new version of Sentieon
- All code associated with TNhaplotyper
- Removed research.filtered.pass VCFs from delivery and storage list
* changelog

* fix to version 1.2.0

* update to version 1.3.0

* split tasks

* merge tasks

* fix somalier container

* fix nimble htslib bcftools version

* fix depth

* update nim to 1.6.20

* update nim to 2.2.0

* update somalier to 0.2.19

* fix nimble version to 1.6.6

* revert changes for somalier container

* refactor

* update software version information

---------

Co-authored-by: Mathias Johansson <[email protected]>
* update somalier and dependecies

* update docs

* fix security hotspot

* restrict redirects

* update docs

* changelog
#### Changed

- Updated lychee-actions to 2.0.2 and increase maximal redirects to 10 
- Fixing broken links in documentation
#### Changed

- Renamed UMI bamfile in order for the sample id to be unique un multiqc_data.json and not overwritten by hsmetrics.
### Changed

- Updated docs tools versions
Copy link

codecov bot commented Oct 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.49%. Comparing base (b94831b) to head (5699764).
Report is 57 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1491      +/-   ##
==========================================
+ Coverage   99.44%   99.49%   +0.05%     
==========================================
  Files          40       40              
  Lines        1983     1991       +8     
==========================================
+ Hits         1972     1981       +9     
+ Misses         11       10       -1     
Flag Coverage Δ
unittests 99.49% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Comment on lines +15 to +16
apt-get install -y --no-install-recommends git make build-essential \
liblzma-dev libbz2-dev zlib1g-dev libncurses5-dev libncursesw5-dev ca-certificates && \

Check notice

Code scanning / SonarCloud

Arguments in long RUN instructions should be sorted Low

Sort these package names alphanumerically. See more on SonarCloud
mathiasbio and others added 4 commits October 19, 2024 10:59
* fix vardict memory error

* changelog

* changelog

---------

Co-authored-by: Mathias Johansson <[email protected]>
#### Fixed

- TNscope tag to variant info-field for TGA workflow

#### Removed

- Removed VarDict germline filter, replaced by relative normal af / tumor af filter
Copy link

sonarcloud bot commented Oct 26, 2024

@khurrammaqbool khurrammaqbool marked this pull request as ready for review November 19, 2024 14:43
@khurrammaqbool khurrammaqbool requested a review from a team as a code owner November 19, 2024 14:43
@khurrammaqbool khurrammaqbool merged commit b68594e into master Nov 19, 2024
28 checks passed
@khurrammaqbool khurrammaqbool deleted the release_v16.0.0 branch November 19, 2024 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants