Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: compute running SV allele frequencies #1047

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open

Conversation

bpblanken
Copy link
Collaborator

No description provided.

##INFO=<ID=SEQR_INTERNAL_TRUTH_VID,Number=1,Type=String,Description="Annotation added by seqr loading pipeline run of SVConcordance between the callset & the all previously loaded variants exported to VCF">
##FORMAT=<ID=CONC_ST,Number=.,Type=String,Description="The genotype concordance contingency state">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RGP_164_1 RGP_164_2 RGP_164_3 RGP_164_4 RGP_164_5
chr1 180929 new_variant1 N <BND> 657 HIGH_SR_BACKGROUND;UNRESOLVED ALGORITHMS=manta;CHR2=chr5;END=180928;END2=20404;EVIDENCE=PE,SR;PREDICTED_INTERGENIC;PREDICTED_NEAREST_TSS=OR4F5,PLEKHG4B;PREDICTED_NONCODING_BREAKPOINT=DNase;STRANDS=+-;SVLEN=-1;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_+-;AN=8;AC=1;AF=0.04775;N_BI_GENOS=2911;N_HOMREF=2633;N_HET=278;N_HOMALT=0;FREQ_HOMREF=0.9045;FREQ_HET=0.0954998;FREQ_HOMALT=0;MALE_AN=2894;MALE_AC=137;MALE_AF=0.047339;MALE_N_BI_GENOS=1447;MALE_N_HOMREF=1310;MALE_N_HET=137;MALE_N_HOMALT=0;MALE_FREQ_HOMREF=0.905321;MALE_FREQ_HET=0.0946786;MALE_FREQ_HOMALT=0;FEMALE_AN=2906;FEMALE_AC=139;FEMALE_AF=0.047832;FEMALE_N_BI_GENOS=1453;FEMALE_N_HOMREF=1314;FEMALE_N_HET=139;FEMALE_N_HOMALT=0;FEMALE_FREQ_HOMREF=0.904336;FEMALE_FREQ_HET=0.0956641;FEMALE_FREQ_HOMALT=0;SEQR_INTERNAL_TRUTH_VID=BND_chr1_6 GT:EV:GQ:PE_GQ:PE_GT:SR_GQ:SR_GT 0/0:PE,SR:99:99:0:2:0 0/1:SR:31:99:0:31:1 0/0:PE,SR:99:99:0:99:0 0/0:PE,SR:99:99:0:2:0 0/0:PE,SR:99:99:0:2:0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test asserts that this variant, with a callset variatn_id of new_variant1, is mapped appropriately to BND_chr1_6 and the allele frequencies are incremented.

@@ -88,6 +88,17 @@ def update_table(self, ht: hl.Table) -> hl.Table:
# have been removed from the lookup table during modification.
# Ensure we don't proceed with those variants.
ht = ht.semi_join(lookup_ht)
elif self.dataset_type.gt_stats_from_hl_call_stats:
callset_mt = get_callset_mt(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use hl.agg.call_stats!

annotation_ht_versions = dict(
hl.eval(hl.read_table(self.output().path).globals.versions),
)
annotation_ht_versions = {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incredibly annoying bug here.... though gnomad_svs are excluded from the "reference datasets for annotation updates", this code would try to drop it because it was present on the annotations table.

this fix excludes on both sides of the set symmetric difference.

@bpblanken bpblanken marked this pull request as ready for review February 19, 2025 15:05
@bpblanken bpblanken requested a review from a team as a code owner February 19, 2025 15:05
# Note, SEQR_INTERNAL_TRUTH_VID is an entirely optional
# and unvalidated field. It will not be properly imported.
# It simply overrides the existing variant_id if present.
variant_id=(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this only run on new variants, or will this also be run on the existing annotations table?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is run on callset import, so new variants only when importing from vcf to hail table.

my comment on this was out of concern that we won't have a clean way of knowing if the SVConcordance fails in some way upstream and doesn't annotate of the variants in the vcf.

Base automatically changed from benb/gnomad_svs_reference_dataset to main February 19, 2025 17:01
@@ -428,6 +450,15 @@ def get_ht(
PATH: 'https://www.biorxiv.org/content/biorxiv/early/2023/01/27/2022.12.16.520778/DC3/embed/media-3.zip',
},
},
ReferenceDataset.gnomad_svs: {
EXCLUDE_FROM_ANNOTATIONS_UPDATES: True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are gnomad_svs annotations excluded?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're excluded from automically updating the annotations table because there's an addition step that's required that's not automate able at the moment (extracting the annotations table to vcf, running SVConcordance against the new gnomad vcf, then annotating with the new gnomad_svs reference dataset). It's doable in the future but we're punting on it for now.

gnomad_svs will be annotated on new variants though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants