-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: compute running SV allele frequencies #1047
base: main
Are you sure you want to change the base?
Conversation
* First pass * add comment * ruff
tdr updates for sample qc
…s into benb/gnomad_svs_reference_dataset
…-loading-pipelines into benb/gnomad_svs_reference_dataset
…s into benb/x_males_non_par
…-loading-pipelines into benb/gnomad_svs_reference_dataset
…nstitute/seqr-loading-pipelines into benb/sv_gt_stats
…es into benb/gnomad_svs_reference_dataset
…nstitute/seqr-loading-pipelines into benb/sv_gt_stats
…ding-pipelines into benb/sv_gt_stats
##INFO=<ID=SEQR_INTERNAL_TRUTH_VID,Number=1,Type=String,Description="Annotation added by seqr loading pipeline run of SVConcordance between the callset & the all previously loaded variants exported to VCF"> | ||
##FORMAT=<ID=CONC_ST,Number=.,Type=String,Description="The genotype concordance contingency state"> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RGP_164_1 RGP_164_2 RGP_164_3 RGP_164_4 RGP_164_5 | ||
chr1 180929 new_variant1 N <BND> 657 HIGH_SR_BACKGROUND;UNRESOLVED ALGORITHMS=manta;CHR2=chr5;END=180928;END2=20404;EVIDENCE=PE,SR;PREDICTED_INTERGENIC;PREDICTED_NEAREST_TSS=OR4F5,PLEKHG4B;PREDICTED_NONCODING_BREAKPOINT=DNase;STRANDS=+-;SVLEN=-1;SVTYPE=BND;UNRESOLVED_TYPE=SINGLE_ENDER_+-;AN=8;AC=1;AF=0.04775;N_BI_GENOS=2911;N_HOMREF=2633;N_HET=278;N_HOMALT=0;FREQ_HOMREF=0.9045;FREQ_HET=0.0954998;FREQ_HOMALT=0;MALE_AN=2894;MALE_AC=137;MALE_AF=0.047339;MALE_N_BI_GENOS=1447;MALE_N_HOMREF=1310;MALE_N_HET=137;MALE_N_HOMALT=0;MALE_FREQ_HOMREF=0.905321;MALE_FREQ_HET=0.0946786;MALE_FREQ_HOMALT=0;FEMALE_AN=2906;FEMALE_AC=139;FEMALE_AF=0.047832;FEMALE_N_BI_GENOS=1453;FEMALE_N_HOMREF=1314;FEMALE_N_HET=139;FEMALE_N_HOMALT=0;FEMALE_FREQ_HOMREF=0.904336;FEMALE_FREQ_HET=0.0956641;FEMALE_FREQ_HOMALT=0;SEQR_INTERNAL_TRUTH_VID=BND_chr1_6 GT:EV:GQ:PE_GQ:PE_GT:SR_GQ:SR_GT 0/0:PE,SR:99:99:0:2:0 0/1:SR:31:99:0:31:1 0/0:PE,SR:99:99:0:99:0 0/0:PE,SR:99:99:0:2:0 0/0:PE,SR:99:99:0:2:0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test asserts that this variant, with a callset variatn_id of new_variant1
, is mapped appropriately to BND_chr1_6
and the allele frequencies are incremented.
@@ -88,6 +88,17 @@ def update_table(self, ht: hl.Table) -> hl.Table: | |||
# have been removed from the lookup table during modification. | |||
# Ensure we don't proceed with those variants. | |||
ht = ht.semi_join(lookup_ht) | |||
elif self.dataset_type.gt_stats_from_hl_call_stats: | |||
callset_mt = get_callset_mt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to use hl.agg.call_stats
!
annotation_ht_versions = dict( | ||
hl.eval(hl.read_table(self.output().path).globals.versions), | ||
) | ||
annotation_ht_versions = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incredibly annoying bug here.... though gnomad_svs
are excluded from the "reference datasets for annotation updates", this code would try to drop it because it was present on the annotations table.
this fix excludes on both sides of the set symmetric difference.
# Note, SEQR_INTERNAL_TRUTH_VID is an entirely optional | ||
# and unvalidated field. It will not be properly imported. | ||
# It simply overrides the existing variant_id if present. | ||
variant_id=( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this only run on new variants, or will this also be run on the existing annotations table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is run on callset import, so new variants only when importing from vcf to hail table.
my comment on this was out of concern that we won't have a clean way of knowing if the SVConcordance fails in some way upstream and doesn't annotate of the variants in the vcf.
@@ -428,6 +450,15 @@ def get_ht( | |||
PATH: 'https://www.biorxiv.org/content/biorxiv/early/2023/01/27/2022.12.16.520778/DC3/embed/media-3.zip', | |||
}, | |||
}, | |||
ReferenceDataset.gnomad_svs: { | |||
EXCLUDE_FROM_ANNOTATIONS_UPDATES: True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are gnomad_svs
annotations excluded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they're excluded from automically updating the annotations table because there's an addition step that's required that's not automate able at the moment (extracting the annotations table to vcf, running SVConcordance
against the new gnomad vcf, then annotating with the new gnomad_svs reference dataset). It's doable in the future but we're punting on it for now.
gnomad_svs will be annotated on new variants though.
No description provided.