-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit development of code to handle less common syntaxes #556
Comments
@leicray, there are examples in the files so no need to trawl |
OK, @ifokkema . I took a look at https://varnomen.hgvs.org/recommendations/DNA/variant/repeated/ and there is a relevant section that will change the nomenclature for expanded repeat syntax.
I was aware because I believe Ivo and I were amongst those who asked for the consultation. So, I will leave this work for now until the consultation is completed. |
We need to also look at uncertain positions For example Here is the nomenclature page https://varnomen.hgvs.org/recommendations/uncertain/ |
OK, here are the web page examples NC_000003.12:g.(63912602_63912844)delN[15] or NM_000333.3:c.(4_246)delN[15] NC_000023.11:g.(31729716_31774235)(32216847_32287541)del (LRG_199t1:c.(6278_6438+69)(7310-43_7575)del) NC_000023.11:g.(31729663_31774080)(32216973_32287624)del (LRG_199t1:c.(6195_6381)(7422_7628)del) NC_000023.10:g.(32218983_32238146)(32984039_33252615)del rearrangements detected using FISH (Fluorescence In Situ Hybridisation) can be described using ISCN guidelines. When probe positions are known, variants can be described using genomic coordinates. The basic format is (position-last-normal-probe_position-first-variant probe)(position-last-variant-probe_position-first-normal-probe) (see also ISCN<>HGVS). In this description the “probe position” is based on the center of the labelled probe used during hybridisation. NC_000023.11:g.(31775822_31819974)(32217064_32278336)del) (LRG_199t1:c.(6290+9193_6291-1)(7309+1_7310-1630)del insertion |
@ifokkema To save duplication of effort. Have you handled these in your syntax checker? |
Agreed! Also related: #328 I've been wanting to make a finalized list of variant descriptions, with reference sequences so we can test them as well with VV, where we have defined:
Also related to LOVDnl/LOVD3#573. And perhaps to Reece's HGVS eval. |
Made a little progress for the unknown position syntax import json
import VariantValidator
vval = VariantValidator.Validator()
variant = 'NM_006138.4:c.(1_20)_(30_36)del' # variant 1
genome_build = 'GRCh38'
select_transcripts = 'all'
transcript_set = 'refseq'
validate = vval.validate(variant, genome_build, select_transcripts, transcript_set)
validation = validate.format_as_dict(with_meta=True)
print(json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': '))) {
"flag": "warning",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev465+gd6addb3.d20231211",
"vvdb_version": "vvdb_2023_8",
"vvseqrepo_db": "VV_SR_2023_05/master",
"vvta_version": "vvta_2023_05"
},
"validation_warning_1": {
"alt_genomic_loci": [],
"annotations": {},
"gene_ids": {},
"gene_symbol": "",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "",
"tlr": ""
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "NM_006138.4:c.(1_20)_(30_36)del",
"primary_assembly_loci": {
"grch38": {
"hgvs_genomic_description": "NC_000011.10:g.(60061161_60061180)_(60061190_60061196)del"
}
},
"reference_sequence_records": {
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_006138.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh38",
"submitted_variant": "NM_006138.4:c.(1_20)_(30_36)del",
"transcript_description": "",
"validation_warnings": [
"Uncertain positions are not fully supported, however the syntax is valid"
],
"variant_exonic_positions": null
}
} g. variants will map to a transcript (either a Select transcript or a single selected transcript). Transcript will map to g. for the selected assembly only |
Is your feature request related to a problem? Please describe.
There are a few formats that are uses less regularly which we need to develop code to handle
Describe the solution you'd like
We need a few examples of each. One for @leicray. Then to map out workflows for each
Describe alternatives you've considered
A couple of STP students did start to tackle some of this so we have code we can merge in and adapt
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: