-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collecting recently failed variants as a list. please add #545
Comments
19-40397933-ATCT-A b38 All seem to be the same error Traceback (most recent call last): NOTE: These are now fixed |
The attached text file contains a long list of variants that have triggered ERROR messages from the interactive validation tool since the start of September this year. Some of these might now be handled correctly since the recent patches. variants that trigger error messages.txt GRCh37 variants fixed |
It looks like a user is trying to validate NM_024496.4:c.369_374del which does validate correctly in the interactive tool. However, the error message says:
That looks like the vcf2hgvs tool is being used. However, that would require the user to place the variant in a text file and then upload that file to the vcf2hgvs tool. Possible, but unlikely. |
Variant: 1-156138613-C-T Hello, I'm having a problem validating the synonymous variant in LMNA (ClinVar ID 14500) - NM_170707.4(LMNA):c.1824C>T p.(Gly608=). I tried different ways, including chr1(GRCh38):g.156138613C>T and 1-156138613-C-T. Message error: Unable to validate the submitted variant against the GRCh38 assembly Thank you in advance. |
This is the code trying to create a UCSC link I believe. Not VCF. Thanks for logging it |
Here is another one that ought not to trip up the system: It generates error messages from the interactive service and submission to the batch tools also fails. The reference sequence is the MANE Select transcript for the MSH6 gene. The traceback message for failure to validate via the batch tool is: Traceback (most recent call last): In addition, this triggers a further exception: Traceback (most recent call last): |
Thanks.
I think we have an issue open for debugging. Can you please add it. I want to do come debugging in a couple of weeks to release a new builod
From: leicray ***@***.***>
Date: Tuesday, 31 October 2023 at 09:33
To: openvar/variantValidator ***@***.***>
Cc: Peter Freeman ***@***.***>, Author ***@***.***>
Subject: Re: [openvar/variantValidator] Collecting recently failed variants as a list. please add (Issue #545)
Here is another one that ought not to trip up the system: NM_000179.3:c.4083dup
It generates error messages from the interactive service and submission to the batch tools also fails. The reference sequence is the MANE Select transcript for the MSH6 gene.
The traceback message for failure to validate via the batch tool is:
Traceback (most recent call last):
File "/local/py3Repos/variantValidator/VariantValidator/modules/vvMixinCore.py", line 752, in validate
toskip = mappers.transcripts_to_gene(my_variant, self, select_transcripts_dict_plus_version)
File "/local/py3Repos/variantValidator/VariantValidator/modules/mappers.py", line 643, in transcripts_to_gene
protein_dict = validator.myc_to_p(hgvs_coding, variant.evm, re_to_p=False, hn=variant.hn)
File "/local/py3Repos/variantValidator/VariantValidator/modules/vvMixinInit.py", line 535, in myc_to_p
start_aa = utils.one_to_three(aa_seq[0])
IndexError: string index out of range
In addition, this triggers a further exception:
Traceback (most recent call last):
File "/local/miniconda3/envs/vvweb_v2/lib/python3.10/site-packages/celery/app/trace.py", line 412, in trace_task
R = retval = fun(*args, **kwargs)
File "/local/miniconda3/envs/vvweb_v2/lib/python3.10/site-packages/celery/app/trace.py", line 704, in protected_call
return self.run(*args, **kwargs)
File "/local/VVweb/web/tasks.py", line 60, in batch_validate
output = validator.validate(variant, genome, transcripts)
File "/local/py3Repos/variantValidator/VariantValidator/modules/vvMixinCore.py", line 1462, in validate
raise fn.VariantValidatorError('Validation error')
VariantValidator.modules.utils.VariantValidatorError: Validation error
—
Reply to this email directly, view it on GitHub [github.com]<https://urldefense.com/v3/__https:/github.com/openvar/variantValidator/issues/545*issuecomment-1786840459__;Iw!!PDiH4ENfjr2_Jw!FHx9A_rx_a9tND79UlqIDMpebg4S8W7HJ37ylSaiTJM8UjpmuSOiCtgKa7BsESnfYX5GJ9HO5QF136PHQjSHPJrYr1r32yS14jjSzDz7$>, or unsubscribe [github.com]<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AGKWROOALHIL3I72Y4AVW7LYCDAWDAVCNFSM6AAAAAA4V7I47CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBWHA2DANBVHE__;!!PDiH4ENfjr2_Jw!FHx9A_rx_a9tND79UlqIDMpebg4S8W7HJ37ylSaiTJM8UjpmuSOiCtgKa7BsESnfYX5GJ9HO5QF136PHQjSHPJrYr1r32yS14iAYSCqb$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
What do you mean by "add it"? This is the report. |
Sorry, I meant to the open git issue. You already collated a few variants that fail processing I believe??
Grant coming well. Should be able on time.
Dr Peter Freeman
Lecturer in Healthcare sciences (Clinical bioinformatics, genomics)
Division of Informatics, Imaging & Data Science
Faculty of Biology, Medicine and Health | The University of Manchester
G.725 | Stopford Building | Oxford Road | Manchester | M13 9PT
Tel: +44(0) 161 275 5731
email: ***@***.******@***.***>
web: Peter Freeman<https://www.research.manchester.ac.uk/portal/peter.j.freeman.html>
[A close-up of a logo Description automatically generated]
website: www.manchester.ac.uk<http://www.manchester.ac.uk/>
Social media: Facebook<https://www.facebook.com/TheUniversityOfManchester> Twitter<https://twitter.com/OfficialUoM> LinkedIn<https://www.linkedin.com/school/university-of-manchester/> Instagram<https://www.instagram.com/officialuom/> YouTube<http://www.youtube.com/user/universitymanchester>
[VariantValidator Logo]
web: www.variantvalidator.org<http://www.variantvalidator.org/>
Social media: Twitter<https://twitter.com/intent/follow?ref_src=twsrc%5Etfw%7Ctwcamp%5Ebuttonembed%7Ctwterm%5Efollow%7Ctwgr%5EVariantValidatr&screen_name=VariantValidatr> Facebook<https://www.facebook.com/VariantValidator> Buy-us-a-coffee, supporting SWAN UK<https://www.buymeacoffee.com/VariantValidatr>
From: leicray ***@***.***>
Date: Tuesday, 31 October 2023 at 09:44
To: openvar/variantValidator ***@***.***>
Cc: Peter Freeman ***@***.***>, Author ***@***.***>
Subject: Re: [openvar/variantValidator] Collecting recently failed variants as a list. please add (Issue #545)
What do you mean by "add it"? This is the report.
—
Reply to this email directly, view it on GitHub [github.com]<https://urldefense.com/v3/__https:/github.com/openvar/variantValidator/issues/545*issuecomment-1786860236__;Iw!!PDiH4ENfjr2_Jw!A2DE_rJKOiwaoSi0oA5VBfh8Q8L0zmh10q13s0bUmWxk8Rz9uNUg2TU141M9V4B7xAV1GJ2mBz88dn7oWA8VB7KtHbqrwLi-uIZ3j73U$>, or unsubscribe [github.com]<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AGKWROI7MOHUS252ZZQATUTYCDB7JAVCNFSM6AAAAAA4V7I47CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBWHA3DAMRTGY__;!!PDiH4ENfjr2_Jw!A2DE_rJKOiwaoSi0oA5VBfh8Q8L0zmh10q13s0bUmWxk8Rz9uNUg2TU141M9V4B7xAV1GJ2mBz88dn7oWA8VB7KtHbqrwLi-uI96yXnE$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Here is another one that trips up the interactive and batch validators:
|
Thanks @leicray . Realised its a git email this time. I'm gonna do a little debugging now. Need time away from grant writing |
And another one:
|
Will come back to this one NG_059281.1:g.4962G>C (GRCh38). It's a database issue. Missing records |
This one too NG_061374.1:g.11229T>C (b38) |
So, the issue was that RefSeq are not maintaining RefSeqGene lookup tables. I added code to get the data from the API on fails. These variants are not fixed, but will not be fixed live until I do a new database build |
or at least do a interim update on the live servers which may be quicker for now. |
I don't know if I have the words. |
I did wonder about that one. However, there is a genome build provided, a chromosome, a nucleotide number, and the nature of the change to that nucleotide. In a sense, it's little different from |
It's not that sample sadly. I will need to figure out where to pus a Regex to catch it. I'm sure it'll fit. Hopefully with the code that allows chr17:50198002C>A. The difference is that chr17:50198002C>A is derived as art of pseudo VCF re-formatting. The description 11:2587692del is a bit different because 50198002C>A comes from 50198002:C:A. 11:2587692del should be derived from somethign like 50198002:CC:C not "del". Hopefully its a quick tweak though. Fun times! At least you came up with a reasonable explanation as to where the description came from |
NC_000023.11:r.650_831del |
chr11:g,108121787G>A GRCh37 The anonymous submitter also tried GRCh38 and that failed too, of course. This should be easy to trap and correct as the comma just needs to be replaced by a full stop. |
Will get this one done asap. Easy one hopefully |
An anonymous user has tried to validate If I rewrite the variant description as
Ought to be easy to trap. |
I might be wrong, but are you suggesting that is valid syntax? Because a change to the first codon leads to an unpredictable result. The docs say:
(source) |
You are quite correct. I simply wanted generate a variant description that would not cause the validator to fall over. I have no idea what comes next after Met1 in the DMD protein sequence, so pushed on with that. Of course, there ought to be an additional warning that |
This should be triggering the warning and I wonder if it is trying to and failing. Will look into it |
Ah, OK, you were just testing the reference sequence 😅 Never mind me! |
I'm still worried that the Met1 warning wasn't generated. So 2 fixes here. A chance to increase code coverage :P |
Hmm... I don't think that has ever been observed in humans... ClinVar reports this variant, but ClinVar always lies when it comes to protein descriptions 🙄 |
how about this? {
"flag": "warning",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev729+g86e62d8.d20241105",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_09/master",
"vvta_version": "vvta_2024_09"
},
"validation_warning_1": {
"alt_genomic_loci": [],
"annotations": {},
"gene_ids": {},
"gene_symbol": "",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "",
"tlr": ""
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "",
"primary_assembly_loci": {},
"reference_sequence_records": "",
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh38",
"submitted_variant": "NM_000179.3:r.3646_3646+1insugagauaugcauag",
"transcript_description": "",
"validation_warnings": [
"VariantSyntaxError: RNA (r.) reference sequences do not contain introns. Intronic descriptions are described in the context of a c. description"
],
"variant_exonic_positions": null
}
} Also @leicray can you comment on this issue so we can mark it as completed and update the text as necessary |
Related information:
|
I would change the second part of the warning to read: |
I honestly don't know when IUPAC made this change, but a quick check online doesn't show me any pages still using the old nomenclature. So it's probably been a while. I'll keep you updated on the vote! |
A user submitted the incorrect variant description |
…e and intrins in r. descriptions as referred to in #545
I doubt |
I agree with everything that you say regarding I always try to respond to user "errors" such as this if the user has logged in so that I can figure out their email address from their login ID. I did that in this case too and asked the user to get back to me with more info. I have received no reply and that's what happens in the majority of cases. Most users are rather unthinking (rude) when invited to respond. |
A user has tried unsuccessfully (and three times to validate the variant description |
The dev version of the LOVD HGVS syntax checker says: "Protein reference sequences are not supported. Please submit a DNA variant using a DNA reference sequence." |
I have added the following
{
"flag": "warning",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev709+g6340024",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_09/master",
"vvta_version": "vvta_2024_09"
},
"validation_warning_1": {
"alt_genomic_loci": [],
"annotations": {},
"gene_ids": {},
"gene_symbol": "",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "",
"tlr": ""
},
"hgvs_refseqgene_variant": "",
"hgvs_transcript_variant": "",
"primary_assembly_loci": {},
"reference_sequence_records": "",
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh38",
"submitted_variant": "NP_000483.3:c.579+3A>G.",
"transcript_description": "",
"validation_warnings": [
"Protein reference sequence input as Nucleotide (:c.) variant."
],
"variant_exonic_positions": null
}
} Will work for both NP_ and ENSP and with variant types g. c. r. n. @ifokkema not ignoring your email about requests for the LOVD API. We need to meet up and discuss integration with you |
This is not a "failed variant" issue but it probably belongs here anyway as it's an input-parsing issue of a sort. An anonymous user has twice tried to search for transcripts for a gene using the HGNC gene ID. The first submitted |
An anonymous user has twice tried to validate the variant description This looks like a failure to properly parse the input. |
An anonymous user has tried to validate the variant description This looks like a failure to properly parse the input. |
A user has submitted the variant description The basic problem is that position |
An anonymous user tried to validate the variant description If corrected to The gene symbol is redundant, as is the number and "identity" of the deleted nucleotides. |
An anonymous user's submission generated these ERROR message lines:
So much wrong here. The irony is that the corrected description, NM_033116.6:c.1715G>T, does validate and yields the protein-level variant p.(G572V). |
Hmm... what do you think the user meant with "105"? For what it's worth, my HGVS library responds to
|
I see LOTS of failed validation requests for which the submitted variant description is simply a long numeric string. By comparison "105" is modestly short. I really have no idea why users are doing this. I can only reply to registered users to offer help and to ask what they were trying to do, but none have ever replied with respect to this type of error. |
I also see no link between "105" and the gene (NEK9). Is the use of a variant description in the "transcripts" field common enough to add some code that checks that, perhaps only when the "variant" field itself can't be interpreted and tries to handle that? |
I dot not commonly see users placing the variant description in the "transcripts" field. The most common error of this type is simply submitting a long numeric string with no pretence to it being a valid variant description. I wonder occasionally if it's perhaps intended to be malicious in some way. |
That's definitely a possibility, although then I would expect attempts like SQL injection, XSS, local file inclusion, etc. We get those all the time, but they are mostly handled by the LOVD infrastructure before they reach the rest of the code, so they rarely get included in reports. |
An anonymous user has twice submitted the variant description When the description is changed to |
The LOVD/HGVS library returns, in this case: |
A user twice submitted the variant description The submission form perhaps needs a guidance message about submitting LRG-based descriptions or needs to auto "correct" to RefSeq, if necessary, when LRG-based descriptions are submitted. |
A user has submitted the variant description The latter description is probably what was intended, but the original variant description ought to have validated successfully as there appear to be no obvious syntax errors. The only possible issue is that normalization pushes the duplication 2bp into an intron and that would not be possible for an "r." description. Errors like this need to be better handled with a clearer error message. |
A user submitted the variant description User errors like this need to be trapped and guidance be sent to the user. EDIT: Three days later, an anonymous user submitted |
An anonymous user has submitted I'm not saying that just removing the brackets solves every issue with this description. |
This one is similar to the previous two. Whether this is valid or not depends; the HGVS guidelines are unclear about this. I had previously opened a discussion about it. Currently, Jeroen, Johan, and I have commented on it. If you have suggestions or recommendations, please add them there if you have time. |
chr5:112839840_112839842delGGCinsTGA b38
The text was updated successfully, but these errors were encountered: