-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mapping support for variants only partially intergenic #333
Comments
My initial thoughts on this @ifokkema and @leicray is to set up a vid call. I have huge concerns with opening the can of worms again that is letting variant descriptions in the context of transcript reference sequences whereby there would be a need to describe variation beyond the boundaries of the reference sequence. It is not a good idea, so I think we need to have a very good think about this and make recommendations for the HGVS SVD group |
Sounds good! Another thing that popped up in my head is that this is also related to fusion transcripts. Deletions like these can cause fusion transcripts, and those do have a transcript-based description. So we might also go in that direction, even though that doesn't solve whole-gene deletions yet but only deletions where half genes are deleted. |
Fusions are on the agenda for description formats that we need to crack. @leicray has certainly been working in this area. We should definately talk about those too. Let's sort out some dates via email. Not sure about the SVD recruitment. Another thing to chat about |
Please include me in the any proposed chat session. |
You are needed |
As a heads-up; the HVNC just decided that in a genomic context, the transcript coordinate system (NM:c) can be used to indicate upstream and downstream positions, just like currently intronic positions are described that way. |
@ifokkema Thanks for the heads up. I'll add this to the to-do list. I object by the way, but rules is rules :P Can I ask you to open a specific feature request. Is there any links to this in the HGVS sites yet. Not a problem if not, we just will track once there are |
Haha! Since it solves a major issue in LOVD, as well as this issue, I'm all for it 😅 Also, it re-aligns VV and Mutalyzer.
Yep, I just created #652. So far, we only have the issue in the HGVS Nomenclature repo; we'll work on updating the website soon. As you can imagine, quite a few pages should be adjusted. |
Is your feature request related to a problem? Please describe.
For LOVD to store a gene-specific effect of a variant, LOVD must store the mapped gene-level representation of that variant. While it is understandable that intergenic variants can not be mapped to genes, variants that do overlap genes should always have a gene-level representation, even if they are also partially intergenic. But, variants entirely deleting genes with both of the deletion's endpoints outside of the gene's bounds, currently do not report any mapping (tested the LOVD endpoint and the VV endpoint). Also, variants deleting half of a gene with the other endpoint outside of the gene's bounds also do not report any mapping. E.g.,
NC_000016.9:g.2106894_2161281del
.Describe the solution you'd like
In order for LOVD to "discover" an effect on the gene, VV should return a mapping. An issue is, however, that the HGVS nomenclature doesn't have any valid rules currently that can describe such a variant.
We conducted a poll among 400 LOVD curators, asking them what description would be best to be used on the gene level. It was highlighted to them that none of the possibilities were HGVS compliant, so they were purely asked about their preference. In total, 54 curators replied.
Note, for all given descriptions, the intended reference sequence is
NC_000016.9(NM_001009944.2)
, equal to intronic variation.The given options were;
A. "-" (an empty description)
This gives no details on what region of the coding DNA reference sequence is affected.
B. "c.3887_*32834del"
This is the current output that the Mutalyzer tool generates, linked to the LOVD database, to check and create variant descriptions. Mutalyzer maps the variant's endpoint assuming c.* numbering continues forever. It gives a clear indication of the full deletion's size but is not supported by HGVS.
C. "c.3887_*1017del"
This does give details on what region of the coding DNA reference sequence is affected (c.*1017 is the last base of this reference sequence), but it suggests the deletion has been sequenced as c.3887_*1017del. Since, in fact, the deletion extends beyond c.*1107, this description is not correct.
D. "c.3887_(*1017_?)del"
This does give details on what region of the coding DNA reference sequence is affected and shows more sequence has been deleted, although it suggests the endpoint of the deletion is not known while it is.
E. "c.3887_*1017[0]"
This new format does give details on what region of the coding DNA reference sequence is affected, and the [0] suggests it is present in 0 copies, so deleted. However, the format may be confused with the HGVS allele format, which also uses []. NOTE: For a duplication, we would use [2].
F. "c.3887_*1017{0}"
This new format does give details on what region of the coding DNA reference sequence is affected, and the {0} suggests it is present in 0 copies. Since HGVS does not use {}, there can not be any confusion. NOTE: For a duplication, we would use {2}.
Note, as a response to the survey, Peter Taschner noted another option;
G. "c.3887_*1017+d31817del"
This has been proposed before but was rejected by the HGVS. It indicates clearly the extent of the deletion, including the extent of the reference sequence, and more closely resembles the intronic variant notation.
The results were as follows;
My personal worry is also to generate any description that cannot be mapped back to the genome. I.e., options A, C, D, E, and F, can not be mapped back to the genome if their source was the transcript. So, information is lost. Personally, I feel that the "we cannot describe positions not mentioned in the reference sequence" is solved by using the NC(NM) construct, just like intronic variants are handled now. I haven't heard any argument why it can't work like this, that would also not apply to how we describe intronic variants.
Describe alternatives you've considered
Note that Mutalyzer currently uses option B and that descriptions like these are currently widely spread in LOVD.
Additional context
Note, that Johan decided to ignore the wishes of the curators, and decided to implement option F in the GV shared LOVD. For many "new" submissions (up to one and a half years old or so), option F is used and not B.
The text was updated successfully, but these errors were encountered: