Add mapping support for variants only partially intergenic #333

ifokkema · 2022-01-20T17:43:23Z

Is your feature request related to a problem? Please describe.
For LOVD to store a gene-specific effect of a variant, LOVD must store the mapped gene-level representation of that variant. While it is understandable that intergenic variants can not be mapped to genes, variants that do overlap genes should always have a gene-level representation, even if they are also partially intergenic. But, variants entirely deleting genes with both of the deletion's endpoints outside of the gene's bounds, currently do not report any mapping (tested the LOVD endpoint and the VV endpoint). Also, variants deleting half of a gene with the other endpoint outside of the gene's bounds also do not report any mapping. E.g., NC_000016.9:g.2106894_2161281del.

Describe the solution you'd like
In order for LOVD to "discover" an effect on the gene, VV should return a mapping. An issue is, however, that the HGVS nomenclature doesn't have any valid rules currently that can describe such a variant.
We conducted a poll among 400 LOVD curators, asking them what description would be best to be used on the gene level. It was highlighted to them that none of the possibilities were HGVS compliant, so they were purely asked about their preference. In total, 54 curators replied.

Note, for all given descriptions, the intended reference sequence is NC_000016.9(NM_001009944.2), equal to intronic variation.
The given options were;

A. "-" (an empty description)
This gives no details on what region of the coding DNA reference sequence is affected.

B. "c.3887_*32834del"
This is the current output that the Mutalyzer tool generates, linked to the LOVD database, to check and create variant descriptions. Mutalyzer maps the variant's endpoint assuming c.* numbering continues forever. It gives a clear indication of the full deletion's size but is not supported by HGVS.

C. "c.3887_*1017del"
This does give details on what region of the coding DNA reference sequence is affected (c.*1017 is the last base of this reference sequence), but it suggests the deletion has been sequenced as c.3887_*1017del. Since, in fact, the deletion extends beyond c.*1107, this description is not correct.

D. "c.3887_(*1017_?)del"
This does give details on what region of the coding DNA reference sequence is affected and shows more sequence has been deleted, although it suggests the endpoint of the deletion is not known while it is.

E. "c.3887_*1017[0]"
This new format does give details on what region of the coding DNA reference sequence is affected, and the [0] suggests it is present in 0 copies, so deleted. However, the format may be confused with the HGVS allele format, which also uses []. NOTE: For a duplication, we would use [2].

F. "c.3887_*1017{0}"
This new format does give details on what region of the coding DNA reference sequence is affected, and the {0} suggests it is present in 0 copies. Since HGVS does not use {}, there can not be any confusion. NOTE: For a duplication, we would use {2}.

Note, as a response to the survey, Peter Taschner noted another option;
G. "c.3887_*1017+d31817del"
This has been proposed before but was rejected by the HGVS. It indicates clearly the extent of the deletion, including the extent of the reference sequence, and more closely resembles the intronic variant notation.

The results were as follows;

My personal worry is also to generate any description that cannot be mapped back to the genome. I.e., options A, C, D, E, and F, can not be mapped back to the genome if their source was the transcript. So, information is lost. Personally, I feel that the "we cannot describe positions not mentioned in the reference sequence" is solved by using the NC(NM) construct, just like intronic variants are handled now. I haven't heard any argument why it can't work like this, that would also not apply to how we describe intronic variants.

Describe alternatives you've considered
Note that Mutalyzer currently uses option B and that descriptions like these are currently widely spread in LOVD.

Additional context
Note, that Johan decided to ignore the wishes of the curators, and decided to implement option F in the GV shared LOVD. For many "new" submissions (up to one and a half years old or so), option F is used and not B.

The text was updated successfully, but these errors were encountered:

Peter-J-Freeman · 2022-01-25T09:51:04Z

My initial thoughts on this @ifokkema and @leicray is to set up a vid call. I have huge concerns with opening the can of worms again that is letting variant descriptions in the context of transcript reference sequences whereby there would be a need to describe variation beyond the boundaries of the reference sequence. It is not a good idea, so I think we need to have a very good think about this and make recommendations for the HGVS SVD group

ifokkema · 2022-01-25T10:16:27Z

Sounds good! Another thing that popped up in my head is that this is also related to fusion transcripts. Deletions like these can cause fusion transcripts, and those do have a transcript-based description. So we might also go in that direction, even though that doesn't solve whole-gene deletions yet but only deletions where half genes are deleted.
On a related note; did "recruiting" for the SVD group already start? I'm interested to join. Same for the VIJ group. Even though I'm already incredibly busy, it's important for me to be involved in these.

Peter-J-Freeman · 2022-01-25T10:38:56Z

Fusions are on the agenda for description formats that we need to crack. @leicray has certainly been working in this area. We should definately talk about those too. Let's sort out some dates via email.

Not sure about the SVD recruitment. Another thing to chat about

leicray · 2022-01-25T11:30:40Z

Please include me in the any proposed chat session.

Peter-J-Freeman · 2022-01-25T11:31:25Z

You are needed

ifokkema · 2024-10-07T16:59:23Z

As a heads-up; the HVNC just decided that in a genomic context, the transcript coordinate system (NM:c) can be used to indicate upstream and downstream positions, just like currently intronic positions are described that way.

Peter-J-Freeman · 2024-10-08T09:03:45Z

@ifokkema Thanks for the heads up. I'll add this to the to-do list. I object by the way, but rules is rules :P

Can I ask you to open a specific feature request. Is there any links to this in the HGVS sites yet. Not a problem if not, we just will track once there are

ifokkema · 2024-10-08T12:47:21Z

@ifokkema Thanks for the heads up. I'll add this to the to-do list. I object by the way, but rules is rules :P

Haha! Since it solves a major issue in LOVD, as well as this issue, I'm all for it 😅 Also, it re-aligns VV and Mutalyzer.

Can I ask you to open a specific feature request. Is there any links to this in the HGVS sites yet. Not a problem if not, we just will track once there are

Yep, I just created #652. So far, we only have the issue in the HGVS Nomenclature repo; we'll work on updating the website soon. As you can imagine, quite a few pages should be adjusted.

ifokkema mentioned this issue Jan 20, 2022

LOVD endpoint: Variants crossing gene boundaries generate "porcessing_error". #173

Closed

ifokkema mentioned this issue Jan 27, 2022

Handle variants unsupported by VV by translating them first LOVDnl/LOVD3#584

Open

11 tasks

ifokkema mentioned this issue Jul 22, 2022

"No transcripts found" in a region that spans several genes #399

Open

ifokkema mentioned this issue Apr 17, 2024

LOVD associated improvement requests and bugs #603

Open

5 tasks

ifokkema mentioned this issue Oct 8, 2024

Allow c. positions for intergenic variants in the context of a genomic reference sequence #652

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mapping support for variants only partially intergenic #333

Add mapping support for variants only partially intergenic #333

ifokkema commented Jan 20, 2022

Peter-J-Freeman commented Jan 25, 2022

ifokkema commented Jan 25, 2022

Peter-J-Freeman commented Jan 25, 2022

leicray commented Jan 25, 2022

Peter-J-Freeman commented Jan 25, 2022

ifokkema commented Oct 7, 2024

Peter-J-Freeman commented Oct 8, 2024 •

edited

Loading

ifokkema commented Oct 8, 2024

Add mapping support for variants only partially intergenic #333

Add mapping support for variants only partially intergenic #333

Comments

ifokkema commented Jan 20, 2022

Peter-J-Freeman commented Jan 25, 2022

ifokkema commented Jan 25, 2022

Peter-J-Freeman commented Jan 25, 2022

leicray commented Jan 25, 2022

Peter-J-Freeman commented Jan 25, 2022

ifokkema commented Oct 7, 2024

Peter-J-Freeman commented Oct 8, 2024 • edited Loading

ifokkema commented Oct 8, 2024

Peter-J-Freeman commented Oct 8, 2024 •

edited

Loading