You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is another bioinformatics issue for data sets. Handle multiple co-resident variants. For exmaple there are (at least) six variants of the APOBEC1 gene (which often uses the symbol A1CF). Most people have all of these variants.
Sometimes multiple variants need to be treated as separate genes, sometimes they should be averaged and treated as once gene. There are probably other strategies.
Rob and Holly should comment.
The text was updated successfully, but these errors were encountered:
We should store the number of variants per gene and also store the individual variants. For protein coding variants, they are commonly stored as offset. Each isoform could have a different offset. For non-coding variants, store the offset. Important to store the truncating variants also.
I'm not sure I understand the scope of the question, but everything Robert says sounds good to me. The variant nomenclature scheme at http://varnomen.hgvs.org/ is clunky but pretty good.
@tedgoldstein are the things in the first column of that table transcript labels? It doesn't seem to me that this has a compelling use case in our current roadmap.
Here is another bioinformatics issue for data sets. Handle multiple co-resident variants. For exmaple there are (at least) six variants of the APOBEC1 gene (which often uses the symbol A1CF). Most people have all of these variants.
NCBI label Hugo label
NM_138933 A1CF
NM_014576 A1CF
NM_138932 A1CF
NM_001198820 A1CF
NM_001198818 A1CF
NM_001198819 A1CF
Sometimes multiple variants need to be treated as separate genes, sometimes they should be averaged and treated as once gene. There are probably other strategies.
Rob and Holly should comment.
The text was updated successfully, but these errors were encountered: