Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Search function to return all transcript mappings spanned by a query reference region #5

Open
John-F-Wagstaff opened this issue Aug 17, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@John-F-Wagstaff
Copy link
Collaborator

Feature request description, and associated problem
vv_hgvs is currently the main interface for VVTA databases and is used by VariantValidator for this purpose. Users expect to be able to query VariantValidator
for genomic variants, and receive as a response all affected transcripts. However, despite their expectations this is not the case, as a consistent hgvs nomenclature for handling variants beyond the bounds of the transcript has yet to be decided on by the HVNC, and as such the vv_hgvs so far lacks features for querying mapped transcripts in these cases. This has caused issues in variantValidator such as https://github.com/openvar/variantValidator/issues/399. As such it would be good to add a query that allows users to detect such transcripts to help fulfil these expectations.

Current proposed solution
vv_hgvs already has a number of related functions, adding a similar one to handle this case should be reasonably straightforward. The underlying SQL should look something like
SELECT * FROM current_valid_mapped_transcript_spans_mv WHERE alt_ac=$target_acc AND start_i >$query_start AND end_i < $query_end for total overlap or SELECT * FROM current_valid_mapped_transcript_spans_mv WHERE alt_ac=$target_acc AND end_i>$query_start AND start_i < $query_end. Relevant tests will also need to be added.

Alternatives
It is possible that we could just expect users to query the VVTA directly, but this would complicate the usage of the VVTA by breaking through the expected layering.

Additional context
We need to decide, and specify, whether the spans are exclusive or inclusive, document which, and test for this as well.

@John-F-Wagstaff John-F-Wagstaff added the enhancement New feature or request label Aug 17, 2022
@mashok-acog
Copy link

mashok-acog commented Oct 11, 2022

Hi John,
I am getting the following error when running the following vv_hgvs test script

hp = vvhgvs.parser.Parser()
hgvs_g = 'NC_000007.13:g.36561662C>T'
hgvs_c = 'NM_001637.3:c.1582G>A'
var_g = hp.parse_hgvs_variant(hgvs_g)
var_g
var_g.posedit.pos.start
str(var_g)
import vvhgvs.dataproviders.uta
hdp = vvhgvs.dataproviders.uta.connect()
import vvhgvs.assemblymapper
am = vvhgvs.assemblymapper.AssemblyMapper(hdp, assembly_name='GRCh37', alt_aln_method='splign', replace_reference=True)```

 ERROR:  relation "current_valid_mapped_transcript_spans_mv" does not exist at character 9
 select tx_ac,alt_ac,alt_strand,alt_aln_method,start_i,end_i
 from current_valid_mapped_transcript_spans_mv
 where alt_ac='NC_000007.13' and alt_aln_method='splign' and start_i < 36561662 and 36561662 <= end_i

the materialized view "current_valid_mapped_transcript_spans_mv" does not exist. 

Please help

@John-F-Wagstaff
Copy link
Collaborator Author

@mashok-acog Sorry for not replying to your other bug but, not only is it unclear what you mean in that bug, compared to this much clearer post, but I am also currently not full time on this project. If you need further help on this issue please move back to the original bug fill in the extra detail and '@' me. This bug is a feature request, it is not associated with your problem, please do not reply in this thread.

You probably just need to install the vvta (and it's own Seqrepo release) instead of the uta and it should work fine. The "current_valid_mapped_transcript_spans_mv" view is one of the first views used by the when searching for any relevant transcripts with an input chromosomal location, so this complaint is characteristic of missing/ outdated or misconfigured database.

As noted at the top of the readme however this project is mainly being used by the VariantValidator pipeline, and is not recommended for stand alone use. In some respects this represents a snapshot of an older hgvs version, though upgraded to work with the newer vvta database. This is required to work with the existing VariantValidator code base, which then tweaks the output to improve it. If you need a end user recommended project you should normally either use VariantValidator or mainline hgvs, as such the documentation has not been updated for this project as a stand alone system. The actual documentation to install this project is here
(VariantValidator install docs), If you want to install this code stand alone you would need to do the "Setting up Seqrepo" and "Setting up VVTA database" sections from this as well as installing Seqrepo and the vvhgvs code, the configuration for vvhgvs should point at these data sources not the UTA versions of either. But again, this is not the recommended usage method, so please consider if the other options would be better for your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants