Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing novel transcript in the VEP output #1801

Open
tuncayseker opened this issue Nov 24, 2024 · 2 comments
Open

missing novel transcript in the VEP output #1801

tuncayseker opened this issue Nov 24, 2024 · 2 comments
Assignees

Comments

@tuncayseker
Copy link

A novel transcript (ENST00000484532) is not appearing in the output of my local VEP (v108) or the current Ensembl UI. Could this be due to a pre-filtration step, or are certain transcripts intentionally excluded from annotation? I would like to determine whether this is an isolated issue, a general problem, or an expected behavior.

genome-version: hg38
transcript: ENST00000484532 (ENSG00000285629)
transcript-start-end: 1: 6,159,430-6,197,757

Variant: chr1:6168218 G->A
dbSNP: rs1667284547

result of the VEP: https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=mEODJGJsk7few3CQ-10650484

@dglemos dglemos self-assigned this Nov 25, 2024
@dglemos
Copy link
Contributor

dglemos commented Nov 25, 2024

Hi @tuncayseker,
ENST00000484532 is a readthrough transcription

A readthrough transcript has exons that overlap exons from transcripts belonging to two or more different loci (in addition to the locus to which the readthrough transcript itself belongs).

In this blog post you can read more about annotating readthrough transcription in Ensembl:
https://www.ensembl.info/2019/02/11/annotating-readthrough-transcription-in-ensembl/

In release 107 we decided to filter out readthrough transcripts from cache and database (release/107.0)
If you want to annotate this transcript you can run vep with a GFF file: https://ftp.ensembl.org/pub/current_gff3/homo_sapiens/

Let me know if you have more questions.

Best wishes,
Diana

@tuncayseker
Copy link
Author

Hi @dglemos,

Thank you for the detailed explanation and the reference to the blog post—it makes sense.

Having a list or a tag to identify which transcripts are excluded from the default VEP run (beyond relying on a full transcript dump) would indeed be very helpful, especially if there are additional transcript groups subject to similar exclusions.

I checked the GFF file for the transcript, but I couldn't find any specific tag or attribute that marks it as a readthrough gene/transcript. Here are the relevant lines from the GFF dump (release 113):

1	havana	gene	6159430	6197757	.	-	.	ID=gene:ENSG00000285629;biotype=protein_coding;description=novel transcript;gene_id=ENSG00000285629;logic_name=havana_homo_sapiens;version=1
1	havana	mRNA	6159430	6197757	.	-	.	ID=transcript:ENST00000484532;Parent=gene:ENSG00000285629;biotype=nonsense_mediated_decay;tag=gencode_basic,gencode_primary,Ensembl_canonical;transcript_id=ENST00000484532;transcript_support_level=2 (assigned to previous version 5);version=6

It would be helpful if there were a more explicit tag or attribute in the GFF file to identify readthrough transcripts for easier reference (similar to how Gencode dumps include readthrough_transcript and readthrough_gene tags).

Thanks again for your help.

Best regards,
Tuncay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants