missing novel transcript in the VEP output #1801

tuncayseker · 2024-11-24T11:30:01Z

A novel transcript (ENST00000484532) is not appearing in the output of my local VEP (v108) or the current Ensembl UI. Could this be due to a pre-filtration step, or are certain transcripts intentionally excluded from annotation? I would like to determine whether this is an isolated issue, a general problem, or an expected behavior.

genome-version: hg38
transcript: ENST00000484532 (ENSG00000285629)
transcript-start-end: 1: 6,159,430-6,197,757

Variant: chr1:6168218 G->A
dbSNP: rs1667284547

result of the VEP: https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=mEODJGJsk7few3CQ-10650484

dglemos · 2024-11-25T10:06:27Z

Hi @tuncayseker,
ENST00000484532 is a readthrough transcription

A readthrough transcript has exons that overlap exons from transcripts belonging to two or more different loci (in addition to the locus to which the readthrough transcript itself belongs).

In this blog post you can read more about annotating readthrough transcription in Ensembl:
https://www.ensembl.info/2019/02/11/annotating-readthrough-transcription-in-ensembl/

In release 107 we decided to filter out readthrough transcripts from cache and database (release/107.0)
If you want to annotate this transcript you can run vep with a GFF file: https://ftp.ensembl.org/pub/current_gff3/homo_sapiens/

Let me know if you have more questions.

Best wishes,
Diana

tuncayseker · 2024-11-26T11:23:15Z

Hi @dglemos,

Thank you for the detailed explanation and the reference to the blog post—it makes sense.

Having a list or a tag to identify which transcripts are excluded from the default VEP run (beyond relying on a full transcript dump) would indeed be very helpful, especially if there are additional transcript groups subject to similar exclusions.

I checked the GFF file for the transcript, but I couldn't find any specific tag or attribute that marks it as a readthrough gene/transcript. Here are the relevant lines from the GFF dump (release 113):

1	havana	gene	6159430	6197757	.	-	.	ID=gene:ENSG00000285629;biotype=protein_coding;description=novel transcript;gene_id=ENSG00000285629;logic_name=havana_homo_sapiens;version=1
1	havana	mRNA	6159430	6197757	.	-	.	ID=transcript:ENST00000484532;Parent=gene:ENSG00000285629;biotype=nonsense_mediated_decay;tag=gencode_basic,gencode_primary,Ensembl_canonical;transcript_id=ENST00000484532;transcript_support_level=2 (assigned to previous version 5);version=6

It would be helpful if there were a more explicit tag or attribute in the GFF file to identify readthrough transcripts for easier reference (similar to how Gencode dumps include readthrough_transcript and readthrough_gene tags).

Thanks again for your help.

Best regards,
Tuncay

dglemos self-assigned this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing novel transcript in the VEP output #1801

missing novel transcript in the VEP output #1801

tuncayseker commented Nov 24, 2024

dglemos commented Nov 25, 2024

tuncayseker commented Nov 26, 2024

missing novel transcript in the VEP output #1801

missing novel transcript in the VEP output #1801

Comments

tuncayseker commented Nov 24, 2024

dglemos commented Nov 25, 2024

tuncayseker commented Nov 26, 2024