Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VEP 113 Interrupts on Large VCF Files with Numerous Transcript-assembly Mismatch Warnings #1787

Open
GSYongWu opened this issue Nov 11, 2024 · 3 comments
Assignees

Comments

@GSYongWu
Copy link

Hi,
I tested using version 113 of VEP. Small VCF files work fine, but when I provide a larger VCF file with about 100,000 lines, the program is interrupted, showing numerous warnings: WARNING: “Transcript-assembly mismatch”.
我的命令行是:

/usr/bin/perl /dssg02/AnalysisTemp/wuy/VEP113/softwares/ensembl-vep-release-113.0/vep --offline --no_stats --buffer_size 10000 --fork 4 --ccds --uniprot --hgvs --symbol --shift_3prime 1 --numbers --canonical --protein --biotype --hgvsg --variant_class --total_length --force_overwrite --allele_number --no_escape --vcf --dir /dssg02/AnalysisTemp/wuy/VEP113/softwares/vepdb --fasta /dssg/gitlab/gscap/gscap2020/resources/public/databases/genome/hs37d5.fa --format vcf --input_file test.in.all.vcf --output_file test.out.all --refseq --use_given_ref --no_check_variants_order 

The WARNING for example:

WARNING: 135747 : WARNING: Transcript-assembly mismatch in 1_144994660_C/A
WARNING: Transcript-assembly mismatch in 1_144994660_C/A
WARNING: Transcript-assembly mismatch in 1_144955253_C/A
WARNING: Transcript-assembly mismatch in 1_144955253_C/A
WARNING: Transcript-assembly mismatch in 1_144952627_C/A
WARNING: Transcript-assembly mismatch in 1_144952627_C/A
WARNING: Transcript-assembly mismatch in 1_144952288_A/T
WARNING: Transcript-assembly mismatch in 1_144952288_A/T
WARNING: Transcript-assembly mismatch in 1_144946683_G/A
WARNING: Transcript-assembly mismatch in 1_144946683_G/A
WARNING: Transcript-assembly mismatch in 1_144923755_A/T
WARNING: Transcript-assembly mismatch in 1_144923755_A/T
WARNING: Transcript-assembly mismatch in 1_144882663_A/T
WARNING: Transcript-assembly mismatch in 1_144882663_A/T
WARNING: Transcript-assembly mismatch in 1_144879298_A/T
WARNING: Transcript-assembly mismatch in 1_144879298_A/T
WARNING: Transcript-assembly mismatch in 1_144877153_A/T
WARNING: Transcript-assembly mismatch in 1_144877153_A/T
WARNING: Transcript-assembly mismatch in 1_144874792_A/T
WARNING: Transcript-assembly mismatch in 1_144874792_A/T
WARNING: Transcript-assembly mismatch in 1_144866657_A/T
WARNING: Transcript-assembly mismatch in 1_144866657_A/T
WARNING: Transcript-assembly mismatch in 1_145039599_G/A
WARNING: Transcript-assembly mismatch in 1_145039599_G/A
WARNING: Transcript-assembly mismatch in 1_145021139_T/A
WARNING: Transcript-assembly mismatch in 1_145021139_T/A
WARNING: Transcript-assembly mismatch in 1_145015935_A/T
WARNING: Transcript-assembly mismatch in 1_145015935_A/T
WARNING: Transcript-assembly mismatch in 1_144994645_C/A
WARNING: Transcript-assembly mismatch in 1_144994645_C/A

@dglemos dglemos self-assigned this Nov 11, 2024
@dglemos
Copy link
Contributor

dglemos commented Nov 11, 2024

Hi @GSYongWu,
This warning should not stop the VEP run. Is the output truncated?

This warning is thrown when there is an allele mismatch between the RefSeq and the Ensembl transcript at that specific location.
I can see from your command you run the two options:

--refseq 
--use_given_ref

Running the option --use_given_ref should not throw this warning however there is an issue with this option in release 113.

Can you please send an example of your output file? For example the output for 1_144994660_C/A.

@GSYongWu
Copy link
Author

Hi @dglemos
Yesterday, I ran the process several times, and it terminated each time. The message before termination was: "ensembl-vep-release-113.0/Bio/EnsEMBL/Variation/Utils/VariationEffect.pm line 936.
Died in forked process 74883”
Today, I reran it, and the result had only warnings and did not terminate.
The locus result file 1_144994660_C/A is here:
1_144994660_C-A.vep.zip

@dglemos
Copy link
Contributor

dglemos commented Nov 12, 2024

Thanks for providing the data.
The variant output looks correct.

If you run vep with option --use_given_ref you can ignore the warning WARNING: Transcript-assembly mismatch in <>. This warning indicates the alleles don't match at this position between RefSeq and Ensembl, but as you are running the option to use the input allele this mismatch won't be a problem for your job.
We are working on fixing this warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants