Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VEP needs sorted VCFs again #1784

Open
gernophil opened this issue Nov 5, 2024 · 1 comment
Open

VEP needs sorted VCFs again #1784

gernophil opened this issue Nov 5, 2024 · 1 comment
Assignees

Comments

@gernophil
Copy link

Hey, this bug has been described here ealier: #530

For larger inputs in VCF format VEP 113 (used with cache version 112) crashes with Exiting the program. The input file appears to be unsorted. Please sort by chromosome and by location and re-submit.

In my situation, I rely on the original order of the VCF. Sorting and then getting the original order back again would of course be doable, but it would be easier not to. It's not an issue for me, if it just takes longer and your website states:

VEP is optimised to run on input files that are sorted in chromosomal order. Unsorted files will still work, albeit more slowly. (http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html)

Small inputs up to several thousand variants seem to work fine. I haven't been able to find the exact threshold yet, but from around up to 2000 to 3000 lines VEP crashes reproducible with Exiting the program. The input file appears to be unsorted. Please sort by chromosome and by location and re-submit. As I said, I haven't been able to pin it down to a specific number exactly. Maybe this also depends on the entropy of the VCF?

  • VEP version: 113
  • VEP Cache version: 112
  • Perl version: 5.32
  • OS: macOS and CentOS
  • tabix installed: no

I pipe my variants like this:

echo (or printf) CHROM\tPOS\t.\tREF\tALT\t.\t.\t.\n(next variant) | vep --species species --assembly assembly --dir_cache dir_cache --cache --cache_version cache_version --force --format vcf --no_stats --offline --output_file STDOUT --quiet --vcf

Full error message:
Exiting the program. The input file appears to be unsorted. Please sort by chromosome and by location and re-submit.

@nakib103 nakib103 self-assigned this Nov 6, 2024
@nakib103
Copy link
Contributor

nakib103 commented Nov 6, 2024

Hi @gernophil,

Thanks for the query!

Can you try running using the --no_check_variants_order option? VEP should run your unsorted file but would be slower.

Best regards,
Nakib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants