Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Adding Custom BED File to Ensembl VEP as a Coverage Database #1785

Closed
mrymkdnz opened this issue Nov 7, 2024 · 2 comments
Closed
Assignees

Comments

@mrymkdnz
Copy link

mrymkdnz commented Nov 7, 2024

I'm trying to integrate a custom BED file as a custom database in Ensembl VEP, but I am not getting the expected results. The BED file I’m using has a value for each base position, not a variant-specific value, and is structured as follows:
chr1 11819 11819 0.11177
chr1 11820 11820 0.11177
chr1 11821 11821 0.11177
chr1 11822 11822 0.11177
chr1 11823 11823 0.11177
chr1 11824 11824 0.11177
chr1 11825 11825 0.11177
chr1 11826 11826 0.11177
chr1 11827 11827 0.11177
chr1 11828 11828 0.11177

Since the fourth column represents position-specific values (not variants), I've used a BED file where the start and end values are the same, as I only want the value at that specific position. I followed the instructions from the Ensembl VEP custom annotation documentation to compress and index the BED file.

I’ve tried adding this file to my VEP command with the following option:

--custom [PATH]/gnomad.exomes.v4.0.coverage.summary.bed.gz,GnomadExomeCoverage,bed,exact,0
However, I am not getting the expected results. Some variants receive values while others, which should have values, do not. Additionally, the values for those that do appear are incorrect.

For example, let's look at this variant:
Variant: chr2-101008009-T-C
grep '101008009' gnomad.exomes.v4.0.coverage.summary.bed
chr2 101008009 101008009 77.4
chr10 101008009 101008009 2.1976

This shows the entry does exist in the BED file.

However, in the VEP VCF output, this entry appears as:

#Uploaded_variation Location Gene GnomadExomeCoverage GnomadGenomeCoverage NMD gnomADe_FILTER gnomADg_FILTER
chr2_101008009_T/C chr2:101008009 ENSG00000204634 - - - - -
The GnomadExomeCoverage field is empty where I expected a value.

I tried using both exact,0 and overlap options, but neither worked.
I double-checked the syntax and verified that the BED file is formatted and indexed as described in the documentation.

Expected Behavior
For each variant position that matches an entry in the BED file, I expect the GnomadExomeCoverage field to be populated with the corresponding value from the BED file.

Actual Behavior
Some variants do not retrieve any value, and the values for others are incorrect.

System

  • VEP version: 112
  • VEP Cache version: 112
  • OS: [Ubuntu]

Any help in troubleshooting this issue would be greatly appreciated!

@nakib103 nakib103 self-assigned this Nov 7, 2024
@nakib103
Copy link
Contributor

nakib103 commented Nov 7, 2024

Hi @mrymkdnz ,

Thanks for your query!

The BED format is zero-based start and non-inclusive end (see documentation). I am assuming you are using a input file format that uses one-based position like VCF.

What it means is that, when you have 101008009 in BED it is interpreted as 101008010 position in one-based format. Also, as end position is non-inclusive in BED, if you want to specify a single base you have to use end = start + 1. So, the correct way to represent the line in your bed would be -

chr2 101008008 101008009 77.4

Which would mean a value 77.4 associated with the 101008009 position in chromosome 2 in one-based system and you will find a match.

Hope that helps,
Nakib

@mrymkdnz
Copy link
Author

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants