eggd_vep

What does this app do?

Annotates a vcf using Variant Effect Predictor. Default docker image used v110.1.

What are typical use cases for this app?

This app was designed to annotate vcfs with specified fields based on provided annotation.

A variable level of annotation can be achieved by different combinations of custom annotation and vep plugins, in addition to the required VEP cache annotation bundle.

What data are required for this app to run?

An input vcf to be annotated (vcf)

Annotation configuration file (config_file):

json file providing information about annotations and plugins.
Example config file:

 {
         "config_information":{
             "genome_build": "GrCh37",
             "assay":"TWE",
             "config_version": "1.0.0"
         },

         "vep_resources":{
             "vep_docker":"project-Fkb6Gkj433GVVvj73J7x8KbV:file-G8V2Vz0433Gp5bYPF2f6vg9X",
             "vep_cache":"project-Fkb6Gkj433GVVvj73J7x8KbV:file-G8V4bGj433Gz96K3Fb1VfbG3",
             "plugin_config":"project-Fkb6Gkj433GVVvj73J7x8KbV:file-G8V57Yj433Gfg3vF9jPq1ZFk",
             "reference_fasta":"project-Fkb6Gkj433GVVvj73J7x8KbV:file-G6BYyyj4YV3pYBkgFVGP2K4P",
             "reference_fai":"project-Fkb6Gkj433GVVvj73J7x8KbV:file-G6P32kj4YV3y58KyP4k4qG2p",
             "reference_gzi":"project-Fkb6Gkj433GVVvj73J7x8KbV:file-G6BYz104YV3X5qp463K5b5vp",
             "ref_bcftools":"project-Fkb6Gkj433GVVvj73J7x8KbV:file-F3zxG0Q4fXX9YFjP1v5jK9jf"

         },
         "custom_annotations": [
             {
                 "name": "ClinVar",
                 "type": "vcf",
                 "annotation_type": "exact",
                 "force_coordinates": "0",
                 "vcf_fields": "CLNSIG,CLNREVSTAT,CLNDN",
                 "required_fields":"ClinVar,ClinVar_CLNSIG,ClinVar_CLNDN",
                 "resource_files": [
                     {
                     "file_id":"project-Fkb6Gkj433GVVvj73J7x8KbV:file-G6BY8X0433GbVBG06pFPvjp7",
                     "index_id":"project-Fkb6Gkj433GVVvj73J7x8KbV:file-G6BY8Pj433GjPF7j6kK5QZ75"
                     }
                 ]
             }
         ],
         "plugins": [
             {
                 "name": "SpliceAI",
                 "pm_file": "project-G86K7XQ4jKXPgK4Z8Zj585Zj:file-G8YXQQQ4jKX4bVz53zK7VJX5",
                 "required_fields":"SpliceAI_pred_DS_AG,SpliceAI_pred_DS_AL,SpliceAI_pred_DS_DG,SpliceAI_pred_DS_DL",
                 "suffix": "cutoff=0.5",
                 "resource_files": [
                 {
                     "file_id": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-G9FFfBj433GV57Zf8ZvxfbBg",
                     "index_id": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-G9FFfz0433GZ1X13FqjjQJFF",
                     "prefix": "snv="
                 },
                 {
                     "file_id": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-G9FF6zj433Gv5jxkFqpK6p7J",
                     "index_id": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-G9FF6qQ433GxvVp497P5V05X",
                     "prefix": "indel="
                 }
             ]
         }
     ]
 }

In theory the app should run with any combination of annotation, please bear in mind the instance used if passing big datasets

What are the optional inputs for this app?

Amount of variants VEP will annotate per core(buffer_size) [default = 500] : to allow for parallelisation the app recognises the instance type and splits annotation in the amount of available cores.
A panel bed file to filter the vcf on (panel_bed)
A list of transcripts to filter on (transcript_list). One transcript per line. VEP annotates with all possible transcripts and if this list is passed it filters on the given transcript list.
A boolean flag of whether to normalise the input vcf or not (normalise) [ default=true ].

This app uses the following tools which are app assets:

htslib (v1.14)
bedtools (v2.30.0)

For larger vcfs please consider the appropriate instance type and buffer size to use.

What does this app output?

Annotated (and if requested, filtered) vcf.

Notes

This app uses a buffer_size of 500 variants and parallelised the maximum number of cores available. As a default, this app runs using mem1_ssd1_v2_x16 which translates to 16 cores. This was chosen to speed up set up.
The default behaviour of this app is to normalise the input vcf as all default annotation used is also normalised to be able to appropriately compare and annotate the vcf. The normalise option was built to ensure compatibility with copy-number vcfs which do not require normalisation.
If there are additional flags, the additional_flags section in the config file should contain the flag name and the additional_fields section in the config file should contain the output field name for that flag. Example flags and output fields are linked here

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
resources/usr/local/bin		resources/usr/local/bin
src		src
Readme.md		Readme.md
dxapp.json		dxapp.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eggd_vep

What does this app do?

What are typical use cases for this app?

What data are required for this app to run?

What are the optional inputs for this app?

What does this app output?

Notes

About

Releases 3

Packages

Contributors 6

Languages

eastgenomics/eggd_vep

Folders and files

Latest commit

History

Repository files navigation

eggd_vep

What does this app do?

What are typical use cases for this app?

What data are required for this app to run?

What are the optional inputs for this app?

What does this app output?

Notes

About

Resources

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 6

Languages

Packages