Conclusion
The Ensembl VEP web tool enables the flexible configuration of variant
analysis from an extensive range of options via a simple interface. It
allows customisable filtering so you can interrogate and understand your
results. It links out to detailed resources, both within the Ensembl
browser and other key websites. The regular updating of the reference
data and analysis tools supported within Ensembl VEP make it an
essential tool for variant annotation, filtering and prioritisation.
Acknowledgments
We thank members of the Ensembl team for gene, regulatory and
comparative genomics annotation, and web development. We thank previous
team members, in particular William McLaren and Laurent Gil, for their
contributions to Ensembl VEP. We also wish to thank the of EMBL-EBI’s
technical services cluster for their support and the VEP community who
have helped to improve Ensembl VEP by suggesting new functionality,
giving feedback and bug reports.
References
1000 Genomes Project Consortium, Auton, A., Brooks, L. D., Durbin, R.
M., Garrison, E. P., Kang, H. M., Korbel, J. O., Marchini, J. L.,
McCarthy, S., McVean, G. A., & Abecasis, G. R. (2015). A global
reference for human genetic variation. Nature , 526 (7571),
68–74. https://doi.org/10.1038/nature15393
Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova,
A., Bork, P., Kondrashov, A. S., & Sunyaev, S. R. (2010). A method and
server for predicting damaging missense mutations. Nature
methods , 7 (4), 248–249. https://doi.org/10.1038/nmeth0410-248
Chunn, L. M., Nefcy, D. C., Scouten, R. W., Tarpey, R. P., Chauhan, G.,
Lim, M. S., Elenitoba-Johnson, K., Schwartz, S. A., & Kiel, M. J.
(2020). Mastermind: A Comprehensive Genomic Association Search Engine
for Empirical Evidence Curation and Genetic Variant Interpretation.Frontiers in genetics , 11 , 577152.
https://doi.org/10.3389/fgene.2020.577152
Cunningham, F., Moore, B., Ruiz-Schultz, N., Ritchie, G. R., & Eilbeck,
K. (2015). Improving the Sequence Ontology terminology for genomic
variant annotation. Journal of biomedical semantics , 6 ,
32. https://doi.org/10.1186/s13326-015-0030-4
Format:den Dunnen, J. T., Dalgleish, R., Maglott, D. R., Hart, R. K.,
Greenblatt, M. S., McGowan-Jordan, J., Roux, A. F., Smith, T.,
Antonarakis, S. E., & Taschner, P. E. (2016). HGVS Recommendations for
the Description of Sequence Variants: 2016 Update. Human
mutation , 37 (6), 564–569. https://doi.org/10.1002/humu.22981
Eilbeck, K., Lewis, S. E., Mungall, C. J., Yandell, M., Stein, L.,
Durbin, R., & Ashburner, M. (2005). The Sequence Ontology: a tool for
the unification of genome annotations. Genome biology ,6 (5), R44. https://doi.org/10.1186/gb-2005-6-5-r44
Frankish, A., Diekhans, M., Jungreis, I., Lagarde, J., Loveland, J. E.,
Mudge, J. M., Sisu, C., Wright, J. C., Armstrong, J., Barnes, I., Berry,
A., Bignell, A., Boix, C., Carbonell Sala, S., Cunningham, F., Di
Domenico, T., Donaldson, S., Fiddes, I. T., García Girón, C., Gonzalez,
J. M., … Flicek, P. (2021). GENCODE 2021. Nucleic acids
research , 49 (D1), D916–D923.
https://doi.org/10.1093/nar/gkaa1087
Howe, K. L., Achuthan, P., Allen, J., Allen, J., Alvarez-Jarreta, J.,
Amode, M. R., Armean, I. M., Azov, A. G., Bennett, R., Bhai, J., Billis,
K., Boddu, S., Charkhchi, M., Cummins, C., Da Rin Fioretto, L.,
Davidson, C., Dodiya, K., El Houdaigui, B., Fatima, R., Gall, A.,
… Flicek, P. (2021). Ensembl 2021. Nucleic acids research ,49 (D1), D884–D891. https://doi.org/10.1093/nar/gkaa942
Hunt, S. E., McLaren, W., Gil, L., Thormann, A., Schuilenburg, H.,
Sheppard, D., Parton, A., Armean, I. M., Trevanion, S. J., Flicek, P.,
& Cunningham, F. (2018). Ensembl variation resources. Database :
the journal of biological databases and curation , 2018 , bay119.
https://doi.org/10.1093/database/bay119
Jaganathan, K., Kyriazopoulou Panagiotopoulou, S., McRae, J. F.,
Darbandi, S. F., Knowles, D., Li, Y. I., Kosmicki, J. A., Arbelaez, J.,
Cui, W., Schwartz, G. B., Chow, E. D., Kanterakis, E., Gao, H., Kia, A.,
Batzoglou, S., Sanders, S. J., & Farh, K. K. (2019). Predicting
Splicing from Primary Sequence with Deep Learning. Cell ,176 (3), 535–548.e24. https://doi.org/10.1016/j.cell.2018.12.015
Jones, P., Binns, D., Chang, H. Y., Fraser, M., Li, W., McAnulla, C.,
McWilliam, H., Maslen, J., Mitchell, A., Nuka, G., Pesseat, S., Quinn,
A. F., Sangrador-Vegas, A., Scheremetjew, M., Yong, S. Y., Lopez, R., &
Hunter, S. (2014). InterProScan 5: genome-scale protein function
classification. Bioinformatics (Oxford, England) , 30 (9),
1236–1240. https://doi.org/10.1093/bioinformatics/btu031
Karczewski, K. J., Francioli, L. C., Tiao, G., Cummings, B. B., Alföldi,
J., Wang, Q., Collins, R. L., Laricchia, K. M., Ganna, A., Birnbaum, D.
P., Gauthier, L. D., Brand, H., Solomonson, M., Watts, N. A., Rhodes,
D., Singer-Berk, M., England, E. M., Seaby, E. G., Kosmicki, J. A.,
Walters, R. K., … MacArthur, D. G. (2020). The mutational
constraint spectrum quantified from variation in 141,456 humans.Nature , 581 (7809), 434–443.
https://doi.org/10.1038/s41586-020-2308-7
Kumar, P., Henikoff, S., & Ng, P. C. (2009). Predicting the effects of
coding non-synonymous variants on protein function using the SIFT
algorithm. Nature protocols , 4 (7), 1073–1081.
https://doi.org/10.1038/nprot.2009.86
Liu, X., Li, C., Mou, C., Dong, Y., & Tu, Y. (2020). dbNSFP v4: a
comprehensive database of transcript-specific functional predictions and
annotations for human nonsynonymous and splice-site SNVs. Genome
medicine , 12 (1), 103. https://doi.org/10.1186/s13073-020-00803-9
McLaren, W., Gil, L., Hunt, S. E., Riat, H. S., Ritchie, G. R.,
Thormann, A., Flicek, P., & Cunningham, F. (2016). The Ensembl Variant
Effect Predictor. Genome biology , 17 (1), 122.
https://doi.org/10.1186/s13059-016-0974-4
O’Leary, N. A., Wright, M. W., Brister, J. R., Ciufo, S., Haddad, D.,
McVeigh, R., Rajput, B., Robbertse, B., Smith-White, B., Ako-Adjei, D.,
Astashyn, A., Badretdin, A., Bao, Y., Blinkova, O., Brover, V.,
Chetvernin, V., Choi, J., Cox, E., Ermolaeva, O., Farrell, C. M.,
… Pruitt, K. D. (2016). Reference sequence (RefSeq) database at
NCBI: current status, taxonomic expansion, and functional annotation.Nucleic acids research , 44 (D1), D733–D745.
https://doi.org/10.1093/nar/gkv1189
Piñero, J., Ramírez-Anguita, J. M., Saüch-Pitarch, J., Ronzano, F.,
Centeno, E., Sanz, F., & Furlong, L. I. (2020). The DisGeNET knowledge
platform for disease genomics: 2019 update. Nucleic acids
research , 48 (D1), D845–D855.
https://doi.org/10.1093/nar/gkz1021
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J., & Kircher, M.
(2019). CADD: predicting the deleteriousness of variants throughout the
human genome. Nucleic acids research , 47 (D1), D886–D894.
https://doi.org/10.1093/nar/gky1016
Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J.,
Grody, W. W., Hegde, M., Lyon, E., Spector, E., Voelkerding, K., Rehm,
H. L., & ACMG Laboratory Quality Assurance Committee (2015). Standards
and guidelines for the interpretation of sequence variants: a joint
consensus recommendation of the American College of Medical Genetics and
Genomics and the Association for Molecular Pathology. Genetics in
medicine : official journal of the American College of Medical
Genetics , 17 (5), 405–424. https://doi.org/10.1038/gim.2015.30
Rodriguez, J. M., Rodriguez-Rivas, J., Di Domenico, T., Vázquez, J.,
Valencia, A., & Tress, M. L. (2018). APPRIS 2017: principal isoforms
for multiple gene sets. Nucleic acids research , 46 (D1),
D213–D217. https://doi.org/10.1093/nar/gkx997
Ward, A. J., & Cooper, T. A. (2010). The pathobiology of splicing.The Journal of pathology , 220 (2), 152–163.
https://doi.org/10.1002/path.2649
Yeo, G., & Burge, C. B. (2004). Maximum entropy modeling of short
sequence motifs with applications to RNA splicing signals. Journal
of computational biology: a journal of computational molecular cell
biology , 11 (2-3), 377–394.
https://doi.org/10.1089/1066527041410418
Zhang, F., & Lupski, J. R. (2015). Non-coding genetic variants in human
disease. Human molecular genetics , 24 (R1), R102–R110.
https://doi.org/10.1093/hmg/ddv259
Zerbino, D. R., Wilder, S. P., Johnson, N., Juettemann, T., & Flicek,
P. R. (2015). The ensembl regulatory build. Genome biology ,16 (1), 56. https://doi.org/10.1186/s13059-015-0621-5
Figure legends
Figure 1. The Ensembl VEP web interface showing species/assembly
selection, data input, transcript set selection and additional groups of
configuration options.
Figure 2. The ‘Identifiers’ section which allows the selection of gene,
protein and HGVS identifiers.
Figure 3. The ‘Variants and frequency data’ section which allows the
selection of information known about variants at the same location.
Figure 4. The ‘Additional annotations’ section which allows the
selection of transcript, protein domain, regulatory region and phenotype
annotations.
Figure 5. The ‘Predictions’ section, which allows the selection of
different pathogenicity, splicing and conservation predictions.
Figure 6. Filtering and advanced options
Figure 7. The results page with summary statistics and options for
filtering and downloading the results table.
Figure 8. The results table showing predicted molecular consequences and
links to the location and overlapping genes and variant displays within
the Ensembl genome browser.
Conflicts of interest statement
Paul Flicek is a member of the scientific advisory boards of Fabric
Genomics, Inc., and Eagle Genomics, Ltd.
Data Availability Statement
No new data were created or analysed in this study.
Publicly available data is integrated into the Ensembl variation
resources. Reference data packaged for use in Ensembl VEP is available
from our FTP site in release-specific directories for example:
http://ftp.ensembl.org/pub/release-103/variation/vep/.
The Ensembl VEP command line tool is available from
https://github.com/Ensembl/ensembl-vep
The Ensembl VEP plugins are available from
https://github.com/Ensembl/VEP_plugins
Ensembl VEP plugins are created to integrate datasets with
redistribution restrictions. These plugins contain full instructions for
data collection and formatting. We have here described the use of the
following data sets via plugins:
CADD ( https://cadd.gs.washington.edu/download)
dbNSFP (ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFP4.2a.zip)
dbscSNV
(https://drive.google.com/file/d/0B60wROKy6OqcQ0IyYnh5bmdHMW8/view)
DisGeNET (https://www.disgenet.org/downloads)
Mastermind (https://www.genomenon.com/cvr/)
SpliceAI (https://pypi.org/project/spliceai/)