Conclusion
The Ensembl VEP web tool enables the flexible configuration of variant analysis from an extensive range of options via a simple interface. It allows customisable filtering so you can interrogate and understand your results. It links out to detailed resources, both within the Ensembl browser and other key websites. The regular updating of the reference data and analysis tools supported within Ensembl VEP make it an essential tool for variant annotation, filtering and prioritisation.
Acknowledgments
We thank members of the Ensembl team for gene, regulatory and comparative genomics annotation, and web development. We thank previous team members, in particular William McLaren and Laurent Gil, for their contributions to Ensembl VEP. We also wish to thank the of EMBL-EBI’s technical services cluster for their support and the VEP community who have helped to improve Ensembl VEP by suggesting new functionality, giving feedback and bug reports.
References
1000 Genomes Project Consortium, Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., Korbel, J. O., Marchini, J. L., McCarthy, S., McVean, G. A., & Abecasis, G. R. (2015). A global reference for human genetic variation. Nature , 526 (7571), 68–74. https://doi.org/10.1038/nature15393
Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova, A., Bork, P., Kondrashov, A. S., & Sunyaev, S. R. (2010). A method and server for predicting damaging missense mutations. Nature methods , 7 (4), 248–249. https://doi.org/10.1038/nmeth0410-248
Chunn, L. M., Nefcy, D. C., Scouten, R. W., Tarpey, R. P., Chauhan, G., Lim, M. S., Elenitoba-Johnson, K., Schwartz, S. A., & Kiel, M. J. (2020). Mastermind: A Comprehensive Genomic Association Search Engine for Empirical Evidence Curation and Genetic Variant Interpretation.Frontiers in genetics , 11 , 577152. https://doi.org/10.3389/fgene.2020.577152
Cunningham, F., Moore, B., Ruiz-Schultz, N., Ritchie, G. R., & Eilbeck, K. (2015). Improving the Sequence Ontology terminology for genomic variant annotation. Journal of biomedical semantics , 6 , 32. https://doi.org/10.1186/s13326-015-0030-4
Format:den Dunnen, J. T., Dalgleish, R., Maglott, D. R., Hart, R. K., Greenblatt, M. S., McGowan-Jordan, J., Roux, A. F., Smith, T., Antonarakis, S. E., & Taschner, P. E. (2016). HGVS Recommendations for the Description of Sequence Variants: 2016 Update. Human mutation , 37 (6), 564–569. https://doi.org/10.1002/humu.22981
Eilbeck, K., Lewis, S. E., Mungall, C. J., Yandell, M., Stein, L., Durbin, R., & Ashburner, M. (2005). The Sequence Ontology: a tool for the unification of genome annotations. Genome biology ,6 (5), R44. https://doi.org/10.1186/gb-2005-6-5-r44
Frankish, A., Diekhans, M., Jungreis, I., Lagarde, J., Loveland, J. E., Mudge, J. M., Sisu, C., Wright, J. C., Armstrong, J., Barnes, I., Berry, A., Bignell, A., Boix, C., Carbonell Sala, S., Cunningham, F., Di Domenico, T., Donaldson, S., Fiddes, I. T., García Girón, C., Gonzalez, J. M., … Flicek, P. (2021). GENCODE 2021. Nucleic acids research , 49 (D1), D916–D923. https://doi.org/10.1093/nar/gkaa1087
Howe, K. L., Achuthan, P., Allen, J., Allen, J., Alvarez-Jarreta, J., Amode, M. R., Armean, I. M., Azov, A. G., Bennett, R., Bhai, J., Billis, K., Boddu, S., Charkhchi, M., Cummins, C., Da Rin Fioretto, L., Davidson, C., Dodiya, K., El Houdaigui, B., Fatima, R., Gall, A., … Flicek, P. (2021). Ensembl 2021. Nucleic acids research ,49 (D1), D884–D891. https://doi.org/10.1093/nar/gkaa942
Hunt, S. E., McLaren, W., Gil, L., Thormann, A., Schuilenburg, H., Sheppard, D., Parton, A., Armean, I. M., Trevanion, S. J., Flicek, P., & Cunningham, F. (2018). Ensembl variation resources. Database : the journal of biological databases and curation , 2018 , bay119. https://doi.org/10.1093/database/bay119
Jaganathan, K., Kyriazopoulou Panagiotopoulou, S., McRae, J. F., Darbandi, S. F., Knowles, D., Li, Y. I., Kosmicki, J. A., Arbelaez, J., Cui, W., Schwartz, G. B., Chow, E. D., Kanterakis, E., Gao, H., Kia, A., Batzoglou, S., Sanders, S. J., & Farh, K. K. (2019). Predicting Splicing from Primary Sequence with Deep Learning. Cell ,176 (3), 535–548.e24. https://doi.org/10.1016/j.cell.2018.12.015
Jones, P., Binns, D., Chang, H. Y., Fraser, M., Li, W., McAnulla, C., McWilliam, H., Maslen, J., Mitchell, A., Nuka, G., Pesseat, S., Quinn, A. F., Sangrador-Vegas, A., Scheremetjew, M., Yong, S. Y., Lopez, R., & Hunter, S. (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics (Oxford, England) , 30 (9), 1236–1240. https://doi.org/10.1093/bioinformatics/btu031
Karczewski, K. J., Francioli, L. C., Tiao, G., Cummings, B. B., Alföldi, J., Wang, Q., Collins, R. L., Laricchia, K. M., Ganna, A., Birnbaum, D. P., Gauthier, L. D., Brand, H., Solomonson, M., Watts, N. A., Rhodes, D., Singer-Berk, M., England, E. M., Seaby, E. G., Kosmicki, J. A., Walters, R. K., … MacArthur, D. G. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans.Nature , 581 (7809), 434–443. https://doi.org/10.1038/s41586-020-2308-7
Kumar, P., Henikoff, S., & Ng, P. C. (2009). Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature protocols , 4 (7), 1073–1081. https://doi.org/10.1038/nprot.2009.86
Liu, X., Li, C., Mou, C., Dong, Y., & Tu, Y. (2020). dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome medicine , 12 (1), 103. https://doi.org/10.1186/s13073-020-00803-9
McLaren, W., Gil, L., Hunt, S. E., Riat, H. S., Ritchie, G. R., Thormann, A., Flicek, P., & Cunningham, F. (2016). The Ensembl Variant Effect Predictor. Genome biology , 17 (1), 122. https://doi.org/10.1186/s13059-016-0974-4
O’Leary, N. A., Wright, M. W., Brister, J. R., Ciufo, S., Haddad, D., McVeigh, R., Rajput, B., Robbertse, B., Smith-White, B., Ako-Adjei, D., Astashyn, A., Badretdin, A., Bao, Y., Blinkova, O., Brover, V., Chetvernin, V., Choi, J., Cox, E., Ermolaeva, O., Farrell, C. M., … Pruitt, K. D. (2016). Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.Nucleic acids research , 44 (D1), D733–D745. https://doi.org/10.1093/nar/gkv1189
Piñero, J., Ramírez-Anguita, J. M., Saüch-Pitarch, J., Ronzano, F., Centeno, E., Sanz, F., & Furlong, L. I. (2020). The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic acids research , 48 (D1), D845–D855. https://doi.org/10.1093/nar/gkz1021
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J., & Kircher, M. (2019). CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic acids research , 47 (D1), D886–D894. https://doi.org/10.1093/nar/gky1016
Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., Grody, W. W., Hegde, M., Lyon, E., Spector, E., Voelkerding, K., Rehm, H. L., & ACMG Laboratory Quality Assurance Committee (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in medicine : official journal of the American College of Medical Genetics , 17 (5), 405–424. https://doi.org/10.1038/gim.2015.30
Rodriguez, J. M., Rodriguez-Rivas, J., Di Domenico, T., Vázquez, J., Valencia, A., & Tress, M. L. (2018). APPRIS 2017: principal isoforms for multiple gene sets. Nucleic acids research , 46 (D1), D213–D217. https://doi.org/10.1093/nar/gkx997
Ward, A. J., & Cooper, T. A. (2010). The pathobiology of splicing.The Journal of pathology , 220 (2), 152–163. https://doi.org/10.1002/path.2649
Yeo, G., & Burge, C. B. (2004). Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. Journal of computational biology: a journal of computational molecular cell biology , 11 (2-3), 377–394. https://doi.org/10.1089/1066527041410418
Zhang, F., & Lupski, J. R. (2015). Non-coding genetic variants in human disease. Human molecular genetics , 24 (R1), R102–R110. https://doi.org/10.1093/hmg/ddv259
Zerbino, D. R., Wilder, S. P., Johnson, N., Juettemann, T., & Flicek, P. R. (2015). The ensembl regulatory build. Genome biology ,16 (1), 56. https://doi.org/10.1186/s13059-015-0621-5
Figure legends
Figure 1. The Ensembl VEP web interface showing species/assembly selection, data input, transcript set selection and additional groups of configuration options.
Figure 2. The ‘Identifiers’ section which allows the selection of gene, protein and HGVS identifiers.
Figure 3. The ‘Variants and frequency data’ section which allows the selection of information known about variants at the same location.
Figure 4. The ‘Additional annotations’ section which allows the selection of transcript, protein domain, regulatory region and phenotype annotations.
Figure 5. The ‘Predictions’ section, which allows the selection of different pathogenicity, splicing and conservation predictions.
Figure 6. Filtering and advanced options
Figure 7. The results page with summary statistics and options for filtering and downloading the results table.
Figure 8. The results table showing predicted molecular consequences and links to the location and overlapping genes and variant displays within the Ensembl genome browser.
Conflicts of interest statement
Paul Flicek is a member of the scientific advisory boards of Fabric Genomics, Inc., and Eagle Genomics, Ltd.
Data Availability Statement
No new data were created or analysed in this study.
Publicly available data is integrated into the Ensembl variation resources. Reference data packaged for use in Ensembl VEP is available from our FTP site in release-specific directories for example: http://ftp.ensembl.org/pub/release-103/variation/vep/.
The Ensembl VEP command line tool is available from https://github.com/Ensembl/ensembl-vep
The Ensembl VEP plugins are available from https://github.com/Ensembl/VEP_plugins
Ensembl VEP plugins are created to integrate datasets with redistribution restrictions. These plugins contain full instructions for data collection and formatting. We have here described the use of the following data sets via plugins:
CADD ( https://cadd.gs.washington.edu/download)
dbNSFP (ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFP4.2a.zip)
dbscSNV (https://drive.google.com/file/d/0B60wROKy6OqcQ0IyYnh5bmdHMW8/view)
DisGeNET (https://www.disgenet.org/downloads)
Mastermind (https://www.genomenon.com/cvr/)
SpliceAI (https://pypi.org/project/spliceai/)