Discussion
Considering the substantial differences amongst diseases in terms of inheritance pattern, disease mechanism, phenotype, genetic and allelic heterogeneity, and prevalence, disease-specific guidelines are necessary for accurate and reliable interpretations (Rehm, Berg, & Plon, 2018). Following variant interpretation guidelines for genetic hearing loss (Oza et al., 2018) , we developed a new computational tool, named VIP-HL, publicly available through a web interface (http://hearing.genetics.bgi.com/). To our knowledge, this is the first tool designed for automated variant interpretation in genetic hearing loss. Considering the high prevalence of hearing loss in the population, the availability of VIP-HL will significantly relieve the interpretation burdens for clinicians and curators.
Compared to rules activated by ClinGen HL-EP, VIP-HL showed a markedly high concordance (96%), indicating the reliability of interpreting hearing loss variants via VIP-HL. Of note, all the three discrepant activations (variants #1-3, Table 1) were attributable to population-based rules (BA1 and PM2), which depends on the adoption of popmax filtering allele frequency in extensive population studies (Whiffin et al., 2017). The ClinGen HL-EP used the ExAC database in the time of their research whereas we employed a larger dataset (gnomAD) as it was encouraged by the ClinGen HL-EP (Oza et al., 2018). Using these stringent allele frequencies empowers clinical genome interpretation without the removal of true pathogenic variants (Whiffin et al., 2019).
VIP-HL activated several rules that were not activated by ClinGen HL-EP, including PM1 and BS2. ClinGen HL-EP did not perform a systematic review of mutational hot spots or functional domains for all genes associated with hearing loss, and proposed that PM1 can be applied for KCNQ4pore-forming region (Oza et al., 2018). In this study, we used the enrichment of pathogenic/likely pathogenic variants to construct a set of important regions (Xiang, Peng, et al., 2020) which includes theKCNQ4 pore-forming region. Additionally, although HL-EP did not elaborate on the cutoff for BS2, we used a conservative cutoff to automate this rule. It should be noted that the penetrance affects the application of BS2 but was not considered by VIP-HL. This led to activations of BS2 for NM_004004.6:c.109G>A and NM_004004.6:c.101T>C in the GJB2 gene because 50 and 16 homozygotes were identified from the gnomAD control dataset, respectively. The two variants were well-known pathogenic variants with low penetrance (Shen et al., 2019). Nevertheless, VIP-HL is a semi-automatic tool and our user interface enables curators to manually adjust codes to avoid such possible misclassifications.
A further comparison between VIP-HL and ClinVar showed an overall interpretation concordance of 88.0%. In terms of pathogenic/likely pathogenic variants, the concordance was lower (57.1%). This could be explained that VIP-HL only automated 13 out of 24 ACMG/AMP rules. The nine case-level and segregation evidence and two functional evidence required manual curation from scientific literature. Of them, pathogenic rules are more frequently activated than benign rules (Oza et al., 2018). Prospectively, text-mining and machine learning techniques might serve as potential solutions. For example, Birgmeier and co-authors developed an end-to-end machine learning tool, named AVADA, for the automatic retrieval of variant evidence directly from full-text literature (Birgmeier et al., 2020). Suppose we can accumulate enormous datasets of evidence-related sentences or figures, in that case, it is possible to apply machine-learning approaches in the future for evidence retrieval and to automate the remaining ACMG/AMP rules in the next version of VIP-HL. In the meantime, our interface enables curators to manually activate the relevant codes after manual literature curation.
VIP-HL generated three P/LP classifications versus B/LB compared to ClinVar. All the three variants were related to the consideration of splicing impact. This discrepancy of NM_153676.3:c.2547-1G>T was attributable to a lack of considerations of exon expression data, which ultimately led to inappropriate classifications. It is apparent that a splicing variant affecting a non-expressive exon should have less functional effects (DiStefano et al., 2018). Recently, the transcript-level information from the GTEx project (Consortium, 2017) was utilized and proved that incorporating exon expression data can improve interpretations of putative loss-of-function variants (Cummings et al., 2020). The second and third variants (NM_206933.3:c.949C>A and NM_022124.6:c.7362G>A) were synonymous variants, and their splicing impact should be curated from public literature if available. Nevertheless, these results indicated the importance of expression data in variant interpretation.
To improve user experience and further facilitate variation interpretation via VIP-HL, we developed a user-friendly web interface, which we continue to grow and add useful features over time. For example, PM3, one of the most frequently activated rules in genetic hearing loss (Oza et al., 2018), relies on the variant’s pathogenicity on the second allele. If this latter variant is introduced (in HGVS nomenclature) during the curation of PM3, VIP-HL can now provide the pathogenicity of this second variant as a reference for users. We expect such features and ongoing improvements would save curators the time and relieve the burden of variant interpretation.
VIP-HL has limitations. First, it is currently not applicable for exon-level copy number variations. Second, the allele frequency cutoffs were different for dominant and recessive hearing loss disorders. We first applied the cutoffs from the inheritance curated by ClinGen HL-EP for variants in a gene with both dominant and recessive inheritance. If both were available, we conservatively chose the cutoffs in recessive disorders. To avoid users falling into this pitfall, we highlighted the selected inheritance in the web interface of VIP-HL.
In conclusion, VIP-HL is an integrated online tool and search engine for variants in genetic hearing loss genes. It is also the first tool, to our knowledge, to consider the specifications proposed by ClinGen HL-EP for genetic hearing loss related variants. Providing reliable and reproducible annotations, VIP-HL not only facilitates variant interpretation but also provides a platform for users to share classifications with others.
Data Availability Statement:Data sharing is not applicable to this article as no new data were created or analyzed in this study.
Conflict of Interest: Jiguang Peng, Jiale Xiang, Xiangqian Jin, Junhua Meng Lisha Chen, Nana Song, and Zhiyu Peng were employed at BGI Genomics at the time of submission. No other conflicts relevant to this study should be reported.