Bioinformatic prediction of BP site abrogation
BP prediction tools (Table 1) have demonstrated poor specificity due to BP motif degeneracy combined with a lack of experimental data to train algorithms (Corvelo, Hallegger, Smith, & Eyras, 2010). BP characterization has lagged far behind that of 5’ and 3’ splice sites because of experimental difficulties in detecting BPs (Paggi & Bejerano, 2018). A large genome-wide dataset of experimentally confirmed BPs (Mercer et al., 2015) has been used to develop the BP prediction tools Branchpointer and LaBranchoR. Based on the Mercer dataset (Mercer et al., 2015), only ~18% of human 3’ splice sites have high confidence experimental BP annotations (Mercer et al., 2015; Paggi & Bejerano, 2018).
The Branchpointer BP annotations were used to attribute hundreds of clinically associated variants with changes in BP architecture, but the impact of these variants on splicing was largely uncharacterized (Signal et al., 2018). Other tools (SVM-BPfinder, BPP, RNABPS) are also available but, similar to Branchpointer and LaBranchoR, these are mainly for predicting the presence of a BP site. Namely, these tools were not designed to automatically identify spliceogenic variants, and require separate input of wild-type and variant intronic sequences for non-automated comparison of scores. Branchpointer also allows input of single nucleotide variants using rsIDs to evaluate separately the effect of reference and alternative variants on BPs (Signal et al., 2018). The use of R by Branchpointer, and python scripts by LaBranchoR and BPP, have also rendered these tools less accessible to non-bioinformatician users (Leman et al., 2020). HSF, an older and easy-to-use online splicing tool, can directly analyze an intronic variant to predict BP site abrogation; however, recent evaluations have revealed its poor performance in detecting experimentally verified BPs (Leman et al., 2020; Signal et al., 2018; Q. Zhang et al., 2017).
It is important to note that variants predicted to disrupt a BP do not necessarily induce aberrant splicing, as introns can have multiple functional BPs (Mercer et al., 2015), which adds to the complexity of predicting the spliceogenicity of a single variant in the BP window. Moreover, in the analysis of Leman et al. (2020), the use of score change to predict BP disruption by a variant was found to not be the best strategy to predict spliceogenic variants. According to Leman et al. (2020), the best approach would be to consider a variant as potentially spliceogenic if it is located in the BP motif regardless of score change. Performance of BPP, Branchpointer, HSF, LaBranchoR, RNABPS, and SVM-BPfinder was evaluated by checking the co-location of confirmed spliceogenic variants within predicted BP motifs, and revealed BPP as having the highest accuracy of 89.17% (Leman et al., 2020). In their positive control set of 38 spliceogenic variants, 32 variants were within BP motifs predicted by BPP, which predicted a total of 39 BP motifs (Leman et al., 2020).
Generally, the current BP prediction tools are useful in prioritizing candidate spliceogenic variants for downstream analysis through predicting their location in putative BP sites. Further, while variants reported to alter a BP site sequence generally lead to exon skipping, other types of splicing aberrations have been observed (Crotti et al., 2009; M. Li & Pritchard, 2000). Hence, the current BP prediction tools are not suitable for predicting a specific splicing effect.