Bioinformatic analysis of exonic SREs
SRE predictions have poor specificity. There are several factors that
contribute to the complexity of SRE prediction, including the diverse
range of splicing regulatory motifs (Ke et al., 2011; X. H.-F. Zhang &
Chasin, 2004) and the context-dependence of their activity (Fu & Ares
Jr, 2014; Z. Wang & Burge, 2008). The surrounding sequences and their
location in the gene relative to the consensus splice sites
significantly impact their activity and usage. For instance, some ESS
motifs, including G runs, can promote splicing when located in an
intron (Z. Wang & Burge, 2008). Moreover, RNA secondary structure and
chromatin state may also influence SRE accessibility affecting its usage
(reviewed by (Fu & Ares Jr, 2014; Hnilicová & Staněk, 2011)).
There are already several datasets and prediction algorithms (Table 1)
that have been used to identify SREs or test if a variant can
potentially create or abolish SREs (reviewed by Grodecká, Buratti, and
Freiberger (2017)). However, experimental studies have shown that these
bioinformatic prediction tools have high false positive rates. For
example, one of the largest studies to date (Houdayer et al., 2012)
reported that predictions were confirmed for only 14% (15/108) ofBRCA1 and BRCA2 variants predicted to alter ESEs using a
combination of ESEfinder, RESCUE-ESE, PESE octamer, and HSF algorithms.
More recently, two studies have assessed both positive andnegative predictive values of selected bioinformatic tools to determine
variant effects on SREs. ΔtESRseq (using hexamer scores from Ke et al.
(2011)) and ΔHZEI were reported to perform better than
ΔΨ and EX-SKIP in analysis of 154 variants (including 50 spliceogenic)
from select exons from five genes (Soukarieh et al., 2016). The data
from this study led the authors to postulate that the predictive
performance of SRE-dedicated tools varies for different genes and exons
(Soukarieh et al., 2016). For example, sensitivity of ΔtESRseq ranged
from 67-100% and specificity from 66-97% depending on the gene and
exon (Soukarieh et al., 2016). In another evaluation of ΔtESRseq,
ΔHZEI, and EX-SKIP (Grodecká et al., 2017), analysis of
only 20 variants (10 spliceogenic) from four genes found that ΔtESRseq
had higher sensitivity (80%) but lower specificity (60%) compared to
ΔHZEI and EX-SKIP (both 70% sensitivity, 70%
specificity). However, given the sample sizes for these two studies
(Grodecká et al., 2017; Soukarieh et al., 2016), it is difficult to have
confidence in their assessment of comparative performance of
bioinformatic tools.