Score distributions for currently used tools
The distribution of values of the retrieved features, many of which are
currently used as deleteriousness/pathogenicity prediction scores, were
plotted for Benign and Pathogenic variants, as well as for Variants of
Uncertain Significance. Figure 3 shows the distribution of values for
three of the most commonly used tools, namely SIFT, PolyPhen and Revel.
Considering that the SIFT score assigns a 0 value to deleterious
variants, in contrast with the typical score value of 1 for
deleterious/pathogenic variants, its histogram was plotted using the
1-SIFT value to allow for easier comparison with the other tools. As
seen on Figure 3, 1-SIFT scores have a great proportion of values ≈ 1
for Benign variants, suggesting an overestimation of deleteriousness.
PolyPhen scores have values ≈ 1 for benign, and values ≈ 0 for
pathogenic variants as well. However, for VUS variants SIFT and PolyPhen
have pronounced distributions with peaks on the extreme values, while
the Revel scores have a less markedly bimodal distribution. An ideal
prediction score for VUS variants would classify them on two clear
clusters (in a similar way to PolyPhen) while avoiding classification
errors.