FIGURE LEGENDS
Figure 1. a) Pie chart showing the composition of the ex-VUS
set by variant consequence type. b) Receiver Operating Characteristic
(ROC) curves for our initial Random Forest, Support Vector Machine
(SVM), and Multilayer Perceptron (MLP) based models, and CADD, REVEL,
and PolyPhen. Our models are drawn with thicker lines. c) Bar graph
showing the change in the Area Under the Curve (ΔAUC) for each variant
consequence type in the ex-VUS sample, comparing the models developed
before and after including CADD as a feature for model training. d)
Receiver Operating Characteristic (ROC) curves for Random Forest,
Support Vector Machine (SVM), and Multilayer Perceptron (MLP) based
models trained including CADD phred score as a feature, and CADD, REVEL,
and PolyPhen. Our models are shown with thicker lines.
Figure 2. Receiver Operating Characteristic (ROC) curves for
the Random Forest (RF), Support Vector Machine (SVM), and Multilayer
Perceptron (MLP) models trained including CADD phred score as a feature.
Curves for our models are shown in thicker lines along with benchmarked
scores for a) missense , b)splice , c) synonymous , d)non-coding mRNA , e) intron , f), coding INDEL ,
g)intergenic , h) other variant types, and i) all variant
consequence types.
Figure 3. Distribution of values of 1-SIFT, PolyPhen, and Revel
scores for Benign AND Pathogenic variants, and variants of Uncertain
Significance. For the Variants of Uncertain Significance, the thresholds
for each of the ACMG categories are displayed.
Figure 4. Distribution of probability of pathogenicity values
for variants currently classified as Variants of Uncertain Significance
on ClinVar for a) the Random Forest (RF) based model, b) the Support
Vector Machine (SVM) based model, and c) the Multilayer Perceptron (MLP)
based model. Additionally, d) shows a pie chart of the predictions of
the RF based model for the current VUS on ClinVar.