FIGURE LEGENDS
Figure 1. a) Pie chart showing the composition of the ex-VUS set by variant consequence type. b) Receiver Operating Characteristic (ROC) curves for our initial Random Forest, Support Vector Machine (SVM), and Multilayer Perceptron (MLP) based models, and CADD, REVEL, and PolyPhen. Our models are drawn with thicker lines. c) Bar graph showing the change in the Area Under the Curve (ΔAUC) for each variant consequence type in the ex-VUS sample, comparing the models developed before and after including CADD as a feature for model training. d) Receiver Operating Characteristic (ROC) curves for Random Forest, Support Vector Machine (SVM), and Multilayer Perceptron (MLP) based models trained including CADD phred score as a feature, and CADD, REVEL, and PolyPhen. Our models are shown with thicker lines.
Figure 2. Receiver Operating Characteristic (ROC) curves for the Random Forest (RF), Support Vector Machine (SVM), and Multilayer Perceptron (MLP) models trained including CADD phred score as a feature. Curves for our models are shown in thicker lines along with benchmarked scores for a) missense , b)splice , c) synonymous , d)non-coding mRNA , e) intron , f), coding INDEL , g)intergenic , h) other variant types, and i) all variant consequence types.
Figure 3. Distribution of values of 1-SIFT, PolyPhen, and Revel scores for Benign AND Pathogenic variants, and variants of Uncertain Significance. For the Variants of Uncertain Significance, the thresholds for each of the ACMG categories are displayed.
Figure 4. Distribution of probability of pathogenicity values for variants currently classified as Variants of Uncertain Significance on ClinVar for a) the Random Forest (RF) based model, b) the Support Vector Machine (SVM) based model, and c) the Multilayer Perceptron (MLP) based model. Additionally, d) shows a pie chart of the predictions of the RF based model for the current VUS on ClinVar.