Improved performance through ensembling with CADD
To overcome the shortcomings of the developed models on certain variant types (namely non-coding mRNA variants, coding INDELvariants, intergenic variants, and other types), and considering that ensemble approaches have shown increased performance (Ghosh, Oak, & Plon, 2017), we retrained the models using the CADD score as an additional feature. Compared to other variant deleteriousness prediction tools, CADD does not use clinical variants for its training, so using it avoids the circularity bias that would arise from using other tools like REVEL or PolyPhen in our training and testing with a ClinVar variant population. The models were trained in the same fashion than described above, tuning the hyperparameters with cross validation using a grid search approach. The overall performance of the models improved, as seen on Figure 1d. The RF and MLP based models yielded an AUROC of 0.98, and the SVM model of 0.97. As seen on Figure 1c, the highest improvement is seen on coding INDELs and intergenic variants, and a more modest increase in the AUROC for splice, non coding mRNA, and other variant types. Synonymous variants experienced a decrease in accuracy as measured by the AUROCs. A profiling analysis showed that virtually all synonymous variants are labelled as Benign, while virtually all Frameshift variants are labelled as Pathogenic, implying that the models assign the benign label to all synonymous, and the pathogenic label to all frameshift variants. For the variant consequence types analyzed, synonymous variants yield the lowest results on AUCs. Moreover, the current classification of non-VUS synonymous variants on ClinVar (99% are classified as Bening) is not matched by the CADD scores which predict a much higher number to be pathogenic (Supplementary Figure S2).
As shown in figure 2a, on the subset of missense variants, the AUC of the RF (0.97) outperforms the SVM (0.96) and the MLP (0.96), and all the benchmarked tools. For the splice type variants (Figure 2b), our RF yielded an AUC of 0.99; the SVM, of 0.97; and the MLP, of 0.98. CADD yielded an AUC of 0.95; our MLP an AUC of 0.96; and the ada score and AUC of 0.95. For synonymous variants (Figure 2c), the RF yielded an AUC of 0.79. Non coding exon type variants (Figure 2d), the AUCs are slightly lower. The RF yielded an AUC of 0.98, the SVM and the MLP, of 0.96 and 0.97, respectively, higher than CADD with 0.93. For the intron type variants (Figure 3e), the RF yielded an area of 0.92. In all cases except coding INDEL variants (Figure 2f), our models yield AUCs higher than CADD, and the RF based model gets the highest performance across the assessed variant types.