Improved performance through ensembling with CADD
To overcome the shortcomings of the developed models on certain variant
types (namely non-coding mRNA variants, coding INDELvariants, intergenic variants, and other types), and considering
that ensemble approaches have shown increased performance (Ghosh, Oak,
& Plon, 2017), we retrained the models using the CADD score as an
additional feature. Compared to other variant deleteriousness prediction
tools, CADD does not use clinical variants for its training, so using it
avoids the circularity bias that would arise from using other tools like
REVEL or PolyPhen in our training and testing with a ClinVar variant
population. The models were trained in the same fashion than described
above, tuning the hyperparameters with cross validation using a grid
search approach. The overall performance of the models improved, as seen
on Figure 1d. The RF and MLP based models yielded an AUROC of 0.98, and
the SVM model of 0.97. As seen on Figure 1c, the highest improvement is
seen on coding INDELs and intergenic variants, and a more modest
increase in the AUROC for splice, non coding mRNA, and other variant
types. Synonymous variants experienced a decrease in accuracy as
measured by the AUROCs. A profiling analysis showed that virtually all
synonymous variants are labelled as Benign, while virtually all
Frameshift variants are labelled as Pathogenic, implying that the models
assign the benign label to all synonymous, and the pathogenic label to
all frameshift variants. For the variant consequence types analyzed,
synonymous variants yield the lowest results on AUCs. Moreover, the
current classification of non-VUS synonymous variants on ClinVar (99%
are classified as Bening) is not matched by the CADD scores which
predict a much higher number to be pathogenic (Supplementary Figure S2).
As shown in figure 2a, on the subset of missense variants, the AUC of
the RF (0.97) outperforms the SVM (0.96) and the MLP (0.96), and all the
benchmarked tools. For the splice type variants (Figure 2b), our RF
yielded an AUC of 0.99; the SVM, of 0.97; and the MLP, of 0.98. CADD
yielded an AUC of 0.95; our MLP an AUC of 0.96; and the ada score and
AUC of 0.95. For synonymous variants (Figure 2c), the RF yielded an AUC
of 0.79. Non coding exon type variants (Figure 2d), the AUCs are
slightly lower. The RF yielded an AUC of 0.98, the SVM and the MLP, of
0.96 and 0.97, respectively, higher than CADD with 0.93. For the intron
type variants (Figure 3e), the RF yielded an area of 0.92. In all cases
except coding INDEL variants (Figure 2f), our models yield AUCs higher
than CADD, and the RF based model gets the highest performance across
the assessed variant types.