Variant feature selection
To assess possible attributes for model training, we used 24 variant features, including splice site predictors, conservation scores, deleteriousness/pathogenicity scores, allele frequency, and consequence type, from the Ensembl Variant Effect Predictor (McLaren, et al., 2016; Zerbino, et al., 2018) ). Features with high Pearson correlation were depurated. Additionally, features with values coming from models trained with clinical significance data were discarded to avoid circularity biases on our model estimation phase. First, the features used for training were: ada score, codon degeneracy score, integrated fitness conservation score, BLOSUM62 score, Eigen score, phyloP score, Gerp score, SIFT score, the Loss of Function tool score, the allele frequencies from the 1000 human genomes project global dataset, and the variant consequence type codified as dummy binary variables. Clinical Significance was used as the label for training, and codified using1 for pathogenic, and 0 for benign . To correct for class unbalance (2/3 benign vs. 1/3 pathogenic variants) we randomly undersampled benign variants to equalize the number ofpathogenic variants. After testing for the models performance on the ex-VUS set, models were retrained with the procedure described before, adding the CADD phred score (retrieved from Ensembl Variant Effect Predictor) as a feature for the variants.