Selection of variants for model training and testing
From ClinVar, we selected 82,463 variants which had at least two quality stars (i.e. assertion criteria available, multiple submitters, and no conflicts in the interpretation) and were not classified as VUS in the ClinVar database, version 08/03/2020. We sampled the ClinVar database versions from 06/15/2017, 12/03/2017, 06/03/2018, 12/02/2018, 06/03/2019, 12/06/2019, to look for variants that were classified as VUS on those dates, but had been reclassified on any of the four remaining categories (pathogenic , likely pathogenic , likely benign , or benign ) and were included on the group of 82,463 variants, finding 5,537 variants that were reserved for further benchmarking as the ex-VUS set. To increase predictive power by including more variants for training, and ease the interpretability of the results, Bening and Likely Benign variants were merged into a unique Benign label, and Pathogenic andLikely Pathogenic variants were merged into a uniquePathogenic label.