Selection of variants for model training and testing
From ClinVar, we selected 82,463 variants which had at least two quality
stars (i.e. assertion criteria available, multiple submitters, and no
conflicts in the interpretation) and were not classified as VUS in the
ClinVar database, version 08/03/2020. We sampled the ClinVar database
versions from 06/15/2017, 12/03/2017, 06/03/2018, 12/02/2018,
06/03/2019, 12/06/2019, to look for variants that were classified as VUS
on those dates, but had been reclassified on any of the four remaining
categories (pathogenic , likely pathogenic , likely
benign , or benign ) and were included on the group of 82,463
variants, finding 5,537 variants that were reserved for further
benchmarking as the ex-VUS set. To increase predictive power by
including more variants for training, and ease the interpretability of
the results, Bening and Likely Benign variants were merged
into a unique Benign label, and Pathogenic andLikely Pathogenic variants were merged into a uniquePathogenic label.