Introduction

After genome-wide sequencing, the majority of patients with rare disease (RD) remain without an identified genetic cause for their condition (Lee et al., 2014; Wright et al., 2018; Yang et al., 2013). Oftentimes, finding even one other unrelated patient with a similar phenotype and available genomic data can lead to the identification of a shared genetic cause (Bamshad et al., 2011). Given that RDs are just that, rare, for many disorders no one clinician will see two patients with the same unsolved RD. Instead, the “matching” patients (having similar manifestations with the same genetic cause) may be located across the globe.
A significant barrier to identifying matching patients has been the storage of clinical and genomic data in isolated silos. Furthermore, much of the clinical data is collected in unstructured, non-standardized formats, which impedes computation and sharing across groups. Data is shared by clinicians in numerous ways including case discussion at conferences and publication of case reports. However, given the rarity of the conditions being discussed, these means of sharing are ineffective and inefficient, which translates to delays in gene discovery and answers for patients.
In the field of genetics, data generation is occurring at unprecedented rates in both the clinical and research settings. The subsequent abundance of data in different siloes has led to an interpretation bottleneck and data sharing plays a crucial role in addressing this issue. Broad data sharing will lead to better data quality and interpretation, faster answers for patients, and rapid advancements in the field of genetics. There is consensus among professional societies, patients, and experts that responsible data sharing is imperative (ACMG Board of Directors, 2017; Bush et al., 2018; Darquy et al., 2016; Rehm, 2017). Consideration must be given to the social, societal, privacy, and policy challenges to implementing international data sharing responsibly, and these are being tackled by groups such as the Global Alliance for Genomics and Health (Rahimzadeh, Dyke, & Knoppers, 2016). There is no question that data sharing will be essential to solving the currently unsolved patients with rare diseases. Furthermore, it will also support better understanding of complex, non-Mendelian conditions and facilitate the shift toward personalized medicine.
To being to address the problem of responsible data sharing, in 2014 we launched PhenomeCentral, a web portal designed for the matchmaking of cases entered by clinicians and researchers working with rare diseases (Buske et al., 2015a). Since its entrance into the rare disease space, PhenomeCentral has been facilitating matches and the identification of second cases or case series leading to gene discovery and answers for families often experiencing a diagnostic odyssey. PhenomeCentral is a founding member of the MME (Philippakis et al., 2015), a collaborative effort to solve genetic disorders by building an international network of rare disease databases connected by a common application interface (API; Buske et al., 2015b). In this manuscript we describe the growth of PhenomeCentral over the past 7 years, improvements we have made for the portal, and goals of our future work.