loading page

Analyzing the Adoption of Database Management Systems Throughout the Life Cycle of Open Source Projects
  • +6
  • Raquel Maximino,
  • Camila A. Paiva,
  • Frederico Paiva,
  • João Felipe Pimentel,
  • Igor Wiese,
  • Marco Aurélio Gerosa,
  • Igor Steinmacher,
  • Leonardo Murta,
  • Vanessa Braganholo
Raquel Maximino
Universidade Federal Fluminense Instituto de Computacao
Author Profile
Camila A. Paiva
Universidade Federal Fluminense Instituto de Computacao
Author Profile
Frederico Paiva
Universidade Federal Fluminense Instituto de Computacao
Author Profile
João Felipe Pimentel
Universidade Federal Fluminense Instituto de Computacao
Author Profile
Igor Wiese
Universidade Tecnologica Federal do Parana - Campus Campo Mourao
Author Profile
Marco Aurélio Gerosa
Northern Arizona University
Author Profile
Igor Steinmacher
Northern Arizona University
Author Profile
Leonardo Murta
Universidade Federal Fluminense Instituto de Computacao
Author Profile
Vanessa Braganholo
Universidade Federal Fluminense Instituto de Computacao

Corresponding Author:[email protected]

Author Profile

Abstract

Database Management Systems (DBMSs) are largely used to store, retrieve, and manage the vast amounts of data that modern applications handle. There are various DBMSs available in the industry. While a few studies have examined the co-evolution of DBMSs and application source code, there is a research gap in examining the adoption of DBMSs in real systems. Knowing the most commonly used DBMSs, how frequently they are used together, and their patterns of replacement can assist project managers in making informed decisions about DBMS adoption. Therefore, we conducted a historical investigation of 317 popular open source end-user applications developed in Java and hosted on GitHub. We determined if these projects had, at any point, employed any of the top 50 DBMSs as ranked by DB-Engines. We observed that MySQL is the most utilized relational DBMS, succeeded by PostgreSQL and H2. Considering only non-relational DBMSs, Redis emerges as the predominant choice, with Cassandra trailing behind. Multi-model DBMSs are top-ranked in Infrastructure Management projects. Furthermore, we found different combinations of subsets of 11 DBMSs being used together at the beginning of the project life cycle (e.g., PostgreSQL and MySQL). Halfway through the project life cycle, we found combinations of 25 DBMSs being used together (e.g., MS SQL Server and Oracle). Finally, at the end of the life cycle, this number increases to 29 DBMSs (e.g., Redis and H2). We also investigated the replacements of DBMSs. We mined sequential patterns and discovered 20 situations where projects replaced DBMSs. For example, we could observe 11 replacements of PostgreSQL in 8 projects in our corpus, with MySQL being a dominant replacement choice, having superseded PostgreSQL in four instances. Conversely, no project switched from MySQL to PostgreSQL. In summary, our study offers insights into the patterns of DBMS adoption, co-use, and replacement tendencies.
26 Oct 2023Submitted to Software: Practice and Experience
26 Oct 2023Assigned to Editor
26 Oct 2023Submission Checks Completed
17 Nov 2023Review(s) Completed, Editorial Evaluation Pending
29 Jan 2024Reviewer(s) Assigned