DISCUSSION

The analysis presented in the preceding pages represents the application of a novel ML-based method for predicting premature births by using administrative birth data. The results demonstrated that the performance of the DNN classifier trained based on various infant and parental characteristics from administrative data can produce a decent level of performance that outweighs the performance of the OLS classifier trained by the same dataset. Although there are some critical missing factors of birth prematurity, such as genetic information, the findings imply that the combination of ML and administrative birth data may have the potential to identify mothers with high risk and care needs, who need to pay attention to their birth outcomes, which can also improve the effectiveness of public programs providing PCHV services.
From a theoretical point of view, this study is one of the early attempts to confirm the utility of administrative data in predicting premature births. Studies that use administrative birth data for the purpose of predicting birth outcomes have been rare in this area, despite the explosive growth in the amount of data in the public sector and the scientific community. Moreover, as mentioned before, the findings also imply that the combination of ML and administrative data may have the potential to improve the effectiveness of PCHV programs by helping identify mothers with the most need, even though the overall predictive performance of the models should be further improved to allow a more proactive approach in promoting PCHV programs. Therefore, research to develop a model with a higher prediction precision would be necessary. Future research should also make efforts to test with more recent learning algorithms and look for ways to accumulate more data that help improve prediction.
While the findings are encouraging, it is important to also recognize that the predictive approach has the potential risk of systematically excluding some populations that actually demand prenatal care services. The population-based classification and predictive approach compared in this study are all based upon a technique for drawing a boundary dividing potential beneficiaries and other individuals that will not be considered for the program. The only differences between them are whether the model draws on a statistical or ML technique and how many variables are included in the classification. These efforts are expected to allow identifying individuals with the highest need. However, as mentioned before, a downside of these approaches would be that there might be certain individuals or populations left behind because perfect predictive models do not exist and some variables might be missing. An agency that attempts to use a predictive approach for promoting a PCHV program should acknowledge this situation and try to mitigate it with ancillary measures and more traditional strategies with the goal of also including individuals not identified through the predictive models.