DISCUSSION
The analysis presented in the preceding pages represents the application
of a novel ML-based method for predicting premature births by using
administrative birth data. The results demonstrated that the performance
of the DNN classifier trained based on various infant and parental
characteristics from administrative data can produce a decent level of
performance that outweighs the performance of the OLS classifier trained
by the same dataset. Although there are some critical missing factors of
birth prematurity, such as genetic information, the findings imply that
the combination of ML and administrative birth data may have the
potential to identify mothers with high risk and care needs, who need to
pay attention to their birth outcomes, which can also improve the
effectiveness of public programs providing PCHV services.
From a theoretical point of view, this study is one of the early
attempts to confirm the utility of administrative data in predicting
premature births. Studies that use administrative birth data for the
purpose of predicting birth outcomes have been rare in this area,
despite the explosive growth in the amount of data in the public sector
and the scientific community. Moreover, as mentioned before, the
findings also imply that the combination of ML and administrative data
may have the potential to improve the effectiveness of PCHV programs by
helping identify mothers with the most need, even though the overall
predictive performance of the models should be further improved to allow
a more proactive approach in promoting PCHV programs. Therefore,
research to develop a model with a higher prediction precision would be
necessary. Future research should also make efforts to test with more
recent learning algorithms and look for ways to accumulate more data
that help improve prediction.
While the findings are encouraging, it is important to also recognize
that the predictive approach has the potential risk of systematically
excluding some populations that actually demand prenatal care services.
The population-based classification and predictive approach compared in
this study are all based upon a technique for drawing a boundary
dividing potential beneficiaries and other individuals that will not be
considered for the program. The only differences between them are
whether the model draws on a statistical or ML technique and how many
variables are included in the classification. These efforts are expected
to allow identifying individuals with the highest need. However, as
mentioned before, a downside of these approaches would be that there
might be certain individuals or populations left behind because perfect
predictive models do not exist and some variables might be missing. An
agency that attempts to use a predictive approach for promoting a PCHV
program should acknowledge this situation and try to mitigate it with
ancillary measures and more traditional strategies with the goal of also
including individuals not identified through the predictive models.