Implications for researchers and practitioners
Diagnostic analyses are notoriously difficult, and deriving accurate and representative diagnostic metrics is important for both research purposes and an individual’s life (e.g., health, penal consequences, social standing/reputation, etc). Bootstrapping is a useful tool for evaluating the precision of a diagnostic test without the cost associated with repeating the physical test multiple times per patient. We have found that although the 100 iterations bootstrap test yielded statistically reliable classifications for the majority of our sample, it provided inconsistent diagnostic results in approximately a quarter of our participants (n = 19/81 or 23%), particularly when a participant’s score occurred near the diagnostic cutpoint. For this reason, increasing the number of iterations performed (e.g., to 10,000) is advisable. However, we note that increasing the number of iterations performed does not guarantee reliable classification, nor does it facilitate the reporting of the intraindividual variability of the BSITER score itself - and thus the stability or reliability of a participant’s classification. We believe that describing the intraindividual variability of the BSITER score (or any diagnostic metric) is important to understanding and justifying classification decisions. Therefore, we suggest that researchers and practitioners evaluate the variability of a patient’s BSITER score and integrate and describe this variability when providing a diagnosis. We discuss various examples of how researchers and practitioners might do this in the following section.