Implications for researchers and practitioners
Diagnostic analyses are notoriously difficult, and deriving accurate and
representative diagnostic metrics is important for both research
purposes and an individual’s life (e.g., health, penal consequences,
social standing/reputation, etc). Bootstrapping is a useful tool for
evaluating the precision of a diagnostic test without the cost
associated with repeating the physical test multiple times per patient.
We have found that although the 100 iterations bootstrap test yielded
statistically reliable classifications for the majority of our sample,
it provided inconsistent diagnostic results in approximately a quarter
of our participants (n = 19/81 or 23%), particularly when a
participant’s score occurred near the diagnostic cutpoint. For this
reason, increasing the number of iterations performed (e.g., to 10,000)
is advisable. However, we note that increasing the number of iterations
performed does not guarantee reliable classification, nor does it
facilitate the reporting of the intraindividual variability of the
BSITER score itself - and thus the stability or reliability of a
participant’s classification. We believe that describing the
intraindividual variability of the BSITER score (or any diagnostic
metric) is important to understanding and justifying classification
decisions. Therefore, we suggest that researchers and practitioners
evaluate the variability of a patient’s BSITER score and integrate and
describe this variability when providing a diagnosis. We discuss various
examples of how researchers and practitioners might do this in the
following section.