Probability
Using rBS, one can calculate the probability that a participant belongs
to state A (e.g., guilty/knowledgeable) or state B (e.g.,
innocent/unknowledgeable), given the data. This is quite simple and
intuitive to do, for instance, by counting the number of repetitions
that fall above or below a given threshold and dividing by the total
number of repetitions conducted. For example, 95/100 (or 95%) of
Participant 19’s rBS repetitions exceeded the 85% threshold, meaning
that 95 times out of 100 Participant 19 would have been diagnosed as
“guilty” using the 100 iterations bootstrap test. This is an
informative description of the consensus in the results of a diagnostic
assessment. Although it is expected that the rate of consensus or
agreement across repetitions be quite high, we note that some rate of
“disagreement” may be acceptable under specific circumstances. For
instance, a lower rate of agreement may be acceptable if the
consequences associated with a false positive decision are low. This
kind of risk analysis is often overlooked in the CIT literature, and we
hope to instigate discussion by drawing more attention to diagnostic
precision and consensus across re-sampling or re-testing.
Ultimately, when describing Participant 19 in the context of all
available results, we can interpret the data in the following way: “We
cannot conclude with 95% statistical confidence that participant 19
recognized information relevant to the crime at-hand. We therefore
advise they be labeled “indeterminate” and re-tested using additional
crime-relevant information, if available. However, if re-testing is not
feasible and a decision must be rendered immediately, it should be noted
that the threshold for a “guilty/knowledgeable” decision was satisfied
in 95% of the conducted bootstrap repetitions. This may be sufficient
for a “guilty/knowledgeable” assessment, assuming the risk associated
with a false positive classification is reasonably low. However, if a
classification must be made and a judgment rendered “beyond the shadow
of a doubt”, then an “innocent/unknowledgeable” decision is
preferred.”
In this modified approach, we acknowledge that a diagnosis may change as
a function of both the available data and the context in which the
individual is being evaluated. We find this nuanced approach to be more
informative and potentially useful in application compared to a single
summary statistic supplied by the typical 100 (or even 10,000)
iterations bootstrap test. Applying this logic to the remainder of the
participants in Table 2, results in the following classification table
(Table 3):