Probability
Using rBS, one can calculate the probability that a participant belongs to state A (e.g., guilty/knowledgeable) or state B (e.g., innocent/unknowledgeable), given the data. This is quite simple and intuitive to do, for instance, by counting the number of repetitions that fall above or below a given threshold and dividing by the total number of repetitions conducted. For example, 95/100 (or 95%) of Participant 19’s rBS repetitions exceeded the 85% threshold, meaning that 95 times out of 100 Participant 19 would have been diagnosed as “guilty” using the 100 iterations bootstrap test. This is an informative description of the consensus in the results of a diagnostic assessment. Although it is expected that the rate of consensus or agreement across repetitions be quite high, we note that some rate of “disagreement” may be acceptable under specific circumstances. For instance, a lower rate of agreement may be acceptable if the consequences associated with a false positive decision are low. This kind of risk analysis is often overlooked in the CIT literature, and we hope to instigate discussion by drawing more attention to diagnostic precision and consensus across re-sampling or re-testing.
Ultimately, when describing Participant 19 in the context of all available results, we can interpret the data in the following way: “We cannot conclude with 95% statistical confidence that participant 19 recognized information relevant to the crime at-hand. We therefore advise they be labeled “indeterminate” and re-tested using additional crime-relevant information, if available. However, if re-testing is not feasible and a decision must be rendered immediately, it should be noted that the threshold for a “guilty/knowledgeable” decision was satisfied in 95% of the conducted bootstrap repetitions. This may be sufficient for a “guilty/knowledgeable” assessment, assuming the risk associated with a false positive classification is reasonably low. However, if a classification must be made and a judgment rendered “beyond the shadow of a doubt”, then an “innocent/unknowledgeable” decision is preferred.”
In this modified approach, we acknowledge that a diagnosis may change as a function of both the available data and the context in which the individual is being evaluated. We find this nuanced approach to be more informative and potentially useful in application compared to a single summary statistic supplied by the typical 100 (or even 10,000) iterations bootstrap test. Applying this logic to the remainder of the participants in Table 2, results in the following classification table (Table 3):