Semantic Memory
The results in Table 3 suggest that a knowledgeable/guilty individual tested on highly familiar, Semantic information with an estimated BSITER score of 95% may in fact present anywhere between 92.37% – 97.63%, due to random sampling. In very rare cases, a subject’s BSITER score may vary as much as +/-8% (87% – 100%). Although this level of variability is likely not concerning for participants with large BSITER scores (e.g., 95-100%), it could result in diagnostic complications for participants who scored near the 90% threshold, which we examine next.
To be confident that a participant’s BSITER score is reliably above or below a diagnostic threshold it is intuitive to expect that the 95% confidence interval for their score must not overlap with the threshold. If the 95% CI overlaps with the diagnostic threshold it follows that we cannot have at least 95% confidence in that participant’s diagnosis. Therefore, we tallied how many participants in the Semantic group had mean BSITER scores with 95% CIs that overlapped with the 90% diagnostic cutpoint.
Following this analysis, we noted that at least 95% confidence was obtained in the majority of Semantic participants’ diagnoses (41/52 or 78.85%). However, we did not have 95% statistical confidence in nearly a quarter of the sample’s BSITER scores (11/52 or 21.15%). Notably, all 11 participants had scores < 95%. Importantly, 7/11 of these participants also had scores >90%, meaning they were correctly classified as “guilty/knowledgeable” (true positives, since these subjects were knowledgeable of crime-relevant information), but the guilty/knowledgeable classification was not made with 95% statistical confidence.
To summarize, 100 iterations appears to yield consistent and reliable results for the majority of our Semantic sample; however, 100 iterations was insufficient to produce precise results in nearly a quarter of participants (21.15%). For these participants, conducting the 100 iterations test once (as is traditionally done) could result in a score either above or below the 90% threshold depending only upon chance resampling in the derivation of their BSITER scores. In other words, although these classifications were technically correct, the classification decisions were made absent the level of statistical rigor we and others advocate for in diagnostic psychophysiology. Consequently, we recommend A) increasing the number of iterations (e.g,. to 10,000) when deriving the BSITER score in order to reduce the variability of the score, and/or B) using the rBS test to calculate, report, and integrate the 95% CI for participant’s BSITER score during individual classification (we will explore two detailed examples of this in the Discussion).