Semantic Memory
The results in Table 3 suggest that a knowledgeable/guilty individual
tested on highly familiar, Semantic information with an estimated BSITER
score of 95% may in fact present anywhere between 92.37% – 97.63%,
due to random sampling. In very rare cases, a subject’s BSITER score may
vary as much as +/-8% (87% – 100%). Although this level of
variability is likely not concerning for participants with large BSITER
scores (e.g., 95-100%), it could result in diagnostic complications for
participants who scored near the 90% threshold, which we examine next.
To be confident that a participant’s BSITER score is reliably above or
below a diagnostic threshold it is intuitive to expect that the 95%
confidence interval for their score must not overlap with the threshold.
If the 95% CI overlaps with the diagnostic threshold it follows that we
cannot have at least 95% confidence in that participant’s diagnosis.
Therefore, we tallied how many participants in the Semantic group had
mean BSITER scores with 95% CIs that overlapped with the 90%
diagnostic cutpoint.
Following this analysis, we noted that at least 95% confidence was
obtained in the majority of Semantic participants’ diagnoses (41/52 or
78.85%). However, we did not have 95% statistical confidence in nearly
a quarter of the sample’s BSITER scores (11/52 or 21.15%). Notably, all
11 participants had scores < 95%. Importantly, 7/11 of these
participants also had scores >90%, meaning they were
correctly classified as “guilty/knowledgeable” (true positives, since
these subjects were knowledgeable of crime-relevant information), but
the guilty/knowledgeable classification was not made with 95%
statistical confidence.
To summarize, 100 iterations appears to yield consistent and reliable
results for the majority of our Semantic sample; however, 100 iterations
was insufficient to produce precise results in nearly a quarter of
participants (21.15%). For these participants, conducting the 100
iterations test once (as is traditionally done) could result in a score
either above or below the 90% threshold depending only upon chance
resampling in the derivation of their BSITER scores. In other words,
although these classifications were technically correct, the
classification decisions were made absent the level of statistical rigor
we and others advocate for in diagnostic psychophysiology. Consequently,
we recommend A) increasing the number of iterations (e.g,. to 10,000)
when deriving the BSITER score in order to reduce the variability of the
score, and/or B) using the rBS test to calculate, report, and integrate
the 95% CI for participant’s BSITER score during individual
classification (we will explore two detailed examples of this in the
Discussion).