Summary of study concept and major results
This paper aimed to carefully evaluate the 100 iterations bootstrap
test, which is a statistical resampling technique commonly used in the
P300 concealed information detection literature. Although the accuracy
of the 100 iterations test is debated, Rosenfeld et al., (2017b) argued
that the test is equally reliable to its more rigorous 1,000 and 10,000
iterations peers. Although these results are compelling, they do not
rigorously describe the precision of the 100 iterations test, which we
aimed to do here. Thus, the current report is the first and only to
describe and evaluate the precision of the classification results of the
100 iterations bootstrap test.
We felt the most intuitive way to interrogate the precision of the 100
iteration bootstrap test was to simply repeat the test many times.
Therefore, we repeated the 100 iterations test 100 times using a
technique we call the repeated bootstrap (i.e., rBS), which is the
mathematical equivalent to using 10,000 iterations. In this paper, we
used the rBS technique on amplitude values of the P300 ERP from a
concealed information test (i.e., the CTP) - critically, the same
knowledgeable/guilty participants analyzed in Rosenfeld et al., (2017b).
In diagnosing a participant as knowledgeable/guilty or
unknowledgeable/innocent, the bootstrap iteration (i.e., BSITER) score
is compared to a threshold to determine the diagnosis. In our analyses,
we used data from two CTP studies: one using semantic or highly salient
information like birth dates, and another using less salient, episodic
information like knowledge acquired during committing a mock crime. In
the semantic group, we found that, in rare cases, an individual’s BSITER
score may vary by +/-8 %, and the episodic group was nearly twice as
variable at +/-15%. This result was not surprising given that highly
rehearsed semantic information is more memorable than episodic
information, so P300s are generally smaller and more variable from trial
to trial in the episodic protocol (Olson et al., 2020). Fortunately, we
find it unlikely that the variability observed would affect the
diagnosis of individuals so long as their BSITER score was quite large
(i.e., >= 95%). However, for participants with lower
scores (e.g., near the diagnostic threshold) this variability could
produce unreliable or inconsistent diagnoses. The intraindividual
variability of BSITER scores has never previously been analyzed or even
reported, to our knowledge, and based on these results we suggest
researchers use rBS (or a similar technique) to report the variability
of the BSITER score within a subject, as we have done here.