Summary of study concept and major results
This paper aimed to carefully evaluate the 100 iterations bootstrap test, which is a statistical resampling technique commonly used in the P300 concealed information detection literature. Although the accuracy of the 100 iterations test is debated, Rosenfeld et al., (2017b) argued that the test is equally reliable to its more rigorous 1,000 and 10,000 iterations peers. Although these results are compelling, they do not rigorously describe the precision of the 100 iterations test, which we aimed to do here. Thus, the current report is the first and only to describe and evaluate the precision of the classification results of the 100 iterations bootstrap test.
We felt the most intuitive way to interrogate the precision of the 100 iteration bootstrap test was to simply repeat the test many times. Therefore, we repeated the 100 iterations test 100 times using a technique we call the repeated bootstrap (i.e., rBS), which is the mathematical equivalent to using 10,000 iterations. In this paper, we used the rBS technique on amplitude values of the P300 ERP from a concealed information test (i.e., the CTP) - critically, the same knowledgeable/guilty participants analyzed in Rosenfeld et al., (2017b). In diagnosing a participant as knowledgeable/guilty or unknowledgeable/innocent, the bootstrap iteration (i.e., BSITER) score is compared to a threshold to determine the diagnosis. In our analyses, we used data from two CTP studies: one using semantic or highly salient information like birth dates, and another using less salient, episodic information like knowledge acquired during committing a mock crime. In the semantic group, we found that, in rare cases, an individual’s BSITER score may vary by +/-8 %, and the episodic group was nearly twice as variable at +/-15%. This result was not surprising given that highly rehearsed semantic information is more memorable than episodic information, so P300s are generally smaller and more variable from trial to trial in the episodic protocol (Olson et al., 2020). Fortunately, we find it unlikely that the variability observed would affect the diagnosis of individuals so long as their BSITER score was quite large (i.e., >= 95%). However, for participants with lower scores (e.g., near the diagnostic threshold) this variability could produce unreliable or inconsistent diagnoses. The intraindividual variability of BSITER scores has never previously been analyzed or even reported, to our knowledge, and based on these results we suggest researchers use rBS (or a similar technique) to report the variability of the BSITER score within a subject, as we have done here.