Disagreement about iterations
The number of iterations in a bootstrap test refers to the number (n) of
times we resample with replacement to calculate the bootstrap average
for each individual. The number of iterations can be anything ranging
from 10 to 10,000 or more, but generally speaking a larger number of
iterations produces more reliable results. Rosenfeld et al. (2017b)
provided evidence that a relatively small number of iterations (e.g.,
100 iterations) sufficiently yields accurate diagnoses when
bootstrapping the P300 ERP. The authors concluded that a smaller number
of iterations was effective because the P300 is a robust ERP with a
large effect size. However, they also noted that this may not be the
case in experiments using other ERPs with smaller effect sizes (e.g.,
N400). So, Rosenfeld et al., (2017b), suggests that the number of
iterations may vary across different applications and may not be “one
size fits all”. Understandably, concerns have been presented around
whether 100 iterations are truly sufficient for accurate diagnosis, even
when using P300 (e.g., Zoumpalaki et al., 2015). Although the evidence
provided in Rosenfeld et al., (2017b) was suggestive, large correlations
between low vs high iteration tests is not highly robust evidence of
adequate precision in diagnostic tests. In our view, the concern around
a low iteration bootstrap test regards the reliability and repeatability
of the classification results of the test, and the most direct way to
evaluate this is to simply repeat the test many times, producing many
diagnostic results per individual. Fortunately, repeating a
bootstrap-based test can be done easily and requires only an investment
in time and computation resources. Moreover, we believe that there are
several statistical, methodological and diagnostic advantages for
repeating a bootstrap test many times, which is the focus of this paper.