2.4.7 | Neural-Network tolerance level and number of
neurons in the hidden layer
We determined empirically the NN tolerance level (i.e. the number of
simulations to be included in the NN training), and number of neurons in
the hidden layer. Indeed, while the NN needs a substantial amount of
simulations for training, there is also a risk of overfitting posterior
parameter estimations when considering too large a number of neurons in
the hidden layer. However, there are no absolute rules for choosing both
numbers (Csilléry et al., 2012;
Jay, Boitard, & Austerlitz, 2019).
Therefore, we tested four different tolerance levels to train the NN for
parameter estimation (0.01, 0.05, 0.1, and 0.2), and a number of neurons
ranging between four and seven (the number of free parameters in the
winning scenarios, see Results ). For each pair of tolerance
level and number of neurons, we conducted cross-validation with 1,000
randomly chosen simulated datasets in turn used as pseudo-observed data
with the “cv4abc ” function in the package abc . We
considered the median point-estimate of each posterior parameter\(\left({\hat{\theta}}_{i}\right)\) to be compared with the true
parameter value used for simulation \(\left(\theta_{i}\right)\). The
cross-validation parameter prediction error was then calculated across
the 1,000 separate posterior estimations for pseudo-observed datasets
for each pair of tolerance level and number of neurons, and for each
parameter \(\theta_{i}\), as\(\frac{\sum_{1}^{1000}\left({\hat{\theta}}_{i}-\theta_{i}\right)^{2}}{\left(1000\times Variance\left(\theta_{i}\right)\right)}\),
using the summary.cv4abc function in abc(Csilléry et al., 2012). Results showed
that, a priori , all numbers of neurons considered performed very
similarly for a given tolerance level (Supplementary Table S2 ).
Furthermore, results showed that considering 1% closest simulations to
the pseudo-observed ones reduced the average error for each tested
number of neurons. Thus, we decided to opt for four neurons in the
hidden layer and a 1% tolerance level for training the NN in all
subsequent parameter inference, in order to avoid overfitting.