2.4.7 | Neural-Network tolerance level and number of neurons in the hidden layer
We determined empirically the NN tolerance level (i.e. the number of simulations to be included in the NN training), and number of neurons in the hidden layer. Indeed, while the NN needs a substantial amount of simulations for training, there is also a risk of overfitting posterior parameter estimations when considering too large a number of neurons in the hidden layer. However, there are no absolute rules for choosing both numbers (Csilléry et al., 2012; Jay, Boitard, & Austerlitz, 2019).
Therefore, we tested four different tolerance levels to train the NN for parameter estimation (0.01, 0.05, 0.1, and 0.2), and a number of neurons ranging between four and seven (the number of free parameters in the winning scenarios, see Results ). For each pair of tolerance level and number of neurons, we conducted cross-validation with 1,000 randomly chosen simulated datasets in turn used as pseudo-observed data with the “cv4abc ” function in the package abc . We considered the median point-estimate of each posterior parameter\(\left({\hat{\theta}}_{i}\right)\) to be compared with the true parameter value used for simulation \(\left(\theta_{i}\right)\). The cross-validation parameter prediction error was then calculated across the 1,000 separate posterior estimations for pseudo-observed datasets for each pair of tolerance level and number of neurons, and for each parameter \(\theta_{i}\), as\(\frac{\sum_{1}^{1000}\left({\hat{\theta}}_{i}-\theta_{i}\right)^{2}}{\left(1000\times Variance\left(\theta_{i}\right)\right)}\), using the summary.cv4abc function in abc(Csilléry et al., 2012). Results showed that, a priori , all numbers of neurons considered performed very similarly for a given tolerance level (Supplementary Table S2 ). Furthermore, results showed that considering 1% closest simulations to the pseudo-observed ones reduced the average error for each tested number of neurons. Thus, we decided to opt for four neurons in the hidden layer and a 1% tolerance level for training the NN in all subsequent parameter inference, in order to avoid overfitting.