3.1 | Complex admixture scenarios cross-validation with RF-ABC
We trained the RF-ABC model-choice algorithm using 1,000 trees, which guaranteed the convergence of the model-choice prior error rates (Supplementary Figure S3 ). Based on this training, the complete out-of-bag cross-validation matrix showed that the nine competing scenarios of complex historical admixture could be relatively reasonably distinguished despite the high level of nestedness of the scenarios here considered (Figure 2 ). Indeed, we calculated an out-of-bag prior error rate of 32.41%, considering each 90,000 simulation, in turn, as out-of-bag pseudo-observed target dataset, compared to a prior probability of 88.89% to erroneously select a scenario. Furthermore, we found the posterior probabilities of identifying the correct scenario ranging from 55.17% (prior probability = 11.11% for each competing scenario), for the two-pulses scenarios from both the African and European sources (Afr2P-Eur2P), to 77.71% for the scenarios considering monotonically decreasing recurring admixture from both sources (AfrDE-EurDE).
Importantly, the average probability, for a given admixture scenario, of choosing any one alternative (wrong) scenario were on average 4.05% across the eight alternative scenarios, ranging from 2.79% for the AfrDE-EurDE scenario, to 5.60% for the Afr2P-Eur2P scenario (Figure 2 ). This shows that our approach did not systematically favor one or the other competing scenario when wrongly choosing a scenario instead of the true one. Furthermore, note that Afr-DE-EurDE scenarios were rarely confused (3.8%) with other recurring admixture scenarios containing at least one recurring admixture increase (AfrIN-EurDE, AfrDE-EurIN, AfrIN-EurIN), which shows a strong discriminatory power of RF-ABC model-choice a priori , even among complex recurring admixture scenarios.
In cross-validation analyses of groups of scenarios (Estoup et al., 2018), monotonically recurring admixture scenarios (AfrDE-EurDE, AfrDE-EurIN, AfrIN-EurDE, AfrIN-EurIN) can be well distinguished from scenarios considering two possible pulses after the founding event (Afr2P-Eur2P, Afr2P-EurDE, Afr2P-EurIN, AfrDE-Eur2P, AfrIN-Eur2P). Indeed, we found an out-of-bag prior error rate of 13.85%, and posterior cross-validation probabilities of identifying the correct group of scenarios of 86.08% and 86.23% respectively for the two groups.
Detailed investigation of cross-validation results shows that inaccuracies of RF-ABC model-choices occur mainly in parameter-spaces where scenarios are highly nested and, in fact, close biologically (Figure 2 ). As expected, model-choice increasingly mistakes the AfrDE-EurDE scenarios for scenarios containing two admixture pulses (Afr2P-Eur2P, Afr2P-EurIN, AfrIN-Eur2P) as values ofu Afr and u Eur are closer to 0, regardless of introgression rates values (Supplementary Figure S5A ). Intuitively, the closer these parameter values are to 0, the more peaked the decrease of recurring admixture are, which increases model-choice confusion with pulse-like scenarios. Instead,u -values closer to 0.5 correspond to linearly decreasing admixture over time which are hardly confounded with pulse-like scenarios. Furthermore, the model-choice increasingly confuses, as expected regardless of introgression values, Afr2P-Eur2P scenarios with recurring increasing admixture scenarios (AfrIN-EurIN, AfrDE-EurIN, AfrIN-EurDE), as the time of the second admixture pulses from Europe or Africa are recent (Supplementary Figure S5B ).
Most importantly, RF-ABC model-choice power to discriminate among complex admixture processes a priori was not strongly affected by the numbers of markers considered. Indeed, we found an out-of-bag prior error of 33.53% and 37.93% (instead of 32.41%), considering respectively 50,000 and 10,000 SNPs, instead of 100,000, together with a very similar distribution of correct and mistaken predictions among scenarios (Supplementary Figure S6A-B ). Finally, dividing by five the sample sizes in population H and each source populations increased, as expected, the cross-validation error rate (48.39%). Nevertheless, all scenarios continue to be correctly identified three to six times more often than expected a priori , and the distribution of erroneous predictions remained similar to previously (Supplementary Figure S6C ). Altogether, these results showed that RF-ABC model-choice can be successfully used to distinguish highly complex admixture models even when substantially less genetic and sample data are considered.