4 | DISCUSSION
Our novel MetHis forward-in-time simulator and summary-statistics
calculator coupled with RF-ABC scenario-choice can distinguish among
highly complex admixture histories using genetic data. As expected,
scenario-choice errors are particularly made in regions of the parameter
space for which models are highly nested (Robert, Mengersen, & Chen,
2010), and, thus, biologically similar. Furthermore, we found that
NN-ABC provided accurate and reasonably conservative posterior parameter
estimation for numerous parameters of the winning scenario, using human
population data as a case-study. Finally, we empirically demonstrated
that the moments of the distribution of admixture fractions in the
admixed population were highly informative for ABC inference, as
expected theoretically (Gravel, 2012; Verdu & Rosenberg, 2011).
Altogether, our results for the two recently-admixed human populations
illustrate how our MetHis – ABC framework can bring fundamental
new insights into the complex demographic history of admixed
populations; a framework that can easily be adapted, using MetHis(Supplementary Note S1 ), for investigating complex admixture
histories when maximum-likelihood methods are intractable.
We considered nine competing scenarios all deriving from the general
mechanistic admixture model of Verdu and Rosenberg (2011). While the
two-source version of this model can readily be simulated withMetHis , it considers 2g -1 model parameters (with g the
duration of the admixture process), plus effective population sizes
parameters and mutation parameters. Estimating jointly all these
parameters is out of reach of ML methods, and further likely out of
reach of ABC posterior-parameter estimation procedures. However,
conducting ABC model-choice for disentangling major classes of
relatively simplified admixture processes followed by ABC parameter
estimation under the winning model, is flexible enough to bring new
insights into the evolutionary history of admixed populations, far
beyond all admixture scenarios that can be explored with existing ML
methods (Gravel 2012; Hellenthal et al. 2014).
The sample and SNP-set explored here is often out of reach in non-model
species. Nevertheless, our results considering vastly reduced SNP or
sample sets demonstrate that ABC can remain remarkably accurate to
disentangle highly complex admixture processes with much less genetic or
sample data. This is due to the fact that ABC relies on the amount of
information carried by summary-statistics about model parameters, rather
than the absolute amount of genetic data investigated. Therefore, theMetHis -ABC framework remains promising to reconstruct complex
admixture histories, provided that summary-statistics considered by the
user are, a priori , informative about model parameters, and that
summary-statistics are reasonably well estimated with the observed data.
Altogether, large parameter and summary-statistics spaces, lack of
information from summary statistics, and scenario nestedness, are well
known to affect ABC performances and, thus, imperatively need to be
thoroughly evaluated case by case (Csilléry, Blum, Gaggiotti, &
Francois, 2010; Robert et al., 2010; Sisson et al., 2018).
To further increase the range of applicability of our MetHis -ABC
framework, our software readily implements microsatellite markers
together with a general stepwise mutation model (Estoup, Jane, &
Cornuet, 2002), fully parameterizable by the user (Supplementary
Note S1 ). This will allow investigating numerous complex admixture
histories, much older than the one here explored, and from non-model
species. Even if prior knowledge of the founding date is lacking,MetHis users can simply set the founding of the population in a
remote past and implement a second founding event with variable date to
be estimated, together with later additional admixture events and other
parameters of interest, in the ABC inference. Nevertheless, it is not
trivial to predict how old an admixture processes should be to be
successfully investigated with ABC (Buzbas and Verdu 2018). Indeed,
ancient admixture processes can leave scarcely identifiable signatures
in the observed data, if obliterated by more recent admixture events.
This was theoretically expected (Buzbas, & Verdu, 2018), and future
studies combining ancient and modern DNA samples may bring further
information into the ancient admixture history reconstruction.
Importantly, the computational cost of our study depends, for 2/3, on
summary statistics calculation at the end of the admixture process, as
is often the case in ABC. Considering much longer admixture processes
than the ones here investigated will mechanically increase computation
time but will not increase summary-statistics calculation time.
Furthermore, note that the
computational cost of simulating data with MetHis does not rely
excessively on the number of generations considered (within reason), nor
on the absolute number of markers used, but rather on the effective
population size in the admixed population set by the user.
Although MetHis readily allows considering changes of effective
population size in the admixed population at each generation as a
parameter of interest to ABC inference (Supplementary Note S1 ),
we did not, for simplicity, investigate here how such changes affected
our results. Future work using MetHis will specifically
investigate how effective size changes may influence genetic patterns in
admixed populations, a question of major interest as numerous admixed
populations have experienced founding events and/or bottlenecks during
their genetic history (e.g. Browning et al., 2018).
The current MetHis – ABC approach does not make use of admixture
linkage-disequilibrium patterns in the admixed population, and only
relies on independent SNP or microsatellite markers. Nevertheless,
admixture LD has consistently proved to bring massive information about
complex admixture histories in populations where large genomic datasets
are available (Gravel, 2012; Hellenthal et al., 2014; Malinsky et al.,
2018; Medina et al., 2018; Ni et al., 2019; Stryjewski & Sorenson,
2017). However, existing methods to calculate admixture LD patterns
remain computationally intensive and require both dense marker-sets and
accurate phasing, which is difficult under ABC where such statistics
have to be calculated for each one of the numerous simulated datasets.
In this context, RF-ABC (Pudlo et al., 2016; Raynal et al., 2019), or
AABC (Buzbas & Rosenberg, 2015), methods allow substantially
diminishing the number of simulations required for satisfactory ABC
inference. This makes both approaches promising tools for using, in the
future, admixture-LD patterns to reconstruct complex admixture processes
from genomic data.
Finally, future developments of the MetHis -ABC framework will
focus on implementing sex-specific admixture models, as these processes
are known to affect genetic diversity patterns in a specific way, and
are of interest to numerous study-cases (Goldberg, Verdu, & Rosenberg,
2014). Furthermore, the MetHis forward-in-time simulator
represents an ideal tool to further investigate admixture-related
selection forces, and admixture-specific assortative matting processes,
as these processes can simply be modeled by specifically parameterizing
individual reproduction and survival in the simulations, unlike most
coalescent-based simulators.