2.3.1 | Distribution of admixture fractions as a set of
summary-statistics
Most methods developed to estimate
individual admixture fractions from genetic data (e.g.
Alexander et al., 2009), are
computationally intensive, which is out-of-reach when iterated for large
sets of simulated genetic data. This explains why they are not routinely
used in ABC inferences, despite being theoretically highly informative
(Gravel, 2012;
Verdu & Rosenberg, 2011).
Here, we propose, and implement in MetHis , an efficient way to
use estimated individual admixture fractions as summary statistics for
ABC inferences, based on allele-sharing-dissimilarity (ASD)
(Bowcock et al., 1994) and
multidimensional scaling (MDS). For each simulated dataset, we first
calculated a pairwise inter-individual ASD matrix using asdsoftware
(https://github.com/szpiech/asd)
using all pairs of sampled individuals and all markers. Then we
projected in two dimensions this pairwise ASD matrix with classical
unsupervised metric MDS using the cmdscale function in R .
We expect individuals in population H to be dispersed along an axis
joining the centroids of the proxy source populations on the
two-dimensional MDS plot. We projected population H individuals
orthogonally onto this axis, and calculate each individual’s relative
distance to each centroid. We considered this measure as an estimate of
individual average admixture level from either source. Note that by
doing so, some individuals might show “admixture fractions” higher
than one, or lower than zero, as they might be projected on the other
side of the centroid when being genetically close to 100% from one
source population or the other. Under an ABC framework, this is not a
difficulty since this may happen also with the real data a
priori , and ABC goal is to use summary statistics that mimic the
observed ones.
This individual admixture estimation method has been shown to be highly
concordant with cluster membership fractions as estimated with STRUCTURE
or ADMIXTURE (Falush, Stephens, & Pritchard, 2003;
Alexander, Novembre, & Lange, 2009) in
real data analyses (e.g. Verdu et al.,
2017). We confirm these previous findings since we obtain a Spearman
correlation (calculated using the cor.test function in R ),
of rho = 0.950 (p-value < 2.10-16) and rho=
0.977 (p-value < 2.10-16) between admixture
estimates based on ASD-MDS and on ADMIXTURE, for the two case-study
datasets here explored (Supplementary Figure S2 ).
We used the mean, mode, variance,
skewness, kurtosis, minimum, maximum, and all 10%-quantiles of the
admixture distribution in population H, as 16 separate summary
statistics for ABC inference.
2.3.2 |Within population summary
statistics
We calculated marker by marker
heterozygosities (Nei, 1978), and we
considered the mean and variance of this quantity across markers in the
admixed population as two separate summary statistics for ABC inference.
In addition, we considered the mean and variance of ASD values across
pairs of individuals within population H.