2.3.1 | Distribution of admixture fractions as a set of summary-statistics
Most methods developed to estimate individual admixture fractions from genetic data (e.g. Alexander et al., 2009), are computationally intensive, which is out-of-reach when iterated for large sets of simulated genetic data. This explains why they are not routinely used in ABC inferences, despite being theoretically highly informative (Gravel, 2012; Verdu & Rosenberg, 2011).
Here, we propose, and implement in MetHis , an efficient way to use estimated individual admixture fractions as summary statistics for ABC inferences, based on allele-sharing-dissimilarity (ASD) (Bowcock et al., 1994) and multidimensional scaling (MDS). For each simulated dataset, we first calculated a pairwise inter-individual ASD matrix using asdsoftware (https://github.com/szpiech/asd) using all pairs of sampled individuals and all markers. Then we projected in two dimensions this pairwise ASD matrix with classical unsupervised metric MDS using the cmdscale function in R . We expect individuals in population H to be dispersed along an axis joining the centroids of the proxy source populations on the two-dimensional MDS plot. We projected population H individuals orthogonally onto this axis, and calculate each individual’s relative distance to each centroid. We considered this measure as an estimate of individual average admixture level from either source. Note that by doing so, some individuals might show “admixture fractions” higher than one, or lower than zero, as they might be projected on the other side of the centroid when being genetically close to 100% from one source population or the other. Under an ABC framework, this is not a difficulty since this may happen also with the real data a priori , and ABC goal is to use summary statistics that mimic the observed ones.
This individual admixture estimation method has been shown to be highly concordant with cluster membership fractions as estimated with STRUCTURE or ADMIXTURE (Falush, Stephens, & Pritchard, 2003; Alexander, Novembre, & Lange, 2009) in real data analyses (e.g. Verdu et al., 2017). We confirm these previous findings since we obtain a Spearman correlation (calculated using the cor.test function in R ), of rho = 0.950 (p-value < 2.10-16) and rho= 0.977 (p-value < 2.10-16) between admixture estimates based on ASD-MDS and on ADMIXTURE, for the two case-study datasets here explored (Supplementary Figure S2 ).
We used the mean, mode, variance, skewness, kurtosis, minimum, maximum, and all 10%-quantiles of the admixture distribution in population H, as 16 separate summary statistics for ABC inference.
2.3.2 |Within population summary statistics
We calculated marker by marker heterozygosities (Nei, 1978), and we considered the mean and variance of this quantity across markers in the admixed population as two separate summary statistics for ABC inference. In addition, we considered the mean and variance of ASD values across pairs of individuals within population H.