Environmental Association Analysis
The PCA with all Bioclim environmental variables (N = 19 variables)
yielded a single principal component (eigenvalue = 0.679) that retained
four variables using the scree plot criterion: annual mean temperature
(BIO1), max temperature of the warmest month (BIO5), mean temperature of
the warmest quarter (BIO10), and annual precipitation (BIO12). Thereby,
this component defined a bioclimatic gradient with a hot and dry extreme
in the area occupied by the Sahara desert (highest values) and a
temperate and wet extreme in northwestern Spain (lowest values; Fig.
1B).
A vast majority of inter-population randomizations and of randomizations
by neutral loci resulted in non-significant models (Fig. 2). Mean
adjusted R2 for the random assignment of genotypes to
populations (inter-population randomization) was 0.002 (SD = 0.038,
range = 0 - 0.199), with a mean p-value of 0.487 (SD = 0.292, range =
0.0004 - 1); only 6% of the 1,000 randomized datasets yielded
significant results. Mean adjusted R2 for
randomization by neutral loci (i.e. random within-population selection
of SNPs, either outliers or not) was 0.002 (SD = 0.105, range = 0 -
0.324), with a mean p-value of 0.494; SD = 0.296, range = 0.001- 1);
only 5.6 % of the datasets yielded significant models. In the case of
the complete randomizations (i.e. fully permuted SNP datasets prior to
outlier analysis), mean adjusted R2 was 0.047 (SD =
0.092, range = 0 - 0.788), with a mean p-value of 0.616 (SD = 0.451,
range = 10-7 - 1).
Datasets built by intra-population randomization produced highly
significant models (mean adjusted R2 = 0.646 ± 0.007,
range = 0.623 - 0.670; mean p-value ± SD =3.5 x 10-17± 2.8 x 10-17, range = 2.02 x 10-18– 2.93 x 10-16; Fig. 2). Only 2 out of the 500
complete randomizations yielded a model that explained more
environmental variability than those built with intra-population
randomized datasets. Thus, the environmental association models
including the SNPs under selection (intra-population randomization: mean
R2 = 0.646) had on average between one and three
orders of magnitude more predictive power than those built with the
other three randomization strategies (mean R2 = 0.002,
0.002, and 0.047, for inter-population, by neutral loci, and complete
randomizations, respectively). The final GEAM included four SNPs (all
but one detected by FLK) with significant partial correlations whose
final coefficients and p-values were computed by calculating their means
across all intra-population randomized datasets (all but one detected by
FLK; Table 1).