Microevolutionary analyses
For each ORF and UCE we genotyped
48 diploid, dioecious individuals belonging to six populations from the
Malawi Basin. Filtering for nucleotide diversity and population
differentiation resulted in the retention of 1,097 ORFs and high-quality
genotypes for ~1,153,827 out of 1,675,638 sites
(68.9%), of which 17,988 were polymorphic (1.1%; Ortiz-Sepulveda et
al., 2022). Using identical filtering criteria, we retained 309 UCEs
with high-quality genotypes for 111,492 out of 254,694 sites (43.8%),
of which 3,148 were polymorphic (1.2%). Estimates of
π average 0.00167±0.00316 and
0.00116±0.00206 for ORFs and UCEs, respectively, and they are very
similar across the six studied populations (Fig. 6A, S5A), indicating an
average of ~1,915 and ~130 nucleotide
differences in pairwise haplotype comparisons, respectively. The average
πS per population varies between 0.00220 and 0.00329,
whereas πN varies between 0.00057 and 0.00092, resulting
in an average πN/πS of 0.259 (Fig. S6).
Including intronic/intergenic flanking regions of ORFs would further
increase the number of variant sites that can be analysed. PairwiseDXY -values average 0.00175±0.00008 for ORFs and
0.00127±0.00015 for UCEs, which is only 5% to 9% higher than the mean
π, respectively (Fig. 6A, S5A), indicating limited overall net
nucleotide divergence among populations.FST -values average 0.060±0.019 for ORFs and
0.054±0.015 for UCEs, indicating moderate genetic differentiation.
Substantial variation exists in FST values among
ORFs (Fig. 6B) and UCEs (Fig. S5B). Per pairwise population comparison
between 60 and 230 ORFs and between 16 and 38 UCEs displayFST values >0.15, of which 42.0%
and 59.3%, respectively, display elevatedDXY -values too (i.e.DXY >0.002).
Filtering ORF data to examine genetic structure resulted in the removal
of two individuals (dna0469 and dna0416) and a final dataset of 2,161
SNPs (Ortiz-Sepulveda et al., 2022). PCA on this dataset indicated that
PC1, 2 and 3 represent 11.7, 10.9 and 8.7% of all variation in the
dataset, respectively. The 95% convex hulls of populations overlap
substantially within the northern and southern regions, but not between
them (Fig. 7). Both regions are mainly separated along PC3. The Likoma
Island population falls closer to populations of the northern region in
PC1 vs. 2, but closer to those of the south in PC1 vs. 3. The population
of the Shire River overlaps with one population from the south, but
shows substantial differentiation from the other southern population.
These results are highly congruent with those obtained with
fastSTRUCTURE on the same dataset, which suggest K=4 to be the best
scenario with the ΔK method and most of the estimators of Puechmaille
(2016). Some of these latter estimators suggested 5 clusters, but with
specimen assignments that are almost identical to the K=4 solution (Fig.
7). Two of these four clusters correspond to sampling locations, i.e.
Likoma Island and Shire River, whereas the others coincide with a
north-south separation in which one population from the south (MLW8-014)
displays mixed assignments, including signatures from the northern and
Shire River clusters. Interestingly, the Shire River cluster, although
being geographically in the far south, clusters with the north in the
K=3 scenario.