2.4 Filtering and quality control
We filtered our genetic marker set in several ways. First, we removed variants that had missing data across the dataset and avoided markers that were on the same chromosome (to avoid that the markers are in linkage disequilibrium). Second, we selected 800 bp loci with a single restriction sites (or a maximum of two for a few). Third, we avoided genetic markers that had other variants in the flanking region (±5 bp). Based on the position of the restriction enzyme recognition sequence, we extracted the exact haplotypes for all sequenced genomes, estimated allele frequencies across the dataset and constructed haplotype networks (Fig. S1 and S2). Two RFLP loci had one additional SNP in the recognition sequence. However, these SNPs did not affect the RFLP. In one case the second SNP was fully linked to the target SNP (Fig. S2 L). In the other case there were two haplotypes that both were not cut by the enzyme and both more common in the outgroup (Fig. S1 O).