2.4 Filtering and quality control
We filtered our genetic marker set in several ways. First, we removed
variants that had missing data across the dataset and avoided markers
that were on the same chromosome (to avoid that the markers are in
linkage disequilibrium). Second, we selected 800 bp loci with a single
restriction sites (or a maximum of two for a few). Third, we avoided
genetic markers that had other variants in the flanking region (±5 bp).
Based on the position of the restriction enzyme recognition sequence, we
extracted the exact haplotypes for all sequenced genomes, estimated
allele frequencies across the dataset and constructed haplotype networks
(Fig. S1 and S2). Two RFLP loci had one additional SNP in the
recognition sequence. However, these SNPs did not affect the RFLP. In
one case the second SNP was fully linked to the target SNP (Fig. S2 L).
In the other case there were two haplotypes that both were not cut by
the enzyme and both more common in the outgroup (Fig. S1 O).