2.7 Estimation of RFLP marker quality
To estimate the reliability of individual markers, we first calculated discovery and false discovery rates based on allele frequencies. To do so, we first calculated allele frequencies by lake (lake comparisons) or sympatric species (species comparisons). Based on the allele frequency, we calculated expected genotype frequencies for all populations individually (assuming Hardy–Weinberg equilibrium) (Table S2). To evaluate the quality of the markers, we used these frequencies to calculate the chance that an individual with a particular genotype is from a particular population or not (without specifying which and assuming a 50:50 chance that the individual is from the focal population or not) (Table S2).
We used a bootstrap approach and randomly picked one million genotypes (i.e. individuals) from the ingroup (target population) and one million from the outgroups (again with an equal chance for each population/sympatric species to be picked) based on their relative frequencies and calculated how often a particular population/sympatric species would have been assigned correctly (correctly assigned), how often an ingroup individual would have been assigned to an outgroup (false negative) and how often an outgroup individual would have been assigned to the ingroup (false positive) (Fig. 3, Table S3). The same approach was then used based on our PCR-RFPL data (Table S3): False negatives were ingroup individuals that were incorrectly assigned as outgroup individuals, false positives were outgroup individuals that were incorrectly assigned as ingroup individuals. The proportion of correctly assigned individuals was calculated by taking the mean of the percentage of correctly assigned ingroup and the percentage of correctly assigned outgroup individuals (to make these estimates comparable to the estimates based on the bootstrapping dataset — some analyses were imbalanced with a different number of ingroup and outgroup individuals).