4. Discussion
Despite the increase in large-scale genomic data, PCR-RFLPs are still widely used as diagnostic markers for the detection and species assignment of parasites (Pegg et al., 2016), disease-causing pathogens (Kato et al., 2019), microbiota (Baffoni et al., 2013), toxic dinoflagellates (Lozano-Duque, Richlen, Smith, Anderson, & Erdner, 2018) as well as animals using different tissue samples (Larraín, González, Pérez, & Araneda, 2019), scat samples (Mukherjee, Cn, Home, & Ramakrishnan, 2010) or environmental DNA (eDNA) (Clusa, Ardura, Fernández, Roca, & García-Vázquez, 2017). Once markers are identified it is a fast, cheap, and reliable technique, but the design of PCR-RFLP markers is usually time consuming, especially if many species and populations are being compared and/or highly differentiated markers are difficult to find. Here, we introduce a streamlined workflow to identify PCR-RFLPs from whole genome re-sequencing data (GB-RFLPs). We note that the same approach could be applied to RAD-seq, exome sequencing, or other forms of targeted genomic data (Fig. 2).
Our study yielded promising results for diverged populations from different lakes without ongoing gene flow, as represented by all seven Nicaraguan crater lakes. While populations could be assigned with more than 90% accuracy to two crater lakes (Apoyeque and As. Managua), our markers even yielded 100% assignment accuracy for populations of the remaining five crater lakes (Table S4). Results were less clear for populations with ongoing gene flow and/or large population sizes, in particular the Great Lakes Nicaragua and Managua, for which population-specific markers performed poorly (between 62 and 86% assignment accuracy, Table S4). This was not unexpected as we know that many alleles are shared between the great lakes and chances are high that alleles found in one of the great lakes can at least be found in one of the crater lakes that was colonized from this older source population. Therefore, although we could assign individual samples using whole-genome (Kautt et al., 2020) or RAD-seq data (Kautt et al., 2018), single- or double marker approaches are not sufficient to unambiguously differentiate between Lake Managua or Lake Nicaragua Midas cichlid populations. A similar problem can be observed for the species–specific markers for the species of Crater Lakes Apoyo and Xiloá. Also here, species clearly form pronounced clusters using whole-genome (Kautt et al., 2020) or RAD-seq marker sets (Kautt et al., 2016). Yet, particularly in the sympatric scenario, where speciation occurred within the last 5,000 years (Kautt et al., 2020) and in at least one case gene flow persists (Kautt et al., 2020; Kautt et al., 2018), there might be a strong ascertainment bias when focusing on single SNPs — as it has been intensively discussed for SNP datasets from humans (Clark, Hubisz, Bustamante, Williamson, & Nielsen, 2005). In line with this caveat, indeed species-specific markers, with a few exceptions (A. chancho and A. viridis ), performed less reliably (12/14 markers have <90% correct assignments; Table S4). Interestingly, the genetic markers for the great lake species that show extremely low genetic differentiation (FST~0.02) perform quite well (87% and 81% correctly assigned), particularly when combined (100% correctly assigned) (Fig. S4). This can be explained by the different approach that was taken here. We designed markers based on the cognizant of our prior knowledge of the genomic basis of the species-defining trait of A. labiatus : hypertrophied, thick lips. As the trait and the underlying associated SNPs (lip size variation links to only a single locus in most populations; (Kautt et al., 2020)) are almost alternatively fixed between these species, the marker seem to be most powerful to reliably assign species. While signals for gene flow betweenA. labiatus and A. citrinellus can be detected in most of the genome, this is not true for the lip locus on chromosome 8, where also the genetic markers are located.
Based on our results, we conclude that the design of markers based on whole genome data is a powerful approach in an effort to distinguish clearly differentiated species or populations or rare cases where we have loci with high local differentiation that can be used as markers. For populations with ongoing gene flow or instances where the population constitutes the source population (both applies for Great Lakes Nicaragua and Managua) the single/double-GB-RFLP marker approach performs poorly — likely because our genomic samples that we used for the design of the markers gives only an estimate of the ‘true’ population allele frequencies (i.e., markers that seem perfect based on our limited genomic data are in reality not markers that can unambiguously differentiate populations). The same is true for sympatric species (Crater Lake Apoyo and Xiloá) without localized differentiation (as opposed to differentiation found between A. labiatus andA. citrinellus ). To make reliable species identification possible, multi-marker assays might be necessary for some instances. These would likely not require the complete set of markers found via RAD-seq or WGS analyses but could be applied with a selected set of markers. Here, one approach would be to use those SNPs that load most heavily on the first principal components of Crater Lakes Apoyo and Xiloá (based on Kautt et al., 2020; Kautt et al., 2016, 2018) thereby giving most power to distinguish the sympatric species. Such very cost-effective targeted multi-SNP genotyping panels have been used, for example, for 217 SNPs to assign salmons to particular populations (Aykanat, Lindqvist, Pritchard, & Primmer, 2016) and might be an excellent approach, also for the Midas cichlid system. Lastly, this set of RFLPs is now available as a resource for conservation purposes to for example identify individual samples on fish markets, but also for cohort and mark-recapture studies. This study therefore also presents a workflow how use genomic resources for the generation of applicable low-budget approaches for species assignment. Our study therefore introduces a new methodological approach for such an effort, as implementation of approaches that can help ‘real-world conservation issues’ often fail as previously discussed (Shafer et al., 2015).
In summary, we tested a set of 36 PCR-RFLP loci that we designed based on whole genome re-sequencing data to genetically assign Midas cichlid species and populations. While our analyses reveal limitations for the assignment of species and populations with ongoing gene flow and/or extreme recent divergence, genome-based designed PCR-RFLPs (GB-RFLP) have great benefits when populations with robust genome-wide (allopatric populations) or local differentiation (A. citrinellus andA. labiatus ) have to be identified.