1. Introduction
Historically morphological differences remain the basis for species identification, taxonomic keys, and effort in species delimitation. Yet, reliable classification of specimens can be complex due to many factors. For example, when species are morphologically extremely similar or when morphological characters are not expressed at a given life-history stage (e.g., juveniles). In the last decade, the increasing affordability of reduced-representation data (e.g., restriction-site-associated DNA sequencing or target enrichment) or whole genome re-sequencing has provided new possibilities to assign species not only based on morphological or meristic characters, but also on genomic information. In some instances, this has even greatly contributed to the discovery and description of new (i.e., previously cryptic) species (Fennessy et al., 2016; Nater et al., 2017). Genetic species assignment approaches are also promising to add novel tools to aid in conservation efforts of endangered species, but practical implementations often fail (Campbell et al., 2019; Piertney, 2016; Shafer et al., 2015). A major disadvantage of high-throughput sequencing techniques are the cost and time that is needed to generate libraries, sequence them, and to analyze the data. But, importantly, genomic data also allow for the identification of a suite of informative, diagnostic genetic markers for species or population assignment that can be genotyped using cheaper and faster methods (Shafer et al., 2015).
Among all genetic variants, single-nucleotide polymorphisms (SNPs) are clearly the most abundant (in the human population for example more than 95% of all genetic variants are SNPs (Auton et al., 2015)) and therefore powerful genetic markers for assigning populations or species. Over the past 30 years, many methods have been developed to cost-effectively genotype SNPs. One widely used fast method are PCR restriction fragment length polymorphism (PCR-RFLP) markers (McKeown, Robin, & Shaw, 2015; Ota, Fukushima, Kulski, & Inoko, 2007). Hereby, a particular DNA fragment is first amplified by PCR. The resulting amplicon is then digested using a restriction enzyme that cuts only one allele at a diagnostic SNP (resulting in two fragments) but not the other one (one fragment), due to an, ideally species-specific, polymorphism in the enzyme’s recognition site. Homozygous individuals for either allele, as well as heterozygous individuals (three fragments), can be easily distinguished from each other by gel electrophoresis (see detailed description of the method in Ota et al., 2007). Therefore, PCR-RFLP is an excellent method that can be used for fast, cheap, and reliable genotyping of diagnostic markers.
Recently, we have sequenced 453 genomes of a very young species flock of Nicaraguan Midas cichlid fishes (Amphilophus cf. citrinellus ) (Kautt et al., 2020). This species complex includes, so far, 13 described species (Torres-Dowdall & Meyer, in press). Two species (A.s citrinellus and A. labiatus ) can be found in both Great Lakes Managua and Nicaragua (Barluenga, Stölting, Salzburger, Muschick, & Meyer, 2006). From there, seven crater lakes (Apoyeque, Apoyo, As. León, As. Managua, Masaya, Tiscapa and Xiloá) have been colonized (K. R. Elmer et al., 2014; Kathryn R. Elmer, Lehtonen, Fan, & Meyer, 2013; Kathryn R Elmer, Lehtonen, & Meyer, 2009). In two of the crater lakes, Apoyo and Xiloá, six and four endemic species have been described, respectively (Barlow & Munsey, 1976; Geiger, McCrary, & Stauffer Jr, 2010; Recknagel, Kusche, Elmer, & Meyer, 2013; Stauffer Jr, McCrary, & Black, 2008; Stauffer Jr & McKaye, 2002). In Crater Lake As. Manuagua, another endemic species, A. tolteca , has been formally described (Recknagel et al., 2013), while species of the other crater lakes await formal description (why we included them here as ‘populations’).
Crater lake populations and sympatric species therein clearly form separate clusters using both RAD-sequencing data (Kautt, Machado-Schiaffino, & Meyer, 2018) and whole-genome data (Kautt et al., 2020). While all crater lake populations and species differ morphologically (Kathryn R. Elmer, Kusche, Lehtonen, & Meyer, 2010; Kautt et al., 2018), species assignment can be difficult, especially when specimens are young, and particularly for the sympatric species from crater lakes Apoyo and Xiloá. Therefore, methods to quickly genotype fish using genetic markers would give additional confidence for species assignments and allow identification of species also for juvenile fish. This is important for certain research questions including for example cohort analyses and unbiased frequency estimations. Moreover, several of these species are protected or live in protected environments where illegal fishing occurs. Cheap genotyping assays with a fast turnaround time might contribute to conservation monitoring.
The objectives of this study were therefore to (1) design a workflow to screen for suitable GB-RFLP markers for species and population assignment, (2) test in silico if those markers would allow unambiguous assignment and (3) to perform GB-RFLP assays on independent samples (i.e., samples that have been not used for the design of the markers in (1)) to test if the markers are suitable to assign species and populations (i.e., lakes of origin).