Molecular single-gene species delimitation
The sequences of each species alignment were collapsed into unique haplotypes using the online .fasta sequence toolbox FaBox (Villesen 2007). We discarded four species either because no sequences were available from any of the southern European peninsulas (Orthosia cerasi ) or elsewhere on the continent (Eupithecia massiliata ,Phycita torrenti and Dryobota labecula ). We removed them because those missing data precluded any pairwise comparison between populations from at least one southern Peninsula and the rest of the continent. Our final database included a total of 18 taxa, 17 species plus the subspecies Colotois pennaria ssp. carbonii , which was treated as distinct species in the analyses on intra-specific genetic divergence (Table 1). We did so to be conservative and avoid inflating intra-specific genetic divergence in Colotois pennaria . The ssp. carbonii is morphologically distinct (Sihvonen & Skou 2015), and so the intra-specific divergence would be expected to be higher than in those species in which no subspecies have been described so far. The alignments of the unique haplotypes of each of the 18 species were visually inspected one by one in MEGA 7 (Kumar 2016); we corroborated that there had been no errors and all sequences differed in at least one base. Then, the haplotypes of all species were pooled together and subsequently aligned again; this was the basic and final alignment used for species delimitation with the Generalized Mixed Yule Coalescent (GMYC) method.
The GMYC single-locus method identifies the number of independent operational taxonomic units (OTUs) or putative species present in a sample of sequences under the maximum likelihood solution (Pons et al . 2006; Fujisawa & Barraclough 2013). Some OTUs can correspond to cryptic species or isolated genetic lineages, thus, this sort of analyses may well illustrate the patterns of geographical genetic variability of these insects in Europe. GMYC requires an ultrametric gene tree as input, which was built using the software Bayesian Evolutionary Analysis Sampling Trees (BEAST 1.7.5; Drummond, Suchard, Xie & Rambaut 2012). To select appropriate partitioning scheme and substitution model for the three-codon positions of COI, we used PartitionFinder version 1.1.1 (Lanfear, Calcorr, Ho & Guindon. 2012). We tested between the available models in BEAST using the Bayesian Information Criteron (BIC) and between all possible partitioning schemes. PartitionFinder supported three separate partitions for the codon positions, and three different models were selected according to the BIC scores (equal-frequency Tamura-Nei model with Gamma correction (G), Hasegawa Kishino Yano model with invariable sites (I) and equal-frequency Tamura-Nei model with Gamma correction (G) for 1st, 2nd and 3rd codon positions respectively). We used a strict clock model with rate fixed to 1 and a constant size coalescent tree prior as this could be considered conservative towards the null model when testing against the GMYC model in a likelihood ratio test (Monaghan 2009). The effects of tree reconstruction method and model for the GMYC results have been investigated and, in general, a Bayesian estimation under a coalescent tree prior has performed well in comparisons (Monaghan 2009; Talavera, Dincă & Vila 2013; Tang, Obertegger, Fontaneto & Barraclough 2014). Talavera et al . (2013) found no difference in GMYC results between using a strict or relaxed clock model for inferring the input ultrametric tree. We ran two Markov Chain Monte Carlo (MCMC) runs each with 10 million generations, sampled every 1000 generations, which were merged using LogCombiner. Convergence and Effective Sample Size (ESS) values for sampled model parameters were monitored in Tracer version 1.6 (Rambaut, Suchard, Xie & Drummond 2014). TreeAnnotator was used to select the maximum clade credibility tree (MCC tree) from the sampled trees (burnin=0.25) with posterior median values used for node heights.
The maximum clade credibility (MCC) tree with branch lengths was imported in R version 3.3.1 (R Core Team 2016) and the GMYC analysis was conducted using the splits package version 1.0 (Ezard, Fujisawa & Barraclough 2009), assisted by the ‘APE’ package (Paradis, Claude & Strimmer 2004). The ‘splits’ package calculates the likelihood of the tree under a single coalescent (null model) and under a GMYC model, which may follow the single threshold or the multiple threshold assumptions. In the first, a single threshold is placed at every node in the tree, and the threshold at the maximum likelihood solution delimits the number of evolutionary units. This method has shown to display close correlation with the number of species in the tree (Fujisawa & Barraclough 2013). The multiple threshold GMYC relaxes the assumption that speciation events must be older than all coalescent events in the gene tree. The method iteratively splits and fuses existing species clusters starting from the single threshold solution, until no further improvement in the maximum likelihood occurs (Fujisawa & Barraclough 2013). We performed both and compared the results in terms of the number of OTUs delimited. Besides these tree-based methods, we used two distance-based approaches for delimiting the number of OTUs, namely ABGD (Automatic Barcode Gap Discovery) (Puillandre, Lambert, Brouillet & Achaz 2012), and jMOTU (Jones, Ghoorah & Blaxter 2011). ABGD uses genetic divergences between sequences and a coalescent model to identify a barcode gap between intra- and interspecific distances, and defines OTUs accordingly (Puillandre et al. 2012). jMOTU takes one, or a range of, user-supplied distance cut-off values and clusters sequences using single-linkage clustering so that no member of one OTU is closer than the cut-off to any member of any other cluster (Jones et al.2011). A detailed explanation of the basis and principles of each method are provided in the supplementary information (Table S3 and Fig. S1).