Statistical analyses
Our main goal was to assess whether the inclusion of sequences from
genetic diversity hotspots (in our case the Iberian and Italian
Peninsulas) increases intra-specific genetic divergence. We did not
consider the Balkan Peninsula, as barely any sequences were available in
BOLD and we had not sampled in that geographic region.
In order to test whether southern Europe was underrepresented in the
Barcode of Life Data System relative to its species richness we checked
the geographical distribution of all the study species at the GBIF
website (GBIF.org 2017). The study species are common ones and we could
thus assess their distribution range reliably on GBIF records. In
parallel, we checked another database (Lepidoptera Mundi,
lepidoptera.eu) based on records and bibliographical data to confirm the
species geographical distribution. We took the southernmost and
northernmost European records for each species of the study group and
assumed that these were the limits of its geographical distribution in
Europe; in between them, the species would be present. We then counted
to which extent the number of species recorded decreased with increasing
latitude starting from southern Iberian Peninsula. At the same time, and
taking only the DNA barcodes available in BOLD (not including the
individuals sequenced in this project) we assessed the relationship
between latitude and the number of barcodes. Regression fitting was done
using STATISTICA (Statoft Inc 2005).
To assess whether genetic divergence was higher in the pairwise
inter-population comparisons when at least one of the populations was
Iberian or Italian we performed linear mixed models (LMMs) using ‘nlme’
package (Pinheiro, Bates, De Roy, Sarkar & R Core Team 2017) of R (R
Core Team 2016). We did so because we considered both fixed and random
mixed-effects in the regression models. Four types of pairwise contrasts
between populations were defined: i) between two European populations
excluding Iberian and Italian ones (contrasts abbreviated henceforth as
EUEU), ii) between one European population (not Italian) and one Iberian
(abbreviation EUIB), iii) between one European population (not Iberian)
and one Italian (EUIT) and iv) between two Iberian populations (IBIB).
The pairwise comparisons between only Italian populations were not
conducted due to low sample size.
We performed three LMM tests: the first one to assess whether the
genetic divergence differed between EUEU and EUIB pairwise population
contrasts, the second to calculate the same but between EUEU and EUIT;
and the third one to assess it within the same geographical area (EUEUvs IBIB contrasts). In all the analyses, the genetic divergence
(measured as K2P% distance) was the dependent variable and the type of
population contrast the independent factor; the pairwise spatial
distance between populations was the covariate. Additionally, the
largest number of sequences at each pairwise comparison between
populations was also included as covariate to control for the potential
effect that sample size could have on genetic divergence. In the EUEU vs
IBIB analysis the spatial range was reduced to 1000 km, as the maximum
distance between any pair of Iberian populations was lower than that. In
the three tests the species of Lepidoptera was included as a random
factor.