Genetic diversity and demographic inferences
Nucleotide diversity (Ɵπ ),Ɵw (Watterson’s theta, based on segregating sites
(Watterson, 1975)) and Tajima’s D (TD, (Tajima, 1989)) were
computed from the SFS for each species with custom scripts. Significance
of TD was evaluated after 1,000 coalescent simulations of a
constant population model with scaled size Ɵπ . To
test whether sampled demes are isolated or belong to a structured
meta-population and to eventually estimate connectivity, we devised
three alternative evolutionary models for each species (Figure 1) within
an approximate bayesian computation framework (ABC) framework.
Model NS (non-structured) defined an isolated population characterized
by a modern effective population size (Nmod ) switching
instantaneously into an ancestral population size (Nanc ) atTc generations before present. Model FIM specifies a
non-equilibrium finite island model defined by 100 demes exchangingNm migrants each generation under a symmetric migration matrix.
The array of demes is instantaneously colonized Tcol generations
before present from a population with an ancestral size (Nanc ).
Model SST is similar to FIM but demes exchanging migrants only with
their four neighbours (or less, if they are at the border of the array),
in a steppingstone fashion. We performed 50,000 coalescent simulations
from prior distributions using fastsimcoal v.2.6.0.3
(Excoffier, Dupanloup, Huerta-Sánchez, Sousa, & Foll, 2013),reproducing the exact number of individuals and loci for each species
(Table 1). We first performed model selection through the random forest
(RF) classification method implemented in the abcRF R package (Pudlo et
al., 2016). We then performed 50,000 additional simulations under the
most supported model in order to estimate demographic parameters with
the abcRF regression method (Raynal et al., 2019). Both model selection
and parameter estimation were computed with the following set of summary
statistics: the SFS, Ɵπ ,Ɵw and TD. The first two axes of a Linear
Discriminate Analysis performed on the previous statistics were also
included for model selection in order to increase the accuracy of the
estimates (Pudlo et al., 2016). Even though Ɵπ ,Ɵw and TD are function of the SFS, they
convey additional information by the non-linear feature of the
functions. Information redundancy among the considered summary
statistics is accounted for by the RF algorithm. Model selection and
parameter estimation were performed twice to check the consistency of
the analyses, and cross validation (or confusion matrix for the model
selection) was performed on the first of the two runs. The number of
trees in each RF algorithm was chosen by monitoring the evolution of the
out-of-bag error (Pudlo et al., 2016).
We investigated the variation in the effective population size
(Ne ) through time by running the composite likelihood approach
implemented in the stairwayplot v.0.2 software (Liu & Fu,
2015). We set the generation time to seven years for C.
melanopterus (Maisano Delser et al., 2016) and to 10 years for the
other species (Cortés, 2002; Pirog et al., 2019) for all demographic
inferences. We applied a mutation rate per generation per site of
8.4×10-9 to the exon capture data of C.
melanopterus (Maisano Delser et al. 2016) and of
1.93×10-8 to the RADseq data for the remaining three
species. This mutation rate was determined by scaling genetic diversity
between ddRAD (obtained under the same protocol of this study) and Exon
Capture data from 12 C. melanopterus individuals from Moorea,
French Polynesia (Supplementary Material).