Case studies: phylogeny and nucleotide sequence variation of
mzl-USCOs
In all four case studies, most interspecific but also many intraspecific
nodes of the inferred phylogenetic trees had high (i.e., >
90) branch support and showed few topological differences between
datasets obtained by different USCO extraction methods (Fig. 4; Figures
S10–13). We detected few topological differences between trees inferred
from concatenation supermatrices and trees inferred by using a
multispecies coalescent approach on gene trees.
In the Anopheles gambiae complex, the topology of
interspecific nodes in all USCO-based trees (Figure S10) was identical
to the published one inferred by applying the maximum likelihood
optimality criterion on aligned WGS data (Fontaine et al., 2015), except
that we found neither A. gambiae nor A. coluzzii to be
monophyletic. However, the topology of Fontaine et al. (2015) differed
from the species tree inferred by the same authors from the X chromosome
data only. According to Fontaine et al. (2015), the X chromosome-derived
tree more likely represents the true phylogeny of the group, because the
remainder of the genome exhibits extensive signatures of introgression.
The USCO-derived topology suggested the monophyly of all species exceptA. gambiae and A. coluzzii . Monophyly of the latter was
also not found in the study by Fontaine et al. (2015) when analyzing
SNPs extracted from WGS data applying the neighbor-joining tree
inference method. Only in the tree obtained from concatenated data
containing mzl-USCOs extracted with Orthograph/OrthoDB v. 9, both
species were found to be reciprocally monophyletic. NMDS plots that
visualized the similarity in SNPs showed nearly all species as clearly
distinct clusters irrespective of the applied USCO extraction method
(Figure S14). The only exceptions were A. gambiae and A.
coluzzii , forming together a single cluster. Our model-based clustering
analyses using STRUCTURE also showed all species with the exception ofA. coluzzii and A. gambiae as separate clusters with some
levels of admixture (Figure S11; Supplementary Text).
In the Drosophila nasuta complex, our analyses inferred
most species to be monophyletic (Figure 4; S11). These findings are
largely consistent with those reported by Mai et al. (2019).
(Sub-)species that had not been inferred as monophyletic in our
phylogenetic analyses were also not resolved when applying NMDS or
STRUCTURE (Figure 4). Otherwise, all (sub)species were clearly
distinguishable from each other (Supplementary Text).
Regarding Heliconius butterflies, our phylogenies inferred from
analyzing mzl-USCOs largely agreed with the phylogeny published by
Martin et al. (2013) (Figure S12). We found only few topological
differences between analyses that were based on different data
extraction approaches and/or phylogenetic reconstruction methods (see
Supplementary Text for details). STRUCTURE (Pritchard et al., 2000) and
NMDS revealed clusters that were largely consistent with the topology of
the phylogenetic trees, with few exceptions described in the
Supplementary Text. Analyses based on the datasets from the three USCO
extraction approaches gave very similar results (Fig. 4; Figures S14,
15). Even when allowing STRUCTURE to find more clusters than known
(sub)species in the analyzed sample by specifying a K value higher than
5, the clustering never supported more than five clusters, and
individuals were always assigned to clusters with a probability of more
than 90%. A small amount of admixture was detected between sympatric
populations (e.g., those of Heliconius melpomene and H.
timareta in Peru).
In Darwin’s finches, the alignment completeness of extracted mzl-USCOs
was very low (Table 2). The incompleteness of the Darwin’s finches’
datasets was likely caused by a low sequence coverage (< 10x)
and in consequence a poor assembly quality. Therefore, for the analysis
of sequence variation we included not only SNPs present in all
individuals (as in the other case studies), but also SNPs absent in less
than five. Possibly due to the large amount of missing data in the
alignments, the inferred phylogenetic trees differed in many details
from each other and from the original maximum-likelihood tree based on
WGS data (Lamichhaney et al., 2015). Consequently, also NMDS plots of
SNP similarity did not provide results that allowed to visually
distinguish between different species within the genusCamarhynchus and between most of the species withinGeospiza , except for the species G. difficilis andG. septentrionalis . However, differentiation between genera was
clearly visible. SNP clustering with STRUCTURE also did not allow us to
distinguish species of Camarhynchus from each other and to
distinguish some species of Geospiza from each other
(Supplementary Text).