Case studies: Species delimitation
Similar to observations reported in previous studies involving large
multi-gene datasets, we found species delimitation algorithms to exhibit
a tendency for over-splitting (Sukumaran & Knowles, 2017; Chambers &
Hillis, 2020; Dietz et al., 2023). This tendency is probably caused by
intraspecific population structure being mistaken for divergence between
species — an effect that is expected to positively correlate with
dataset size (Leaché et al., 2019). Over-splitting happened more
frequently when using the species delimitation software SODA than when
using the software tr2 — a trend previously already observed by Joshi
et al. (2023). This bias in the tendency to over-split species may be
caused in part by the larger number of trees used by SODA in comparison
to tr2: SODA considers all user-provided gene trees, while tr2 can use
only those gene trees that consistently contain all samples. The
significantly lower amount of over-splitting in Heliconius in
comparison to the results obtained from studying the other three
taxonomic groups is probably caused by the lower number of individuals
in the analysis, which may result in fewer intraspecific clusters that
the algorithms could mistake for species. The subspecies ofHeliconius lumped here (Heliconius melpomene aglaope /H. m. amaryllis ; Heliconius melpomene melpomene [from
Panama] / H. m. rosina ) had the lowest Fst values of all
involved taxon pairs (Martin et al. 2013), while they represent highly
distinctive morphological forms. These taxa were found to be
differentiated almost exclusively at loci related to wing coloration,
while the rest of the genome showed very little differentiation (Martin
et al. 2013). The lumping in some parts of the Darwin’s finch datasets
seems to be caused by a large amount of discordance among gene trees,
which may be explained in part by the relative incompleteness of the
data, but also by frequent interspecific hybridization (Lamichhaney et
al., 2015; Grant & Grant, 2016).