Case studies: Species delimitation
Similar to observations reported in previous studies involving large multi-gene datasets, we found species delimitation algorithms to exhibit a tendency for over-splitting (Sukumaran & Knowles, 2017; Chambers & Hillis, 2020; Dietz et al., 2023). This tendency is probably caused by intraspecific population structure being mistaken for divergence between species — an effect that is expected to positively correlate with dataset size (Leaché et al., 2019). Over-splitting happened more frequently when using the species delimitation software SODA than when using the software tr2 — a trend previously already observed by Joshi et al. (2023). This bias in the tendency to over-split species may be caused in part by the larger number of trees used by SODA in comparison to tr2: SODA considers all user-provided gene trees, while tr2 can use only those gene trees that consistently contain all samples. The significantly lower amount of over-splitting in Heliconius in comparison to the results obtained from studying the other three taxonomic groups is probably caused by the lower number of individuals in the analysis, which may result in fewer intraspecific clusters that the algorithms could mistake for species. The subspecies ofHeliconius lumped here (Heliconius melpomene aglaope /H. m. amaryllis ; Heliconius melpomene melpomene [from Panama] / H. m. rosina ) had the lowest Fst values of all involved taxon pairs (Martin et al. 2013), while they represent highly distinctive morphological forms. These taxa were found to be differentiated almost exclusively at loci related to wing coloration, while the rest of the genome showed very little differentiation (Martin et al. 2013). The lumping in some parts of the Darwin’s finch datasets seems to be caused by a large amount of discordance among gene trees, which may be explained in part by the relative incompleteness of the data, but also by frequent interspecific hybridization (Lamichhaney et al., 2015; Grant & Grant, 2016).