Introduction
During the past two decades, DNA-based approaches have increased the
quality and reproducibility of species delimitation and identification
(Ahrens, 2023). Standardized and automated species recognition using DNA
has made it easy to link taxonomic information with diverse biological
questions and applied research aspects (e. g., rapid assessment of
biodiversity; Yu et al., 2012). Species delimitation and identification
of animals are often based on information from a single mitochondrial
gene, cytochrome oxidase I (COI) (Hebert et al., 2003; Fontaneto et al.,
2015). Such single-marker reliance can lead to errors due to
extrachromosomal inheritance, incomplete lineage sorting, sex-biased
dispersal, asymmetrical introgression, and Wolbachia -mediated
genetic sweeps of the marker (Funk & Omland, 2003; Ballard & Whitlock,
2004). At the same time, species delimitation approaches using
nuclear-encoded markers have considerably improved in accuracy, allowing
to complement the currently established single-gene barcoding approach
(Dowton et al., 2014; Eberle et al., 2020; Gueuning et al., 2020;
Prebus, 2021; Erikson et al., 2021; Dietz et al., 2023).
Besides mitochondrial genes, a variety of conserved nuclear markers have
been used for species delimitation in different phylogenetic groups of
Metazoa, such as nuclear ribosomal RNA genes (Lebonah et al., 2014; Chen
et al., 2017; Krehenwinkel et al., 2019) and various housekeeping genes
(Joshi et al., 2022). Furthermore, restriction site-associated DNA
sequences (RADseq) (Baird et al., 2008; Pante et al., 2015; Herrera &
Shank, 2016) and ultra-conserved elements (UCE) linked to more rapidly
evolving flanking regions (Faircloth et al., 2012; Bejerano et al.,
2004; Ješovnik et al., 2017; Zarza et al., 2018; Gueuning et al., 2020;
Prebus, 2021) were used. However, these nuclear marker systems can
hardly be applied universally across animals, either because they
insufficiently capture intraspecific variation or because they do not
provide orthologous loci across distantly related taxa (Pierce, 2019;
Eberle et al., 2020).
Recently, Metazoa-level Universal Single Copy Orthologs (USCOs) have
been proposed as a universal marker set for species-level DNA taxonomy
of animals as an extension and improvement of conventional DNA barcoding
(Eberle et al., 2020). USCOs are defined as protein-coding genes that
are present and single-copy in at least 90% of the species within the
available genomes of a given taxonomic group. They have originally been
developed to benchmark the quality of genome assemblies (“BUSCO”,
Simão et al., 2015). However, they also proved to be highly informative
for addressing phylogenomic questions (Waterhouse et al., 2018;
Fernández et al., 2018; Zhang et al., 2019; Stolle et al., 2022). This
insight has led to the development of a recently published automated
software pipeline that extracts USCOs from genome assemblies and
generates phylogenies from the extracted sequence data (Sahbou et al.,
2022). Finally, Metazoa-level USCOs (mzl-USCOs) have been shown to allow
distinguishing highly similar morphospecies (even when COI was
unable to do so) and reliably estimating their phylogenetic
relationships in several clades of arthropods and vertebrates (Dietz et
al., 2023).
What has remained unclear is whether mzl-USCOs can be considered a
genetically unlinked representative sample of a species’ genome, which
is a prerequisite for USCOs being reliable and useful in
coalescent-based phylogenetic analyses and applications. Knowledge of
the spatial distribution and physical linkage of mzl-USCOs is hence
fundamental to assess whether these markers are indeed as suitable for
delimiting species with coalescent-based approaches as currently
assumed. We here study the two parameters ”spatial distribution” and
”physical linkage” by extracting USCOs from published whole genomes
assembled to chromosome-level (WG) of various species of Metazoa and
analyzing the physical distances between USCOs and their distribution
across chromosomes. Furthermore, using unassembled reads from whole
genome sequencing (WGS) datasets of four metazoan lineages (i.e,Anopheles mosquitos, Drosophila fruit flies,Heliconius butterflies, and Darwin’s finches), we assess to what
extent phylogenetic analysis of the extracted mzl-USCOs provides results
consistent with those of previous studies that used more extensive sets
of markers from the same genomes.