Phylogenomic and syntenic data demonstrate complex evolutionary processes in early radiation of the rosids
Luxian Liu1,2†, Mengzhen Chen1†, Ryan A. Folk3†, Meizhen Wang2, Tao Zhao4, Fude Shang1,5 Douglas E. Soltis6,7, and Pan Li2*
1Laboratory of Plant Germplasm and Genetic Engineering, School of Life Sciences, Henan University, Kaifeng, Henan, 475001, China
2Key Laboratory of Biosystems Homeostasis and Protection (Zhejiang University), Ministry of Education, Hangzhou, Zhejiang, 310058, China
3Department of Biological Sciences, Mississippi State University, Starkville, MS, United States
4State Key Laboratory of Crop Stress Biology for Arid Areas/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi, 712100, China
5Henan Engineering Research Center for Osmanthus Germplasm Innovation and Resource Utilization, Henan Agricultural University, Zhengzhou, Henan, 450002, China
6Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611 United States
7Department of Biology, University of Florida, Gainesville, FL, 32611 United States
These authors contributed equally to this work
*Corresponding author:
Pan Li (Email: panli_zju@126.com, Phone: +8613757152017)
Abstract
Some of the most vexing problems of deep-level relationships in angiosperms involve superrosids. The superrosid clade contains a quarter of all angiosperm species, with 18 orders in three subclades (Vitales, Saxifragales, and core rosids) exhibiting remarkable morphological and ecological diversity. To help resolve deep-level relationships, we constructed a high-quality chromosome-level genome assembly forTiarella polyphylla (Saxifragaceae), thereby providing a broader genomic representation of Saxifragales. Whole genome microarray analysis of superrosids showed that Saxifragales shared more synteny clusters with core rosids than Vitales, further supporting Saxifragales as being more closely related to core rosids. To resolve the ordinal phylogeny of superrosids, we screened 122 single-copy nuclear genes from the genomes of 36 species representing all 18 superrosid orders. Vitales were recovered as sisters to all other superrosids (Saxifragales + core rosids). Our data suggest dramatic differences in these relationships compared to earlier studies of core rosids. Fabids should be restricted to the nitrogen-fixing clade, while Picramniales, the Celastrales-Malpighiales (CM) clade, Huerteales, Oxalidales, Sapindales, Malvales, and Brassicales formed an “expanded” malvid clade. The Celastrales-Oxalidales-Malpighiales (COM) clade (sensu APG IV) was not monophyletic. Crossosomatales, Geraniales, Myrtales, and Zygophyllales did not belong to either our well-supported malvids or fabids.
There is a strong discordance between nuclear and plastid phylogenetic hypotheses for superrosid relationships, which can be best explained by a combination of incomplete lineage sorting and ancient reticulation.
.
Key words: genome assembly, Tiarella polyphylla , Angiosperm-mega 353, phylogeny, superrosids, ancient reticulation.
Introduction
The core eudicots consist of Gunnerales, Dilleniales, superrosids, and superasterids, with the latter two containing the vast majority of flowering plant diversity (Drinnan et al., 1994; Soltis et al., 2018 ). Superrosids, comprising core rosids (eurosids), Saxifragales, and Vitales, contain more than 90,000 species and thus represent more than a quarter of all angiosperms (Wang et al., 2009; Sun et al., 2020 ). Superrosid species exhibit remarkable morphological and ecological diversity and include herbs, shrubs, trees, vines, aquatics, succulents, and parasites (Zhao et al., 2016 ); Many important crops, as well as forest trees, are superrosids (Wang et al., 2009 ) including Rosales (e.g., apple, jujube, and mulberry), Vitales (grape), Cucurbitales (watermelon, cucumber), Fabales (peanut, soybean), Fagales (walnut, waxberry, oak), and Brassicales (radish, mustard, and cabbage). Several superrosid orders, such as Malvales, Myrtales, Cucurbitales, Fabales, Rosales, and Saxifragales, exhibit exceptionally high diversification rates among angiosperms (Magallon & Sanderson, 2001; Folk et al., 2019; Sun et al., 2021 ). The enormous diversity and ecological and economic importance of superrosid species highlights the importance of greater resolution in superrosid phylogeny.
The monophyly of superrosids has been recovered repeatedly in previous studies, with both organellar (Moore et al., 2010; Sun et al., 2015; Li et al., 2019a ) and nuclear genes (Zhang et al., 2012; One Thousand Plant Transcriptomes Initiative, 2019; Sun et al., 2021 ), as well as combined datasets (Wang et al., 2009; Sun et al., 2020 ). However, relationships within superrosids have proven more problematic. In APG IV (2016) , Saxifragales were sister to Vitales plus core rosids, a topology found in multiple phylogenetic studies of mostly plastid genes (e.g., Wang et al., 2009; Soltis et al., 2011; Li et al., 2019a ). The core rosid clade, in turn, consisted of fabid and malvid subclades. The fabids contained the COM clade (Celastrales, Oxalidales, and Malpighiales), nitrogen-fixing clade (Fabales, Rosales, Cucurbitales, and Fagales), and Zygophyllales, which include Geraniales, Myrtales, Crossosomatales, Picramniales, Sapindales, Huerteales, Malvales, and Brassicales.
Although superrosids have long been the focus of phylogenetic research (Wang et al., 2009; Soltis et al., 2011; Zhang et al., 2012; Li et al., 2019a; Sun et al., 2020 ), relationships remain problematic, in part because of rapid radiation (Wang et al., 2009) combined with substantial recent evidence of incongruence between nuclear and plastid topologies (Zhang et al., 2012; Li et al., 2019a; Sun et al., 2020 ). Key problems in our understanding of relationships in superrosids remain:1) Are Saxifragales or Vitales the sister lineage of core rosids? 2) What are the major subclades within core rosids, and what orders should be included in fabids vs. malvids? 3) What are the relationships between COM clade members, and are they actually monophyletic? An improved nuclear-based phylogeny of superrosids and core rosids would help provide a better understanding of the evolutionary history of this enormous clade.
Previous phylogenetic studies of superrosids were primarily based on plastid and mitochondrial genes or relied on a small number of nuclear genes (Wang et al., 2009; Moore et al., 2010; Zhang et al., 2012; Sun et al., 2016; Li et al., 2019a; Sun et al., 2020 ), with a recent exception that includes numerous nuclear genes derived from transcriptomes (One Thousand Plant Transcriptomes Initiative, 2019 ). Organellar genomes (mitochondrial genomes and plastomes) are generally inherited uniparentally, and the mitochondrial genome is slowly evolving and sometimes affected by horizontal gene transfer, which introduces biases and errors in phylogenetic reconstruction (Birky, 2001; Davis et al., 2014 ); likewise The plastome is frequently transferred horizontally through introgression (Okuyama et al., 2005; Stegemann et al., 2012 ). In contrast, nuclear genes are inherited biparentally and show higher substitution rates than organellar genes, thereby overcoming many of these issues (Springer et al., 2001; Davis et al., 2014 ). In particular, low- or single-copy nuclear genes provide a crucial line of evidence for resolving angiosperm phylogeny (Zeng et al., 2014; Zhang et al., 2020 ), and the importance of using these genes for phylogenetic reconstruction has long been recognized (Strand et al., 1997; Duarte et al., 2010; Zhang et al., 2012 ). Therefore, the use of a sufficient number of single- or low-copy nuclear genes coupled with broad taxon sampling is a promising approach to elucidate angiosperm phylogeny (Duarte et al., 2010; Soltis et al., 2018; One Thousand Plant Transcriptomes Initiative, 2019 ). In green plants, however, identifying orthologous loci has proven difficult because of frequent whole-genome duplication events, especially in angiosperms (Blanc & Wolfe, 2004; Barker et al., 2009 ). The increasing availability of genomic resources held in public repositories and the availability of many newly developed bioinformatic pipelines to identify low- or single-copy genes have enabled bait kit design for orthologous genes from a wide range of flowering plant groups (Campana, 2018; Vatanparast et al., 2018; McLay et al., 2021 ). Universal bait kits, such as Angiosperms353 loci used in this study, aim to capture the same set of loci from samples representing significant phylogenetic breadth and evolutionary timescales (Bossert& Danforth, 2018; Johnson et al., 2019; Breinholt et al., 2021 ). Currently, the Angiosperms353 probe set has been widely used to study the relationships between different groups (Maurin et al., 2021; Thomas et al., 2021; Zuntini et al., 2021; Acha & Majure, 2022 ).
Increasing amounts of genomic data have been sequentially applied to resolve rapid radiation in both green plant (Carlsen et al., 2018; Rouard et al., 2018 ) and animal (Malinsky et al., 2018; Jensen et al., 2021 ) lineages. Much of this work has used large numbers of coding regions extracted from genomes; however, chromosome-level genomes offer an additional path to assessing phylogenetic relationships via microsynteny, which is particularly valuable for resolving recalcitrant phylogenetic nodes (Zhao et al., 2021 ). A number of available genome assemblies have been published for Vitales (Massonnet et al., 2020; Minio et al., 2022 ), as well as for diverse families and orders of the core rosids (Wang et al., 2021b; Wang et al., 2022a ), Rosales (Jiao et al., 2020; Cao et al., 2022 ), but few high-quality genomic resources have been obtained for Saxifragales, preventing the use of this information to resolve phylogeny or understand genome evolution in the earliest radiation of the superrosids. Although small, Saxifragales are an ancient and morphologically diverse group (Jian et al., 2008; Soltis et al., 2018 ) with early and rapid radiation (~89.5 to 110 Ma) that has made resolving phylogenetic relationships challenging (Fishbein et al., 2001; Wang et al., 2009; Jian et al., 2008; Dong et al., 2018; Folk et al., 2019 ). For the 15 families of Saxifragales, seven whole-genome assemblies from four families are available: Paeonia ostii T. Hong and J. X. Zhang (Yuan et al., 2022 ), Paeoniasuffruticosa Andrews (Paeoniaceae, Lv et al., 2020 ),Hamamelis virginiana L. (Hamamelidaceae, Korgaonkar et al., 2021 ), Cercidiphyllum japonicum Siebold et Zucc. (Cercidiphyllaceae, Zhu et al., 2020 ), and three Crassulaceae species (Kalanchoe fedtschenkoi Raym.-Hamet et H. Perrier,Yang et al., 2017 ; Rhodiola crenulata (Hook. f. et Thoms.) H. Ohba, Fu et al., 2017 ; Sedum album L.,Wai et al., 2019 ). However, of these assembled genomes, only C. japonicum and P. ostiiare assembled at the chromosomal level. To improve the genome resources for Saxifragales and provide genome-scale data needed for our analyses of relationships, we produced a chromosome-level genome assembly forTiarella polyphylla D. Don (Saxifragaceae) (Fig. 1-A ). This species has a wide distribution (Wu & Raven, 2003 ); it is an ideal model for use in future biogeographic studies as well as to investigate the features of Saxifragaceae (e.g., it is used in traditional medicine; Lee et al., 2012; Kim et al., 2021 ).
In this study we: (1) use gene sequence data for numerous nuclear loci representing all orders of superrosids to resolve relationships and evolutionary history; (2) constructed a high-quality chromosomal assembly reference genome for T. polyphylla to help elucidate evolutionary history; and (3) combined our newly generated complete genome and published complete nuclear genome sequences to conduct microsynteny analyses of superrosids to further resolve relationships.