Comparative analysis of chloroplast genomes of endangered heterostylous species Primula wilsonii and its closely related species
Yan-Ping Xie 1, Gang-Gang Yang 2, Chan Zhang 2, Xing-Wang Zhang 1,* and Xian-Feng Jiang 3,*
1 School of Life Sciences, Huaibei Normal University, Huaibei 253000, China; xieyanping0001@163.com;
2 School of Life Sciences, Henan Normal University, Xinxiang 453007, China; yangganggang@htu.deu.cn; zhangchan0705@163.com
3 College of Agriculture and Bioscience, Dali University, Dali 671003, China
* Correspondence: zhangxingwang79@126.com and jiangxianfeng@dali.edu.cn
Abstract: Primula , well known for its heterostyly, is the largest genus in the family Primulaceae with more than 500 species. The considerable species number has introduced a huge challenge for taxonomy. Although several phylogenetic constructions have been carried out thoroughly, the relationships between Primula species were remained incompletely understood, especially for the relationship among sections within Chinese species. P. wilsonii Dunn is a PSESP (plant species with extremely small populations) with very limited genetic information to explore its endangered mechanism and conservation. In this study, we sequenced and assembled the complete chloroplast genomes of P. wilsonii using Illumina sequencing and compared its genomic sequences with those of four related Primula species. The chloroplast genomes of Primula species were similar in the basic structure, gene order and GC content. The detected 38 SSRs loci and 17 hyper-variable regions had many similarities in P. wilsonii ,P. anisodora , P. miyabeana and P. poissonii , but showed a significant difference compared with those in P. secundiflora . Slight variations were observed among Primulachloroplast genomes, in consideration of the relatively stable patterns of IR contraction and expansion. Phylogenetic analysis based on chloroplast genomes confirmed three major clades in ChinesePrimula , but the infrageneric sections were not in accordance with morphological traits. The P. poissonii complex was confirmed here and P. anisodora was the species that was most closely related to P. wilsonii . Overall, the chloroplast genome sequences provided useful genetic and evolutionary information for phylogeny, population genetics and conservation studies on Chinese Primulaspecies.
Keywords: plastid genome; Chinese Primula ; phylogeny; sequence divergences; plant species with extremely small populations
1. Introduction
As a genetically controlled floral polymorphism, heterostyly has evolved numerous times in at least 30 families of angiosperm plants [1-3], and this number is increasing with the emergence of new reports [4-7]. Heterostylous plants have a reciprocal arrangement of stigma and anther heights, which are referred to as reciprocal herkogamy and are believed to promote outcrossing [8, 9]. Reciprocal herkogamy is often linked with a self-incompatibility system that prevents or strongly limits self- and intramorph fertilization [10]. Heterostylous plants may comprise distylous (long style and short style) or tristylous (long style, middle style, and short style) morphs [11]. Since the heterostyle was described by Darwin in his classic book The different forms of flowers on plants of the same species , it has intrigued numerous researchers for centuries [9, 12-14].
Primula is famous for its mating with the heterostyly system and high ornamental value. This group comprises about 500 species, with more than 300 species distributed in China, particularly clustering in the Himalaya–Hengduan Mountains [15]. The exceptionally high species diversity of Primula was presumed to be triggered by the extensive uplifts of the Qinghai–Tibet Plateau and climate oscillations during the Quaternary [16]. The taxonomic study of Primulahas been revised many times according to morphological characters. The infrageneric system with a total of 31 sections of Primula was initially proposed by Smith and Fletcher [17]. Later, a revised system with seven subgenera was proposed that considered some putative reticulate evolutionary relationships [18]. Then, a six-subgenera system was amended in Primula by Richards [15], whereas, the subgenus concept was not adopted in the delimitation of the ChinesePrimula ; a total of 24 sections were delimited [19]. Moreover, many key diagnostic traits, such as the shape of calyx, are slightly different and nonquantitative. The frequently happened natural hybridization and gene introgression also confuse species boundaries in this genus [20-24]. Although an increasing number of phylogenetic constructions have been carried out previously and have greatly advanced our understanding of the evolution of Primula , the phylogenetic relationships within the genus Primula species have remained incompletely resolved.
Primula wilsonii Dunn is a perennial herb in Sect.Proliferae Pax of Primula (Primulaceae). The most closely related species of P. wilsonii are P. miyabeana , P. poissonii and P. anisodora , these species clustered in a well-supported complex in Sect. Proliferae based on phylogeny construction using rbcL + matK + ITS markers, which is considered to be the first choice to barcode Primula plants at present [25]. P. wilsonii only sporadically scatters in limited areas of Sichuan and Yunnan provinces, unlike the other widespread Primulas that are common in the fields and gardens. Considering the limited distribution areas and small size of populations in the fields, P. wilsonii was identified as a PSESP (plant species with extremely small populations) and in need of protection eagerly [26]. However, the mechanisms leading to an endangered situation and measures to promote the conservation of P. wilsoniiare very scarce.
Chloroplasts are the photosynthetic organelles in plant cells, with highly conserved genomes that are inherited maternally in major angiosperms [27-29]. The quadripartite structure of the angiosperm chloroplast genome consists of a large single copy (LSC) region and a small single copy (SSC) region, separated by two inverted repeats (IRs) [30]. The lack of recombination and slow evolutionary rate as compared with mitochondrial and nuclear genomes make it suitable for phylogeny at genus and family level, species barcoding, population genetics and the conservation of endangered species [28, 30-35]. Additionally, up till now, taking advantage of the development of high-throughput sequencing technology, more than 7,000 chloroplast genomes of land plants have been publicly known [36] since the publication of the first plastid genome sequences for Nicotiana tabacum and Marchantia polymorpha [37,38].
Unfortunately, few chloroplast genomes have been reported inPrimula species. In this study, we determine the complete chloroplast genomes of a PSESP species in this genus [39]. In addition, we explore simple sequence repeats (SSRs) loci and identify highly variable regions by comparing the genome contents and structures between the PSESP specie P. wilsonii and its widely spread closely related species. Here, we aim to: (1) investigate the molecular structures of the chloroplast genomes of P. wilsonii , (2) examine the variations of SSRs and highly divergent regions that could be used as specific DNA markers for P. willsonii and its close relatives , (3) evaluate the evolution of several Primula chloroplast genomes, and (4) facilitate the conservation and systematics of the Chinese Primula species.
2. Materials and Methods
2.1. Plant materials, DNA extraction and sequencing
The plant materials used in this study were collected from Wuxuhai, Sichuan Province, China. The voucher specimen (voucher accession number XYP202007016) was deposited in the Key Laboratory of Plant Resource and Biology at Huaibei Normal University. Total genomic DNA was extracted from silica-dried leaves with a modified cetyltrimethylammonium bromide (CTAB) method [36]. The quality and quantity of the DNA samples were determined using agarose gel electrophoresis and the Qubit dsDNA BR assay (Life Technologies, Carlsbad, CA, USA). The qualified PCR-amplified library was sequenced with the Illumina NovaSeq Tenplatform (Nanjing Genepioneer Biotechnologies Inc., Nanjing, China).
2.2. Genome assembly, annotation and analysis
The low quality reads were assessed and filtered using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and Trimmomatic v 0.36 to obtain high-quality clean data [41]. The chloroplast genome sequence of P. wilsonii was assembled by a de nova method using SPAdes Genome Assembler v3.10.1 [42]. The k-mers were set to 55, 87 and 121 to achieve optimal assembly. The whole chloroplast genomes were assembled using high-coverage and long-sequenced contigs. Then, the SSPACE v2.0 was used to construct the scaffold of the chloroplast genomes [43] and Gapfiller v2.1.1 was used to fill the gaps [44]. Finally, Bowtie2 v2.2.4 was used to validate the genome assembly by mapping the initial reads to the assembled genome sequence [45]. The chloroplast genomes were annotated in two ways. Prodigal v2.6.3 [46], Hmmer v3.1b2 [47] and ARAGORN v1.2.38 [48] were used to detect the CDS, rRNA and tRNA, respectively. Furthermore, BLAST (NCBI) was used to check the annotation, followed by manual correction through comparison with the other closely related chloroplast genomes. The circular chloroplast genome map of P. wilsonii was then generated using OGDraw [49]. The repeating sequences including forward, reverse, complement and palindrome repeats were identified using the online REPuter program [50], with three for Hamming distance and 30 for minimal repeat size. Simple sequence repeats (SSRs) were detected using MISA software (https://webblast.ipk-gatersleben.de/misa/) [51]. Thresholds for a minimum number of repeat units were set as follows: 10 for mono-nucleotide repeat units; 5 for di-; 4 for tri-; and 3 for tetra-, penta-, or hexa-nucleotide repeat units.
2.3. Comparative plastome analysis
Gene distribution and the percentage of sequence identity were compared and visualized using mVISTA software [52] with the LAGAN mode [53] in chloroplast genomes of five Primula species, withP. secundiflora , P. poissonii , P. anisodora , andP. miyabeana selected as close relatives ofP. wilsonii. All the chloroplast genomes were obtained from NCBI, except for P. wilsonii. The annotation of P. wilsonii served as a reference. Nucleotide variability values (Pi ) were calculated using the same alignment. Nucleotide diversity was detected by sliding window analysis conducted in DnaSP v.6.11.01 with a step size of 200 bp and window length of 600 bp [54]. The expansion or contraction of the inverted repeat (IR) regions in the chloroplast genomes of the five related Primula species weas investigated and visualized using IRscope program [55].
2.4. Phylogenetic analyses
To investigate the phylogenetic relationship in the sections of ChinesePrimula and the resolution ability of chloroplast genomes, phylogenetic analysis was performed for the 60 species representing 20/24 sections of Chinese Primula based on the whole chloroplast genomes downloaded from NCBI. Two outgroup species (Andorsace bulleyana and Ardisia solanacea ) were sampled from a closely related genus Andorsace and a more distantly related genusArdisia of the Primulaceae in the sense of phylogenetic relationships [56]. The complete chloroplast sequences were firstly aligned by MAFFT v7.307 [57]. jModelTest 2.0 was used to find the best nucleotide-substitution model before phylogenetic construction [58].Trees were then constructed using the maximum likelihood (ML) method by online RaxML BlackBox software [59]. ML was implemented starting from random trees, using 1,000 rapid bootstrap inferences with a General Time Reversible (GTR) nucleotide-substitution model as suggested. The final phylogenetic results were viewed by using FigTree v.1.6.1.
3. Results
3.1. Chloroplast genomes features
The assembled chloroplast genome of P. wilsonii was 151,677 bp in length, exhibiting a typical circular chloroplast structure like most angiosperms. As shown in Figure 1, it contains a large single-copy (LSC) region of 83,510 bp, a small single-copy (SSC) region of 17,765 bp, and a pair of inverted repeats (IRa and IRb) of 25,201 bp. The average coverage depth of the assembled chloroplast genome was 1249 ×. Overall, the GC content was 36.99%. We found an uneven distribution of GC content across the regions of the genomes, which was 34.89%, 30.18% and 42.87% for the LSC, SSC and IR regions, respectively. The chloroplast genome structures, the size of each region, and the GC contents of the five chloroplast genomes of Primula species were compared and are shown in Table 1.