Comparative analysis of chloroplast genomes of endangered heterostylous
species Primula wilsonii and its closely related species
Yan-Ping Xie 1, Gang-Gang Yang 2,
Chan Zhang 2, Xing-Wang Zhang 1,*
and Xian-Feng Jiang 3,*
1 School of Life Sciences, Huaibei Normal University,
Huaibei 253000, China; xieyanping0001@163.com;
2 School of Life Sciences, Henan Normal University,
Xinxiang 453007, China;
yangganggang@htu.deu.cn;
zhangchan0705@163.com
3 College of Agriculture and Bioscience, Dali
University, Dali 671003, China
* Correspondence: zhangxingwang79@126.com
and
jiangxianfeng@dali.edu.cn
Abstract: Primula ,
well known for its heterostyly, is the largest genus in the family
Primulaceae with more than 500 species. The considerable species number
has introduced a huge challenge for taxonomy. Although several
phylogenetic constructions have been carried out thoroughly, the
relationships between Primula species were remained incompletely
understood, especially for the relationship among sections within
Chinese species. P. wilsonii Dunn is a PSESP (plant species with
extremely small populations) with very limited genetic information to
explore its endangered mechanism and conservation. In this study, we
sequenced and assembled the complete chloroplast genomes of P.
wilsonii using Illumina sequencing and compared its genomic sequences
with those of four related Primula species. The chloroplast
genomes of Primula species were similar in the basic structure,
gene order and GC content. The detected 38 SSRs loci and 17
hyper-variable regions had many similarities in P. wilsonii ,P. anisodora , P. miyabeana and P. poissonii , but
showed a significant difference compared with those in P.
secundiflora . Slight variations were observed among Primulachloroplast genomes, in consideration of the relatively stable patterns
of IR contraction and expansion. Phylogenetic analysis based on
chloroplast genomes confirmed three major clades in ChinesePrimula , but the infrageneric sections were not in accordance
with morphological traits. The P. poissonii complex was confirmed
here and P. anisodora was the species that was most closely
related to P. wilsonii . Overall, the chloroplast genome sequences
provided useful genetic and evolutionary information for phylogeny,
population genetics and conservation studies on Chinese Primulaspecies.
Keywords: plastid genome; Chinese Primula ; phylogeny;
sequence divergences; plant species with extremely small populations
1. Introduction
As a genetically controlled floral polymorphism, heterostyly has evolved
numerous times in at least 30 families of angiosperm plants [1-3],
and this number is increasing with the emergence of new reports
[4-7]. Heterostylous plants have a reciprocal arrangement of stigma
and anther heights, which are referred to as reciprocal herkogamy and
are believed to promote outcrossing [8, 9]. Reciprocal herkogamy is
often linked with a self-incompatibility system that prevents or
strongly limits self- and intramorph fertilization [10].
Heterostylous plants may comprise distylous (long style and short style)
or tristylous (long style, middle style, and short style) morphs
[11]. Since the heterostyle was described by Darwin in his classic
book The different forms of flowers on plants of the same
species , it has intrigued numerous researchers for centuries [9,
12-14].
Primula is famous for its mating with the heterostyly system and
high ornamental value. This group comprises about 500 species, with more
than 300 species distributed in China, particularly clustering in the
Himalaya–Hengduan Mountains [15]. The exceptionally high species
diversity of Primula was presumed to be triggered by the
extensive uplifts of the Qinghai–Tibet Plateau and climate oscillations
during the Quaternary [16]. The taxonomic study of Primulahas been revised many times according to morphological characters. The
infrageneric system with a total of 31 sections of Primula was
initially proposed by Smith and Fletcher [17]. Later, a revised
system with seven subgenera was proposed that considered some putative
reticulate evolutionary relationships [18]. Then, a six-subgenera
system was amended in Primula by Richards [15], whereas, the
subgenus concept was not adopted in the delimitation of the ChinesePrimula ; a total of 24 sections were delimited [19].
Moreover, many key diagnostic traits, such as the shape of calyx, are
slightly different and nonquantitative. The frequently happened natural
hybridization and gene introgression also confuse species boundaries in
this genus [20-24]. Although an increasing number of phylogenetic
constructions have been carried out previously and have greatly advanced
our understanding of the evolution of Primula , the phylogenetic
relationships within the genus Primula species have remained
incompletely resolved.
Primula wilsonii Dunn is a perennial herb in Sect.Proliferae Pax of Primula (Primulaceae). The most closely
related species of P. wilsonii are P. miyabeana , P.
poissonii and P. anisodora , these species clustered in a
well-supported complex in Sect. Proliferae based on phylogeny
construction using rbcL + matK + ITS markers, which
is considered to be the first choice to barcode Primula plants at
present [25]. P. wilsonii only sporadically scatters in
limited areas of Sichuan and Yunnan provinces, unlike the other
widespread Primulas that are common in the fields and gardens.
Considering the limited distribution areas and small size of populations
in the fields, P. wilsonii was identified as a PSESP (plant
species with extremely small populations) and in need of protection
eagerly [26]. However, the mechanisms leading to an endangered
situation and measures to promote the conservation of P. wilsoniiare very scarce.
Chloroplasts are the photosynthetic organelles in plant cells, with
highly conserved genomes that are inherited maternally in major
angiosperms [27-29]. The quadripartite structure of the angiosperm
chloroplast genome consists of a large single copy (LSC) region and a
small single copy (SSC) region, separated by two inverted repeats (IRs)
[30]. The lack of recombination and slow evolutionary rate as
compared with mitochondrial and nuclear genomes make it suitable for
phylogeny at genus and family level, species barcoding, population
genetics and the conservation of endangered species [28, 30-35].
Additionally, up till now, taking advantage of the development of
high-throughput sequencing technology, more than 7,000 chloroplast
genomes of land plants have been publicly known [36] since the
publication of the first plastid genome sequences for Nicotiana
tabacum and Marchantia polymorpha [37,38].
Unfortunately, few chloroplast genomes have been reported inPrimula species. In this study, we determine the complete
chloroplast genomes of a PSESP species in this genus [39]. In
addition, we explore simple sequence repeats (SSRs) loci and identify
highly variable regions by comparing the genome contents and structures
between the PSESP specie P. wilsonii and its widely spread
closely related species. Here, we aim to: (1) investigate the molecular
structures of the chloroplast genomes of P. wilsonii , (2) examine
the variations of SSRs and highly divergent regions that could be used
as specific DNA markers for P. willsonii and its close relatives ,
(3) evaluate the evolution of several Primula chloroplast
genomes, and (4) facilitate the conservation and systematics of the
Chinese Primula species.
2. Materials and Methods
2.1. Plant materials, DNA extraction and sequencing
The plant materials used in this study were collected from Wuxuhai,
Sichuan Province, China. The voucher specimen (voucher accession number
XYP202007016) was deposited in the Key Laboratory of Plant Resource and
Biology at Huaibei Normal University. Total genomic DNA was extracted
from silica-dried leaves with a modified cetyltrimethylammonium bromide
(CTAB) method [36]. The quality and quantity of the DNA samples were
determined using agarose gel electrophoresis and the Qubit dsDNA BR
assay (Life Technologies, Carlsbad, CA, USA). The qualified
PCR-amplified library was sequenced with the Illumina NovaSeq
Tenplatform (Nanjing Genepioneer Biotechnologies Inc., Nanjing, China).
2.2. Genome assembly, annotation and analysis
The low quality reads were assessed and filtered using FastQC
(https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and
Trimmomatic v 0.36 to obtain high-quality clean data [41]. The
chloroplast genome sequence of P. wilsonii was assembled by a de
nova method using SPAdes Genome Assembler v3.10.1 [42]. The k-mers
were set to 55, 87 and 121 to achieve optimal assembly. The whole
chloroplast genomes were assembled using high-coverage and
long-sequenced contigs. Then, the SSPACE v2.0 was used to construct the
scaffold of the chloroplast genomes [43] and Gapfiller v2.1.1 was
used to fill the gaps [44]. Finally, Bowtie2 v2.2.4 was used to
validate the genome assembly by mapping the initial reads to the
assembled genome sequence [45]. The chloroplast genomes were
annotated in two ways. Prodigal v2.6.3 [46], Hmmer v3.1b2 [47]
and ARAGORN v1.2.38 [48] were used to detect the CDS, rRNA and tRNA,
respectively. Furthermore, BLAST (NCBI) was used to check the
annotation, followed by manual correction through comparison with the
other closely related chloroplast genomes. The circular chloroplast
genome map of P. wilsonii was then generated using OGDraw
[49]. The repeating sequences including forward, reverse, complement
and palindrome repeats were identified using the online REPuter program
[50], with three for Hamming distance and 30 for minimal repeat
size. Simple sequence repeats (SSRs) were detected using MISA software
(https://webblast.ipk-gatersleben.de/misa/) [51]. Thresholds for a
minimum number of repeat units were set as follows: 10 for
mono-nucleotide repeat units; 5 for di-; 4 for tri-; and 3 for tetra-,
penta-, or hexa-nucleotide repeat units.
2.3. Comparative plastome analysis
Gene distribution and the percentage of sequence identity were compared
and visualized using mVISTA software [52] with the LAGAN mode
[53] in chloroplast genomes of five Primula species, withP. secundiflora , P. poissonii , P. anisodora , andP. miyabeana selected as close relatives ofP.
wilsonii. All the chloroplast genomes were obtained from NCBI, except
for P. wilsonii. The annotation of P. wilsonii served as a
reference. Nucleotide variability values (Pi )
were calculated using the same alignment. Nucleotide diversity was
detected by sliding window analysis conducted in DnaSP v.6.11.01 with a
step size of 200 bp and window length of 600 bp [54]. The expansion
or contraction of the inverted repeat (IR) regions in the chloroplast
genomes of the five related Primula species weas investigated and
visualized using IRscope program [55].
2.4. Phylogenetic analyses
To investigate the phylogenetic relationship in the sections of ChinesePrimula and the resolution ability of chloroplast genomes,
phylogenetic analysis was performed for the 60 species representing
20/24 sections of Chinese Primula based on the whole chloroplast
genomes downloaded from NCBI. Two outgroup species (Andorsace
bulleyana and Ardisia solanacea ) were sampled from a closely
related genus Andorsace and a more distantly related genusArdisia of the Primulaceae in the sense of phylogenetic
relationships [56]. The complete chloroplast sequences were firstly
aligned by MAFFT v7.307 [57]. jModelTest 2.0 was used to find the
best nucleotide-substitution model before phylogenetic construction
[58].Trees were then constructed using the maximum likelihood (ML)
method by online RaxML BlackBox software [59]. ML was implemented
starting from random trees, using 1,000 rapid bootstrap inferences with
a General Time Reversible (GTR) nucleotide-substitution model as
suggested. The final phylogenetic results were viewed by using FigTree
v.1.6.1.
3. Results
3.1. Chloroplast genomes features
The assembled chloroplast genome of P. wilsonii was 151,677 bp in
length, exhibiting a typical circular chloroplast structure like most
angiosperms. As shown in Figure 1, it contains a large single-copy (LSC)
region of 83,510 bp, a small single-copy (SSC) region of 17,765 bp, and
a pair of inverted repeats (IRa and IRb) of 25,201 bp. The average
coverage depth of the assembled chloroplast genome was 1249 ×. Overall,
the GC content was 36.99%. We found an uneven distribution of GC
content across the regions of the genomes, which was 34.89%, 30.18%
and 42.87% for the LSC, SSC and IR regions, respectively. The
chloroplast genome structures, the size of each region, and the GC
contents of the five chloroplast genomes of Primula species were
compared and are shown in Table 1.