Genome Assembly and Annotation
We generated a total of 17.2 Gb data and 13.6 Gb clean data
(Table S1 ). All Nanopore subreads were corrected using
canu-correct and trimmed by canu-trim for low-quality bases. The reads
≥500 bp were used to generate an initial assembly with WTDBG. We used
Pilon to polish the genome assembly twice, to finally obtain a 215.4-Mb
contig-scale assembly with contig N50 of
1.81 Mb. The genome contained
447 contigs, with the longest
contig being 11.13 Mb in length. We then anchored these contigs into six
chromosomes with Hi-C reads by 3D-DNA (Dudchenko et al., 2017). This
assembled chromosome-scale genome is 215.2 Mb in length with chromosome
N50 = 34.8 Mb (Table 1,
Fig. 1B ).
The 215.2-Mb draft M. pygmaea genome represents a
high-quality near-complete genome assembly. A total of 1,395/1,440
plant-specific orthologs were present, indicating an estimated
completeness of 96.9% (Table S2 ). The assembly size fell only
slightly below estimates from
K-mer
analysis and
flow
cytometric: 259 Mb and 219 Mb, respectively (Figs. S1 and S2 ).
In total, 25,607 genes were predicted, with an average gene length,
coding sequence length and an average exon number of 2,628 base pairs
(bp), 234 bp and 5.4 exons, respectively (Table 1 ). The vast
majority of gene models were supported by complementary DNA/expressed
sequence tag evidence. In our assembly, 97.03% of the genes (24,846 of
25,607) were annotated on six chromosomes, and only 2.97% (761 of
25,607) remained on scaffolds (Table S3 ). A total of 91.79 Mb
(42.66%) of the assembled M. pygmaea genome is composed of
repetitive sequences (Table 1 ). Among these repetitive
elements, most are LTR retrotransposons, spanning 25.21% of the
assembled genome, including 23.93% of intact LTR retrotransposons,
followed by DNA transposons (7.03%) and LINEs (2.90%) (Table
S4 ). The insertions of the LTR-RTs in M. pygmaea occurred
earlier than in A. lyrata (Fig. 1C ). The M.
pygmaea genome contains a similar number of transcription factors (TFs)
(1,571) as these Brassicaceae species (Table S5 ;
http://www.transcriptionfactor.org).