Genome Assembly and Annotation
We generated a total of 17.2 Gb data and 13.6 Gb clean data (Table S1 ). All Nanopore subreads were corrected using canu-correct and trimmed by canu-trim for low-quality bases. The reads ≥500 bp were used to generate an initial assembly with WTDBG. We used Pilon to polish the genome assembly twice, to finally obtain a 215.4-Mb contig-scale assembly with contig N50 of 1.81 Mb. The genome contained 447 contigs, with the longest contig being 11.13 Mb in length. We then anchored these contigs into six chromosomes with Hi-C reads by 3D-DNA (Dudchenko et al., 2017). This assembled chromosome-scale genome is 215.2 Mb in length with chromosome N50 = 34.8 Mb (Table 1, Fig. 1B ).
The 215.2-Mb draft M. pygmaea genome represents a high-quality near-complete genome assembly. A total of 1,395/1,440 plant-specific orthologs were present, indicating an estimated completeness of 96.9% (Table S2 ). The assembly size fell only slightly below estimates from K-mer analysis and flow cytometric: 259 Mb and 219 Mb, respectively (Figs. S1 and S2 ). In total, 25,607 genes were predicted, with an average gene length, coding sequence length and an average exon number of 2,628 base pairs (bp), 234 bp and 5.4 exons, respectively (Table 1 ). The vast majority of gene models were supported by complementary DNA/expressed sequence tag evidence. In our assembly, 97.03% of the genes (24,846 of 25,607) were annotated on six chromosomes, and only 2.97% (761 of 25,607) remained on scaffolds (Table S3 ). A total of 91.79 Mb (42.66%) of the assembled M. pygmaea genome is composed of repetitive sequences (Table 1 ). Among these repetitive elements, most are LTR retrotransposons, spanning 25.21% of the assembled genome, including 23.93% of intact LTR retrotransposons, followed by DNA transposons (7.03%) and LINEs (2.90%) (Table S4 ). The insertions of the LTR-RTs in M. pygmaea occurred earlier than in A. lyrata (Fig. 1C ). The M. pygmaea genome contains a similar number of transcription factors (TFs) (1,571) as these Brassicaceae species (Table S5 ; http://www.transcriptionfactor.org).