3.4 Gene structure prediction and function annotation
A total of 30,900 coding genes were eventually predicted using the three strategies. The average gene length, average coding gene length, average exon per gene, average exon length, and average intron length were 11,027, 1,386, 5.12, 341.12, and 2,255 bp, respectively (Table 4). We also functionally annotated all coding genes based on 10 publicly available protein databases. The results showed that 21,979, 16,540, 21,220, 11,310, 15,903, 23,894, 1,458, 20,810, 23,010, and 16,956 coding genes were successfully mapped to the InterPro, GO, KEGG_ALL, KEGG_KO, Swissprot, TrEMBL, TF, Pfam, NR, and KOG databases, respectively. Among all the coding genes, 25,325 were annotated in at least one database, and 5,575 were unannotated.
In the present study, 474 miRNAs, 15,570 tRNAs, 309 rRNAs, and 157 snRNAs were identified from theC. japonica genome. The total lengths of the miRNAs, tRNAs, rRNAs, and snRNAs were 57,561 bp (0.004027% of genome), 1,135,923 bp (0.079470% of genome), 49,795 bp (0.003484% of genome), and 28,110 bp (0.001967% of genome), respectively (Table 5).