3.4 Gene structure prediction and function annotation
A total of 30,900 coding genes were eventually predicted using the three
strategies. The average gene length, average coding gene length, average
exon per gene, average exon length, and average intron length were
11,027, 1,386, 5.12, 341.12, and 2,255 bp, respectively (Table 4). We
also functionally annotated all coding genes based on 10 publicly
available protein databases. The results showed that 21,979, 16,540,
21,220, 11,310, 15,903, 23,894, 1,458, 20,810, 23,010, and 16,956 coding
genes were successfully mapped to the InterPro, GO, KEGG_ALL, KEGG_KO,
Swissprot, TrEMBL, TF, Pfam, NR, and KOG databases, respectively. Among
all the coding genes, 25,325 were annotated in at least one database,
and 5,575 were unannotated.
In the present study, 474 miRNAs, 15,570 tRNAs, 309 rRNAs, and 157
snRNAs were identified from theC.
japonica genome. The total lengths of the miRNAs, tRNAs, rRNAs, and
snRNAs were 57,561 bp (0.004027% of genome), 1,135,923 bp (0.079470%
of genome), 49,795 bp (0.003484% of genome), and 28,110 bp (0.001967%
of genome), respectively (Table 5).