Figure 4. The average nucleotide composition of the Hsp60 genes from 17 phyla: a – The average total GC contents at each positions of codon of Hsp60 sequences and corresponding genomes; b - The average content of GC1, GC2, and GC3 in Hsp60 genes. Phyla were sorted by average total GC content of Hsp60 sequences. Student’s t-test was used to compare the average GC content of the Hsp60 sequences and the average GC content of the corresponding genomes. The difference between two independent samples of GC values is considered statistically significant if the p-value is less than 0.05. Statistically indistinguishable average GC values are marked with “ns” (non-significant).
The average GC content in the Hsp60 genes ranges from 0.41±0.13 (Apicomplexa) to 0.67±0.04 (Actinobacteria) (Figure 4a). As can be seen, the average GC content of almost all Hsp60 genes is comparable to or exceeds the average genomic background. In turn, the opposite is observed for Euryarchaeota. The upward trend in the GC content in the Hsp60 genes may be associated with recombination (GC-biased gene conversion)46,47, repair48, and the environmental changes49,50, in which there is an increase in the frequency of AT→GC substitutions. Thus, it can be assumed that the Hsp60 gene is tightly controlled by DNA repair systems that protect the genetic material from mutations.
To determine the contribution of each of three codon positions to the total GC content of Hsp60 genes from 17 phyla, the GC1, GC2, and GC3 contents were calculated. The average GC1 values vary from 0.49±0.06 (Apicomplexa) to 0.69±0.02 (Actinobacteria), and their contribution to the total GC content of Hsp60 genes is moderate (Figure 4b). At the same time, GC2 in the range from 0.36±0.03 (Apicomplexa) to 0.44±0.02 (Basidiomycota) were the least variable and practically did not affect the GC composition of the Hsp60 genes. These results are obvious since the second codon position is the most conserved. GC3 values vary from 0.25±0.15 (Firmicutes) to 0.9±0.11 (Actinobacteria), which indicates a mutation bias51. It should be noted that starting from Euryarchaeota, the average GC3 values of the Hsp60 genes increase sharply (Figure 4b), and the average GC content becomes more than 0.5 (Figure 4a).
The substitution of nucleotides at the third position of the codon, caused by point mutations or repair processes, does not change the amino acid, but only indicates the mutational pressure for codon usage. According to the theory35, mutational pressure tends to push the GC content in a gene/genome towards equilibrium (neutrality of codon usage), reducing the heterogeneity caused by natural selection34. Equilibrium of the nucleotide composition of the gene/genome, in which selective constraints (factors that reduced the evolutionary divergence of the functional sequence) do not affect the GC content, is achieved when the frequencies of the AT→GC and GC→AT mutations are equal34. These mutations can be fixed or removed from the population by natural selection and random genetic drift35. The frequencies of the AT→GC and GC→AT mutations at the third position of the codons also reflect the direction of the mutational pressure. In general, the GC3 value is less than 0.5 when the gene is under the influence of AT pressure, andvice versa 52. Thus, we can initially identify two groups of Hsp60 genes that differ in the direction of mutational pressure (Figure 4b). The AT-group includes Apicomplexa, Chlamydiae, Firmicutes, Streptophyta, Nematoda, Bacteroidetes, Mollusca, Cyanobacteria, and Chordata, which have an average total GC content of less than 0.5 in the Hsp60 genes. In turn, the phyla Euryarchaeota, Arthropoda, Ascomycota, Proteobacteria, Euglenozoa, Basidiomycota, Chlorophyta, and Actinobacteria form a GC-group with an average total GC content of more than 0.5. However, the threshold of 0.5 is nominal due to the imbalance between the rates of mutation and repair processes53. Therefore, a neutrality analysis was carried out to clarify the direction of the mutational pressure and to reveal the degree of its influence on the codon usage with the determination of the equilibrium point (Figure 5).