Evolutionary findings relating to the Novel Coronavirus
To find out more about its relationship with SARS and MERS, Lu’s research group have reported that SARS-CoV-2 have about 79% sequence identity with SARS-CoV but lower identity with that of the one from MERS-CoV (with only 50%).(R. Lu et al., 2020; Wu et al., 2020) Classification by Zhou et al. using seven conserved replicase domains in ORF1ab had also indicate that SARS-CoV-2 and SARS-CoV belong to the same species with a 94·6% amino acid identity.(P. Zhou et al., 2020a)
In the meantime, Study by Zhu et al. had indicated that it has formed an isolated clade, composing with other two virus strains that originated from bats, ZC45 and ZXC21.(Zhu et al., 2020) The result was further proven by the study from Lu et al, which showed thatSARS-CoV-2 have formed a considerable long branch length together with bat-SL-CoVZC45 and bat-SL-CoVZXC21 in a phylogenetic tree that is distinct from SARS.(R. Lu et al., 2020) In Lu’s study, samples from the Wuhan patients were examined, the result of an 88% sequence identity with both bat-SL-CoVZC45 and bat-SL-CoVZXC21 was shown.(R. Lu et al., 2020) Also, a study by Chan et al had demonstrated the same findings by using the samples from patients from Hongkong to do the sequence comparison.(Chan et al., 2020) From the same study examining the HK patients, it was reported that the virus genome is about 29.8 kilobases with 38% of the genome consist of GC.(Chan et al., 2020) Notably, other two research groups had also reported respectively that the virus genome had about 89% (Chan et al., 2020) and 86·9% (Zhu et al., 2020) nucleotide sequence identity to that of the bat SARS-like CoV (bat-SL-CoVZC45) genome. But from a report by Zhou et al, they claimed that a bat-originated coronavirus termed RaTG13 is the closest relative of the virus from Wuhan as its indicated from their study analyzing the RdRp gene and S gene sequence.(P. Zhou et al., 2020a)
Results from Lu’s further analysis of the phylogenetic tree has displayed that three clades are formed: clade 1 is composed by SARS-related strains from Rhinolophussp from Bulgaria (accession number GU190215) and Kenya (KY352407); clade 2 is formed by 10 samples from Wuhan and bat-derived strain: bat-SL-CoVZC45 and bat-SL-CoVZXC21, just as mentioned above; and clade 3 is formed by the human-derived SARS-CoV and several bat-derived coronaviruses.(R. Lu et al., 2020) Phylogenetic tree analysis employing Maximum likelihood (ML) methods by Benvenuto et al had shown the same outcome with SARS virus, SARS-like bat-derived virus and the Novel CoV together formed one clade (termed clade II) and MERS virus themselves forming another (termed clade I).(Benvenuto et al., 2020) Within clade II, Bat-derived SARS-like coronavirus and the SARS-CoV-2 formed IIa cluster and Bat-derived SARS-like coronavirus and the SARS virus formed IIb cluster.
As it is mentioned, 88% similarity was displayed when comparing the SARS-CoV-2 with bat-derived CoV ZC45 and ZXC21. To dive deeper, five regions of the gene -E, M, 7, N and 14, as its reported, are both have over 90% sequence identity with E gene the highest of all (98·7%).(R. Lu et al., 2020) Meanwhile, prediction of 12 of its coding regions had demonstrated that the genomic organization, as well as the length of those proteins, are alike with only minor variations in comparison to the counterparts from bat-SL-CoVZC45 and bat-SL-CoVZXC21.(R. Lu et al., 2020)
Contrastingly, the S gene and protein 13 holds the lowest identity of all, with only about 75% and 73% sequence identity no matter comparing with ZC45 or ZXC21.(R. Lu et al., 2020) But from the study by Chan et al, SARS-CoV-2’s Spike protein was reported to have a higher nucleotide identity, at about 84% and 78% identity with that of the one from bat-SL-CoVZC45 and human SARS coronavirus respectively.(Chan et al., 2020) Zhou et al found SARS-CoV-2’s S gene and RdRp gene indeed has a high variation with other CoVs(less than 75% in identity), but has a high sequence identity of that two genes with RaTG13 at a figure of 93·1%, and the S gene of RaTG13 and SARS-CoV-2 is much longer than other SARSr-CoVs.(P. Zhou et al., 2020a) But it is worth mentioning that since the spike protein of SARS-like coronavirus is the most variable in the viral genome, the highly differentiated outcome is actually not surprising at all.
As mentioned in the overview, Spike protein has two domains with both of which responsible for some vital functions. Detailed analysis of SARS-CoV-2’s difference compares to two bat-derived virus strains bat-SL-CoVZC45 and bat-SL-CoVZXC21 has shown that it’s S1 domain has a comparatively greater different with only 68% sequence identity, whereas the S2 domain is far more alike with a 93% identity.(R. Lu et al., 2020) As for the amino acid, about 50 amino acid remain not changed in S1 when comparing with SARS. But from some position of the C-terminal suggest that mutation and deletion happened in the bat-derived strain.(R. Lu et al., 2020) Despite the whole-genome of SARS-CoV-2 is reported to be closer to the bat-SL-CoVZC45 and bat-SL-CoVZXC21, its receptor binding domain is closer to that of the one from SARS-CoV. This result is further proven by a following three-dimensional modelling verification. Similar to other betacoronaviruses, the receptor binding domain still composed by a core domain and an external subdomain but the external subdomain is more similar to that of the SARS-CoV.(R. Lu et al., 2020) Since the receptor domain’s similarity is close to the SARS-CoV, it gives the hint that ACE2 might still be the possible mechanism for the cell entry. Through the protein modelling experiment and structural analysis of the receptor binding domain (RBD), it shows that it still preserves sufficient affinity to the Angiotensin converting enzyme 2 (ACE2) receptor and still be able to use it as a mechanism of cell entry. Thus, it may enhance the human-to-human transmission(Ge et al., 2013; Hu et al., 2017; X. L. Yang et al., 2015), despite some difference of the composing amino acids that was identified after comparing of the proteins from SARS through homology modelling. Interestingly, the result from the study by Chan et al.(Chan et al., 2020), had reported that the amino acid sequence identity of the N-terminal of the subunit 1 is around 66%, the core domain is about 68%. But the protein sequence has only about 39% in identity with SARS-CoV. Depending on this outcome, they assumed some change of the entry mechanism might take place instead of the traditional mechanism.(Chan et al., 2020)
Conserved replicase domains (ORF 1ab) of the novel coronavirus was reported to be less than 90% in the identity with other betacoronavirus with 1b (about 86%) slightly lower than that of 1a (about 90%). (Zhu et al., 2020) A supplementary analysis of the main encoding regions of the typical members of the subgenus sarbecovirus indicates that recombination of the virus’s gene might have happened in gene 1b as the novel virus cluster with two other closest bat-derived virus strains (bat-SL-CoVZC45, and bat-SL-CoVZXC21) in the tree of 1a and S gene, as suggested in the phylogenetic analysis, but not clustering in the tree of 1b. Topological position change had also been detected as it is announced by Wu et al.(Wu et al., 2020), and therefore, it suggests the same conclusion of recombination within the subgenus Sarbecovirus from bats.
The findings of Lu’s study revealed that SARS-CoV-2 was distinct from SARS-CoV in a phylogeny of the complete RNA-dependent RNA polymerase (RdRp) gene.(R. Lu et al., 2020) Nevertheless, SARS-CoV-2’s RdRp region had shown a high sequence similarity (96·2% whole-genome identity) with the bat-originated coronavirus BatCoV RaTG13 according to the study by Zhou et al.(P. Zhou et al., 2020a) By using a couple of aligned sequences, they discovered that there was no recombination took place in the genome of SARS-CoV-2(P. Zhou et al., 2020a), which further escalated the result by the team of Lu et al which assumed that it is the bat-derived virus that had the recombination rather than the one from Wuhan patient.