Evolutionary findings relating to the Novel Coronavirus
To find out more about its relationship with SARS and MERS, Lu’s
research group have reported that SARS-CoV-2 have about 79% sequence
identity with SARS-CoV but lower identity with that of the one from
MERS-CoV (with only 50%).(R. Lu et al., 2020; Wu et al., 2020)
Classification by Zhou et al. using seven conserved replicase domains in
ORF1ab had also indicate that SARS-CoV-2 and SARS-CoV belong to the same
species with a 94·6% amino acid identity.(P. Zhou et al., 2020a)
In the meantime, Study by Zhu et al. had indicated that it has formed an
isolated clade, composing with other two virus strains that originated
from bats,
ZC45
and ZXC21.(Zhu et al., 2020) The result was further proven by the study
from Lu et al, which showed thatSARS-CoV-2 have formed a considerable
long branch length together with bat-SL-CoVZC45 and bat-SL-CoVZXC21 in a
phylogenetic
tree that is distinct from SARS.(R. Lu et al., 2020) In Lu’s study,
samples from the Wuhan patients were examined, the result of an 88%
sequence identity with both
bat-SL-CoVZC45 and bat-SL-CoVZXC21 was shown.(R. Lu et al., 2020) Also,
a study by Chan et al had demonstrated the same findings by using the
samples from patients from Hongkong to do the sequence comparison.(Chan
et al., 2020) From the same study examining the HK patients, it was
reported that the virus genome is about 29.8 kilobases with 38% of the
genome consist of GC.(Chan et al., 2020) Notably, other two research
groups had also reported respectively that the virus genome had about
89% (Chan et al., 2020) and 86·9% (Zhu et al., 2020) nucleotide
sequence identity to that of the bat SARS-like CoV (bat-SL-CoVZC45)
genome. But from a report by Zhou et al, they claimed that a
bat-originated coronavirus termed RaTG13 is the closest relative of the
virus from Wuhan as its indicated from their study analyzing the RdRp
gene and S gene sequence.(P. Zhou et al., 2020a)
Results
from
Lu’s further analysis of the phylogenetic tree has displayed that three
clades are formed: clade 1 is composed by SARS-related strains from
Rhinolophussp from Bulgaria (accession number GU190215) and Kenya
(KY352407); clade 2 is formed by 10 samples from Wuhan and bat-derived
strain: bat-SL-CoVZC45 and bat-SL-CoVZXC21, just as mentioned above; and
clade 3 is formed by the human-derived SARS-CoV and several bat-derived
coronaviruses.(R. Lu et al., 2020) Phylogenetic tree analysis employing
Maximum likelihood (ML) methods by Benvenuto et al had shown the same
outcome with SARS virus, SARS-like bat-derived virus and the Novel CoV
together formed one clade (termed clade II) and MERS virus themselves
forming another (termed clade I).(Benvenuto et al., 2020) Within clade
II, Bat-derived SARS-like coronavirus and the SARS-CoV-2 formed IIa
cluster and Bat-derived SARS-like coronavirus and the SARS virus formed
IIb cluster.
As it is mentioned, 88% similarity was displayed when comparing the
SARS-CoV-2 with bat-derived CoV ZC45 and ZXC21. To dive deeper, five
regions of the gene -E, M, 7, N and 14, as its reported, are both have
over 90% sequence identity with E gene the highest of all (98·7%).(R.
Lu et al., 2020) Meanwhile, prediction of 12 of its coding regions had
demonstrated that the genomic organization, as well as the length of
those proteins, are alike with only minor variations in comparison to
the counterparts from bat-SL-CoVZC45 and bat-SL-CoVZXC21.(R. Lu et al.,
2020)
Contrastingly, the S gene and protein 13 holds the lowest identity of
all, with only about 75% and 73% sequence identity no matter comparing
with ZC45 or ZXC21.(R. Lu et al., 2020) But from the study by Chan et
al, SARS-CoV-2’s Spike protein was reported to have a higher nucleotide
identity, at about 84% and 78% identity with that of the one from
bat-SL-CoVZC45 and human SARS coronavirus respectively.(Chan et al.,
2020) Zhou et al found SARS-CoV-2’s S gene and RdRp gene indeed has a
high variation with other CoVs(less than 75% in identity), but has a
high sequence identity of that two genes with RaTG13 at a figure of
93·1%, and the S gene of RaTG13 and SARS-CoV-2 is much longer than
other SARSr-CoVs.(P. Zhou et al., 2020a) But it is worth mentioning that
since the spike protein of SARS-like coronavirus is the most variable in
the viral genome, the highly differentiated outcome is actually not
surprising at all.
As mentioned in the overview, Spike protein has two domains with both of
which responsible for some vital functions. Detailed analysis of
SARS-CoV-2’s difference compares to two bat-derived virus strains
bat-SL-CoVZC45 and bat-SL-CoVZXC21 has shown that it’s S1 domain has a
comparatively greater different with only 68% sequence identity,
whereas the S2 domain is far more alike with a 93% identity.(R. Lu et
al., 2020) As for the amino acid, about 50 amino acid remain not changed
in S1 when comparing with SARS. But from some position of the C-terminal
suggest that mutation and deletion happened in the bat-derived
strain.(R. Lu et al., 2020) Despite the whole-genome of SARS-CoV-2 is
reported to be closer to the bat-SL-CoVZC45 and bat-SL-CoVZXC21, its
receptor binding domain is closer to that of the one from SARS-CoV. This
result is further proven by a following three-dimensional modelling
verification. Similar to other betacoronaviruses, the receptor binding
domain still composed by a core domain and an external subdomain but the
external subdomain is more similar to that of the SARS-CoV.(R. Lu et
al., 2020) Since the receptor domain’s similarity is close to the
SARS-CoV, it gives the hint that ACE2 might still be the possible
mechanism for the cell entry. Through the protein modelling experiment
and structural analysis of the receptor binding domain (RBD), it shows
that it still preserves sufficient affinity to
the Angiotensin
converting enzyme 2 (ACE2) receptor and still be able to use it as a
mechanism of cell entry. Thus, it may enhance the human-to-human
transmission(Ge et al., 2013; Hu et al., 2017; X. L. Yang et al., 2015),
despite some difference of the composing amino acids that was identified
after comparing of the proteins from SARS through homology modelling.
Interestingly, the result from the study by Chan et al.(Chan et al.,
2020), had reported that the amino acid sequence identity of the
N-terminal of the subunit 1 is around 66%, the core domain is about
68%. But the protein sequence has only about 39% in identity with
SARS-CoV. Depending on this outcome, they assumed some change of the
entry mechanism might take place instead of the traditional
mechanism.(Chan et al., 2020)
Conserved replicase domains (ORF 1ab) of the novel coronavirus was
reported to be less than 90% in the identity with other betacoronavirus
with 1b (about 86%) slightly lower than that of 1a (about 90%). (Zhu
et al., 2020) A supplementary analysis of the main encoding regions of
the typical members of the subgenus sarbecovirus indicates that
recombination of the virus’s gene might have happened in gene 1b as the
novel virus cluster with two other closest bat-derived virus strains
(bat-SL-CoVZC45, and bat-SL-CoVZXC21) in the tree of 1a and S gene, as
suggested in the phylogenetic analysis, but not clustering in the tree
of 1b. Topological position change had also been detected as it is
announced by Wu et al.(Wu et al., 2020), and therefore, it suggests the
same conclusion of recombination within the subgenus Sarbecovirus from
bats.
The findings of Lu’s study revealed that SARS-CoV-2 was distinct from
SARS-CoV in a phylogeny of the complete RNA-dependent RNA polymerase
(RdRp) gene.(R. Lu et al., 2020) Nevertheless, SARS-CoV-2’s RdRp region
had shown a high sequence similarity (96·2% whole-genome identity) with
the bat-originated coronavirus BatCoV RaTG13 according to the study by
Zhou et al.(P. Zhou et al., 2020a) By using a couple of aligned
sequences, they discovered that there was no recombination took place in
the genome of SARS-CoV-2(P. Zhou et al., 2020a), which further escalated
the result by the team of Lu et al which assumed that it is the
bat-derived virus that had the recombination rather than the one from
Wuhan patient.