Logo motifs
The logo motifs in each study group (by the number of structural
domains) were analyzed with alignment of the S1 protein sequences. An
example of Logo motif profiles for S1 proteins containing two domains is
shown in Figure 3 (Logo analysis motif, http://oka.protres.ru:4200).
Moreover, the user can create logo profiles for full-size sequences
using the S1 server.
Some specific patterns were identified when considering the analysis of
the S1 protein profiles. Thus, the most conserved are the sequence
regions corresponding to β-strands, which correlate with our earlier
data on the high conservatism of the secondary structure in such
proteins 38. In addition, an increase in the number of
structural S1 domains correlates with an increase in conservatism within
each individual domain. Proteins containing five domains are an
exception, possibly due to the small sequence representation.
As shown in 9, single domain S1 proteins have a not
very high percentage of identity with each other (27%). The strict
presence of conserved residues F19, F22, H34, N64, and R6839, which form RNA binding site in other bacterial,
archeal, and eukaryotic protein containing the S1 domain7 was not revealed taking into account analysis of the
logo motif of this group
(http://oka.protres.ru:4200/protein/5eb71e488886fe5b65803db9/logo).
For this group, residues F19, F22, and R68 are conserved only for some
bacteria. At the same time, as is known, single-domain S1 proteins of
parasitic bacteria of the Mollicutes class (the Tenericutes phylum)
effectively perform the main RNA-binding function 40.
It is possible that for these bacteria the RNA binding site is formed by
specific amino acid residues or the RNA binding mechanism differs from
other proteins containing the S1 domain.
The first and second domains in S1 proteins, containing two structural
domains, also have a low percentage of identity within domains: 27% and
30%, respectively. The first and the second domains from S1 proteins
containing two structural domains have 38% identity, while pairs with
the maximum and minimum values of identity have been identified for the
remaining domains 9. For the first domain in this
group, F19, F22 and R68 residues of the RNA binding site are conserved.
F19 and H34 are conserved residues for the second domain in this group
(Figure 3a).
For S1 proteins containing three structural domains, the maximal value
of identity was found between the first and third domains (53%) and the
minimum value between the first and the second domains (42%). Moreover,
the third domain has the maximum percentage of identity (57%) among
other domains for this group 9. For the first domain
in this group of bacteria, N64 residue of the RNA-binding site is
conserved. N64, R68, and R34 (at the position of the conserved residue
H34) seem to form the RNA-binding site of the second domain in the
three-domain containing bacterial S1 proteins. F19, H34, and R68
residues are conserved for the third domain. It can be assumed that for
this group, the first domain is characterized by a lower degree of RNA
binding efficiency.
For S1 proteins containing four structural domains, the maximum identity
value was found between the third and fourth domains (78%) and the
minimum identity value between the second and third domains. The third
domain also has also the maximum percentage of homology (66%) among
other domains in this group. F19, F22, H34 and R68 residues are highly
conserved for this domain. These residues are also conserved for the
fourth S1 domain in this group. For the second domain F22, N64, and R68
residues formed an RNA binding site. For the first domain, only R34
residue (at the position of the conserved H34 residue) is retained.
The third and fourth domains in the group of S1 proteins containing five
structural domains have the maximum percentage of identity (66%), while
the second and fifth domains have the lowest percentage of identity
(43%). In this group, the fourth domain has the maximum percentage of
identity among other domains (49%) 9. The first
domain has no specific conserved motif residues; for the second domain,
only R68 residue from the RNA-binding site is retained. Despite the
small representativeness of the sequence of bacteria of this group, for
the remaining three domains F19, F22, H34 and R68 residues apparently
form an RNA binding site.
For the most abundant S1 proteins containing six structural domains, as
well as, for S1 proteins with four and five domains, the maximum values
of identity are determined between the third and fourth domains (71%)
and the minimum values are between the first and the second (39%). The
third domain has the highest percentage of identity among other domains
in this group (68%) 9. For this domain, the RNA
binding site is formed by five residues: F19, L22 (conserved for F22),
H34, N64, and R68 (Figure 3b). The first and sixth domains have no
specific conserved residues that can form a RNA binding site. For the
second domain, F22, N64, and R68 are retained. Four residues, F19, L22
(in the position of the conserved F22 residue), H34 and R68 are specific
for the fourth domain in this group (Fig 3b). the obtained data are in a
good agreement with the experimental data confirming that cutting off
one S1 domain from the C-terminus or two S1 domains from the N-terminus
of the protein reduces only the efficiency of the protein functions, but
not its functional capabilities 14,41.