3. Results and discussion
3 .1 Sequence-based analysis of putative GH57 GBE
sequences
Using the key words “DUF1957 domain-containing protein” or “Glycoside
hydrolase family 57 protein”, 2,497 amino acid sequences were retrieved
from the NCIB database. These sequences varied in length between 418 and
1,184 amino acids. Except for 50, all sequences had the nucleophile and
acid/base catalyst and contained the five conserved sequence regions
typical for GH57 members (Fig. 1). The exception were sequences that
missed one or both catalytic residues and showed a large variation in
four of the five conserved sequence regions. These sequences were
excluded from further analysis as it was assumed that they are not
active. The first four conserved sequence regions are positioned within
the A-domain containing the (β/α)7 barrel. Conserved
region 5 is located in the C domain on the second α helix.
When comparing the sequence logo for all the GH57 GBEs of this study
with the logos recently published on 1,602 GH57
sequences25, two GBE specific fingerprints become
clear; the first is a quintet of amino acids with the combination HxHLP,
with x being A, S or T, found in CSRI of almost all GBE sequences; in a
small number of GBE sequences the L at position 4 is replaced by an I or
M. In all other GH57 enzymes a Q is present instead of an L at position
4 whereas in α-galactosidases and -related proteins there is an L at
position 4 but this is followed by a Q or A/M and not a P as is the case
for GH57 GBEs. The second GBE fingerprint is the sextet ELF(Y)GHW
present in CSRIV. The first position of this fingerprint, the E, is
conserved among all proteins assigned to a functional GH57 enzyme
subfamily. This E is not conserved in the proteins that are categorized
as -like proteins. These proteins miss one or both of the catalytic
residues and are very likely not active. In the 4-α-glucanotransferase
of Thermococcus litoralis the E is only 5.1 Å from the acid-base
catalyst D (Fig. 2A) and is involved in binding the -1 subsite residue
through a water molecule31. In the other GH57 crystal
structures, the conserved E is 4 Å (T. maritima AmyC) to 7 Å
(T. thermophilus GBE) from the acid-base catalyst (Fig. 2B and
2C). In GH13 enzymes, a catalytic triad of a catalytic nucleophile D, a
general acid base catalyst E, and a transition state stabilizer D play a
key role in catalysis [32-34]. As the CSRIV E is completely
conserved in all GH57 proteins assigned to a functional subfamily and is
positioned close to the acid-base catalyst in all available GH57 crystal
structures, it is not unlikely to assume that this E plays a similar
role as the transition state stabilizer D in GH13.
The other positions of the sextet are completely conserved in all GBE
proteins analyzed in this study, with the exception of the third
position, the F which is in K. pacifica GBE replaced by another
hydrophobic side chain containing amino acid, an Y. Whereas previously
it was reported that the C at position 16 (CSRIII) is conserved among
GH57 GBEs6, this position is not absolutely invariant,
as 72 out of the 2,497 (2.9%) sequences have a different amino acid in
this position, a feature also noticed by Martinovičová and
Janeček25; the majority of these have an M (56;
2.2%), nine have an S, five an L and two an F. This almost fully
conserved C can still be seen as a fingerprint as none of all the other
GH57 enzymes and -like proteins have a C at this position.
In addition, five residues in the vicinity of the active sites were
identified to be fully conserved in all sequences; three tryptophans
(W274, W404 and W413), one histidine (H146), and one arginine (R265)
(T. thermophilus numbering). In T. kodakarensis , the three
tryptophans and the one at position 28 have been defined as the aromatic
gate keepers8. This group of four aromatic gate keeper
tryptophans is highly conserved in all GH57 GBEs except W28, which is a
threonine in the GBEs of Thermus and Meiothermus species.
In the T. thermophilus GBE the W274 is positioned to the side and
the W404 at the bottom of the positive subsites6. Both
are involved in substrate binding by aromatic stacking (W274) and
hydrogen bonding (W404). In the T. maritima GBE, the W274
equivalent (W246) is buried such that aromatic stacking is very unlikely
to occur while the position of the W413 equivalent (W411) is difficult
to predict6. The role of the H146 and the R265 is not
clear.
In the P. horokoshii GBE, a tryptophan (W22) at the bottom of the
active site groove is involved in substrate recognition. Changing this W
into an A resulted in almost complete loss of
activity19. This W is also found at the same position
in the crystal structure of T. thermophilus and T.
kodakarensis . In GH57GBEs from T. maritima , P. mexicana ,P. mobilis and K. pacifica , this W is replaced by D or E
or P. Besides the bottom W four other aromatic amino acids are found in
close vicinity of the active sites of T. kodakarensis , T.
thermophilus , P. horokoshii , or T. maritima ; F23, F289,
W360 and F461 (T. thermophilus numbering). In all the other
sequences of this study, three of these four aromatic amino acids are
functionally conserved while the F23 is not conserved. Zhang et
al.8 reported another three important amino acids near
the active site, H11, S462, and D463 (T. thermophilus numbering).
These are all conserved at the respective positions in all the 2,497
sequences.