2.7 Simulated pepsin digestion of epitopes
The online tool PeptideCutter
(https://web.expasy.org/peptide_cutter/) was used to predict the
pepsin digestion sites of the T cell epitopes of soybean allergens
[27].
3. Results
3.1T cell epitope
prediction
In order to compare the binding affinity between different peptides and
HLA class II molecules, a unified scale has been generated in the
prediction results of the IEDB-percentile ranking value, as the lower
percentile ranking value with higher affinity [28]. In addition, the
inhibitory concentration 50 (IC50) value was also used to calculate the
peptide binding affinity, as the lower, IC50 value indicated the
stronger affinity [24]. Therefore, peptides with a strong binding
ability to MHC class II molecules would be screened out based on the
peptide binding affinity score with IC50≤250 nM and percentile ranking
value≤4. Subsequently, T cell peptides with percentile ranking value ≤
4, stable matrix method IC50 ≤ 250 nM, and neural network method IC50 ≤
250 nM obtained by the consensus method, and the peptides with the
percentile ranking value ≤ 4, and NetMHCIIpan IC50 ≤ 250 nM obtained by
the NetMHCIIpan method were further selected for the IL-4 inducing
ability analysis, and the AllerTop 2.0 tool was finally used to confirm
the allergenicity potential.
3.1.1 P01070
Soybean trypsin inhibitor (Uniprot ID: P01070) is an anti-nutritional
factor in soybean [29], consisting of 216 amino acids, with a
protein molecular weight of 24005 Da. Forty-five peptides with the high
binding ability to MHC class II molecules were obtained by setting IC50
≤ 250 nM and percentile ranking value ≤ 4, and 22 peptides could induce
IL-4 secretion by Th2 cells based on the IL4pred tool analysis.
Furthermore, a total of 14 peptides were confirmed as the T cell
epitopes via the allergenicity analysis using the AllerTOP v. 2.0 tool
(Table 2), which were mainly located in three regions of the protein: aa
44-60, aa 86-111, and aa 179-199. The region 44-60 are located inside
the protein, whereas segments aa 86-111, and aa 179-199 were exposed to
the protein surface (Table 2). Thereinto, the epitopes ”YRIRFIAEGHPLSLK”
(aa 86-100) and ”RIRFIAEGHPLSLKF” (aa 87-101) might have a higher
ability to induce Th2 to produce IL4 because of the higher IL-4pred
scores. Furthermore, as shown in Table 2, the bindings of the epitopes
”PLSLKFDSFAVIMLC” (aa 96-110) and ”LSLKFDSFAVIMLCV” (aa 97-111) with the
most diverse HLA class II alleles were observed, indicating these two
epitopes can cause more people to be allergic than other epitopes.
3.1.2 P04347
Gly m 6.0101, Gly m 6.0201, Gly m 6.031, Gly m 6.0401, and Gly m 6.0501
are the 5 subunits (G1-G5) of the Glycinin, which belong to the 11S
plant seed storage protein [30]. Gly m 6.0501 (Uniprot code: P04347)
containing 516 amino acids with a molecular weight of 57956 Da was
analyzed in the study, and a total of 22 T cell epitopes were screened
(Table 3), which were mainly located in 5 protein regions of aa 183-199,
aa 231-246, aa 260-275, aa 390-407, and aa 461-495. Thereinto, protein
fragments of aa 231-246, aa 260-275, and aa 471-494 are posited on the
surface of the protein (Table 3). Compared with other epitopes, the four
epitopes of ”GLEYVVFKTHHNAVS” (aa 461-475), ”LEYVVFKTHHNAVSS” (aa
462-476), ”YVVFKTHHNAVSSYI” (aa 464-478), and ”VVFKTHHNAVSSYIK” (aa
465-479) had higher IL-4pred scores.
3.1.3 P04776
Gly m 6.0101 (Uniprot ID: P04776) is the G1 subunit of glycinin, which
contains 495 amino acids and has a molecular weight of 55706 Da. As
shown in Table 4, a total of 34 T cell epitopes, mainly concentrated in
the 7 regions of aa 158-172, aa 217-236, aa 319-333, aa 347-362, aa
366-387, aa 412-442 and aa 468-491, that could induce Th2 cell to
produce IL4 and have allergenicity potential. Among them, the regions of
aa 217-236, aa 319-333, and aa 468-491 are exposed on the surface (Table
4). T cell epitopes in the aa 217-236 region could bind the largest
number of HLA class II alleles, indicating the allergic susceptibility,
and the epitopes ”ILSGFTLEFLEHAFS” (aa 220-234) and ”LSGFTLEFLEHAFSV”
(aa 221-235) had higher IL-4pred scores.
3.1.4 P05046
Soybean lectin (Uniprot ID: P05046) is an anti-nutritional factor with
285 amino acids and the molecular weight is approximately 120 kDa
[31]. As shown in Table 5, a total of 25 peptides were confirmed as
the T cell epitopes that can induce Th2 to produce IL4 and have
allergenicity potential. All T cell epitopes are located on the surface
of the protein except for the region aa 234-250 (Table 5). Furthermore,
the epitopes ”LVLLTSKANSAETVS” (aa 23-37), ”EWVRIGFSAATGLDI” (aa
234-248), ”VRIGFSAATGLDIPG” (aa 236-250), ”HDVLSWSFASNLPHA” (aa
253-267), ”DVLSWSFASNLPHAS” (aa 254-268) and ”VLSWSFASNLPHASS” (aa
255-269) exhibited higher IL-4pred scores. Especially, the epitope
”ASFAASFNFTFYAPD” (aa 100-114) was predicted by three methods at the
same time, and more HLA class II alleles were observed to be bound by
this epitope.
3.1.5 P11827
Soybean β-conglycinin is a 7S seed storage protein containing three
subunits (α, α’and β) [32]. Gly m 5.0201 (Uniprot ID: P11827) is the
α’ subunit of β-conglycinin and has high amino acid sequence homology
with the α and β subunits, and the molecular weight is 72,228 Da
containing 621 amino acids [33]. Through IEDB tools, IL-4pred, and
AllerTOP v. 2.0 prediction, a total of 17 T cell epitopes were screened
(Table 6), and the epitopes ”PFHFNSKRFQTLFKN” (aa 211-225) and
”FHFNSKRFQTLFKNQ” (aa 212-226) combined the most types of alleles, and
they also had strong potentials for inducing Th2 cells to produce IL4.
Also, these two epitopes are exposed on the surface of the protein
(Table 6).
3.1.6 P25974
Soybean β-conglycinin β subunit (Uniprot code: P25974) is composed of
439 amino acids and has a molecular weight of 50476 Da. A total of 15 T
cell epitopes were obtained with allergic potential (Table 7). Compared
with other epitopes, the epitopes ”EEQRQQEGVIVELSK” (aa 201-215),
”EQRQQEGVIVELSKE” (aa 202-216), and ”RNPIYSNNFGKFFEI” (aa 245-259) had
higher IL-4pred scores and were located on the surface of the protein.
3.1.7 P26987
Gly m 4 (Uniprot ID: P26987) is the pathogenesis-related protein that
belongs to the PR protein family. It has cross-reactivity with apple
allergen Mal d 1, birch pollen allergen Bet v 1, and other allergens
[34]. Gly m 4 has a protein length of 158 amino acids and a
molecular weight of 16772 Da, while a total of 12 T cell epitopes were
screened by the prediction method (Table 8). These 12 T cell epitopes
are all posited on the surface of the protein. Compared with other
epitopes, the epitopes ”AKADALFKAIEAYLL” (aa 138-152) and
”ADALFKAIEAYLLAH” (aa 140-154) bond the most types of HLA class II
alleles, 13 and 15 different HLA alleles, respectively.
3.2 Amino acid
composition of T cell epitopes
As shown in Fig. 1, the ratio of each amino acid in the entire protein
and that in the T cell epitope fragment were counted, respectively. All
20 types of amino acids were observed in four of the seven proteins
(P01070, P04347, P04776, and P11827), and the P05046 protein and P25974
protein contained 19 types of different amino acids except for cysteine
(C) and tryptophan (W), respectively, while 17 types of amino acids were
found in the P26987 protein except for arginine (R), tryptophan (W), and
cysteine (C). While lysine (L), serine (S), valine (V), glutamate (E),
and asparagine (N) were found to be rich in the seven protein allergens,
and fewer tryptophane (W), methionine (M) and cysteine (C) were
observed.
By comparing the frequency of each amino acid in the protein sequence
with that in the T cell epitopes, the frequency of glutamine (Q) is
obviously reduced in the T cell epitope region of the six proteins
(except for P25975 protein). The appearance frequency of glutamate (E)
in the T cell epitope regions of P01070, P04347, P04776, P05046, and
P11827 proteins was also decreased, and a similar phenomenon was also
observed in the T cell epitope region of P04347, P05046, P11827, P25974,
and P26987 proteins for proline (P), while the frequency of glycine (G)
in T cell epitope regions was reduced in P04347, P04776, P11827 and
P26987 proteins, and aspartate (D) was decreased in P01070, P04347,
P05046, and P25974 proteins. On the other hand, the frequency of
phenylalanine (F) in T cell epitope regions was increased among 6
proteins (except for P01070 protein). In addition, the frequency of
isoleucine (I), asparagine (N), valine (V), lysine (K), and histidine
(H) in T cell epitope regions were also increased in more than half of
the proteins (4/7).