3.3
Significance analysis of epitope amino acids
A discriminant analysis model was used to analyze the relationship
between the allergenicity (yes or no, Y- variable) of the epitope
peptides and the physical and chemical properties of the amino acids (X-
variables).
The present study used 3 descriptors to describe the amino acid physical
and chemical properties of the T cell epitopes for building the random
forests models with variable importance analysis. The confusion matrix
(validation sample) correct (%) of seven soybean allergens were: 40%
(P01070), 57.143% (P04347), 41.667% (P04776), 55.556% (P05406),
72.727% (P11827), 60% (P25974), and 75% (P26987). The variable
importance of the X-variables was determined by examining the mean
decrease accuracy obtained through random forests analysis of the
quantitative X- variables and qualitative Y-variable (Fig. 2).
As shown in Fig. 2, the variable p1z1 significantly contributed to the
allergenicity (yes) in the five soybean allergens (P01070, P04776,
P05406, P11827, P25974). According to mean decrease accuracy values,
variables p2z1, p6z2, p13z3 were beneficial to the allergenicity (yes)
in four soybean allergens. Through the calculation, the occurrence of
allergenicity (yes), p1, and p6 was the most important position to
allergenicity (yes), followed by p2, p4, p5, and p13. The soybean
allergens P01070, P11827, and P25974 expressed the bulk of the amino
acid at the p1 position, whereas P04776 and P05046 expressed the
electronic property of the amino acid at the p1 position. The amino
acids at the p6 position can have a good contribution to allergenicity
(yes) in the six allergens except for P04776. Especially, for the
allergen P26987, the hydrophobicity, bulk, and electronic property of
amino acid at position p6 promoted the allergenicity (yes).
In the soybean allergens including P04347, P11827, P25974, P26987, the
most important amino acid property for allergenicity (yes) is z1
(hydrophobicity), followed by z2 (bulk) and z3 (electronic property).
Electronic property is the most important amino acid property for the
allergen P05046, whereas bulk is the most important amino acid property
for the allergens of P01070 and P04776.
Except for the Y- variable allergenicity (yes), the random forest models
also provided the relationship between X- variable and Y- variable
allergenicity (total) to find important variables. From the Fig. 2, we
found that the variable p1z1 can affect both allergenicity (no) and
allergenicity (yes) in most allergens (P01070, P04776, P05406, and
P25974), and the variable p6z1 contributed to both allergenicity (no)
and allergenicity (yes) in three allergens (P04347, P11827, and P26987).