Figure 5. The predicted protein 3D structures from PFVM-01. First row
displays the predicted 3D Structure from SUMO1_HUMAN PFVM-01; second
row displays the predicted 3D Structure from P53_HUMAN PFVM-01; left
side is comparison between known 3D structure and the predicted
structure (brown color). The predicted 3D structures for K4GSD6_9SAUR,
C4IXC1_9TELE, A0A851ZE52_9AVES and EP3B_HUMAN are displayed
respectively.
DISCUSSION
Computation and database for protein folding.
Many of computational methodologies and database for protein folding
have been developed,11Compiani M, Capriotti E, ”Computational
and theoretical methods for protein folding”. Biochemistry. 52 (48):
8601–24, (2013).. and the efforts may be divided into two aspects,
one aspect is to predict the protein structure with thermodynamic
stability and another aspect is to investigate the protein conformations
with variability.
In first aspect, the prediction of protein structure from a sequence is
pursuing to obtain a native folding conformation with thermodynamic
stability, and the stable structure is mainly controlled by hydrophobic
interactions, hydrogen bonds, van der Waals forces, and conformational
entropy. In general, the methods for prediction of protein structure
fall into two main categories: template-free modeling and template-based
modeling.22Guo JT, Ellrott K, Xu Y. A historical perspective of
template-based protein structure prediction. Methods Mol Biol;
413:3–42, (2008).,33Dorn M, E Silva MB,
Buriol LS, Lamb LC. Three-dimensional protein structure prediction:
methods and computational strategies. Comput Biol Chem; 53PB:251–76,
(2014).,44Brylinski M. Is the growth rate
of Protein Data Bank sufficient to solve the protein structure
prediction problem using template-based modeling? : Bio-Algorithms and
Med-Systems[J]. Bio-Algorithms and Med-Systems, 11(1):1-7, (2015).
The template-free methods, i.e., ab initio or de novoapproaches, are based on the energy functions which carry out through
the molecular dynamics (MD) simulation calculations under various force
fields for atoms interaction or experiential parameters for group atoms
interaction.55Honig B. Protein folding: from the levinthal
paradox to structure prediction. J Mol Biol; 293:283–93, (1999).,66Onuchic
JN, Wolynes PG. Theory of protein folding. Curr Opin Struct Biol;
14:70–5, (2004).,77Zhang J, Li W, Wang J,
Qin M, Wu L, Yan Z, et al. Protein folding simulations: from
coarse-grained model to all-atom model. IUBMB Life; 61:627–43,
(2009). The protein with stable conformation is finally obtained by
iterative convergence to lower thermodynamic free energy under defined
force fields, such as AMBER,88Yang, L., Tan, C. H., Hsieh, M.
J., Wang, J., Duan, Y., Cieplak, P., Caldwell, J., Kollman, P. A., and
Luo, R. New-generation amber united-atom force field. J. Phys. Chem. B
110, 13166-13176, (2006). CHARMM99Brooks, B. etc, CHARMM: The
biomolecular simulation program. J. Comput. Chem. 30, 1545-1614,
(2009). and GROMOS1010Riniker, S., Christ, C. D., Hansen, H.
S., Hunenberger, P. H., Oostenbrink, C., Steiner, D., and van
Gunsteren, W. F. Calculation of relative free energies for
ligand-protein binding, solvation, and conformational transitions
using the GROMOS software. J. Phys. Chem. B 115, 13570-13577, (2011).
force fields. The software from Chemistry at Harvard Macromolecular
Mechanics (CHARMM) 33,1111Brooks BR, Bruccoleri
RE, Olafson BD, States DJ, Swaminathan S, Karplus M, ”CHARMM: A
program for macromolecular energy, minimization, and dynamics
calculations”. J Comp Chem. 4 (2): 187–217, (1983). is one of the
most mature algorithm for molecular dynamics, which minimizes the free
energy of a protein structure while collecting the molecular dynamics
trajectory of united-atom all-atom, dihedral potential corrected
variants and polarization. The Rosetta software1212Leaver-Fay, A.,
Tyka, M., Lewis, S. M., Lange, O. F., Thompson, J., Jacak, R.,
Kaufman, K., Renfrew, P. D., Smith, C. A., Sheffler, W., Davis, I. W.,
Cooper, S., Treuille, A., Mandell, D. J., Richter, F., Ban, Y. E.,
Fleishman, S. J., Corn, J. E., Kim, D. E., Lyskov, S., Berrondo, M.,
Mentzer, S., Popovic, Z., Havranek, J. J., Karanicolas, J., Das, R.,
Meiler, J., Kortemme, T., Gray, J. J., Kuhlman, B., Baker, D., and
Bradley, P., ROSETTA3: An object-oriented software suite for the
simulation and design of macromolecules. Methods Enzymol. 487,
545-574, (2011). developed by the Berkeley Open Infrastructure for
Network Computing Platform is one of de novo tools to predict
protein structure, which is assembled by Monte Carlo simulated annealing
procedure relying on a library of residue fragments.1313Kroese, D.
P.; Brereton, T.; Taimre, T.; Botev, Z. I., ”Why the Monte Carlo
method is so important today”. WIREs Comput Stat. 6: 386–392, (2014).
In practice, the protein structure prediction is efficient for
calculating smaller proteins, and requires vast computational resources
for larger proteins. The template-based methods, such as homology
modeling or comparative modeling, align sequences according to
similarity of multiple templates from PDB, and then process energy
optimization to predict protein 3D structure. With sequence homologous,
it assumes that similar sequences have similar folding conformations.
Depending on the homology modeling, I-TASSER1414Roy, A.,
Kucukural, A., and Zhang, Y., I-TASSER: A unified platform for
automated protein structure and function prediction. Nat. Protoc. 5,
725-738, (2010)., Robetta1515Kim, D. E., Chivian, D., and
Baker, D., Protein structure prediction and analysis using the Robetta
server. Nucleic Acids Res. 32, W526-W531, (2004). and
MODELER1616Eswar, N., Webb, B., Marti-Renom, M. A., Madhusudhan,
M. S., Eramian, D., Shen, M. Y., Pieper, U., and Sali, A., Comparative
protein structure modeling using MODELLER. Current Protocols in
Protein Science, Chapter 2, Unit 2.9, (2007), Wiley, New York.,1717Liu,
T., Tang, G. W., and Capriotti, E., Comparative Modeling: The state of
the art and protein drug target structure prediction. Comb. Chem. High
Throughput Screening 14, 532-537, (2011). software build protein for
unknown 3D structure. If there is not a distinguishably similar sequence
matched in PDB database, the template-free approaches will provide the
supplement for thermodynamics calculations. Recently, with a deep
learning in artificial intelligence (AI), AlphaFold approach was
particularly successful at predicting the most accurate structure and
with demonstration in CASP13 and CASP14.1818DeepMind’s
protein-folding AI has solved a 50-year-old grand challenge of
biology. MIT Technology Review. Retrieved (2020).,1919Sample,
Ian (2 December
2018). ”Google’s
DeepMind predicts 3D shapes of proteins”. The Guardian. Retrieved 30
November (2020).,2020 ”DeepMind’s
protein-folding AI has solved a 50-year-old grand challenge of
biology”. MIT Technology Review . (2020). AlphaFold
first handled the protein structure as a spatial graph with the residues
as nodes and the connection of residues as edges. Then, it trained the
system on all available protein 3D structures from PDB together with the
databases containing protein sequences of unknown structure. For
physical interactions within proteins, it created an attention-based
neural network system, and trained residue-to-residue and atom-atom
using an internal confidence measure. The protein structure was refined
by evolutionarily related multiple sequence alignment (MSA) and a
representation of amino acid residue pairs. With iterating process,
AlphaFold predicted the underlying physical structure of the protein and
is able to determine highly-accurate structures.
In second aspect, the objective of protein folding is to investigate
variations of conformations because the proteins in essence are
non-static structures, but rather conformational ensembles with multiple
states. With general knowledge, the protein adjusts the folding
conformations under different environments or interaction of ligand or
protein. Also, intrinsically disordered proteins and regions (IDPs/IDR)
are widely distributed in natural proteins, which are associated with
many biological processes and diseases.2121Chen J , Guo M , Wang X
, et al., A comprehensive review and comparison of different
computational methods for protein remote homology detection[J].
Briefings in Bioinformatics(2):2. 1–17, (2017). The IDPs/IDR for
protein 3D structures can be identified by many experimental
techniques.2222Robin van der Lee , etc. Classification of
intrinsically disordered regions and proteins.[J]. Chemical
Reviews, 114(13):6589, (2014). DisProt,2323Piovesan D, Tabaro
F, Micetic I, et al. DisProt 7.0: a major update of the database of
disordered proteins. Nucleic Acids Res, 45:D1123–4, (2017).
IDEAL2424Fukuchi S, Sakamoto S, Nobe Y, et al. IDEAL:
intrinsically disordered proteins with extensive annotations and
literature. Nucleic Acids Res; 40:D507–11, (2012). and
MobiD2525Potenza E, Di Domenico T, Walsh I, et al. MobiDB 2.0: an
improved database of intrinsically disordered and mobile proteins.
Nucleic Acids Res 2015;43:D315–20. are useful databases for IDP/IDR,
and PDB also provides the illustration. Moreover, under physiological
conditions, a native protein essentially is able to undergo a reversible
transition between disorder and order folding conformations. In 1973,
Anfinsen’s Nobel prize-winning experiments2626Anfinsen CB,
Principles that govern the folding of protein chains. Science 181:
223–230. showed that the protein ribonuclease can be reversibly
denatured and re-natured in a test tube, and then over thousands of
other proteins have been demonstrated with folding reversibility with
condition changes. The protein has folding reversibility because of
small energy barriers (5 to 15 kcal/mol) between the folded and unfolded
populations.2727Kumar MD, Bava KA, Gromiha MM, Prabakaran P,
Kitajima K, Uedaira H, Sarai A (2006) Nucleic Acids Res 34:D204–D206,
(1973). Different computational approach have been developed focusing
on the variability of protein folding. In the late 1970s, Karplus and
Weaver developed the diffusion-collision (DC) model,2828Karplus,
M., and Weaver, D. L., Protein-folding dynamics. Nature 260, 404-406,
(1976).,2929Karplus, M., and Weaver, D. L., Diffusion-collision
model for protein folding. Biopolymers 18, 1421-1437, (1979).,3030Islam,
S. A., Karplus, M., and Weaver, D. L., Application of the
diffusion-collision model to the folding of three-helix bundle
proteins. J. Mol. Biol. 318, 199-215, (2002).,3131Myers, J. K.,
and Oas, T. G., Preorganized secondary structure as an important
determinant of fast protein folding. Nat. Struct. Biol. 8, 552-558,
(2001). that explored the long-term protein evolution and allowed the
large amplitude changes in the folding dynamics. Later it was modified
into the foldon diffusion-collision (FDC)3232Fuxreiter, M., Simon,
I., Friedrich, P., and Tompa, P., Preformed structural elements
feature in partner recognition by intrinsically unstructured proteins.
J. Mol. Biol. 338, 1015-1026, (2004).,3333Compiani, M.,
Capriotti, E., and Casadio, R., Dynamics of the minimally frustrated
helices determine the hierarchical folding of small helical proteins.
Phys. Rev. E: Stat., Nonlinear, Soft Matter Phys. 69, 051905, (2004).,3434Stizza,
A., Capriotti, E., and Compiani, M., A minimal model of three-state
folding dynamics of helical proteins. J. Phys. Chem. B 109, 4215-4226,
(2005). which provided a more refined description of folding
transforms, including predicting the secondary native structure and
specifying stability of the foldons themselves. In 1977, the hydrophobic
collapse (HC) mechanism3535Dill, K. A., Theory for the folding and
stability of globular proteins. Biochemistry 24, 1501-1509, (1985).,3636Haran,
G., How, when and why proteins collapse: The relation to folding.
Curr. Opin. Struct. Biol. 22, 14-20, (2012). was developed to predict
that the hydrophobic forces and backbone forces result in chain collapse
prior to the formation of elements of secondary structure. Of course,
except hydrophobic, the hydrogen bonds and van der Waals forces are also
steering the unfolded protein toward a collapsed
configuration.3737Barbosa, M. A., Garcia, L. G., and Pereira de
Araujo, A. F., Entropy reduction effect imposed by hydrogen bond
formation on protein folding cooperativity: Evidence from a
hydrophobic minimalist model. Phys. Rev. E: Stat., Nonlinear, Soft
Matter Phys. 72, 051903, (2005). In 2000, Folding@Home project was
developed at Stanford University to compute the protein folding with
widely adopting the contribution of computing resource. As a huge number
of folding conformations, the molecular dynamics (MD) simulations is a
time-demanding process which rely on parallel supercomputing
architectures or using personal computing clusters.3838Zagrovic,
B., Snow, C. D., Shirts, M. R., and Pande, V. S., Simulation of
folding of a small α-helical protein in atomistic detail using
worldwide-distributed computing. J. Mol. Biol. 323, 927-937, (2002).,3939Adcock,
S. A., and McCammon, J. A., Molecular dynamics: survey of methods for
simulating the activity of proteins.Chem. Rev. 106, 1589-1615, (2006).,4040Rizzuti,
B., and Daggett, V., Using simulations to provide the framework for
experimental protein folding studies. Arch. Biochem. Biophys. 531,
128-135, (2013).,4141Daggett, V., Protein folding-simulation.
Chem. Rev. 106, 1898-1916, (2006). Anyway, the computational
approaches for all possible conformations to thoroughly resolve the
protein folding problem is now far less successful than was thought in
the early days, and it is still one of challenging subjects in the field
of protein physical science. Recently, Google’s DeepMind applied the
artificial intelligence (AI) and successfully developed Alphafold
approach which can regularly predict protein structures with atomic
accuracy competitive with experimental structures. It trained a neural
network to accurately predict the distances between pairs of residues in
a protein, and a protein was optimized by a simple gradient descent
algorithm to realize structures. As the achievement of Alphafold, more
scientific resource and attention are focusing on the resolution of
protein folding problem.
The protein folding information can be extracted from protein structure
databases. The PDB is the most inclusive repository of protein 3D
structures. So far, nearly 190,000 protein 3D structures have been
available in PDB, where approximately 90% are obtained by X-ray
crystallography and the remain by NMR, CryoTEM and other techniques. The
X-ray crystallography may determine accurate atomic coordination for 3D
structure, but it only represents a specific static protein folding
state. The NMR and CryoTEM display the protein flexibility that
structural oscillation is limited around an equilibrium state under
certain conditions. The Structure Classification of Proteins
(SCOP)4242http://scop.mrc-lmb.cam.ac.uk/scop database classifies
the protein structural domains into the hierarchy in terms of Species,
Protein, Family, Superfamily, Fold and Class. It defines 1,232 folds,
2,026 superfamilies and 4,919 families. If two protein domains have
similar secondary structures with the similar topological connections,
they belong to the same fold. The Class, Architecture, Topological fold
and Homologous superfamily (CATH) 4343http://www.cathdb.info
classifies 95 million of protein domains into 1,391 topological folds
and 6,119 superfamilies. If two proteins have similar topological fold
and sequence in conjunction with similar functions, they are assumed to
be associated with the same category in CATH. The ProTherm4444http://www.abren.net/protherm/
database is a source for understanding the protein folding stability
with the thermodynamic parameters for 25,830 structures, which includes
numerical data changes in Gibbs free energy, enthalpy, heat capacity and
transition temperature etc. Nevertheless, the crucial question is
whether the protein database can be directly utilized for the
investigation of protein folding. The first question is if current
protein structural data and future coming data are sufficient for fold
recognition, and the answer is negative.28 The second
question is whether the defined topological folding patterns (about 1200
types of folds in SCOP and near 1,400 in CATH database) are enough to
correlate the protein folding with the regulation of amino acid in
sequence, and the answer is insufficient. However, a number of
structural data from experimental and computational approaches should
assist to understand the protein folding in some degree. As a whole, the
longer fragments were hard thorough to investigate the folding patterns
because of the larger the folding prototype involving less universal
folding pattern. Therefore, to define a universal small folden as a
prototype, such as the backbone of 5 amino acid residues, may overcome
these obstacles to probe the folding patterns in protein structure
database.
Here, the protein structure fingerprint approach demonstrated a useful
means to describe complete protein folding conformations and to
construct explicit database for protein folding. In mathematical space,
the backbone of 5 points connection is adopted as a universal folden and
the complete folding space is described by 27 PFSC alphabetic letters.
In biological space, the possible folds of 5 of amino acid residues are
limited by constrains, and then different combinations of 5 of amino
acid residues have different folding number and patterns. Thus, a
database (5AAPFSC) was created to collect all folding shapes for all
combinations of 5 of amino acid residues. For protein, one PFSC string
represents a complete folding description, and one PFVM matrix
represents comprehensive folding variation. Based on PFVM, not only does
all possible folding conformations in astronomical number are obtained,
but the most possible conformations are also obtained. Therefore, the
protein structure fingerprint approach covers two aspects, it can
predict stable folding conformation as well discover variations of
folding conformation with massive number. Furthermore, the digital
alphabetic PFSC provides a simplified mode to resolve the protein
folding problem. As a result, the astronomical number of folding
conformations can be easily stored into a database for protein folding.
Thus, the protein structure fingerprint approach made a significant
foundation to solve protein folding problem.
Image visualization vs. alphabetic description.
Due to complexity of protein structure, the protein structure
fingerprint provided the PFSC alphabetical description to probe a huge
number of protein data, especially it is suitable to study the protein
folding conformations with an astronomical number. The protein 3D
structure data are originally obtained by experimental measurements or
computational approaches, which pursue to display 3D image visualization
for protein structure. For single protein, its 3D structural image is
displayed according thousand lines of atomic coordinates in the protein
data file. Although a protein 3D structure is directly perceived through
the senses to understand the folding orientation in space, it is not
easily to illustrate the features of protein folding features. For
comparison of proteins, with structural superposition, the similarity is
quantified by the root-mean-square deviation (RMSD) as score.
Nevertheless, it does not provide any detail where and how are similar
or dissimilar between proteins, and artificial process severely affect
the outcome. So, it is hard to explain the similarity and dissimilarity
between proteins with 3D image visualization.4545Fitzkee NC,
Fleming PJ, Gong H, Panasik N, Street TO, Rose GD. Are proteins made
from a limited parts list? Trends Biochem Sci; 30:73–80, 2005.,4646Irving
JA, Whisstock JC, Lesk AM. Protein structural alignments and
functional genomics. Proteins; 42:378–382, 2001.,4747Sam V,
Tai CH, Garnier J, Gibrat JF, Lee B, Munson PJ. ROC and confusion
analysis of structure comparison methods identify the main causes of
divergence from manual protein classification. BMC Bioinform; 7:206,
2006.,4848Yang J. Complete description of protein folding
shapes for structural comparison[J]. Proteomics Research Journal,
3(1):1-22, (2012).,4949Sarah A. Middleton, Joseph Illuminati &
Junhyong Kim, Complete fold annotation of the human proteome using a
novel structural feature space, Scientific Reports volume 7, Article
number: 46321 (2017). Furthermore, it is almost unimaginable to
construct an astronomical number of 3D conformations for a protein to
probe the protein folding problem, and to involve with billions of
protein sequences even worse. However, one-dimensional PFSC alphabetic
string provided a useful protocol to overcome these obstacles because it
makes easily store and study a massive number of protein conformations.
The PFSC alphabetic representation does not only simplify the
description of protein conformation, but also it can align a large
number of folding conformations for comparison. With advance, the PFSC
alphabetic string covers the regular secondary fragments as well as the
tertiary fragments, so it became a valuable approach to study the
protein conformations with an astronomical number.
The alphabetic description has been adopted following development of
protein structure study. Except to label regular secondary motifs of
alpha helixes and beta strands, many different methods have been
developed trying to label protein conformation more detail with
alphabetic description. Some methods adopted more alphabetic letters to
distinguish secondary structure motifs in detail which specified the
patterns of hydrogen bonds and geometric criteria, such as Cα distances,
Cα angles, dihedral angles between Cα atoms, or a pairs of ψ and φ
dihedral angles around a Cα atom.5050Kabsch W, Sander C.
Dictionary of protein secondary structure: pattern recognition of
hydrogen-bonded and geometrical features. Biopolymers; 22:2577–2637,
(1983).,5151Ridchards FM, Kundrot CE. Identification of
structural motifs from protein coordinate data: secondary structure
and first-level supersecondary structure. Proteins; 3:71–84, (1988).,5252Frishman
D, Argos P. Knowledge-based protein secondary structure. Proteins;
23:566–579, (1995).,5353Sklenar H, Etchebest C, Lavery R.
Describing protein structure: a general algorithm yielding complete
helicoidal parameters and aunique overall axis. Proteins; 6:46–60,
(1989).,5454Labesse G, Colloc’h N, Pothier J, Mornon JP. P-SEA:
a new efficient assignment of secondary structure from C alpha trace
of proteins. Comput Appl Biosci; 13:3:291–295, (1997).,5555Martin
J, Letellier G, Marin A, Taly JF, de Brevern AG, Gibrat JF. Protein
secondary structure assignment revisited: a detailed analysis of
different assignment methods. BMC Struct Biol; 5:17–34, (2005).
Other methods identified the patterns of structural segments with
observations from a large number of structures in training database, and
extracted certain motifs as folding prototypes by statistics adjustment
and then labeled with alphabetic letters.5656Fetrow JS, Palumbo
MJ, Berg G. Patterns, structures, and amino acid frequencies in
structural building blocks, a protein secondary structure
classification scheme. Proteins; 27:249–271, (1997).,5757Zhang
X, Fetrow JS, Berg G. Design of an auto-associative neural network
with hidden layer activations that were used to reclassify local
protein structures. In: Crabb VJ, editor. Advances in Protein
Chemistry. San Diego, CA: Academic Press; pp 397–404 (1994).,5858Brevern
AG, Etchebest C, Hazout S. Bayesian probabilistic approach for
predicting backbone structures in terms of protein blocks. Proteins;
41:271–287, (2000).,5959Alexandre G, de Brevern1, Valadie´ H,
Hazout S, Etchebest C. Extension of a local backbone description using
a structural alphabet: a new approach to the sequence-structure
relationship. Prot Sci; 11:2871–2886, (2002).,6060Fourrier L,
Benros C, Brevern AG. Use of a structural alphabet for analysis of
short loops connecting repetitive structures. BMC Bioinform; 5:58,
(2004).,6161Joseph A P, Srinivasan N, Brevern A G D.
Improvement of protein structure comparison using a structural
alphabet[J]. Biochimie, 93(9):1434, (2011). So far, most of
alphabetic methods adopted 9-16 letters to describe various folding
protocols with different lengths in fragments. Nevertheless, none of
methods guarantee to provide a complete coverage for all possible
folding patterns due to ignoring some of fragment motifs, such as
irregular loops and coils or uncommon folding shape with rare
appearances in structures, etc. However, the PFSC overcome the
shortcomings, it provided a set of 27 alphabetical letters to cover all
possible folds for successive 5 amino acid residues, and a PFSC string
describe the complete folding conformation without gaps from N-terminus
to C-terminus including regular secondary fragments and irregular
tertiary fragments.
The protein structure fingerprint can describe the folding conformations
with alphabetic description, no matter what the protein 3D structure is
known or unknown. For protein with known 3D structure, the folding shape
of each of 5 amino acid residues is assigned by one of PFSC letter
according the atomic coordinates, and then the conformation of entire
protein is expressed by a PFSC string. For protein without known 3D
structure, the comprehensive folding variations for a protein are able
simultaneously to be observed by the PFSC letters in PFVM with
impressiveness covering all at one glance. Also, an astronomical number
of folding conformations for a protein can be assembled with various
PFSC letters in PFVM. Furthermore, any PFSC string represents one of
folding conformations, and it can be conversely converted into 3D
structure.
The alphabetic letters provide a brief description for biological
structure in macromolecule system. The DNA polymer applies four letters
(C, G, A and T) to describe the backbone strand comprised of four
deoxyribonucleic acids in genetic code. The protein polymer applies 20
of amino acids with single letters to describe one-dimensional sequence.
Biological structure is embedded in assembly processes, from
one-dimensional DNA, mRNA to protein sequence until protein folding. In
the first step, the genetic information is stored in the DNA sequence
and transmitted through transcriptional and translated into
one-dimension protein sequence. In the second step, the protein is
folded from one-dimensional sequence to 3D structure for expressing the
vitality of life. To date, however, the knowledge and understanding of
protein folding lag far behind the DNA and protein sequences. The
protein structure fingerprint made a significant progress which applied
a set of 27 PFSC letters to describe protein folding. Thus, the PFSC
perfectly matched alphabetic description of DNA and protein sequence,
and it is possible to integrate the huge data of protein folding
conformations with DNA or mRNA sequence and protein sequence.
Protein folding vs. the order of amino acids in sequence
It is well known that the protein folding in principal depends on the
order of amino acids in sequence. Although researchers confirmed this
principal with many biological experiments, it lacks a systematical
depiction in bioinformatics aspect. Also, it is not easy to clearly
illustrate how the order of amino acids in sequence affects the folding
changes in protein. However, with a universal process, the PFVM
integrally displays the correlation between protein folding changes and
sequence variations. Generally, different protein sequences will have
different folding patterns in PFVM. The folding pattern difference is
presented in several aspects in PFVM even if only one amino acid was
substituted. The differences include the changes of the types of folding
shapes as well as the number of possible folding. Also, if one of amino
acid is substituted, it will not only cause PFSC letter changes in one
column, but a band of 5 columns in PFVM. These changes in PFVM well
demonstrated that the protein folding depended on the order of amino
acids in sequence.
The PFVM characteristically display the local folding variations along
the sequence. The numbers changes of local folding shapes display the
analogous fluctuation spectrum, and indicate some portions of protein
with more flexible while other portions with less flexible. The
fluctuation curves of numbers of local folding shapes for protein
PDCD1_MOUSE and PDCD1_HUMAN are shown in Figure 6. First, each curve
exposed how the folding flexibility following the order of amino acids
in sequence. Second, both curves are different because of the
differentiation of amino acids in sequences. At least, 4 locations in
curves (35-43, 78-91, 139-151 and 170-179 in sequence) have the opposite
tendency for the vibration of numbers of local folding variations. Thus,
a fluctuation curve from PFVM concretely indicates how the protein
folding relates the order of amino acid in sequence. Thus, the PFVM is a
useful tool to probe the protein mutation, protein differentiation,
protein design, protein prediction and protein misfolding etc.