Figure 1. Protein folding shape code (PFSC) and protein folding variation matrix (PFVM). The green arrows indicated the process to construct 5AAPFSC database, which contained all folding shapes in PFSC letters for 3,200,000 of permutations of 5 amino acids. The blue arrows indicated the process how to obtain the PFSC string from a protein with known 3D structure. The red arrows indicated the process how to obtain the PFVM from a protein sequence.
Protein Folding Variation Matrix (PFVM).
With protein folding fingerprint, the PFVM assembles the local folding variations along sequence, and it can construct an astronomical number of conformations while define the most possible conformation. Firstly, a database, which collects all possible folding shapes in PFSC letters, is created. Based on 20 of amino acids, there are 3,200,000 of permutations for 5 amino acids. All folding shapes for each permutations of 5 amino acids are collected from database or calculations. The folding shapes for most of permutations of 5 amino acids were firstly collected from 3D structural data in PDB. For the permutations of 5 amino acids do not exist in PDB, the folding shapes were computed by molecular dynamics simulation method and stored into a complementary database. Then, all folding shapes for 3,200,000 of permutations of 5 amino acids were converted into the PFSC alphabetic letters, and stored into a database named as 5AAPFSC. The procedure is indicated by green arrows in Figure 1. Actually, most of 5 amino acids have more than one folding shape, but the maximum number would not be more than 27. Each folding shape for 5 amino acids, however, have different weight according the frequency of appearance in PDB or the free energy for thermodynamic stability in results of computational simulation. Thus, the folding shapes for each set of 5 amino acids, which have higher frequency and lower free energy, are considered with most probability in folding conformation, and are assembled at the top rank in 5AAPFSC database.
According to protein sequence, the PFVM is constructed by extraction local folding variations from 5AAPFSC database. The local folding variations, which are represented by a set of PFSC letters for 5 successive amino acids, are extracted from 5AA-PFSC database and displayed in vertical column. Along sequence from N-terminus to C-terminus, the PFVM is formed. The procedure from a sequence to PFVM is indicated by red arrows in Figure 1. A diagrammatic sketch in Figure 2 illustrated in detail how the local folding variations were assigned according a protein sequence. The local folding shapes for each 5 consecutive amino acids along sequences are directly acquired from database 5AAPFSC. Starting from the first set of 5 amino acid residues at N-terminus, each step moves forward by one residue with sharing 4 amino acids from prior step until C-terminus. Therefore, the PFVM is generated to reveal the local folding variations for entire protein, the PFSC letters in each column show all possible local folding shapes for a set of 5 amino acids which are extracted from 5AAPFSC database. With PFVM, an astronomical number of PFSC strings can be constructed by taking any one PFSC letter from each column. Also, the most possible folding conformations can be predicted by the PFSC string which is consisted of the PFSC letter on top of each columns in PFVM. Furthermore, according PFSC string, the 3D structure of most possible conformation can be constructed as the predicted result. Therefore, the PFVM provides the significant information to study the protein fading problem.