The goal of CASP experiments is to monitor the progress in the protein structure prediction field. During the 14th CASP edition we aimed to test our capabilities of predicting structures of protein complexes. Our protocol for modeling protein assemblies included both template-based modeling and free docking. Structural templates were identified using sensitive sequence-based searches. If sequence-based searches failed, we performed structure-based template searches using selected CASP server models. In the absence of reliable templates we applied free docking starting from monomers generated by CASP servers. We evaluated and ranked models of protein complexes using an improved version of protein structure quality assessment method, VoroMQA, taking into account both interaction interface and global structure scores. If reliable templates could be identified, generally accurate models of protein assemblies were generated with the exception of an antibody-antigen interaction. The success of free docking mainly depended on the accuracy of initial subunit models and on the scoring of docking solutions. To put our overall results in perspective, we analyzed our performance in the context of other CASP groups. Although the subunits in our assembly models often were not of the top quality, these models had, overall, the best predicted interfaces according to several protein-protein interface accuracy measures. Since we did not use co-evolution-based prediction of inter-chain contacts, we attribute our relative success in predicting interfaces primarily to the emphasis on the interaction interface when modeling and scoring.
The novel coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) still has serious negative effects on health, social life, and economics. Recently, vaccines from various companies have been urgently approved to control SARS-CoV-2 infections. However, any specific antiviral drug has not been confirmed so far for regular treatment. An important target is the main protease (Mpro), which plays a major role in replication of the virus. In this study, Gaussian and residue network models are employed to reveal two distinct potential allosteric sites on Mpro that can be evaluated as drug targets besides the active site. Then, FDA-approved drugs are docked to three distinct sites with flexible docking using AutoDock Vina to identify potential drug candidates. 14 best molecule hits for the active site of Mpro are determined. 6 of these also exhibit high docking scores for the potential allosteric regions. Full-atom molecular dynamics simulations with MM-GBSA method indicate that compounds docked to active and potential allosteric sites form stable interactions with high binding free energy (∆Gbind) values. ∆Gbind values reach -52.06 kcal/mol for the active site, -51.08 kcal/mol for the potential allosteric site 1, and -42.93 kcal/mol for the potential allosteric site 2. Energy decomposition calculations per residue elucidate key binding residues stabilizing the ligands that can further serve to design pharmacophores. This systematic and efficient computational analysis successfully determines ivermectine, diosmin and selinexor currently subjected to clinical trials, and further proposes bromocriptine, elbasvir as Mpro inhibitor candidates to be evaluated against SARS-CoV-2 infection
Recently, a bacterium strain of Ideonella sakaiensis was identified with the uncommon ability to degrade the poly(ethylene terephthalate) (PET). The PETase from I. sakaiensis strain 201-F6 catalyzes the hydrolysis of PET converting it to mono(2-hydroxyethyl) terephthalic acid (MHET), bis(2-hydroxyethyl)-TPA (BHET), and terephthalic acid (TPA). Despite the potential of this enzyme for mitigation or elimination of environmental contaminants, one of the limitations of the use of PETase for PET degradation is the fact that it acts only at moderate temperature due to its low thermal stability. Besides, molecular details of the main interaction of PET in the active site of PETase remains unclear. Herein, molecular docking and molecular dynamics (MD) simulations were applied to analyze structural changes of PETase induced by PET binding. Results from the essential dynamics revealed that β1-β2 connecting loop is very flexible. This Loop is located far from the active site of PETase and we suggest that it can be considered for mutagenesis in order to increase the thermal stability of PETase. The free energy landscape (FEL) demonstrates that the main change in the transition between the unbounded to the bounded state is associated with β7-α5 connecting loop, where the catalytic residue Asp206 is located. Overall, the present study provides insights into the molecular binding mechanism of PET into the PETase structure and a computational strategy for mapping flexible regions of this enzyme, which can be useful for the engineering of more efficient enzymes for recycling the plastic polymers using biological systems.
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has caused substantially more infections, deaths, and economic disruptions than the 2002-2003 SARS-CoV. The key to understanding SARS-CoV-2’s higher infectivity lies partly in its host receptor recognition mechanism. Experiments show that the human ACE2 protein, which serves as the primary receptor for both CoVs, binds to the receptor binding domain (RBD) of CoV-2’s spike protein stronger than SARS-CoV’s spike RBD. The molecular basis for this difference in binding affinity, however, remains unexplained from X-ray structures. To go beyond insights gained from X-ray structures and investigate the role of thermal fluctuations in structure, we employ all-atom molecular dynamics simulations. Microseconds-long simulations reveal that while CoV and CoV-2 spike-ACE2 interfaces have similar conformational binding modes, CoV-2 spike interacts with ACE2 via a larger combinatorics of polar contacts, and on average, makes 45\% more polar contacts. Correlation analysis and thermodynamic calculations indicate that these differences in the density and dynamics of polar contacts arise from differences in spatial arrangements of interfacial residues, and dynamical coupling between interfacial and non-interfacial residues. These results recommend that ongoing efforts to design spike-ACE2 peptide blockers will benefit from incorporating dynamical information as well as allosteric coupling effects.
Structure-based computational protein design (CPD) refers to the problem of finding a sequence of amino acids which folds into a specific desired protein structure, and possibly fulfills some targeted biochemical properties. Recent studies point out the particularly rugged CPD energy landscape, suggesting that local search optimization methods should be designed and tuned to easily escape local minima attraction basins. In this paper, we analyze the performance and search dynamics of an iterated local search (ILS) algorithm enhanced with partition crossover. Our algorithm, PILS, quickly finds local minima and escapes their basins of attraction by solution perturbation. Additionally, the partition crossover operator exploits the structure of the residue interaction graph in order to efficiently mix solutions and find new unexplored basins. Our results on a benchmark of 30 proteins of various topology and size show that PILS consistently finds lower energy solutions compared to Rosetta fixbb and a classic ILS, and that the corresponding sequences are mostly closer to the native.
Protein structure networks (PSNs) have long been used to provide a coarse yet meaningful representation of protein structure, dynamics, and internal communication pathways. An important question is what criteria should be applied to construct the network so that to include relevant interresidue contacts while avoiding unnecessary connections. To address this issue we systematically considered varying residue distance cutoff length and the probability threshold for contact formation to construct PSNs based on atomistic molecular dynamics in order to assess the amount of mutual information within the resulting representations. We found that the minimum in mutual information is universally achieved at the cutoff length of 5 Å, irrespective of the applied contact formation probability threshold in all considered, distinct proteins. Assuming that the optimal PSNs should be characterised by the least amount of redundancy, which corresponds to the minimum in mutual information, this finding suggests an objective criterion for cutoff distance and supports the existing preference towards its customary selection around 5 Å length, typically based to date on heuristic criteria.
The multi-domain bacterial S1 protein is the largest and most functionally important ribosomal protein of the 30S subunit, which interacts with both mRNA and proteins. The family of ribosomal S1 proteins differs in the classical sense from a protein with tandem repeats and has a “bead-on-string” organization, where each repeat is folded into a globular domain. Based on our recent data, the study of evolutionary relationships for the bacterial phyla will provide evidence for one of the proposed theories of the evolutionary development of proteins with structural repeats: from multiple repeats of assembles to single repeats, or vice versa. In this comparative analysis of 1333 S1 sequences that were identified in 24 different phyla; we demonstrate how such phyla can independently/dependently form during evolution. To our knowledge, this work is the first study of the evolutionary history of bacterial ribosomal S1 proteins. The collected and structured data can be useful to computer biologists as a resource for determining percent identity, amino acid composition and logo motifs, as well as dN/dS ratio in bacterial S1 protein. The obtained research data suggested that the evolutionary development of bacterial ribosomal proteins S1 evolved from multiple assemblies to single repeat. The presented data are integrated into the server, which can be accessed at http://oka.protres.ru:4200.
SARS-CoV-2 is neutralized by proteins that block receptor-binding sites on spikes that project from the viral envelope. In particular, substantial research investment has advanced monoclonal antibody therapies to the clinic where there are signs of partial efficacy in reducing viral burden and hospitalization. An alternative is to use the host entry receptor, ACE2, as a soluble decoy that broadly blocks SARS-associated coronaviruses with limited potential for viral escape. Here, we summarize efforts to engineer higher affinity variants of soluble ACE2 that rival the potency of affinity-matured antibodies. Strategies have also been used to increase the valency of ACE2 decoys for avid spike interactions and to improve pharmacokinetics via IgG fusions. Finally, the intrinsic catalytic activity of ACE2 for the turnover of the vasoconstrictor angiotensin II may directly address COVID-19 symptoms and protect against lung and cardiovascular injury, conferring dual mechanisms of action unachievable by monoclonal antibodies. Soluble ACE2 derivatives therefore have the potential to be next generation therapeutics for addressing the immediate needs of the current pandemic and possible future outbreaks.
Multi-domain proteins are not only formed through natural evolution but can also be generated by recombinant DNA technology. Because many fusion proteins can enhance the selectivity of cell targeting, these artificially produced molecules, called multi-specific biologics, are promising drug candidates, especially for immunotherapy. Moreover, the rational design of domain linkers in fusion proteins is becoming an essential step toward a quantitative understanding of the dynamics in these biopharmaceutics. We developed a computational framework to characterize the impacts of peptide linkers on the dynamics of multi-specific biologics. We constructed a benchmark containing six types of linkers that represent various lengths and degrees of flexibility and used them to connect two natural proteins as a test system. The microsecond dynamics of these proteins generated from Anton were projected onto a coarse-grained conformational space. The similarity of dynamics among different proteins in this low-dimensional space was further analyzed by a neural network model. Finally, hierarchical clustering was applied to place linkers into different subgroups based on the neural network classification results. The clustering results suggest that the length of linkers used to spatially separate different functional modules plays the most important role in regulating the dynamics of this fusion protein. Given the same number of amino acids, linker flexibility functions as a regulator of protein dynamics. In summary, we illustrated that a new computational strategy can be used to study the dynamics of multi-domain fusion proteins by a combination of long timescale molecular dynamics simulation, coarse-grained modeling, and artificial intelligence.
Cysteine (Cys) is the most reactive amino acid participating in a wide range of biological functions. In-silico predictions complement the experiments to meet the need of functional characterization. Multiple Cys function prediction algorithm is scarce, in contrast to specific function prediction algorithms. Here we present a deep neural network-based multiple Cys function prediction, available on web-server (DeepCys) (https://deepcys.herokuapp.com/). DeepCys model was trained and tested on two independent datasets curated from protein crystal structures. This prediction method requires three inputs, namely, PDB identifier (ID), chain ID and residue ID for a given Cys and outputs the probabilities of four cysteine functions, namely, disulphide, metal-binding, thioether and sulphenylation and predicts the most probable Cys function. The algorithm exploits the local and global protein properties, like, sequence and secondary structure motifs, buried fractions, microenvironments and protein/enzyme class. DeepCys outperformed most of the multiple and specific Cys function algorithms. This method can predict maximum number of cysteine functions. Moreover, for the first time, explicitly predicts thioether function. This tool was used to elucidate the cysteine functions on domains of unknown functions (DUFs) belonging to cytochrome C oxidase subunit-II (COX2) like transmembrane domains. Apart from the web-server, a standalone program is also available on GitHub (https://github.com/vam-sin/deepcys)
The assignment of protein secondary structure elements (SSEs) underpins the structural analysis and prediction. The backbone of a protein could be adequately represented using a pc-polyline that passes through the centers of its peptide planes. One salient feature of pc-polyline representation is that the secondary structure of a protein becomes recognizable in a matrix whose elements are the pairwise distances between two peptide plane centers. Thus a pc-polyline could in turn be used to assign SSEs. Using convolutional neuron network (CNN) here we confirm that a pc-polyline indeed contains enough information for it to be used for the accurate assignments of six types of secondary structure elements: α-helix, β-sheet, β-bulge, 3 10 -helix, turn and loop. The applications to three large data sets show that the assignments made by our CNN-based P2PSSE program agree very well with those by DSSP , STRIDE and quite well with those by five other programs. The analyses of the assignments by P2PSSE and those by other programs raise some general questions about the characterizations of protein secondary structure. In particular the analyses illustrate the difficulty with giving a quantitative and consistent definition for each of the six SSE types especially for 3_10 -helix, β-bulge, turn or loop in terms of either backbone H-bond patterns, or backbone dihedral angles, or Cα -polylines or pc-polylines. The difficulty suggests that the SSE space though being dominated by the regions for the six SSE types is to a certain degree continuous.
The mitochondrial F1FO-ATPase in the presence of the natural cofactor Mg2+ acts as the enzyme of life by synthesizing ATP, but it can also hydrolyze ATP to pump H+. Interestingly, Mg2+ can be replaced by Ca2+, but only to sustain ATP hydrolysis and not ATP synthesis. When Ca2+ inserts in F1, the torque generation built by the chemomechanical coupling between F1 and the rotating central stalk was reported as unable to drive the transmembrane H+ flux within FO. However, the failed H+ translocation is not consistent with the oligomycin-sensitivity of the Ca2+-dependent F1FO-ATP(hydrol)ase. New enzyme roles in mitochondrial energy transduction are suggested by recent advances. Accordingly, the structural F1FO-ATPase distortion driven by ATP hydrolysis sustained by Ca2+ is consistent with the permeability transition pore signal propagation pathway. The Ca2+-activated F1FO-ATPase, by forming the pore, may contribute to dissipate the transmembrane H+ gradient created by the same enzyme complex.
Human histone H1 subtypes interaction networks was constructed to show a spectrum of their activities realized through the protein-protein interactions. Histone H1 subtypes participate in over half a thousand interactions with nuclear and cytosolic proteins engaged in the enzymatic activity and binding of nucleic acids and proteins. Small scale networks created by H1 subtypes are similar in their topological parameters (p > 0.05) but hub proteins of the networks formed with subtype H1.1 and H1.4 differ from those of subtype H1.3 and H1.5 in the closeness centrality, clustering coefficient and neighborhood connectivity (p < 0.05). Molecular function and biological process of the networks hubs is related to RNA binding and ribosome biogenesis (subtype H1.1 and H1.4), cell cycle and cell division (subtype H1.3 and H1.5) and protein ubiquitination and degradation (subtype H1.2). Such a disparity between H1 subtypes is also manifested by enriched GO terms of their interacting proteins. The residue propensity and secondary structures of interacting surfaces as well as a value of equilibrium dissociation constant indicate that a type of H1 subtypes interactions is transient in term of the stability and medium-strong in relation to the strength of binding. Histone H1 subtypes bind interacting partners in the intrinsic disorder–dependent mode, according to the coupled folding and binding and mutual synergistic folding mechanism. These results evidence that multifunctional H1 subtypes operate via protein interactions in the networks of crucial cellular processes and, therefore, confirm a new histone H1 paradigm relating to its functioning in the protein-protein interaction networks.
Crystallographic B-factors provide direct dynamical information on the internal mobility of proteins that is closely linked to function, and are also widely used as a benchmark in assessing elastic network models. A significant question in the field is: what is the exact amount of thermal vibrations in protein crystallographic B-factors? This work sets out to answer this question. First, we carry out a thorough, statistically sound analysis of crystallographic B-factors of over 10,000 structures. Second, by employing a highly accurate all-atom model with the well-known CHARMM force field, we obtain computationally the magnitudes of thermal vibrations of nearly 1,000 structures. Our key findings are: (i) the magnitude of thermal vibrations, surprisingly, is nearly protein-independent, as a corollary to the universality in vibrational spectra of globular proteins established earlier; (ii) the magnitude of thermal vibrations is small, less than 0.1 Å2 at 100 K; (iii) the percentage of thermal vibrations in B-factors is the lowest at low resolution and low temperature (<10%) but increases to as high as 60% for structures determined at high resolution and at room temperature. The significance of this work is that it provides for the first time, using an extremely large dataset, a thorough analysis of B-factors and their thermal and static disorder components. The results clearly demonstrate that structures determined at high resolution and at room temperature have the richest dynamics information. Since such structures are relatively rare in the PDB database, the work naturally calls for more such structures to be determined experimentally.
Normal Mode Analysis is a fast and inexpensive approach that is largely used to gain insight into functional protein motions, and more recently to create conformations for further computational studies. However, when the protein structure is unknown, the use of computational models is necessary. Here, we analyze the capacity of normal mode analysis in internal coordinate space to predict protein motion, its intrinsic flexibility and atomic displacements, using protein models instead of native structures, and the possibility to use it for model refinement. Our results show that normal mode analysis is quite insensitive to modelling errors, but that calculations are strictly reliable only for very accurate models. Our study also suggests that internal normal mode analysis is a more suitable tool for the improvement of structural models, and for integrating them with experimental data or in other computational techniques, such as protein docking or more refined molecular dynamics simulations.
Deep learning has emerged as a revolutionary technology for protein residue-residue contact prediction since the 2012 CASP10 competition. Considerable advancements in the predictive power of the deep learning-based contact predictions have been achieved since then. However, little effort has been put into interpreting the black-box deep learning methods. Algorithms that can interpret the relationship between predicted contact maps and the internal mechanism of the deep learning architectures are needed to explore the essential components of contact inference and improve their explainability. In this study, we present an attention-based convolutional neural network for protein contact prediction, which consists of two attention mechanism-based modules: sequence attention and regional attention. Our benchmark results on the CASP13 free-modeling (FM) targets demonstrate that the two attention modules added on top of existing typical deep learning models exhibit a complementary effect that contributes to predictive improvements. More importantly, the inclusion of the attention mechanism provides interpretable patterns that contain useful insights into the key fold-determining residues in proteins. We expect the attention-based model can provide a reliable and practically interpretable technique that helps break the current bottlenecks in explaining deep neural networks for contact prediction. The source code of our method is available at https://github.com/jianlin-cheng/InterpretContactMap.
NADPH:protochlorophyllide (Pchlide) oxidoreductase (POR) is a key enzyme of chlorophyll biosynthesis in angiosperms. It is one of few known photoenzymes, which catalyzes the light-activated trans-reduction of the C17-C18 double bond of Pchlide’s porphyrin ring. Due to the light requirement, dark-grown angiosperms cannot synthesize chlorophyll. No crystal structure of POR is available, so to improve understanding of the protein’s three-dimensional structure, its dimerization, and binding of ligands (both the cofactor NADPH and substrate Pchlide), we computationally investigated the sequence and structural relationships among homologous proteins identified through database searches. The results indicate that α4 and α7 helices of monomers form the interface of POR dimers. On the basis of conserved residues, we predicted 11 functionally important amino acids that play important roles in POR binding to NADPH. Structural comparison of available crystal structures revealed that they participate in formation of binding pockets that accommodate the Pchlide ligand, and that five atoms of the closed tetrapyrrole are involved in non-bonding interactions. However, we detected no clear pattern in the physico-chemical characteristics of the amino acids they interact with. Thus, we hypothesize that interactions of these atoms in the Pchlide porphyrin ring are important to hold the ligand within the POR binding site. Analysis of Pchlide binding in POR by molecular docking and PELE simulations revealed that the orientation of the nicotinamide group is important for Pchlide binding. These findings highlight the complexity of interactions of porphyrin-containing ligands with proteins, and we suggest that fit-inducing processes play important roles in POR-Pchlide interactions.
One way in which trichocyte keratin intermediate filament proteins (keratins) and keratin associated proteins (KAPs) differ from their epithelial equivalents is in their higher levels of cysteine residues. Interactions between these cysteine residues within a mammalian fiber, and the putative regular organization of interactions (i.e., types of disulfide bond) are likely important for defining fiber mechanical properties, and thus biological functionality of hairs. Here we extend a previous study of cysteine accessibility under different levels of exposure to reducing compounds to explore a finer set of levels associated with interactions between keratins and KAPs. We found that most of the cysteines in the KAPs were close to either the N- or C- terminal domains of these proteins. The most accessible cysteines in keratins were present in the head or tail domains indicating their function in readily forming intermolecular bonds with KAPs. Some of the more buried cysteines in keratins were discovered either close to or within the rod region in positions previously identified in human epithelial keratins as being involved in crosslinking between the heterodimers of the tetramer. Our present study therefore provides a deeper understanding of the accessibility of disulfides especially in keratins and thus proves that there is some specificity to the disulfide bond interactions leading to these intermolecular bonds stabilizing the fiber structure.