Discussion
Being a busy environment, thousands of molecules constantly interact in
the cell and through information exchange define the cellular metabolic
state. Among all cellular homeostasis contributors, proteins are both
the most active and most abundant [50] therefore, understanding
their interactions and delineating their information sharing mechanism
is essential for a detailed comprehension of cellular functionality.
This further provides the first approach towards rational therapeutic
agent development against many incapacitating or deadly diseases
[51]. Despite the advances in structure determination through
experimental methods, most of the known protein-protein interactions
still have no atomic structure. NMR spectroscopy and X-ray
crystallography, both of which are high resolution techniques struggle
with high-throughput demand, while low resolution methods like the
small-angle X-ray scattering and cryo-electron microscopy provide
excessively coarse data. The development of molecular docking or
computational structure prediction was first aimed at complementing
experimental results but has since developed into a lively and
independent research field [52].
Elucidating the organization and structural architecture of the CCAN is
crucial for the understanding of the functionality and assembly of the
kinetochore. The CENP-H, -I, -K and -M, among other subunits of the CCAN
have previously been reported to form a stable complex based on
reconstitution experiments and proteomic analyses [37, 53, 54, 55, 56,
57]. Our study for the first time present a computationally modeled
high quality structure of the human CENP-HIKM complex (Figure 6)
alongside a detailed report of the inter- and intra-residue
interactions. Previously reported computational model of thehs CENP-I suggests that it assumes a fold in form of an α-solenoid
which shares resemblance with the folding of β-importin [37, 58,
59]. The hs CENP-I N-terminal domain (composed of residues
57-281) was also reported to be sufficient enough for the binding of thehs CENP-H and hs CENP-K while the hs CENP-M
sufficiently binds to the C-terminal domain. Contiguity between CENP-H,
-I, and -K was hypothesized on the basis of proteomic analysis involving
precipitates from phenotypic similarities as a result of individual
subunit depletion, from 2-hybrid interaction data and from cell lysates
[13, 60]. Additional analyses suggest that the revealed complex
interaction is a representation of the evolutionarily conserved
assembling mechanism of the CENP-HIK complex [14].
Structures of biologically essential proteins are consistently on a high
demand, especially the large proteins and those that are members of
complex systems. It is however not always feasible, for numerous
reasons, to experimentally generate high resolution structures using the
NMR, cryo-electron microscopy or X-ray crystallography. Among the
numerous challenges are the poor diffraction of crystals, high
aggregation and low stability of proteins [61]. In silicomolecular modeling in this situation can provide a high quality
alternative for experimental research. One of the most challenging
computational biology problems has been shown to be the De novo
structure prediction of proteins only from amino acid sequences
[32]. Recent advances in the field has revealed that some
accurately-predicted long range contacts may permit correct
topology-level structural modeling [62] and that the DCA (direct
evolutionary coupling analysis) for most multiple sequence alignments
may generate appreciable amount of long range native contacts for
protein-protein interactions and proteins with a large number of
homologous sequences [63, 64]. We have therefore employed the
contact-assisted folding of proteins and contact prediction in the
modeling of each subunit of the hs CENP-HIK 3D structure (Figure
1, Supplementary Figures S1 and S2).
Significant improvement has been made towards the generation of
potential protein-protein interaction networks with the use of mass
spectrometry, yeast two-hybrid assays [65] and high-throughput
proteomics studies [66, 67]. X-ra crystallography-obtained
atomic-level details are frequently required for the mechanistic
interpretation of observed interactions. However, the occurrence of most
biologically relevant interactions are in transient protein complexes,
which makes the experimental determination of their structures largely
difficult, even when structures of the interacting partners are known.
Computational docking approaches have therefore been designed for the
structural prediction of protein complexes with an accuracy similar to
that provided by X-ray crystallography [68, 69]. A substantial
amount of models with well defined atomic positions are usually
generated after protein-protein docking protocols, but the currently
available scoring functions possess low predictive accuracy for a
reliable discrimination of models, and most often, models closest to the
native structure are not easily detected solely through computational
tools [69]. However, our near-native model selection in this study
was guided by the architectural similarity of each generated model with
the fungal and yeast orthologs of the protein complex, previously
reported to be evolutionarily conserved (Figure 5).
The main cellular functions such as DNA replication, transcription,
translation, protein folding and turnover, are directed by large
macromolecular complexes such as proteasomes, chaperonins, ribosomes and
polymerases. The mechanism of action of these macromolecules are often
dynamical and require collective and large conformational changes
[70]. Normal mode analysis is an approach that can be used for the
description of the accessible flexible states of a protein around an
equilibrium position based on small oscillation physics. When a
macromolecule in a minimum energy conformation is perturbed slightly, a
force is activated to restore the system back to its state of
equilibrium [71]. There is always an equal division of vibrational
energy in the system so that all vibrational modes have equal energy and
the average amplitude of oscillation for any given mode scales as the
inverse of its frequency. Thus, higher frequency modes with
energetically greater displacement typically describe fast but small
local amplitude movement relatively involving fewer atoms, while lower
frequency modes describe slow displacements and changes in conformation
on a large scale with the involvement of larger number of atoms
[72]. Coarse-grained models merged with normal mode analysis has
proven to be a popular and powerful substitute for the collective motion
simulation of macromolecular complexes at extended timescales. In
addition to the conformational sampling and motion dynamics
visualization (Supplementary Figure S4 and S5), the normal mode analysis
result also suggest that the hypothetical protein model assumes a stable
conformation (Figure 7).
An essential prerequisite for regular biological function is the ability
of a protein to establish inordinately selective interactions with its
macromoleclar partner. Sequence mutations that changes protein
interactions may lead to a complete functional abolishment or result
into a significant perturbation [73]. A feasible method to evaluate
mutational effect on the binding affinity of proteins is to
experimentally quantify it. However, while site-directed mutagenesis
methodologies are fast and inexpensive, FRET, isothermal titration
calorimetry, surface plasmon resonance and other methods used for
binding affinity measurements can be costly and time-consuming [74].
We have therefore directed computational approaches towards the
prediction of binding affinity changes upon mutation (Tables 1-6,
Supplementary Tables 1-6), which has shown great consistency with
results from earlier reported experimental mutagenesis studies. Our
interatomic interaction visualization study also provided insights into
the molecular nature of the studied interactions and likewise the
comprehension of the functional and structural impact of each mutation
(Tables 7 and 8, Supplementary Figures S8-S10).