Introduction
In nature, Host-pathogen Protein-Protein interactions (HP-PPIs) are
highly complex, ubiquitous and fairly essential for elucidation of
infectious diseases (1). During this interaction, there is a continuous
cross talk between pathogens and their hosts that is mediated by a
variety of effectors including proteins, small molecules, metabolites,
and regulatory RNAs(2, 3). Pathogenesis involves interactions between
the signalling networks of the host and pathogen. Recent studies
regarding HP-PPIs focus on the mechanisms employed by pathogens to
hijack and exploit the host immune system for their own survival.
Processes for molecular mimicry have evolved to enable the proteins of
pathogens to imitate the host proteins in order to disrupt their
interactions and disturb the signalling pathways (4). Thus, the
interacting pathways and proteins of the pathogen may be conceived to be
in a continuum with those of the host.
Mimicry of host antigenic determinants as a survival mechanism was
described early in parasites (5). A pathogen’s ability to mimic the host
components may be achieved by two distinct mechanisms. The first one is
where the host genes are acquired by the pathogen through horizontal
gene transfer. An example of this is the acquisition of complement
escape regulators by pathogenic bacteria like Echinococcus
granulosus (6) and Onchocerca volvulus (7). The second mechanism
is where both host and pathogen genes evolved independently and ended up
having similar structures with different function i.e. underwent
convergent evolution(8). A well-known example of this is theYersinia pseudotuberculosis effector protein, invasin, that
structurally mimics the integrin-binding surface of the protein
fibronectin (9). While Horizontal gene transfer leads to a detectable
homology between the pathogen and host proteins (10, 11), convergent
evolution is likely to modulate local similarity between the proteins of
pathogen and host as depicted by sharing of motifs (12). The local
similarities between epitopes from the pathogen/infectious agents and
antigens present in the host can also lead to autoimmune diseases
(13-17).
Molecular mimicry can operate at four distinct levels; (i) Similarity in
both sequence and structure of a full-length protein or a functional
domain as displayed by molecular mimicry between Legionella
pneumophila , Chlamydia trachomatis and Burkholderia
thailandensis SET-domain containing proteins with host proteins (18),
(ii) only the structural similarity without an apparent sequence
similarity as detected in case of several bacterial and viral pathogens
that eventually evolved to structurally mimic host ligands, though the
sequence similarity between pathogen molecules and the mimicked host
ligands was low (19), (iii) similarity in the sequence of a short linear
motif. An example of motif mimicry is displayed by the WxxxE motif in
many bacterial Guanine Exchange Factors, such as EspM2 and Map in E.
coli and also SifA of Salmonella (20, 21). Motifs have the ability to
tolerate mutations and can evolve rapidly to alter interactions with the
host (22), (iv) Similarity of only the binding site architectures
(interface mimicry) without sequence homology is displayed by human
fibronectin and Y. pseudotuberculosis invasin binding to human
integrin (9, 11). These proteins display similarity in the chemical
properties at the binding site in the absence of sequence and structural
homology.
The existing methods of detection of mimicry are simply based on
identifying sequence or structure similarity. A previously available
database, namely mimicDB (8) provides information about molecular
mimicry proteins or epitopes involved in a limited number of human
parasites. Another database miPepBase (23) lists the experimentally
verified mimicry peptides involved in auto-immune disease. However, a
wide range of domains and motifs are recruited by pathogens to mimic and
hijack the host cell machinery for its survival (20, 21, 24-26). A
computational pipeline using pBLAST against the human proteome has also
been implemented for the prediction of the molecular mimicry candidates
in bacterial pathogens (27). However, sequence-based methods for
discovery of protein mimics may not be adequate as they are dependent on
the level of recognizable homology between the host and pathogen
proteins. Structure-based methods are more suitable for recognizing
remote similarity while motif-based methods are suitable for recognizing
localized regions of similarity between proteins. Pathogenic bacteria
are likely to target the host proteins by imperfectly mimicking the host
interface (28). An interface mimicry-based method, the HMI-PRED server
(29) carries out structural prediction of given HP-PPIs. However, it is
limited due to the requirement of the structure of the microbial protein
involved in mimicry.
Similarity between motifs and domains of the host and pathogen proteins
does not necessarily indicate their actual interaction. This is further
dependent on the proteins having simultaneous expression and being
present in the same cellular compartment. However, analysis of the PPIs
in yeast and human showed that a large majority of the interactions
occur between proteins in the same subcellular compartment (30, 31).
Studies have also shown that functionally related or interacting
proteins from the same pathways share Gene Ontology, and also usually
constitute a higher co-expression score (32, 33). Also, imitation of
host proteins by the pathogen essentially works by imitation and
competing with endogenous (host–host) interactions(34, 35). We
therefore hypothesize that resemblance between the experimentally
validated host and pathogen interactors of the same host protein
increases the confidence in the identification of molecular mimicry
candidates due to colocalization and co-expression of the interacting
protein pairs. This is shown schematically in Figure 1a for global
structural similarity (domain linear pair or DLP) and Figure 1b for
local sequence similarity (motif linear pair or MLP). Delineating the
DLPs and MLPs also provides information about the host interactions that
are likely to be disrupted by pathogen protein mimicry.
In this work, we collated the entire set of experimental HP-PPIs from
interaction databases in order to compute their DLPs and MLPs, which
were organized in the form of a publicly available database, ImitateDB
available online at http://imitatedb.sblab-nsit.net. The ImitateDB
resource can help researchers to search for organism-wise mimicry
patterns prominent in the host pathogen interactome. It houses 2,06,449
DLPs and 38,45,643 MLPs. Out of the total 61,215 HP-PPIs collated, 1,549
and 49,266 were found to be characterized by imitated domains and
motifs. Several novel potential domain mimics include SANT (Swi3, Ada2,
N-Cor, and TFIIIB) DNA binding domain, Tudor and PhoX homology domain
while some of the novel motif mimics identified are Microbodies
C-terminal targeting signal, Ubiquitin-interacting motif and Lipocalin
signature. Specific domains or motifs imitated commonly by a large
number of pathogens are likely to be responsible for microbial virulence
suitable for drug/vaccine targeting. Thus, ImitateDB constitutes a
source of information for molecular imitation in HP-PPIs for researchers
in the field of infectious diseases and microbiology.