Domain mimicry
Out of the 5,569 pathogens proteins from 630 pathogens, 607 proteins
from 146 pathogens made DLPs with the host interactor proteins as
indicated in the schematic Figure 1. Since there were multiple instances
of every mimicked domain, we looked for unique domain types. There
existed 3040 types of unique cdd domains shared by both pathogens and
host. The largest number of DLPs were found for the Serine Threonine
Protein Kinase US3 (UniProt ID: P04413) from Human Herpesvirus 1
Strain 17 (HHV-1) with 61,609 DLPs. The top 10 pathogens involved in
molecular mimicry along with the number of DLPs are shown in Table 2.
Two viral pathogens with the maximum number of DLPs were HHV-1and Rous sarcoma virus strain Schmidt-Ruppin A . In case of
bacteria, Legionella pneumophila subsp. pneumophila (strain
Philadelphia 1 / ATCC 33152 / DSM 7513) was found to have the largest
number and widest diversity of host-like domains (Table 2). This
opportunistic human bacterial pathogen has previously been reported to
be highly involved in molecular mimicry of host proteins (24, 55).
The top 10 most frequently observed mimicked domains are shown in Figure
4a. PHA03247 (large tegument protein UL36) was the most frequent among
DLPs. UL36 is an important domain family of tegument protein of Herpes
Simplex Virus (HSV) that is crucial for virus host interaction and host
immune evasion (56). UL36 is found to be colocalized with host and viral
membrane proteins and aid in the assembly and cell entry of HSV(57). The
top 10 most frequently occurring mimicked domains in different organism
categories are shown in Table 3. A conserved domain family found to be
potentially mimicked by viruses was DEAD-like helicases domain
superfamily. The DEAD-box helicases bear a common D-E-A-D motif and is
an emerging class of host proteins being mimicked by viruses for
infections (58). Bacterial, viral and fungal conserved domains found in
most frequently in DLPs were Rad50 ATPase and SbcC. Rad50 ATPase and
SbcC are both involved in DNA repair pathways and are highly conserved
among eukaryotes (humans and fungi), bacteria and viruses as well (59,
60).This way, the pathogens seem to have captured DNA repair proteins
from their hosts to aid their own replication and survival by disrupting
the host DNA repair pathways (61, 62).
Another important mimicked domain found in our data is Glycogen Synthase
Kinase-3 (GSK-3) domain. Bacterial pathogen such as Helicobacter
pylori has been found to divert the host signalling pathways such as
WNT signalling by targeting the host GSK-3(61).
The predominantly occurring fungal pathogen found to mimic the largest
number of host-like domains was found to be Saccharomyces
cerevisiae S288C . In case of Others category, Dictyostelium
discoideum is the predominant pathogen imitating the maximum number of
domains. The pathogens with the highest number of DLPs and MLPs in
different pathogen categories, i.e., virus, bacteria, fungi, and others
are listed in Supplementary data Tables S1, S2, S3 and S4 respectively.