Motif mimicry
Out of the 5,569 pathogen proteins from 630 pathogens, 5,255 proteins from 610 pathogens made MLPs with the host interactor proteins as indicated in the schematic Figure 1. However, only 239 unique motifs were found to be mimicked by pathogens. Since each pathogen can mimic motifs from multiple interactors, the largest number of MLPs were found for the Polymerase basic protein 2 from Influenza A virus strain A/Wilson-Smith/1933 H1N1 (A/Wilson-Smith/33/H1N1), with 35,385 MLPs. The average number of MLPs for a protein is 732.Amongst viral pathogens, A/Wilson-Smith/33/H1N1 had the maximum MLPs whereas in the bacterial interactome, Yersinia pestis had the maximum MLPs. The Top 10 pathogens by the count of MLPs are listed in Table 4.
Table 2 and 4 showed that S. cerevisiae S288c had the maximum count of DLPs and MLPs even while the total number of reported HP-PPIs were very low in comparison with virus or bacteria. This can be attributed to the fact that yeasts, being eukaryotes are quite similar to humans in terms of genes and other cellular pathways. It has been observed that the genes that regulate cellular processes in humans have equivalents that control cell division in yeasts as well which makes it very easy for pathogenic yeast species to alter the host cellular machinery (63). Therefore, this study has unravelled the potential mimicry candidates in fungal pathogens which was not well established till now.
The total count for the top 10 most frequently occurring motifs in the database is shown in Figure 4b. The predominance of phosphorylation sites for Protein kinase C (PKC) phosphorylation site and casein kinase II (CK2) phosphorylation site can be observed from the figure. PKC and CK2 family of serine/threonine kinases plays essential roles in hijacking multiple signalling pathways in humans leading to many viral infections (64). Tyrosine phosphorylation has been proved to be an important process for pathogenesis as well as immune responses after the underlying revelation of a bacterial tyrosine phosphatase (65). There have been instances where both extracellular as well as intracellular bacteria secreted several proteins that mimicked the function of their analogous eukaryotic like proteins and hijacked the tyrosine phosphorylation pathway (66). Additionally, sites for N-myristoylation, Amidation site, and N-glycosylation could be seen in all the organism categories. Several instances have showed the contribution of post translational modification (PTM) sites in microbial infection and cellular processes (67, 68).
The top 10 most frequent motifs in every pathogen category are listed in Table 5. N- glycosylation was a frequently occurring motif known to be an important modification used by several pathogen proteins (specifically viral glycoproteins) to evade the human immune system (69, 70). The envelope proteins of viruses like HIV-1 are heavily glycosylated and can provide camouflage against the human proteins, leading to alteration of immune recognition (71, 72). Protein N-myristoylation site is another conserved PTM of proteins involved in a variety of different physiological processes like cell proliferation and differentiation, cell survival, and cell death(73). Also, several myristoylated proteins have been found to have prominent roles in cellular signalling pathways (74) and the myristoylation motif has been found to be mimicked by viral and bacterial proteins (25, 75).
Additionally, several other commonly mimicked motifs in our data were ABC transporters family signature motif, Q motif, ATP/GTP-binding site motif A (P-loop), arginine-rich motif, ubiquitination site and prenyl group binding site. The ABC transporters family signature motif is a conserved sequence (LSGGQ) present in the Nucleotide binding domain (NBD) of all ABC transporters and is primarily required for substrate transport (76). The pathogens can mimic this motif to disturb the transportation pathways of the host. Q motif is a part of conserved helicases (involved in DNA dynamics) (77) and might help the pathogens to hijack the host machinery associated with DNA replication, recombination, transcription, and repair. The highlight table depicting the number of MLPs characterized by top 20 mimicked motifs for the top 20 pathogens is shown as Supplementary Table S5