Abstract
Significant populations in tropical and sub-tropical locations all over the world are severely impacted by a group of neglected tropical diseases called leishmaniasis. This disease is caused by roughly 20 species of the protozoan parasite from the Leishmania genus. Disease prevention strategies that include early detection, vector control, treatment of affected individuals, and vaccination are all essential. The diagnosis is critical for selecting methods of therapy, preventing transmission of the disease, and minimizing symptoms so that the affected individual can have a better quality of life. Nevertheless, the diagnostic methods do eventually have limitations, and there is no established gold standard. Some disadvantages include the existence of cross-reactions with other species, limited sensitivity, and specificity, which are mostly determined by the type of antigen used to perform the tests. A viable alternative for a more precise diagnosis is the application of recombinant antigens, which have been generated using bioinformatics approaches and have shown increased diagnostic accuracy. As a result, identifying potential new antigens using bioinformatics resources becomes an effective technique, since it may result in an earlier and more accurate diagnosis. The purpose of this review is to evaluate the efficacy of in silico approaches for selecting recombinant antigens for leishmaniasis diagnosis.
INTRODUCTION
The Leishmaniases are a class of parasitic, non-contagious infections which are part of a diverse group of conditions considered as Neglected Tropical Diseases. These diseases are widespread over the world, although most of the occurrences are reported in Africa, Asia, and the Americas1. According to the World Health Organization (WHO), currently, more than 1 billion people reside where the disease is prevalent, which represents a serious public health issue.
Leishmaniasis has three main clinical forms2. The most lethal form of leishmaniasis, known as visceral leishmaniasis (VL or Kala-sar), is characterized by systemic infections that can affect the liver and spleen, among other organs. Approximately 30,000 cases are expected to occur per year1. While cutaneous leishmaniasis (CL), the most prevalent form, is recognized by the presence of skin lesions and has been estimated to impact approximately one million individuals annually2,3. However, if not appropriately treated, it can develop into a third and more severe form, known as mucocutaneous leishmaniasis (MCL), characterized by nasal ulceration and mucosal infiltration4. These clinical manifestations vary depending on multiple factors, such as the host’s immune system, nutritional state, genetic background, the environment, and the parasite species associated with the infection5.
Approximately 20 species with the potential to infect humans have been described. As a setback in the field, there is a lack of immunological data due to the number of clinical manifestations and infective species, making it challenging to comprehend the different types of immune responses6. However, it has been reported that upon the parasite’s entry, innate immune cells are recruited, and different regulatory, susceptibility, and/or resistance mechanisms are triggered, leading to a complex immune response that is characterized by both cell-mediated responses and the production of antibodies7.
An early diagnosis, vector control, treatment of infected individuals, and vaccination are important disease control strategies8. The diagnosis is crucial to designate specific treatment schemes, prevent disease progression, and alleviate symptoms, allowing the affected individual to have a better quality of life9. Some limitations of the serological diagnosis techniques include the presence of cross-reactions with other species, low sensitivity, and specificity, which are primarily determined by the type of antigen used in the assays10.
Considering the challenge of selecting the most suitable antigen, techniques that support the identification and selection of immunogenic molecules have been demonstrated to be potential alternatives. The principle of reverse vaccinology, proposed Pizza11, established the idea and concept of employing computational methods, primarily for anticipating the selection of potential molecules for use in immunological investigations such as vaccinations and diagnostic testing. As a result, recombinant antigens, which are designed using bioinformatics tools and have demonstrated improved diagnostic accuracy, are a promising alternative for a more accurate diagnosis of the disease12.
Several studies on Leishmaniasis diagnosis have explored bioinformatics tools extensively, where they were employed to search for recombinant proteins and synthetic peptides13,14. These in silico approaches are based on the prediction of potential antigenic/immunogenic epitopes. This strategy’s applicability, in addition of being considered straightforward, allows for cost reductions on culture maintenance and decreases variations in sensitivity and specificity found in conventional serological methodologies14. As a result, identifying potential new recombinant antigens using bioinformatics resources becomes a reliable strategy, as it can lead to an earlier and more accurate diagnosis. This review aims to determine the efficiency of in silico methods in selecting recombinant antigens for the diagnosis of leishmaniasis.
The complexity of anti-Leishmania immune responses
When the vector—female Phlebotomus and Lutzomyia spp., in the Old World and New World, respectively—introduces the promastigote form of the parasite into the host’s bloodstream during their blood meal, the immune response begins2. Immediately, neutrophils and macrophages, key players of the innate immune response in pathogen defense, are recruited. These cells play a dual role as they can be associated with both parasite elimination, and pathogenesis15. Macrophages have important fagocytic and antimicrobial functions against Leishmania . These cells have the ability to either directly destroy the parasite or act as a location for Leishmania replication16. However, the parasite can modulate the complement via virulence factors, allowing it to enter other phagocytic cells17.
Despite the fact that the disease’s immunological mechanisms are quite complex and variable, it was possible to observe that the protective response to the infection is mainly mediated by T cells18. The activation of naive T cells can be explored for modeling the immune environment derived from the antigens presented. The participation of cytokines such as IFN- and TNF- in the response performed by TCD4+ cells favor the development of a type 1 response (Th1) that is more directed toward the resolution of the infection. While anti-inflammatory cytokines like IL-4, IL-13, and TGF- are produced during the type 2 response (Th2), which favors parasite growth. However, the TH1/TH2 paradigm is not well established in humans19. Establishing an immunological pattern for this population is challenging due to the complex immune response that humans developed as well as the disease’s wide clinical spectrum11. CD8+ T cells appear to respond differently depending on the form of Leishmaniasis20. These cells can modulate immunopathology and promote the development of lesions in CL brought on by L. braziliensis 20. In contrast, it was found that in VL caused by L. donovani andL. infantum species, CD8+ T cells revealed a protective role through the formation of effective granulomas, crucial for parasite eradication in both murine and human models21.
While some studies suggest the B cells contributes to the aggravation of the disease, others argue it supports in the healing of the infection. Additionally, research has shown a correlation between the parasite load, the chronicity of the infection and the intensity of the humoral response. High levels of antibodies were found to be associated with an increased disease severity according to studies examining the humoral response produced in mice infected with three different species ofLeishmania , which were related to the cutaneous form of the disease22,23. In contrast, antibodies appear to play a protective role in VL endemic areas, as demonstrated by the high prevalence of healthy seropositive individuals24.
The implications of early detection in Leishmaniasis diagnosis
Early disease detection, chemotherapy, vector control, and a potential vaccine represent the most effective strategies for controlling Leishmaniasis in its forms and clinical manifestations8. Diagnosing a person in early stages in a simple, fast, and effective manner is critical in determining a better prognosis. Currently, Leishmaniasis is diagnosed by combining several factors, such as clinical characteristics presented by the patient, epidemiological and laboratory data10.
Different laboratory tests, including serological, parasitological, and molecular methods, have been developed to diagnose Leishmaniasis25. Despite this multitude of tests, defining one as ideal for diagnosing this disease remains difficult. One of the major challenges is the wide clinical spectrum of cutaneous lesions, which can be easily misdiagnosed during clinical evaluation with other similar diseases9. The accuracy of the tests is affected by the variety of Leishmania species. Another factor is the occurrence of asymptomatic cases, and even co-infection with the human immunodeficiency virus (HIV)7.
In this scenario, serological tests are widely used in the routine to identify parasite antigens and/or anti-Leishmania antibodies in samples from the infected individual26. However, the effectiveness of these techniques varies, and is directly associated with the type of antigen used, the species present during the infection. Therefore, the variety of clinical manifestations of affected individuals27.
Recent studies have demonstrated the variability of results obtained using soluble Leishmania antigen (SLA) of various species ofLeishmania , demonstrating that the nature of the antigen influences the results. When the SLA of Leishmania infantum was tested, the sensitivity and specificity ranged from 0 to 96.7% and 63 to 100%, respectively28. Lower performance was observed with antigens from Leishmania major or Leishmania braziliensis , with sensitivity ranging from 1 to 87.5% and specificity ranging from 21.3 to 100%29. Furthermore, these tests continue to fail to detect asymptomatic patients, as well as patients in the early stages of infection and with low antibody titers28.
On the other hand, the combination of molecular biology methodologies and in silico approaches has resulted in a powerful new strategy for improving the performance of conventional diagnostic methods. The use of these new methodologies has enabled the investigation and selection of a new class of molecularly defined antigens, resulting in the discovery of new molecules of various types (recombinant proteins, chimeric proteins, and peptides) as potential candidates for serological diagnosis29,30. Several studies have attempted to improve conventional methods, such as ELISA, by incorporating these new molecules, with satisfactory sensitivity and specificity results (TABLE 1).
The prospect of using an antigen capable of monitoring antibody titers in VL is critical due to the long-term persistence of anti-Leishmania antibodies even after treatment30. Therefore, the potential for choosing more suitable antigens for the clinical form and the species involved is a potent strategy, as they are essential elements for determining the disease in patients and, as a result, its treatment.
Immunoinformaticsin silico approaches for the selection of antigens for diagnosis
One of the challenges in the discovery of new immuno-diagnostic reagents is the identification of the antigenic region capable of activating the immune system. In this sense, computational methods for antigenic epitope prediction may provide crucial means to serve this purpose.31 B-cell antigenic epitopes are classified as either continuous (Linear B-cell epitopes), consisting of a consecutive fragment of amino acids from the protein sequence, or discontinuous (conformational B-cell epitopes), which consists of atoms from surface residues of the protein that are brought together by the folding of the polypeptide chain.
Hopp and Wood (1981, 1983) 32,33 developed the first linear epitope prediction method. The authors assigned to each amino acid, in a sequence, the hydrophilicity scale on the assumption that hydrophilic regions are predominantly located at the protein surface and are potentially antigenic. This approach is part of the propensity scale methods and, thus, based on the observation of physicochemical properties of amino acids, and the antigenic determinants in protein sequences to identify the location of the linear B-cell epitopes in the query protein sequence. 34
The BEPITOPE tool 35 instead of relying on individual attributes for propensity measurements, this tool utilizes combinations of physical and chemical parameters to predict linear B-cell epitopes. BEPITOPE tool was designed to predict continuous protein epitopes and look for patterns in either a single protein, or in the entire translated genome. In addition to computing, combining, displaying, and printing prediction profiles, the tool also offers a list of potential linear peptides that could be synthesized and tested. BcePred36 developers stated that both BEPITOPE and BcePred work similarly. The BcePred prediction’s accuracy has been measured using a database containing 1029 unique experimentally proven epitopes and 1029 random peptides, yielding a precision that varies from 52.92% to 57.53%, depending on the properties used, while being capable of achieving the highest accuracy of 58.70% when combining four amino acid properties (hydrophilicity, flexibility, polarity, and exposed surface).36
Besides the propensity scale methods, a new approach represents an innovation in the field of epitope prediction, although may present a low performance: the amino acid scale method. Due to its low performance, the use of machine learning (ML) has been introduced on these methods. The first server developed based on recurrent neural network was the ABCpred server. 37 This server predicts B cell epitope(s) in an antigen sequence by using artificial neural network. Users can select window length of 10, 12, 14, 16 and 20 (upper limit) as predicted epitope length, when epitope length is less than 20 amino acids, then the program will complete the “missing” amino acids using the original antigenic sequence. The dataset used for training and testing of ABCpred server, consisting of 700 B-cell epitopes and 700 non-B-cell epitopes (random peptides), achieved an accuracy of 65.93% using recurrent neural network.
Other tools may use a combination of ML algorithms, such as LBtope38 which was developed using Support Vector Machine (SVM) and IBk, for example, using a large dataset of B-cell epitopes and non-epitopes, totalizing 12,063 epitopes and 20,589 non epitopes, both obtained from IEDB database (https://www.iedb.org/). It is important to emphasize that this was the first time experimentally validated non-B-cell epitopes were used for developing a prediction tool, achieving accuracy that varies from approximately 54% to 86%, using diverse features like binary profile, dipeptide composition and AAP (amino acid pair) profile. ABCpred and LBtope methods consist of artificial neural networks (ANNs) trained on similar positive data, B-cell epitopes, but differ on negative data, the non-B-cell epitopes. The negative data for ABCpred consists of the use of random peptides, which possibly may contain non validated B-cell epitopes, while the negative data used for LBtope consists of experimentally validated and, thus, confirmed non-B-cell epitopes from IEDB. The scores are scaled from 20% to 100%; the default score is 60%, with an accuracy of approximately 80%.
In a similar way to LBtope, the SVMtrip 31 also uses the SVM machine learning approach, contrasting by the fact that SVMtrip combines tripeptide similarity and propensity scores for prediction of linear B-cell epitopes in standalone software, and in a web server. The prediction performance show that SVMTriP achieves a sensitivity of 80.1% and a precision of 55.2%, based on the size of epitopes, being 20 amino acids length the optimal and default setting. Regarding the ROC curves, SVMTriP (AUC = 0.702) presented a significantly larger true positive. The combination of similarity and propensity of tripeptide subsequences can improve the prediction performance for linear B-cell epitopes. Similarly, BepiPred 2.0 39 also offers both a standalone software and a web server for linear B-cell epitope prediction and is based on a Random Forest algorithm trained with epitopes annotated from Antigen-Antibody (Ag-Ab) protein structures. A dataset of 649 Antigen-Antibody crystal structures was used, considering all non-antibody protein chains having atoms within 4Å radius of their respective antibody’s Complementary Determining Region (CDR). After removing the complexes with similar antigen sequence (> 70% identical), the total number of structures was reduced to 160, on which 5 randomly selected structures were selected among the final evaluation set, while the remaining 155 structures were distributed on five groups for cross-validation and algorithm’s training. When compared to other prediction tools, BepiPred 2.0 presented the highest AUC value (0.62), followed by BepiPred 1.0 (0.57) and LBtope (0.54).39
Due to the need of the three-dimensional (3D) structures of antigenic proteins required to predict Conformational B-cell epitopes, the development of reliable discontinuous epitope prediction method has lagged that of linear B-cell epitopes. Additionally, it is a difficult task to isolate conformational B-cell epitopes from their protein context, for selective antibody production, when compared to the linear B-cells epitopes. 40 The Conformational Epitope Prediction (CEP) server 41 uses a prediction method that, when tested using X-ray crystal structures of Ag-Ab complexes available at Protein Data Bank (PDB), accurately predicts conformational epitopes, antigenic determinants, and sequential epitopes with an accuracy of 75%. This tool is a step toward the new paradigm of “binding-determines function”, that will aid the development of assays to map the residues implicated in the Ag-Ab contact.
The DiscoTope 42 is a tool capable of detecting 15.5% of residues located in discontinuous epitopes with a specificity of 95%. DiscoTope combines the propensity scale matrices, spatial proximity, and surface exposure, for the first time. This tool uses informations like amino acid statistics, spatial information and surface accessibility, which have been gathered on a data set based on discontinuous epitopes established by X-ray crystallography of Ag-Ab protein complexes.
SEPPA 43 server combine single physicochemical properties of amino acids with geometrical structural properties. SEPPA (Spatial Epitope Prediction of Protein Antigens) introduced a novel concept of ’unit patch of residue triangle’. SEPPA is now on version 3.0, enabling glycoprotein antigens. When tested with independent glycoprotein antigens only, SEPPA 3.0 gave an AUC of 0.749 and BA of 0.665, leading the top performance among peers.
The EPITOPIA server 44 implements a machine-learning based algorithm which can handle both 3D structures, and sequence inputs to predict immunogenic regions as candidate B-cell epitopes. This approach uses a naive Bayesian classifier on forty-four physico-chemical and structural–geometrical attributes, including secondary structure, propensity, conservation, solvent accessible surface, and hydrophilicity. When compared with ABCpred 37 which also have machine-learning algorithms and were trained on a very similar data set, EPITOPIA 44 presented a better performance, yielding a success rate of 80.4% (mean AUC of 0.59), while ABCpred yielded a success rate of 67% (mean AUC of 0.55). When compared to other methodologies like DiscoTope 42, EPITOPIA44 presented a success rate of 89.4% against 81.8% DiscoTope. Although CEP does not individually score amino acids, this sever achieved a mean of 0.53 AUC, which was the lowest performance among the compared servers, with AUC results of EPITOPIA (0.6) and DiscoTope (0.62).
In a study by Arab-Mazar et al. (2022) 45, immunogenic B-cell epitopes were identified based on the amino acid sequences of the GP63, LACK, and TSA proteins of L. major, using ABCpred and Bepipred Linear Epitope Prediction. The results showed L. major’sintegrated recombinant GP63, LACK, and TSA multiepitope antigens could be important components for constructing a viable diagnostic ELISA sandwich test for Cutaneous Leishmaniasis antigen detection. Menezes-Souza et al. (2015) 12 demonstrated that rLbMAPK3 and rLbMAPK4.1 might be one of the target molecules for human and canine leishmaniasis immunodiagnostics, using immunoinformatics tools including BepiPred program which was used to identify Linear B-Cell epitopes. Assis et al. (2014) 46 identified 148 linear epitopes using BepiPred and BcePred, from the calpain-like cysteine peptidase (CP), thiol-dependent reductase 1 (TDR1) and HSP70 proteins of L. infantum. It was the first study using a combination of several in silico epitope prediction approaches, as well as an assessment of secondary structures for the discovery of Leishmania epitopes.
Despite the efforts of developing new epitopes prediction algorithms, this research area in bioinformatics still lacks softwares and servers which can make use of properties that are universally observed for the antigenic epitopes, but not for other protein surfaces during the predictions.
Immunoinformatics in the selection of antibodies for diagnosis
The Structural Antibody Database (SAbDab) is a web tool database of antibody structures which has over 6,000 antibody structures.47 The annotations include experimental information, gene details, accurate heavy and light chain pairings, antigen details, and in some cases, also include antibody-antigen binding affinity.
IMGT/mAb-DB is a monoclonal antibody database, part of IMGT®, the international ImMunoGeneTics information system®, which is the standard reference for immunogenetics and immunoinformatics. IMGT/mAb-DB is a one-of-a-kind specialist resource for immunoglobulins (IG) or monoclonal antibodies (mAb) with therapeutic indications, as well as fusion proteins for immunological applications (FPIA).48 The server database contains 1,261 entries, being 1,091 structures of immunoglobulin.
Immunoinformatics and docking analyses for diagnostic tools
Tools that use antibody-specific decoy generation and scoring methods perform better when compared with the general methods (protein-protein docking). 49 ClusPro,50FRODOCK,51 PatchDock 52 and ZDOCK53 are examples of tools which include specific algorithms to perform antibody–antigen global docking and rigid-body approaches.
The ClusPro server 50, a widely used tool for protein–protein docking, do not consider possible conformational changes upon binding (rigid-body docking), and has an algorithm based on the Fast Fourier Transform (FFT). FRODOCK 51 also uses FFT correlation algorithms, with differences in spherical harmonic (SH) based rotational search, which has been proven to be a faster alternative in protein–protein docking. PatchDock 52is a geometry-based molecular docking algorithm that combines geometric hashing and pose clustering to find interactions between antibody–antigen complexes. Its high efficiency could be explained due to the fast transformational search based on local feature matching, avoiding exhaustive orientation search. ZDOCK 53 is a rigid protein docking program which performs a thorough search for probable binding modes of two component proteins, using FFT. This tool searches through each conceivable posture in the translation and rotation spaces of the two proteins. The scoring function, which calculates potential energy, spatial complementarity, and electric field force, is an energy-based scoring function.
SnugDock 54 and HADDOCK 55 are tools that can perform flexible docking. The SnugDock is a Rosetta protocol (some of these protocols are fully automated via the ROSIE web server, rosie.rosettacommons.org) tailored to perform antibody-antigen docking. SnugDock’s local search algorithm models the CDR loops and the VH-VL orientation in the context of the antibody-antigen contact. When the crystal structure of the antibody is unavailable, this tool may predict high-resolution antibody-antigen complex structures, which is particularly helpful. The relative orientation of the antibody light and heavy chains, the conformations of the six complementarity determination region loops, and the placements of the antibody and antigen rigid bodies can all be optimized simultaneously using this method. On the other hand, the HADDOCK (High Ambiguity Driven protein-protein DOCKing) server 56 allows its users to perform protein-protein docking, considering the flexibility in the side chains and backbones, in order to consider conformational rearrangements in the interaction surface. This tool combines a global rigid body search with ambiguous restraints, simulated annealing in torsion space, and minimization in Cartesian space.
Jeliazkov et al. (2021) 56 performed a comparative study of different docking tools, having specific options for antibody-antigen modeling, on sixteen target complexes. HADDOCK achieved 75% success rate (according to the CAPRI quality criterion, having a model of acceptable quality or better in the top ten57 followed by ClusPro (67.8%) and ZDOCK (56.3%). In another recent assessment, with 67 target complexes, Guest et al. (2021)58 compared ClusPro and ZDOCK. showing that ClusPro achieved a success rate on the benchmark of 34%, although ZDOCK produced more medium accuracy or higher models (22% success, versus 16% of ClusPro).
Bioinformatics is a valuable tool for identifying new proteins and antigens that can be used as targets for the diagnosis and treatment of infectious diseases. However, most studies on leishmaniasis focus on identifying new drugs and vaccines. Although there are studies that use docking tools to identify potential Leishmania antigens, there is a lack of studies reporting the use of these approaches for the identification of new diagnostic methods. Thus, studies in this area may be more challenging and may require greater experimental effort to validate docking results. Our research group has been applying the mentionedin silico tools, as evidenced by our publications in the field of immunoinformatics. 59