Abstract
Significant populations in tropical and sub-tropical locations all over
the world are severely impacted by a group of neglected tropical
diseases called leishmaniasis. This disease is caused by roughly 20
species of the protozoan parasite from the Leishmania genus.
Disease prevention strategies that include early detection, vector
control, treatment of affected individuals, and vaccination are all
essential. The diagnosis is critical for selecting methods of therapy,
preventing transmission of the disease, and minimizing symptoms so that
the affected individual can have a better quality of life. Nevertheless,
the diagnostic methods do eventually have limitations, and there is no
established gold standard. Some disadvantages include the existence of
cross-reactions with other species, limited sensitivity, and
specificity, which are mostly determined by the type of antigen used to
perform the tests. A viable alternative for a more precise diagnosis is
the application of recombinant antigens, which have been generated using
bioinformatics approaches and have shown increased diagnostic accuracy.
As a result, identifying potential new antigens using bioinformatics
resources becomes an effective technique, since it may result in an
earlier and more accurate diagnosis. The purpose of this review is to
evaluate the efficacy of in silico approaches for selecting
recombinant antigens for leishmaniasis diagnosis.
INTRODUCTION
The Leishmaniases are a class of parasitic, non-contagious infections
which are part of a diverse group of conditions considered as Neglected
Tropical Diseases. These diseases are widespread over the world,
although most of the occurrences are reported in Africa, Asia, and the
Americas1. According to the World Health Organization
(WHO), currently, more than 1 billion people reside where the disease is
prevalent, which represents a serious public health issue.
Leishmaniasis has three main clinical forms2. The most
lethal form of leishmaniasis, known as visceral leishmaniasis (VL or
Kala-sar), is characterized by systemic infections that can affect the
liver and spleen, among other organs. Approximately 30,000 cases are
expected to occur per year1. While cutaneous
leishmaniasis (CL), the most prevalent form, is recognized by the
presence of skin lesions and has been estimated to impact approximately
one million individuals annually2,3. However, if not
appropriately treated, it can develop into a third and more severe form,
known as mucocutaneous leishmaniasis (MCL), characterized by nasal
ulceration and mucosal infiltration4. These clinical
manifestations vary depending on multiple factors, such as the host’s
immune system, nutritional state, genetic background, the environment,
and the parasite species associated with the
infection5.
Approximately 20 species with the potential to infect humans have been
described. As a setback in the field, there is a lack of immunological
data due to the number of clinical manifestations and infective species,
making it challenging to comprehend the different types of immune
responses6. However, it has been reported that upon
the parasite’s entry, innate immune cells are recruited, and different
regulatory, susceptibility, and/or resistance mechanisms are triggered,
leading to a complex immune response that is characterized by both
cell-mediated responses and the production of
antibodies7.
An early diagnosis, vector control, treatment of infected individuals,
and vaccination are important disease control
strategies8. The diagnosis is crucial to designate
specific treatment schemes, prevent disease progression, and alleviate
symptoms, allowing the affected individual to have a better quality of
life9. Some limitations of the serological diagnosis
techniques include the presence of cross-reactions with other species,
low sensitivity, and specificity, which are primarily determined by the
type of antigen used in the assays10.
Considering the challenge of selecting the most suitable antigen,
techniques that support the identification and selection of immunogenic
molecules have been demonstrated to be potential alternatives. The
principle of reverse vaccinology, proposed Pizza11,
established the idea and concept of employing computational
methods, primarily for anticipating the selection of potential molecules
for use in immunological investigations such as vaccinations and
diagnostic testing. As a result, recombinant antigens, which are
designed using bioinformatics tools and have demonstrated improved
diagnostic accuracy, are a promising alternative for a more accurate
diagnosis of the disease12.
Several studies on Leishmaniasis diagnosis have explored bioinformatics
tools extensively, where they were employed to search for recombinant
proteins and synthetic peptides13,14. These in
silico approaches are based on the prediction of potential
antigenic/immunogenic epitopes. This strategy’s applicability, in
addition of being considered straightforward, allows for cost reductions
on culture maintenance and decreases variations in sensitivity and
specificity found in conventional serological
methodologies14. As a result, identifying potential
new recombinant antigens using bioinformatics resources becomes a
reliable strategy, as it can lead to an earlier and more accurate
diagnosis. This review aims to determine the efficiency of in
silico methods in selecting recombinant antigens for the diagnosis of
leishmaniasis.
The complexity of anti-Leishmania immune responses
When the vector—female Phlebotomus and Lutzomyia spp., in the Old
World and New World, respectively—introduces the promastigote form of
the parasite into the host’s bloodstream during their blood meal, the
immune response begins2. Immediately, neutrophils and
macrophages, key players of the innate immune response in pathogen
defense, are recruited. These cells play a dual role as they can be
associated with both parasite elimination, and
pathogenesis15. Macrophages have important fagocytic
and antimicrobial functions against Leishmania . These cells have
the ability to either directly destroy the parasite or act as a location
for Leishmania replication16. However, the parasite
can modulate the complement via virulence factors, allowing it to enter
other phagocytic cells17.
Despite the fact that the disease’s immunological mechanisms are quite
complex and variable, it was possible to observe that the protective
response to the infection is mainly mediated by T
cells18. The activation of naive T cells can be
explored for modeling the immune environment derived from the antigens
presented. The participation of cytokines such as IFN- and TNF- in the
response performed by TCD4+ cells favor the development of a type 1
response (Th1) that is more directed toward the resolution of the
infection. While anti-inflammatory cytokines like IL-4, IL-13, and TGF-
are produced during the type 2 response (Th2), which favors parasite
growth. However, the TH1/TH2 paradigm is not well established in
humans19. Establishing an immunological pattern for
this population is challenging due to the complex immune response that
humans developed as well as the disease’s wide clinical
spectrum11. CD8+ T cells appear to respond differently
depending on the form of Leishmaniasis20. These cells
can modulate immunopathology and promote the development of lesions in
CL brought on by L. braziliensis 20. In
contrast, it was found that in VL caused by L. donovani andL. infantum species, CD8+ T cells revealed a protective role
through the formation of effective granulomas, crucial for parasite
eradication in both murine and human models21.
While some studies suggest the B cells contributes to the aggravation of
the disease, others argue it supports in the healing of the infection.
Additionally, research has shown a correlation between the parasite
load, the chronicity of the infection and the intensity of the humoral
response. High levels of antibodies were found to be associated with an
increased disease severity according to studies examining the humoral
response produced in mice infected with three different species ofLeishmania , which were related to the cutaneous form of the
disease22,23. In contrast, antibodies appear to play a
protective role in VL endemic areas, as demonstrated by the high
prevalence of healthy seropositive individuals24.
The implications of early detection in Leishmaniasis
diagnosis
Early disease detection, chemotherapy, vector control, and a potential
vaccine represent the most effective strategies for controlling
Leishmaniasis in its forms and clinical
manifestations8. Diagnosing a person in early stages
in a simple, fast, and effective manner is critical in determining a
better prognosis. Currently, Leishmaniasis is diagnosed by combining
several factors, such as clinical characteristics presented by the
patient, epidemiological and laboratory data10.
Different laboratory tests, including serological, parasitological, and
molecular methods, have been developed to diagnose
Leishmaniasis25. Despite this multitude of tests,
defining one as ideal for diagnosing this disease remains difficult. One
of the major challenges is the wide clinical spectrum of cutaneous
lesions, which can be easily misdiagnosed during clinical evaluation
with other similar diseases9. The accuracy of the
tests is affected by the variety of Leishmania species. Another
factor is the occurrence of asymptomatic cases, and even co-infection
with the human immunodeficiency virus (HIV)7.
In this scenario,
serological tests are widely used in the routine to identify parasite
antigens and/or anti-Leishmania antibodies in samples from the infected
individual26. However, the effectiveness of these
techniques varies, and is directly associated with the type of antigen
used, the species present during the infection. Therefore, the variety
of clinical manifestations of affected individuals27.
Recent studies have demonstrated the variability of results obtained
using soluble Leishmania antigen (SLA) of various species ofLeishmania , demonstrating that the nature of the antigen
influences the results. When the SLA of Leishmania infantum was
tested, the sensitivity and specificity ranged from 0 to 96.7% and 63
to 100%, respectively28. Lower performance was
observed with antigens from Leishmania major or Leishmania
braziliensis , with sensitivity ranging from 1 to 87.5% and specificity
ranging from 21.3 to 100%29. Furthermore, these tests
continue to fail to detect asymptomatic patients, as well as patients in
the early stages of infection and with low antibody
titers28.
On the other hand, the
combination of molecular biology methodologies and in silico approaches
has resulted in a powerful new strategy for improving the performance of
conventional diagnostic methods. The use of these new methodologies has
enabled the investigation and selection of a new class of molecularly
defined antigens, resulting in the discovery of new molecules of various
types (recombinant proteins, chimeric proteins, and peptides) as
potential candidates for serological diagnosis29,30.
Several studies have attempted to improve conventional methods, such as
ELISA, by incorporating these new molecules, with satisfactory
sensitivity and specificity results (TABLE 1).
The prospect of using an
antigen capable of monitoring antibody titers in VL is critical due to
the long-term persistence of anti-Leishmania antibodies even
after treatment30. Therefore, the potential for
choosing more suitable antigens for the clinical form and the species
involved is a potent strategy, as they are essential elements for
determining the disease in patients and, as a result, its treatment.
Immunoinformaticsin silico approaches for the selection of antigens for
diagnosis
One of the challenges in the discovery of new immuno-diagnostic reagents
is the identification of the antigenic region capable of activating the
immune system. In this sense, computational methods for antigenic
epitope prediction may provide crucial means to serve this
purpose.31 B-cell antigenic epitopes are classified as
either continuous (Linear B-cell epitopes), consisting of a consecutive
fragment of amino acids from the protein sequence, or discontinuous
(conformational B-cell epitopes), which consists of atoms from surface
residues of the protein that are brought together by the folding of the
polypeptide chain.
Hopp and Wood (1981, 1983) 32,33 developed the first
linear epitope prediction method. The authors assigned to each amino
acid, in a sequence, the hydrophilicity scale on the assumption that
hydrophilic regions are predominantly located at the protein surface and
are potentially antigenic. This approach is part of the propensity scale
methods and, thus, based on the observation of physicochemical
properties of amino acids, and the antigenic determinants in protein
sequences to identify the location of the linear B-cell epitopes in the
query protein sequence. 34
The BEPITOPE tool 35 instead of relying on individual
attributes for propensity measurements, this tool utilizes combinations
of physical and chemical parameters to predict linear B-cell epitopes.
BEPITOPE tool was designed to predict continuous protein epitopes and
look for patterns in either a single protein, or in the entire
translated genome. In addition to computing, combining, displaying, and
printing prediction profiles, the tool also offers a list of potential
linear peptides that could be synthesized and tested. BcePred36 developers stated that both BEPITOPE and BcePred
work similarly. The BcePred prediction’s accuracy has been measured
using a database containing 1029 unique experimentally proven epitopes
and 1029 random peptides, yielding a precision that varies from 52.92%
to 57.53%, depending on the properties used, while being capable of
achieving the highest accuracy of 58.70% when combining four amino acid
properties (hydrophilicity, flexibility, polarity, and exposed surface).36
Besides the propensity scale methods, a new approach represents an
innovation in the field of epitope prediction, although may present a
low performance: the amino acid scale method. Due to its low
performance, the use of machine learning (ML) has been introduced on
these methods. The first server developed based on recurrent neural
network was the ABCpred server. 37 This server
predicts B cell epitope(s) in an antigen sequence by using artificial
neural network. Users can select window length of 10, 12, 14, 16 and 20
(upper limit) as predicted epitope length, when epitope length is less
than 20 amino acids, then the program will complete the “missing”
amino acids using the original antigenic sequence. The dataset used for
training and testing of ABCpred server, consisting of 700 B-cell
epitopes and 700 non-B-cell epitopes (random peptides), achieved an
accuracy of 65.93% using recurrent neural network.
Other tools may use a combination of ML algorithms, such as LBtope38 which was developed using Support Vector Machine
(SVM) and IBk, for example, using a large dataset of B-cell epitopes and
non-epitopes, totalizing 12,063 epitopes and 20,589 non epitopes, both
obtained from IEDB database (https://www.iedb.org/). It is important to
emphasize that this was the first time experimentally validated
non-B-cell epitopes were used for developing a prediction tool,
achieving accuracy that varies from approximately 54% to 86%, using
diverse features like binary profile, dipeptide composition and AAP
(amino acid pair) profile. ABCpred and LBtope methods consist of
artificial neural networks (ANNs) trained on similar positive data,
B-cell epitopes, but differ on negative data, the non-B-cell epitopes.
The negative data for ABCpred consists of the use of random peptides,
which possibly may contain non validated B-cell epitopes, while the
negative data used for LBtope consists of experimentally validated and,
thus, confirmed non-B-cell epitopes from IEDB. The scores are scaled
from 20% to 100%; the default score is 60%, with an accuracy of
approximately 80%.
In a similar way to LBtope, the SVMtrip 31 also uses
the SVM machine learning approach, contrasting by the fact that SVMtrip
combines tripeptide similarity and propensity scores for prediction of
linear B-cell epitopes in standalone software, and in a web server. The
prediction performance show that SVMTriP achieves a sensitivity of
80.1% and a precision of 55.2%, based on the size of epitopes, being
20 amino acids length the optimal and default setting. Regarding the ROC
curves, SVMTriP (AUC = 0.702) presented a significantly larger true
positive. The combination of similarity and propensity of tripeptide
subsequences can improve the prediction performance for linear B-cell
epitopes. Similarly, BepiPred 2.0 39 also offers both
a standalone software and a web server for linear B-cell epitope
prediction and is based on a Random Forest algorithm trained with
epitopes annotated from Antigen-Antibody (Ag-Ab) protein structures. A
dataset of 649 Antigen-Antibody crystal structures was used, considering
all non-antibody protein chains having atoms within 4Å radius of their
respective antibody’s Complementary Determining Region (CDR). After
removing the complexes with similar antigen sequence (>
70% identical), the total number of structures was reduced to 160, on
which 5 randomly selected structures were selected among the final
evaluation set, while the remaining 155 structures were distributed on
five groups for cross-validation and algorithm’s training. When compared
to other prediction tools, BepiPred 2.0 presented the highest AUC value
(0.62), followed by BepiPred 1.0 (0.57) and LBtope (0.54).39
Due to the need of the three-dimensional (3D) structures of antigenic
proteins required to predict Conformational B-cell epitopes, the
development of reliable discontinuous epitope prediction method has
lagged that of linear B-cell epitopes. Additionally, it is a difficult
task to isolate conformational B-cell epitopes from their protein
context, for selective antibody production, when compared to the linear
B-cells epitopes. 40 The Conformational Epitope
Prediction (CEP) server 41 uses a prediction method
that, when tested using X-ray crystal structures of Ag-Ab complexes
available at Protein Data Bank (PDB), accurately predicts conformational
epitopes, antigenic determinants, and sequential epitopes with an
accuracy of 75%. This tool is a step toward the new paradigm of
“binding-determines function”, that will aid the development of assays
to map the residues implicated in the Ag-Ab contact.
The DiscoTope 42 is a tool capable of detecting 15.5%
of residues located in discontinuous epitopes with a specificity of
95%. DiscoTope combines the propensity scale matrices, spatial
proximity, and surface exposure, for the first time. This tool uses
informations like amino acid statistics, spatial information and surface
accessibility, which have been gathered on a data set based on
discontinuous epitopes established by X-ray crystallography of Ag-Ab
protein complexes.
SEPPA 43 server combine single physicochemical
properties of amino acids with geometrical structural properties. SEPPA
(Spatial Epitope Prediction of Protein Antigens) introduced a novel
concept of ’unit patch of residue triangle’. SEPPA is now on version
3.0, enabling glycoprotein antigens. When tested with independent
glycoprotein antigens only, SEPPA 3.0 gave an AUC of 0.749 and BA of
0.665, leading the top performance among peers.
The EPITOPIA server 44 implements a machine-learning
based algorithm which can handle both 3D structures, and sequence inputs
to predict immunogenic regions as candidate B-cell epitopes. This
approach uses a naive Bayesian classifier on forty-four physico-chemical
and structural–geometrical attributes, including secondary structure,
propensity, conservation, solvent accessible surface, and
hydrophilicity. When compared with ABCpred 37 which
also have machine-learning algorithms and were trained on a very similar
data set, EPITOPIA 44 presented a better performance,
yielding a success rate of 80.4% (mean AUC of 0.59), while ABCpred
yielded a success rate of 67% (mean AUC of 0.55). When compared to
other methodologies like DiscoTope 42, EPITOPIA44 presented a success rate of 89.4% against 81.8%
DiscoTope. Although CEP does not individually score amino acids, this
sever achieved a mean of 0.53 AUC, which was the lowest performance
among the compared servers, with AUC results of EPITOPIA (0.6) and
DiscoTope (0.62).
In a study by Arab-Mazar et al. (2022) 45, immunogenic
B-cell epitopes were identified based on the amino acid sequences of the
GP63, LACK, and TSA proteins of L. major, using ABCpred and
Bepipred Linear Epitope Prediction. The results showed L. major’sintegrated recombinant GP63, LACK, and TSA multiepitope antigens could
be important components for constructing a viable diagnostic ELISA
sandwich test for Cutaneous Leishmaniasis antigen detection.
Menezes-Souza et al. (2015) 12 demonstrated that
rLbMAPK3 and rLbMAPK4.1 might be one of the target molecules for human
and canine leishmaniasis immunodiagnostics, using immunoinformatics
tools including BepiPred program which was used to identify Linear
B-Cell epitopes. Assis et al. (2014) 46 identified 148
linear epitopes using BepiPred and BcePred, from the calpain-like
cysteine peptidase (CP), thiol-dependent reductase 1 (TDR1) and HSP70
proteins of L. infantum. It was the first study using a
combination of several in silico epitope prediction approaches,
as well as an assessment of secondary structures for the discovery of
Leishmania epitopes.
Despite the efforts of developing new epitopes prediction algorithms,
this research area in bioinformatics still lacks softwares and servers
which can make use of properties that are universally observed for the
antigenic epitopes, but not for other protein surfaces during the
predictions.
Immunoinformatics in the selection of antibodies for
diagnosis
The Structural Antibody Database (SAbDab) is a web tool database of
antibody structures which has over 6,000 antibody
structures.47 The annotations include experimental
information, gene details, accurate heavy and light chain pairings,
antigen details, and in some cases, also include antibody-antigen
binding affinity.
IMGT/mAb-DB is a monoclonal antibody database, part of IMGT®, the
international ImMunoGeneTics information system®, which is the standard
reference for immunogenetics and immunoinformatics. IMGT/mAb-DB is a
one-of-a-kind specialist resource for immunoglobulins (IG) or monoclonal
antibodies (mAb) with therapeutic indications, as well as fusion
proteins for immunological applications (FPIA).48 The
server database contains 1,261 entries, being 1,091 structures of
immunoglobulin.
Immunoinformatics and docking analyses for diagnostic tools
Tools that use antibody-specific decoy generation and scoring methods
perform better when compared with the general methods (protein-protein
docking). 49 ClusPro,50FRODOCK,51 PatchDock 52 and ZDOCK53 are examples of tools which include specific
algorithms to perform antibody–antigen global docking and rigid-body
approaches.
The ClusPro server 50, a widely used tool for
protein–protein docking, do not consider possible conformational
changes upon binding (rigid-body docking), and has an algorithm based on
the Fast Fourier Transform (FFT). FRODOCK 51 also uses
FFT correlation algorithms, with differences in spherical harmonic (SH)
based rotational search, which has been proven to be a faster
alternative in protein–protein docking. PatchDock 52is a geometry-based molecular docking algorithm that combines geometric
hashing and pose clustering to find interactions between
antibody–antigen complexes. Its high efficiency could be explained due
to the fast transformational search based on local feature matching,
avoiding exhaustive orientation search. ZDOCK 53 is
a rigid protein docking program which performs a thorough search for
probable binding modes of two component proteins, using FFT. This tool
searches through each conceivable posture in the translation and
rotation spaces of the two proteins. The scoring function, which
calculates potential energy, spatial complementarity, and electric field
force, is an energy-based scoring function.
SnugDock 54 and HADDOCK 55 are tools
that can perform flexible docking. The SnugDock is a Rosetta protocol
(some of these protocols are fully automated via the ROSIE web server,
rosie.rosettacommons.org) tailored to perform antibody-antigen docking.
SnugDock’s local search algorithm models the CDR loops and the VH-VL
orientation in the context of the antibody-antigen contact. When the
crystal structure of the antibody is unavailable, this tool may predict
high-resolution antibody-antigen complex structures, which is
particularly helpful. The relative orientation of the antibody light and
heavy chains, the conformations of the six complementarity determination
region loops, and the placements of the antibody and antigen rigid
bodies can all be optimized simultaneously using this method. On the
other hand, the HADDOCK (High Ambiguity Driven protein-protein DOCKing)
server 56 allows its users to perform protein-protein
docking, considering the flexibility in the side chains and backbones,
in order to consider conformational rearrangements in the interaction
surface. This tool combines a global rigid body search with ambiguous
restraints, simulated annealing in torsion space, and minimization in
Cartesian space.
Jeliazkov et al. (2021) 56 performed a comparative
study of different docking tools, having specific options for
antibody-antigen modeling, on sixteen target complexes. HADDOCK achieved
75% success rate (according to the CAPRI quality criterion, having a
model of acceptable quality or better in the top ten57 followed by ClusPro (67.8%) and ZDOCK (56.3%). In
another recent assessment, with 67 target complexes, Guest et al. (2021)58 compared ClusPro and ZDOCK. showing that ClusPro
achieved a success rate on the benchmark of 34%, although ZDOCK
produced more medium accuracy or higher models (22% success, versus
16% of ClusPro).
Bioinformatics is a valuable tool for identifying new proteins and
antigens that can be used as targets for the diagnosis and treatment of
infectious diseases. However, most studies on leishmaniasis focus on
identifying new drugs and vaccines. Although there are studies that use
docking tools to identify potential Leishmania antigens, there is a lack
of studies reporting the use of these approaches for the identification
of new diagnostic methods. Thus, studies in this area may be more
challenging and may require greater experimental effort to validate
docking results. Our research group has been applying the mentionedin silico tools, as evidenced by our publications in the field of
immunoinformatics. 59