Allergen datasets
Two resources with information on allergenic substances,
WHO/IUIS[14,15] and AllergenOnline
databases[16], were used. From WHO/IUIS, we
extracted the allergen exposure route, source (species scientific and
common names), and different fields used later as synonyms for the
literature searches: allergen name, GenBank nucleotide ID and Uniprot
ID. Following these two last IDs we obtained additional “synonyms”
from the corresponding protein databases, including “Title” and
“Extra” fields of GeneBank nucleotide and “description” and “gene
name” fields of Uniprot. From AllergenOnline, we retrieved the route
(“type” field), allergen source (scientific and common names),
allergen name and its GenBank ID, used to extract additional synonyms as
in the previous case.
This process led to 2,967 allergens for which we have different synonyms
(including gene/protein names) as well as the entry/exposition route and
source species. Note that at this point this list is redundant as the
same allergen can be annotated in both databases, eventually with
different sets of synonyms.