Allergen datasets
Two resources with information on allergenic substances, WHO/IUIS[14,15] and AllergenOnline databases[16], were used. From WHO/IUIS, we extracted the allergen exposure route, source (species scientific and common names), and different fields used later as synonyms for the literature searches: allergen name, GenBank nucleotide ID and Uniprot ID. Following these two last IDs we obtained additional “synonyms” from the corresponding protein databases, including “Title” and “Extra” fields of GeneBank nucleotide and “description” and “gene name” fields of Uniprot. From AllergenOnline, we retrieved the route (“type” field), allergen source (scientific and common names), allergen name and its GenBank ID, used to extract additional synonyms as in the previous case.
This process led to 2,967 allergens for which we have different synonyms (including gene/protein names) as well as the entry/exposition route and source species. Note that at this point this list is redundant as the same allergen can be annotated in both databases, eventually with different sets of synonyms.