2.6 | Constructing gene families
To construct the dataset for gene-family clustering, the protein
sequences from the genomes of Pse. libanotica and 13 other plants
(A. tauschii , B. distachyon , T. aestivum , T.
durum , T. dicoccoides , T. urartu , H. vulgare ,O. sativa , S. cereale , Sorghum bicolor , Z.
mays , Dactylis glomerata and Arabidopsis thaliana ) were
used. In the included species, only the longest transcript in the coding
region was retained for further analysis when multiple transcripts were
present in a gene. Additionally, genes encoding proteins with fewer than
50 amino acids were filtered. The protein sequences of all species were
filtered by BLASTP with an E-value of 1e-5 Protein
sequences from all 14 species were clustered into paralogous and
orthologous groups using OrthoMCL (http://orthomcl.org/orthomcl/) with
an inflation parameter equal to 1.5.