SNPs and classification rules related to sorafenib response
Figure 1 shows the classification tree computed from the RandomTree’s classifier using the sorafenib dataset. Transforming the classification tree into classification rules (1-13) obtained by analyzing the input genotype dataset, as shown in Table 3, makes it more straightforward to analyze and understand the meaning of the multiple relations between the SNPs and genotypes responsible for a particular phenotype of sorafenib response.
We identified ten classification rules by which to discriminate patients belonging to the non-responder setting, and three rules for the responder ones, with an accuracy of 69.5652%: a subject could satisfy a rule only if a correspondence existed between their own genotype and a detected SNPs, against every couple of alleles within of a rule. For instance, to verify whether a subject matched, i.e., rule 6 in Table 3, it was necessary that the SNPs (rs171248, rs6811453, rs2010963, rs12434438) assessed in the subject presented as detected genotypes (TT, CT, CC, GG), respectively. Thus, only the subjects that matched all the genotypes within a rule could be classified as “non-responder” according to the matching rule.
Afterwards, we examined the cumulative effects of SNPs obtained from the classification tree, developing a GRS by summing the number of response alleles.22-24 The response-increasing alleles were attributed based on their greater frequency in response subjects according to the literature data for angiogenesis-related genes8,10,23 and data obtained in the present study for ADME-related genes. The rs7905939 SNP was excluded from the analysis since a clear response allele was not identified. For each SNP, a score of 0 was defined for homozygous non-response alleles, 1 for heterozygous response and non-response alleles, and 2 for two homozygous response alleles. A higher mean GRS score was significantly associated with responders compared to non-responders, when the sum of the 5 scores for the rs2010963, rs4604006, rs12434438, rs183574, and rs6811453 variants was considered for each patient (p = 0.008) (Supporting Information Table S2). The mean of the gene count score was 6.00 ± 0.81 in the responder group, and 4.37 ± 1.36 in the non-responder group.
To explore whether the expression of angiogenesis- and ADME-related genes identified in the decision tree (i.e., SLC22A4, ADH1A, VEGF-A, VEGF-C, HIF-1α, and CY26A1) might have a role in HCC disease outcome in terms of response to sorafenib, we carried out a bioinformatic analysis of these genes using the public dataset GSE109211, downloaded from GEO, in which data from a subset of HCC patients (n = 67) treated with sorafenib are reported. As shown in Figure 2, VEGF-A, HIF-1α, and ADH1A expression were significantly lower in HCC tissues from sorafenib-responsive patients (n = 20), whereas SLC22A14 expression was significantly higher. No significant correlation was found between the expression of VEGF-C and CY26A1 genes and sorafenib response.