SNPs and classification rules related to sorafenib response
Figure 1 shows the classification tree computed from the RandomTree’s
classifier using the sorafenib dataset. Transforming the classification
tree into classification rules (1-13) obtained by analyzing the input
genotype dataset, as shown in Table 3, makes it more straightforward to
analyze and understand the meaning of the multiple relations between the
SNPs and genotypes responsible for a particular phenotype of sorafenib
response.
We identified ten classification rules by which to discriminate patients
belonging to the non-responder setting, and three rules for the
responder ones, with an accuracy of 69.5652%: a subject could satisfy a
rule only if a correspondence existed between their own genotype and a
detected SNPs, against every couple of alleles within of a rule. For
instance, to verify whether a subject matched, i.e., rule 6 in Table 3,
it was necessary that the SNPs (rs171248, rs6811453, rs2010963,
rs12434438) assessed in the subject presented as detected genotypes (TT,
CT, CC, GG), respectively. Thus, only the subjects that matched all the
genotypes within a rule could be classified as “non-responder”
according to the matching rule.
Afterwards, we examined the cumulative effects of SNPs obtained from the
classification tree, developing a GRS by summing the number of response
alleles.22-24 The response-increasing alleles were
attributed based on their greater frequency in response subjects
according to the literature data for angiogenesis-related
genes8,10,23 and data obtained in the present study
for ADME-related genes. The rs7905939 SNP was excluded from the analysis
since a clear response allele was not identified. For each SNP, a score
of 0 was defined for homozygous non-response alleles, 1 for heterozygous
response and non-response alleles, and 2 for two homozygous response
alleles. A higher mean GRS score was significantly associated with
responders compared to non-responders, when the sum of the 5 scores for
the rs2010963, rs4604006, rs12434438, rs183574, and rs6811453 variants
was considered for each patient (p = 0.008) (Supporting Information
Table S2). The mean of the gene count score was 6.00 ± 0.81 in the
responder group, and 4.37 ± 1.36 in the non-responder group.
To explore whether the expression of angiogenesis- and ADME-related
genes identified in the decision tree (i.e., SLC22A4, ADH1A, VEGF-A,
VEGF-C, HIF-1α, and CY26A1) might have a role in HCC disease outcome in
terms of response to sorafenib, we carried out a bioinformatic analysis
of these genes using the public
dataset GSE109211, downloaded
from GEO, in which data from a subset of HCC patients (n = 67) treated
with sorafenib are reported. As shown in Figure 2, VEGF-A,
HIF-1α, and ADH1A expression were
significantly lower in HCC tissues from sorafenib-responsive patients (n
= 20), whereas SLC22A14 expression was significantly higher. No
significant correlation was found between the expression of VEGF-C and
CY26A1 genes and sorafenib response.