1. Introduction
Homeobox genes belong to a family of homeodomain-containing TFs (TFs),
have been vastly studied for their roles in development, physiology and
tissue homeostasis 1.
Even though some members of the homeodomain family, comprising of HOXs,
Hepatocyte nuclear factors (HNFs) and NANOGs (NKX genes), are well
characterized for their role in various cancers, the mechanistic
function of Iroquois (IRX) proteins in tumorigenesis and their DNA
binding sequence is still not fully explored2-4. New studies on some
HOX genes have identified their roles in various cancers, but their
functional mechanisms are still to be explored5-7. For instance,HOXA9 has been found to be a tumour suppressor/oncogene in breast
cancer and leukemia 1.
Another HOX gene, HOXB13, has been well studied in prostate
development and tumorigenesis, with inherited mutations having a genetic
contribution to prostate cancer1. These classes of
proteins usually function as complexes (homo or hetero dimers) to exert
their regulatory function, altering their binding preference8. Limited diversity in
eukaryotes has been observed in the recognition and binding of
homeodomains to DNA 9.
This could be due to a specific constraint in the specific amino acids
associated with the homeodomain architecture and its preference for
specific DNA recognition sequences10,11.
The IRXs are one of the newly added members of homeodomain TF
family that have been found to play an important role in developmental
processes 12. IRX
proteins contain the unique Iro-box motif, a conserved motif of 13 amino
acid residues in the carboxyl-terminal region. They also have an
atypical homeodomain with three extra amino acids between the first and
second alpha helices, which groups them in the
3-amino-acid-loop-extension (TALE) family of TF13. These homeobox TFs
play important roles in embryogenesis, cell specification and
differentiation and organ development. The human IRX complex is
composed of six genes, found in two clusters of three genes, each in
chromosome 5 (IRX1, 2 and 4 ) and 16 (IRX3 , 5and 6 ) 14-17.
Recently IRX TFs have been studied in different cancers, suggesting
aberrant expression of these proteins in contributing to tumorigenesis.IRX5 has been reported to be regulated by vitamin D3 in prostate
cancer involved in regulating cell cycle and apoptosis18. Knockdown ofIRX5 was observed to reduce the cell viability of
androgen-sensitive LNCaP cells. IRX2 protein expression has been
correlated with breast tumour size, indicating its oncogenic function in
breast cancer 19.
Genome-Wide Association Studies (GWAS) identified IRX4 as a
causative gene in prostate cancer susceptibility20. Additionally,
alternate splicing of IRX4 has also been recently studied in
prostate cancer, highlighting differential regulation in prostate
tumorigenesis and progression21. Epigenetic studies
in pancreatic cancer found the IRX4 promotor region to be
hypermethylated, influencing increased cell growth22. IRX4 has
also been described as a tumour suppressor in prostate cancer via
vitamin D interactions23. Other studies have
also suggested the potential oncogenic roles of IRX4 in breast
cancer and non-small cell lung cancer (NSCLC )24,25.
Other differential roles of IRXs have been reported linking it to
multiple mechanisms associated with tumour progression26-28.
Although IRX gene clusters are now being identified as novel
therapeutic targets in carcinogenesis27, their protein
structure, which may help to understand their functions, has not been
biophysically characterized using techniques like Nuclear Magnetic
Resonance (NMR) and X-radiation crystallography (X-ray). Various studies
have used homology modelling and molecular dynamics (MD) simulations to
understand the molecular mechanisms of TF binding to DNA. A recent study
on HOXB13 used computationally modelled protein structures to predict
the effect of single nucleotide polymorphisms (SNPs) on the non-homeobox
region 29.
Additionally, this approach was also used to model HOXB13 protein and
predict the functional role of SNPs in prostate cancer, demonstrating
genotype-phenotype effects and paving the way for further clinical
studies highlighting its theranostic applications29. Furthermore, a
study on transcription regulator SoxR (Sulphur Oxidation) predicted DNA
binding residues of these proteins using homology modelled structures30. Structural
construction using homology modelling of E2F1 TF revealed dimerization
partner domains and the efficiency to bind to DNA31. The structure of a
protein is linked to its stability, function and its interaction.
Although the Protein Data bank (PDB) has a good number of
crystallographic structures, not enough information is available
regarding the human proteome. The use of computationally modelled
structures to understand the physical and chemical properties of TFs has
great benefits. It is well established that missense mutations play an
important role in diseases affecting the core tertiary structure of a
protein32,33.
One of the key benefits of these approaches is analyzing the effect of
mutations on the protein structure and its binding capacity.
Interpretation of mutants and their association to diseases can be
significantly influenced using this technique34. A recent modelling
study in the zyxin family of proteins LIM1-3 domains has indicated new
insights into protein-protein interactions and potential nucleic acid
binding platforms of these proteins, highlighting opportunities for
therapeutic development35.
We studied the sequence conservation of the amino acids present in the
homeodomain in this work. We have built a homology model of IRX4
homeodomain and used MD- simulations and free energy calculations to
provide insight into the mechanistic of protein-DNA binding. We also
checked the mutations on the DNA binding domain and its effect on
homeodomain stability. A classical modelling approach has been used in
this work over Alphafold36 as the prediction of
protein-DNA interactions using the Alphafold approach is still in its
infancy.