Replication patterns vary with gene function and expression
To address this point, we re-analysed our published TrAEL-seq datasets
for increased replication fork density indicative of fork stalling (Kara
et al., 2021; Whale et al., 2022), across gene sets categorised as
’housekeeping’ or ’environment dependent’ as we hypothesised that
environment-dependent genes might be configured to evolve more readily
than housekeeping genes. Although mutants of the transcriptional
activators SAGA and TFIID affect steady state mRNA levels of all genes,
Huisinga and Pugh noted that RNA pol II genes were more responsive to
one or the other (Huisinga and Pugh, 2004), then Donczew et alrefined this categorisation into TFIID dependent (’housekeeping’) and
Coactivator Redundant (CR, ’environment dependent’) sets (Donczew et
al., 2020). We stratified genes in quartiles for transcription based on
NET-seq data (Churchman and Weissman, 2011), then subdivided each
quartile into TFIID or CR genes (Figure 3A). Few genes in the lowest
quartile are reliably designated as SAGA or CR dependent so this
quartile was not subdivided.
For TFIID genes, replication forks moving head-on to the direction of
transcription accumulate slightly across the entire transcribed region
indicating that replisome progression is retarded (Figure 3B, top left),
whereas signal from replisomes moving co-directionally with RNA
polymerase II is reduced, indicating that replisome movement is
accelerated (Figure 3B, top right). Curiously, these effects are
equivalent across expression quartiles and therefore likely reflect a
sensitivity of the replisome to transcription units rather than
transcription-replisome conflicts.
In contrast, head-on CR genes show a transcription-dependent increase in
signal from the TES to the TSS that would be consistent with the
replisome being increasingly retarded by either direct encounters with
RNA polymerase II or indirect features associated with transcription
such as R-loops, while co-directional replisome progression is largely
unaffected (Figure 3B, middle). The TrAEL-seq signal also increases
dramatically in the 10kb upstream of the TSS (Figure 3C), which would be
consistent with replication origins being more frequently located
upstream of highly expressed CR genes. We therefore measured distances
from each TSS to the nearest replication origin: this does not differ
from random for TFIID genes but is significantly closer for highly
expressed CR genes, with the majority of these genes having a
replication origin within 10kb (Figure 3D). This is very interesting
given our recent observation that CNV events triggered by expression of
the CUP1 gene depend on a closely adjacent replication origin
(Whale et al., 2022); it should be noted that the CUP1 locus was
excluded from the analysis presented here because of high copy number.
The TFIID and SAGA gene sets were originally classified as
’housekeeping’ or ’environment dependent’, and GO analysis of the TFIID
and CR gene sets remains in accord with this, the former being dominated
by translation and the latter by metabolic genes (glycolysis and
metabolite biosynthesis), which are used in a more environmentally
dependent manner. This analysis indicates that the replisome tends to
interact with transcription in head-on CR genes but not TFIID genes,
skewing the potential for transcription induced mutation and CNV towards
environmentally responsive genes. Furthermore, highly expressed CR genes
have evolved to lie close to replication origins, which have the
capacity to induce copy number variation mechanisms of the type we
observed at CUP1 .