Replication patterns vary with gene function and expression
To address this point, we re-analysed our published TrAEL-seq datasets for increased replication fork density indicative of fork stalling (Kara et al., 2021; Whale et al., 2022), across gene sets categorised as ’housekeeping’ or ’environment dependent’ as we hypothesised that environment-dependent genes might be configured to evolve more readily than housekeeping genes. Although mutants of the transcriptional activators SAGA and TFIID affect steady state mRNA levels of all genes, Huisinga and Pugh noted that RNA pol II genes were more responsive to one or the other (Huisinga and Pugh, 2004), then Donczew et alrefined this categorisation into TFIID dependent (’housekeeping’) and Coactivator Redundant (CR, ’environment dependent’) sets (Donczew et al., 2020). We stratified genes in quartiles for transcription based on NET-seq data (Churchman and Weissman, 2011), then subdivided each quartile into TFIID or CR genes (Figure 3A). Few genes in the lowest quartile are reliably designated as SAGA or CR dependent so this quartile was not subdivided.
For TFIID genes, replication forks moving head-on to the direction of transcription accumulate slightly across the entire transcribed region indicating that replisome progression is retarded (Figure 3B, top left), whereas signal from replisomes moving co-directionally with RNA polymerase II is reduced, indicating that replisome movement is accelerated (Figure 3B, top right). Curiously, these effects are equivalent across expression quartiles and therefore likely reflect a sensitivity of the replisome to transcription units rather than transcription-replisome conflicts.
In contrast, head-on CR genes show a transcription-dependent increase in signal from the TES to the TSS that would be consistent with the replisome being increasingly retarded by either direct encounters with RNA polymerase II or indirect features associated with transcription such as R-loops, while co-directional replisome progression is largely unaffected (Figure 3B, middle). The TrAEL-seq signal also increases dramatically in the 10kb upstream of the TSS (Figure 3C), which would be consistent with replication origins being more frequently located upstream of highly expressed CR genes. We therefore measured distances from each TSS to the nearest replication origin: this does not differ from random for TFIID genes but is significantly closer for highly expressed CR genes, with the majority of these genes having a replication origin within 10kb (Figure 3D). This is very interesting given our recent observation that CNV events triggered by expression of the CUP1 gene depend on a closely adjacent replication origin (Whale et al., 2022); it should be noted that the CUP1 locus was excluded from the analysis presented here because of high copy number.
The TFIID and SAGA gene sets were originally classified as ’housekeeping’ or ’environment dependent’, and GO analysis of the TFIID and CR gene sets remains in accord with this, the former being dominated by translation and the latter by metabolic genes (glycolysis and metabolite biosynthesis), which are used in a more environmentally dependent manner. This analysis indicates that the replisome tends to interact with transcription in head-on CR genes but not TFIID genes, skewing the potential for transcription induced mutation and CNV towards environmentally responsive genes. Furthermore, highly expressed CR genes have evolved to lie close to replication origins, which have the capacity to induce copy number variation mechanisms of the type we observed at CUP1 .