Results
AAV production
AAV were produced by co-infecting Sf9 insect cells with rBAC-GFP and
rBAC-REP/CAP (MOI of 0.05 for each virus) in a 2 L stirred tank
bioreactor (Figure 1a ). Cell growth kinetics after infection
(Figure 1b ) followed a typical profile for a low MOI infection,
indicated by one cell population doubling until growth arrest at 24 hpi
comparable to previous studies (Correia et al., 2022; Pais et al., 2019;
Pais et al., 2020 and Virgolini et al, 2022). The production yielded
2.5 × 1010 VG.mL-1 intracellular
genome-containing AAV particles at 72 hpi (Figure 1c ). As one
bioreactor run generated enough cells for single-cell RNA-seq analysis
and the cell and AAV profiles were consistent with those of previous
runs from our group, no further replicates were conducted.
scRNA-seq data processing and quality
control
Samples for scRNA-seq were acquired at 0 hpi (prior to infection), 10
hpi and 24 hpi (Figure 1b ). Following Illumina sequencing, an
average of 309 ± 21 million reads were acquired for each of the three
samples (Table S1 ). The BD Rhapsody WTA analysis pipeline from
Seven Bridges Genomics was utilized to process sequencing data, perform
genome mapping, and generate RMCE-adjusted molecule UMI count matrices.
An average of 5,309 ± 234 barcodes per sample passed the initial quality
control step (e.g., read quality filtering, RSEC adjustment, cell label
filtering), yielding an average of 21,647 ± 1,397 UMIs and resulting in
the detection of 3,405 ± 299 genes per cell (Table S1 ).
Retained barcodes were imported into Seurat and further assessed using
established quality parameters, such as the proportion of mitochondrial
UMIs (Figure 1d ) and the detected UMIs per cell (Figure
S1a ). For quality control, only the proportion of UMIs originating from
the mitochondrial genes were considered and barcodes where this
proportion was below 5% were retained, resulting in a total of 5,288,
4,750 and 5,159 cells for 0, 10 and 24 hpi, respectively, to be
considered for further analysis (Figure 1e ). While the total
number of genes identified per cell showed a similar median value along
infection, a bimodal distribution can be observed at 24 hpi, with 21%
of cells having less than 2,000 identified genes, because of a high
percentage of baculovirus mRNA molecules in cells at this timepoint
(Figure 1f overall results of baculovirus UMIs per cell inFigure S1b ).
Sf9 cell heterogeneity
To assess variations in the transcriptome of Sf9 associated with
infection, each timepoint was combined into a single Seurat object for
analysis. Merged samples were log-normalized and scaled, regressing out
variations caused by different total UMIs per cell. For principal
component analysis, 2,000 variable features were considered, and 20
principal components were used to construct a UMAP and perform
clustering analysis. A total of 11 clusters were identified, of which 5
were present prior infection (Figure 2a ), indicating an
inherent degree of heterogeneity in this Sf9 cell line. One cluster
(Cluster 7) was clearly separated at 0 hpi. Using the FindMarkers
function with the MAST differential expression test revealed that cells
in Cluster 7 had significant overexpression of stress response genes
(e.g., heat shock protein 68 and protein lethal(2)essential
for life ) and ribosomal RNA, suggesting increased cell stress and/or
early signs of cell death (Table S2 ).
As cell cycle has been shown to contribute to heterogeneity in scRNA-seq
datasets (Tirosh et al., 2016; Tzani et al., 2021), classification of
cells to cell cycle phases was attempted using the Cell Cycle Scoring
method in Seurat. For this 25 G2/M and 30 S phase genes (Table
S3 ) were used to calculate a score representing the probability that a
certain cell is in a particular phase of cell cycle. At early stages of
infection (0 and 10 hpi), the predicted classification of cells seems to
corroborate observed data, as can be observed by comparing the
distribution of cells associated to each cell cycle phase
(Figure S2a-b ) with the relative expression of 3 selected cell
cycle-related genes (Figure S2c ). At later stages of infection
(24 hpi) this is no longer valid, with the increase in the amount of
baculovirus transcripts in infected cells hindering effective
association of cells to a correct cell cycle phase, as demonstrated by
the overall low expression of the G1 cell cycle-related gene
(Figure S2c ) but still association with G1 cell cycle phase
(Figure S2b ). As a result, cell cycle regression was not
considered for further analysis.
Following visualization of the 3 cell populations (Figure 2a ),
a clear difference in the position of each population can be observed
suggesting an increased impact on the cell transcriptome as infection
progresses. This impact can be associated with enhanced expression of
baculovirus genes, with the furthest clusters (Clusters 5 and 8) showing
an average of 92 ± 3% of baculovirus mRNA per cell (Figure 2b )
and the lowest number of detected genes, averaging 783 ± 331, while
total UMIs per cell stayed similar to other clusters (Figure
S3 ). To support this assumption, an established early gene
(proliferating cell nuclear antigen ) and a late baculovirus gene
(viral cathepsin ) were selected and their expression mapped to
the identified clusters (Figure 2c ). While theproliferating cell nuclear antigen (early gene) showed elevated
expression in clusters (0, 4, and 6) associated to early infected cells
(i.e., 10-24 hpi), the viral cathepsin (late gene) only shows
high expression in clusters (5, 8, and 10) associated with more advanced
stages of infection (i.e., 24 hpi).
Expression of AAV transgenes
Following the evaluation of cell heterogeneity, cells were categorized
throughout infection according to the expression of each target
transgene (rep , cap , and gfp - Figure 3a ).
Baculovirus and transgenes were not detected in cells prior to infection
(0 hpi); this number increased to 36.2 % at 10 hpi, and reached close
to 100 % at 24 hpi (all cells were infected) (Figure 3b ).
At 24 hpi, 29.4 % of cells expressed transgenes from both rBACs (i.e.,
expressed gfp and rep /cap ) and are therefore
capable of producing packaged AAV. Moreover, 59.0 % of cells expressedgfp only, 3.3 % of cells expressed rep /cap , and
8.3% of cells had still no AAV transgene expressed at this timepoint
(Figures 3a-b ). In addition, there is a poor correlation
between expression of gfp and rep (Figure 3c );
for rep and cap, their expression is similar, althoughrep seems to be expressed at higher levels than cap(Figure 3d ).
Identification of cell marker genes associated to
infection
Gene expression changes associated to baculovirus infection were
identified using only the 10 hpi sample. This timepoint was deemed
preferential over the 24 hpi one for two reasons: (1) it contains
infected, non-infected and presumably cells in intermediate states; (2)
the number of UMIs acquired from the host cell transcriptome at 24 hpi
timepoint was extremely low. Given the high number of non-infected cells
(63.8%) at 10 hpi, merging with the 0 hpi datasets was deemed
unnecessary.
The 2,000 most variable genes were retained for PCA, with the first 20
principal components to establish UMAPs. Overall, 9 clusters were
identified, separated into two major cluster groups (Figure
4a ). Clusters 3, 4, 6 ,8 were identified as clusters of infected cells,
with 4 and 8 being most advanced in the infection process according to
the percentage of baculovirus transcripts, and clusters 0, 1, 2 as
clusters of non-infected cells (Figure 4b ). Cluster 7 diverged
from the remaining clusters of non-infected cells, showing fewer total
UMIs and genes per cell compared to others (Figure S4a-b ) and
thus was not included in the differential expression analysis. To
identify marker genes of infected cells, the FindMarkers function was
used with the MAST test. Overall, 291 differentially expressed genes
between infected and non-infected cells were identified, of which 134
were upregulated and 157 downregulated in infected cells clusters
(Table S4 ). Genes significantly upregulated derived mostly from
the baculovirus, with only 9 identified as host cell genes; these 9
genes were associated to stress response (e.g., heat shock protein
68 , peroxiredoxin-6 ) and microtubule mobility (dynein
regulatory complex protein-8 ) (Figure 4c ). Genes downregulated
in infected cells were associated to metabolic processes (e.g.,
mitochondrial glutamate dehydrogenase and, L-lactate
dehydrogenase and aminomethyltransferase ), ion transport (e.g.,innexin-3 ) as well as apoptosis inhibitors (baculoviral IAP
repeat-containing protein 5 ) (Figure 4c ).
Assessing significantly enriched gene ontology terms along
infection
To further assess transcriptional changes along infection,
non-differential expression analysis was conducted. Monocle3 was used to
perform trajectory analysis and thus evaluate which genes vary along
pseudotime in cells at 10 hpi. The 10 hpi sample was again deemed
preferential due to the before mentioned reasons. A trajectory graph was
drawn, from which it becomes evident that clusters associated to
infection were furthest along pseudotime (Figure 5a ). Monocle3
identified 5,704 genes varying along pseudotime (q -value
< 0.01), of which 2616 were deemed significant (Table
S5 ). Gene ontology analysis of respective genes revealed biological
processes such as stress response to infection (e.g., cellular response
to virus), cell growth (e.g., cell division, DNA replication and
transcription), cell cycle and protein folding significantly vary along
infection (Figure 5b ).