1 Introduction
RNA sequencing (RNA-Seq) is increasingly common in ecological and
evolutionary studies focusing on variation in gene expression (Alvarez
et al., 2014, Conesa et al., 2016; Ekblom & Galindo, 2011), It has been
used in research on physiology, conservation, and to assess organismal
response to environmental variables (Todd et al., 2016; Corlett, 2017,
Rey et al., 2020). RNA-Seq is highly accurate for quantifying expression
levels, requires less RNA sample compared to microarrays, does not
necessarily require a reference genome (e.g., Cahais et al., 2012), can
uncover sequence variation in transcribed regions, and shows high
reproducibility (Wang et al., 2009). However, gene expression data can
be strongly influenced by biological and non-biological factors such as
experimental and stochastic variation (Auer & Doerge, 2010; Qian et
al., 2014; Todd et al., 2016). Given the recent surge in RNA-based
studies, it is therefore critical to identify and quantify
non-biological sources of variation in gene expression estimates.
Tissue sampling methods can be an important experimental cause of
variation in estimated gene expression (Mutch et al., 2008; Passow et
al., 2019). Delay in sample preservation after collection, for example
by increasing storage time in buffer at room temperature for more than
10 days, may result in higher RNA degradation and introduce bias in
estimated gene expression (e.g., Gayral et al., 2011; Romero et al.,
2014). This is a consequence of mRNAs being produced in relatively short
or rapid bursts in response to internal or external stimuli and having
short half-lives (Ross, 1995; Staton et al., 2000). Similarly, the use
of different anesthetics, methods of tissue preservation, different RNA
extraction methods, and timeframe between sample collection and RNA
isolation can all impact RNA quality and gene expression (e.g., Debey et
al., 2004; Huitink et al., 2010; Jeffries et al., 2014; Mutter et al.,
2004; Olsvik et al., 2007; Passow et al., 2019).
Variation in gene expression due to stochastic variation in cellular and
molecular processes can result in random differences among individuals
of the same population for the same genes without necessarily being a
consequence of micro-environmental variation or other biological factors
(e.g., maternal effects and potentially heritable variation). For
studies with few biological replicates, this variation may be
misinterpreted as biologically relevant (Hansen et al., 2011; Kaern et
al., 2005). Detection of stochastic variation in gene expression may be
achieved through careful sampling design (e.g., individuals vary at only
one treatment) and by increasing the number of sampled individuals (Kim
et al., 2015, Liu et al., 2014) to gain statistical power (Ching et al.,
2014). However, RNA-Seq experiments are often limited in the number of
sampled individuals due to cost, with consequent loss of statistical
power and potentially misleading results (Bi & Liu, 2016; Li et al.,
2013).
Higher sequencing cost has led to the development of RNA library
construction protocols that allow processing and sequencing a larger
number of samples in a more cost-effective manner (Meyer et al., 2011;
Morrissy et al., 2009; Wu et al., 2010). 3’ RNA-Seq methods only primes
the 3’ poly-A tail, thus reducing the sequencing effort and cost (Lohman
et al., 2016; Ma et al., 2019). Independent of sample size, library
construction and RNA sequencing techniques however may also produce
variability in detection of transcripts, detection of differentially
expressed genes among treatments and observed differences in gene
expression between whole mRNA-Seq and 3’ RNA-Seq (e.g., Crow et al.,
2020; Jarvis et al.; 2020; Ma et al., 2019; Tandonnet & Torres, 2017).
Furthermore, whole mRNA libraries and sequencing methods often result in
fragment length bias because longer transcripts are sheared into more
fragments so that a higher number of reads will be assigned to them than
shorter transcripts, causing an overrepresentation of larger transcripts
(Ma et al., 2019; Oshlack & Wakefield, 2009; Roberts et al., 2011). On
the other hand, 3’ RNA-Seq generates an essentially uniform distribution
of fragments with respect to original RNA length (Lohman et al., 2016;
Ma et al., 2019). Although there are methods to correct for the bias in
gene expression due to differences in transcript length, the detection
and sampling of transcripts is still higher – especially for longer
transcripts – when using classical mRNA-Seq approaches (Crow et al.,
2022; Ma et al., 2019; Mandelboum et al., 2019; Tandonnet & Torres,
2017). Finally, whole mRNA-Seq libraries permits identification of
alternative splicing at a single gene, as library and sequencing with
this method capture different fragments and transcripts for the same
locus (Crow et al., 2022).
In many species including fish, RNA-Seq data are commonly used to
investigate the effects of environmental variables (e.g., temperature,
hypoxia) on gene expression (e.g., Krishnan et al., 2020; Long et al.,
2015; Meyer et al., 2011; Smith et al., 2013; Wang et al., 2015;
Jeffries et al., 2021). However, there is little known about the
influence of different methods used to sample individuals , under field
conditions on gene expression. Field conditions may limit the use of
optimal sampling protocols or storage methods(Mutter et al., 2004;
Pérez‐Portela & Riesgo, 2013). Handling time of individuals before
tissue sampling may also be longer than in the lab and affect gene
expression differently depending on the field sampling technique and
tissue used.
The impacts of handling stress on fish physiology are well understood
(Sopinka et al., 2016). Although most studies focus on glucocorticoid
and blood chemistry responses following capture (Milla et al., 2010;
Wiseman et al., 2007; Wood et al., 1983; Milligan, 1996; Barton, 2002;
Ruane et al., 2001; see also Romero & Reed, 2005 for influence on
handling time of non-fish species), gene expression responses to
handling stress indicate that the magnitude, intensity, and duration of
changes vary across genes, species, and tissue types (Krasnov et al.,
2005; Lopez et al., 2014). While there is evidence that blood cortisol
and glucose levels are affected by capture method (e.g.,
electrofishing), to our knowledge (Barton & Dwyer, 1997; Barton &
Grosh, 1996; Bracewell et al., 2004), it is unclear whether gene
expression is affected by capture method or handling time prior to
sample collection.
Here, we test whether sampling method (electrofishing vs dip netting),
processing time, and RNA-Seq libraries (3’ RNA-Seq – here called
QuantSeq - vs. whole mRNA-Seq – here called NEB) influence gene
expression data in multiple tissue types from westslope cutthroat trout
(Oncorhynchus clarkii lewisi ), a species of conservation concern
native to western North America (Behnke, 2002; Allendorf and Leary,
1988; Shepard et al., 2003). Electrofishing, which consists of a
backpack mounted electrofishing unit that applies an electrical current
in the water to momentarily stun the fish, is one of the most common
fisheries sampling methods. This method may cause the fish to express
genes in response to the electric current, and may affect individual
fish and tissue types differently, increasing variation among biological
replicates. An alternative to electrofishing is dip netting. While nets
may potentially result in a lower effect on gene expression and lower
risk of inadvertently killing both target and non-target organisms, it
is more laborious and time consuming and less effective in the field
where circumstances may not allow for long sampling periods or aquatic
systems may have obstacles that prevent effective capture with nets
(e.g., fallen tree limbs and rocks). Capturing fish by dip netting may
still influence gene expression through stress, as the fish tries to
escape capture.
The results of this study will provide a foundation for improving future
RNA-based study designs for field sampling of wild caught non-model fish
and other species.