1 Introduction
RNA sequencing (RNA-Seq) is increasingly common in ecological and evolutionary studies focusing on variation in gene expression (Alvarez et al., 2014, Conesa et al., 2016; Ekblom & Galindo, 2011), It has been used in research on physiology, conservation, and to assess organismal response to environmental variables (Todd et al., 2016; Corlett, 2017, Rey et al., 2020). RNA-Seq is highly accurate for quantifying expression levels, requires less RNA sample compared to microarrays, does not necessarily require a reference genome (e.g., Cahais et al., 2012), can uncover sequence variation in transcribed regions, and shows high reproducibility (Wang et al., 2009). However, gene expression data can be strongly influenced by biological and non-biological factors such as experimental and stochastic variation (Auer & Doerge, 2010; Qian et al., 2014; Todd et al., 2016). Given the recent surge in RNA-based studies, it is therefore critical to identify and quantify non-biological sources of variation in gene expression estimates.
Tissue sampling methods can be an important experimental cause of variation in estimated gene expression (Mutch et al., 2008; Passow et al., 2019). Delay in sample preservation after collection, for example by increasing storage time in buffer at room temperature for more than 10 days, may result in higher RNA degradation and introduce bias in estimated gene expression (e.g., Gayral et al., 2011; Romero et al., 2014). This is a consequence of mRNAs being produced in relatively short or rapid bursts in response to internal or external stimuli and having short half-lives (Ross, 1995; Staton et al., 2000). Similarly, the use of different anesthetics, methods of tissue preservation, different RNA extraction methods, and timeframe between sample collection and RNA isolation can all impact RNA quality and gene expression (e.g., Debey et al., 2004; Huitink et al., 2010; Jeffries et al., 2014; Mutter et al., 2004; Olsvik et al., 2007; Passow et al., 2019).
Variation in gene expression due to stochastic variation in cellular and molecular processes can result in random differences among individuals of the same population for the same genes without necessarily being a consequence of micro-environmental variation or other biological factors (e.g., maternal effects and potentially heritable variation). For studies with few biological replicates, this variation may be misinterpreted as biologically relevant (Hansen et al., 2011; Kaern et al., 2005). Detection of stochastic variation in gene expression may be achieved through careful sampling design (e.g., individuals vary at only one treatment) and by increasing the number of sampled individuals (Kim et al., 2015, Liu et al., 2014) to gain statistical power (Ching et al., 2014). However, RNA-Seq experiments are often limited in the number of sampled individuals due to cost, with consequent loss of statistical power and potentially misleading results (Bi & Liu, 2016; Li et al., 2013).
Higher sequencing cost has led to the development of RNA library construction protocols that allow processing and sequencing a larger number of samples in a more cost-effective manner (Meyer et al., 2011; Morrissy et al., 2009; Wu et al., 2010). 3’ RNA-Seq methods only primes the 3’ poly-A tail, thus reducing the sequencing effort and cost (Lohman et al., 2016; Ma et al., 2019). Independent of sample size, library construction and RNA sequencing techniques however may also produce variability in detection of transcripts, detection of differentially expressed genes among treatments and observed differences in gene expression between whole mRNA-Seq and 3’ RNA-Seq (e.g., Crow et al., 2020; Jarvis et al.; 2020; Ma et al., 2019; Tandonnet & Torres, 2017). Furthermore, whole mRNA libraries and sequencing methods often result in fragment length bias because longer transcripts are sheared into more fragments so that a higher number of reads will be assigned to them than shorter transcripts, causing an overrepresentation of larger transcripts (Ma et al., 2019; Oshlack & Wakefield, 2009; Roberts et al., 2011). On the other hand, 3’ RNA-Seq generates an essentially uniform distribution of fragments with respect to original RNA length (Lohman et al., 2016; Ma et al., 2019). Although there are methods to correct for the bias in gene expression due to differences in transcript length, the detection and sampling of transcripts is still higher – especially for longer transcripts – when using classical mRNA-Seq approaches (Crow et al., 2022; Ma et al., 2019; Mandelboum et al., 2019; Tandonnet & Torres, 2017). Finally, whole mRNA-Seq libraries permits identification of alternative splicing at a single gene, as library and sequencing with this method capture different fragments and transcripts for the same locus (Crow et al., 2022).
In many species including fish, RNA-Seq data are commonly used to investigate the effects of environmental variables (e.g., temperature, hypoxia) on gene expression (e.g., Krishnan et al., 2020; Long et al., 2015; Meyer et al., 2011; Smith et al., 2013; Wang et al., 2015; Jeffries et al., 2021). However, there is little known about the influence of different methods used to sample individuals , under field conditions on gene expression. Field conditions may limit the use of optimal sampling protocols or storage methods(Mutter et al., 2004; Pérez‐Portela & Riesgo, 2013). Handling time of individuals before tissue sampling may also be longer than in the lab and affect gene expression differently depending on the field sampling technique and tissue used.
The impacts of handling stress on fish physiology are well understood (Sopinka et al., 2016). Although most studies focus on glucocorticoid and blood chemistry responses following capture (Milla et al., 2010; Wiseman et al., 2007; Wood et al., 1983; Milligan, 1996; Barton, 2002; Ruane et al., 2001; see also Romero & Reed, 2005 for influence on handling time of non-fish species), gene expression responses to handling stress indicate that the magnitude, intensity, and duration of changes vary across genes, species, and tissue types (Krasnov et al., 2005; Lopez et al., 2014). While there is evidence that blood cortisol and glucose levels are affected by capture method (e.g., electrofishing), to our knowledge (Barton & Dwyer, 1997; Barton & Grosh, 1996; Bracewell et al., 2004), it is unclear whether gene expression is affected by capture method or handling time prior to sample collection.
Here, we test whether sampling method (electrofishing vs dip netting), processing time, and RNA-Seq libraries (3’ RNA-Seq – here called QuantSeq - vs. whole mRNA-Seq – here called NEB) influence gene expression data in multiple tissue types from westslope cutthroat trout (Oncorhynchus clarkii lewisi ), a species of conservation concern native to western North America (Behnke, 2002; Allendorf and Leary, 1988; Shepard et al., 2003). Electrofishing, which consists of a backpack mounted electrofishing unit that applies an electrical current in the water to momentarily stun the fish, is one of the most common fisheries sampling methods. This method may cause the fish to express genes in response to the electric current, and may affect individual fish and tissue types differently, increasing variation among biological replicates. An alternative to electrofishing is dip netting. While nets may potentially result in a lower effect on gene expression and lower risk of inadvertently killing both target and non-target organisms, it is more laborious and time consuming and less effective in the field where circumstances may not allow for long sampling periods or aquatic systems may have obstacles that prevent effective capture with nets (e.g., fallen tree limbs and rocks). Capturing fish by dip netting may still influence gene expression through stress, as the fish tries to escape capture.
The results of this study will provide a foundation for improving future RNA-based study designs for field sampling of wild caught non-model fish and other species.