2.5 RNASeq reads check and genome coverage
Quality checks of the raw RNA-Seq reads were performed using Fastqc (Andrews, 2014).
Reads were trimmed with trimmomatic using the default parameters (version 0.38, Bolger et al. 2014). Raw reads were mapped to anOncorhynchus mykiss reference genome from NCBI (Omyk_1.0, https://www.ncbi.nlm.nih.gov/assembly/GCF_002163495.1/, Annotation release ID:100) using STAR (version 2.7.1a; Dobin et al., 2013; Dobin and Gingeras, 2015) to obtain the number of genes detected by each technique, QuantSeq vs. NEB.
In order to perform bioinformatic analyses on samples with an equal number of uniquely mapped reads among the two library types (as also previously done by others, see Ma et al., 2019), we randomly selected 11 million and 40 million reads per sample for all analyses performed on QuantSeq and NEB, respectively. Previous work has shown that using ~ >10M reads does not increase the amount of uniquely mapped reads, after which the ability to detect differently expressed genes becomes independent of sequencing depth (Ramsköld, 2009; Liu et al. 2014; Ma et al., 2019; see also Crow et al., 2022 for an in depth discussion on the issue of redundancy of reads). Transcripts are randomly sheared into fragments with NEB but not QuantSeq. Consequently, the number of reads with NEB are proportional to the number of fragments not transcripts, whereas the number of reads with QuantSeq is proportional to the number of transcripts. Because of this, more reads may be needed for NEB than for QuantSeq to have a similar percentage of uniquely mapped reads. However, having many fragments corresponding to the same transcript is redundant and not useful for gene expression quantification (Crow et al., 2022). At the same time, especially for longer transcripts, the increased number of fragments per transcript obtained with NEB library increases transcript detection (Crow et al., 2022; Ma et al., 2019). The issue for whole mRNA-Seq of over-counting the same transcript due to multiple fragments corresponding to the same locus is especially relevant at higher sequencing depths (see Crow et al., 2022 for a detailed discussion on this issue), so that randomly selecting a lower number of reads to obtain an equal number of uniquely mapped reads among the two library types overcome this issue (see also Crow et al., 2022).
Reads were mapped again to the Oncorhynchus mykiss reference genome. HT-Seq (version 0.11.1; Anders et al. 2015) was then used to quantify the number of reads uniquely mapped to each gene of theO. mykiss reference genome. Finally, a python script provided with Stringtie (prepDE.py) was used to generate a gene counts matrix (Pertea et al., 2016).