2.5 RNASeq reads check and genome coverage
Quality checks of the raw RNA-Seq reads were performed using Fastqc
(Andrews, 2014).
Reads were trimmed with trimmomatic using the default parameters
(version 0.38, Bolger et al. 2014). Raw reads were mapped to anOncorhynchus mykiss reference genome from NCBI (Omyk_1.0,
https://www.ncbi.nlm.nih.gov/assembly/GCF_002163495.1/, Annotation
release ID:100) using STAR (version 2.7.1a; Dobin et al., 2013; Dobin
and Gingeras, 2015) to obtain the number of genes detected by each
technique, QuantSeq vs. NEB.
In order to perform bioinformatic analyses on samples with an equal
number of uniquely mapped reads among the two library types (as also
previously done by others, see Ma et al., 2019), we randomly selected 11
million and 40 million reads per sample for all analyses performed on
QuantSeq and NEB, respectively. Previous work has shown that using
~ >10M reads does not increase the amount
of uniquely mapped reads, after which the ability to detect differently
expressed genes becomes independent of sequencing depth (Ramsköld, 2009;
Liu et al. 2014; Ma et al., 2019; see also Crow et al., 2022 for an in
depth discussion on the issue of redundancy of reads). Transcripts are
randomly sheared into fragments with NEB but not QuantSeq. Consequently,
the number of reads with NEB are proportional to the number of fragments
not transcripts, whereas the number of reads with QuantSeq is
proportional to the number of transcripts. Because of this, more reads
may be needed for NEB than for QuantSeq to have a similar percentage of
uniquely mapped reads. However, having many fragments corresponding to
the same transcript is redundant and not useful for gene expression
quantification (Crow et al., 2022). At the same time, especially for
longer transcripts, the increased number of fragments per transcript
obtained with NEB library increases transcript detection (Crow et al.,
2022; Ma et al., 2019). The issue for whole mRNA-Seq of over-counting
the same transcript due to multiple fragments corresponding to the same
locus is especially relevant at higher sequencing depths (see Crow et
al., 2022 for a detailed discussion on this issue), so that randomly
selecting a lower number of reads to obtain an equal number of uniquely
mapped reads among the two library types overcome this issue (see also
Crow et al., 2022).
Reads were mapped again to the Oncorhynchus mykiss reference
genome. HT-Seq (version 0.11.1; Anders et al. 2015) was then used to
quantify the number of reads uniquely mapped to each gene of theO. mykiss reference genome. Finally, a python script provided
with Stringtie (prepDE.py) was used to generate a gene counts matrix
(Pertea et al., 2016).