Gene Expression Analysis
To examine differential gene expression and differential transcript
usage among life stage (egg, larvae, and adult), as well as female and
male adults, we mapped paired-end short-read RNAseq data to the
super-transcriptome using Hisat2 to generate input count matrix
for genes and exons within genes. As each RNAseq sample is unreplicated,
we chose statistical tools that have performed reliably in differential
expression and splicing analyses (Mehmood et al. 2019; Rapaportet al. 2013) and applied stringent significance thresholds based
on multiple testing criteria. First, we used edgeR
(Robinson et al. 2010) to conduct the differential expression
analysis and assumed a recommended dispersion value for within species
comparisons (0.16, based on human datasets). The count matrix was
filtered by requiring a minimum count of 1 read-per-million per gene in
at least three samples. Hierarchical clustering was used to measure the
overall sample to sample distance based on differential expression, with
read data transformed to log2 counts-per-million (logCPM), or moderated
log-counts-per-million, to avoid the effect of undefined values and to
shrink poorly expressed genes with low counts to zero. We then
identified significantly differentially expressed genes in pairwise
contrasts of each life-stage or sex, using Fisher’s Exact Test based on
the negative-binomial distribution. Statistical significance in each
contrast was determined based on a Benjamini-Hochberg adjustedp-value , or false discovery rate (FDR), of 0.01. Gene lists for
the top 25 genes (lowest p-value ) from each pairwise contrast
were combined to generate a heatmap using the pheatmap
package in R. For the egg vs. larvae, larvae vs. adult female, and adult
female vs. male contrasts, we examined the biological function for all
significantly differentially expressed genes (FDR < 0.01)
using a gene set enrichment analysis (GSEA) based on gene ontology (GO)
terms. The goseq package (Young et al. 2010) in R
was used to identify GO terms that were enriched by comparing the subset
of differentially expressed genes to all genes that were expressed in
either the female or male samples. The Wallenius approximation was then
used to calculate statistical significance based on an FDR of 0.05,
while taking gene length into account using an empirically estimated
probability weighting function.
We conducted another GSEA analysis based on genes exhibiting
differential exon usage in the female vs. male contrast. A generalized
log-linear model was fit to the exon count data, and log-fold
differences among exons in each multi-exon gene were assessed using thediffSplice function in the limma package in R
(Ritchie et al. 2015). Differences in expression among exons were
first assessed using t-tests, and then the Simes method was used to
provide a multiple testing correction for a gene-wise measure of
significance. We used the Simes p-value with a significance
threshold of 0.01 to generate a gene list for GSEA analysis, and the
background list included all multi-exon genes.