RNA-seq libraries and analysis.
Total RNA was extracted and qualified and sent to Berry Genomics
(Beijing, China) for transcriptome sequencing. After sequencing, obtain
a binary file in fastqc.gz format, then using Fastqc (version 0.11.7,https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
software for raw data quality assessment. Trim-galore (version 0.6.6,https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/,
parameters: paired, –quality 25 –length 36, stringency 3)
software was used to remove splices and low-quality reads to obtain
clean reads with quality scores greater than Q20. Then, the human genome
HG38
(https://hgdownload.soe.ucsc.edu/downloads.html#hg38sequence)
and its annotation file
(https://hgdownload.soe.ucsc.edu/downloads.html#hg38annotations)
were downloaded from UCSC website
(http://www.genome.ucsc.edu/),
which was used to construct human genome index by HISAT2 software
(hisat2, version 2.2.1,http://daehwankimlab.github.io/hisat2/
; parameters: default parameters) and then aligned to the reference
genome. Samtools software (version 1.9,https://sourceforge.net/projects/samtools/files/samtools/1.9/;
parameters : -b -S -h) was used to transform Sam files into BAM files
and sort them. After sorting, The featureCounts program in the subread
package (version 2.0.1,http://subread.sourceforge.net/
; parameters: -t exon -g gene_id,) was used for gene level
quantification. The results were expressed by FPKM. After obtaining the
count matrix, the R package DEseq2 (version 1.24.0,http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html)
based on the negative binomial distribution was used for the difference
analysis.
The fold change (FC) thresholds for identifying DEGs were |
Log2FC | >1, and a quality score greater than Q20,
i.e., false discovery rate (FDR) less than 0.05.and visualized the
results by R package ggplot2 [34].