2.4. Taxonomic and functional profiling
The sequencing reads were first quality filtered using
Trimmomatic13 v0.36 and PRINSEQ14v0.20.4. Human reads were removed using KneadData v0.6.1
(https://bitbucket.org/biobakery/kneaddata). High quality non-human
reads were mapped against a custom database using
Kraken215 v2.0.9. A total of 29,943 complete microbial
genomes were downloaded on 3 May 2020, of which 19,362 were bacterial,
368 were archaeal, 9,346 were viral, and 867 were fungal. The complete
bacterial, archaeal, and viral genomes were downloaded from RefSeq
database using the –download-library option of kraken2-build. The
complete fungal genomes were manually downloaded from GenBank database.
The results of taxonomic classification were filtered using a confidence
score of 0.20. Only species with more than 10 reads in at least one
sample were retained. The species profiles were decontaminated (see
below) and the reads derived from non-contaminant species were used for
functional analysis. Functional profiling was performed using
HUMAnN216 v0.11.1. The abundance profiles of gene
families (UniRef90s) were summarized to the abundance of KEGG orthology
(KO), Enzyme Commission (EC) gene families, EggNOG clusters of
orthologous groups, and Pfam protein families, respectively. In
addition, the non-contaminant reads were mapped against the protein
homolog sequences of the antimicrobial resistance genes in CARD
database17 (May 2020 release) using
DIAMOND18 v0.9.22.123.