Bioinformatics
All shotgun metagenomic reads underwent adapter trimming, quality
control, and were filtered against the EquCab3 domestic horse reference
genome , using default parameters in kneaddata , a wrapper function for
trimmomatic and bowtie2 . Read pairs which passed quality filtering were
used to estimate microbial taxon relative abundances, via default
implementation of Kaiju . Unlike nucleotide-based profilers, Kaiju
assigns taxon identities to reads by first translating sequences into
all six reading frames, and mapping the resultant amino acid sequences
to a protein reference database. Protein coding regions are more
strongly conserved than non-coding regions, and amino acid sequences
will not be affected by synonymous substitutions of nucleotide bases.
This can allow for greater rates of classification for shotgun
metagenomic reads which come from microbiota which are underrepresented
in reference databases. Using the microbial subset of the NCBI BLAST
non-redundant protein database, we first classified reads to ‘species’,
a delineation which in actuality encompasses species, strains and
co-abundant gene groups. Reads which could not be assigned to species
were assigned to progressively coarser taxonomic levels (genus, family,
order, class, phylum, kingdom). For example, reads which could be
classified to family, but not genus or species would form a
‘familyX unclassified ’ bin in our analyses.
To estimate microbial gene contents and metabolic potential, we used
HUMAnN3 . HUMAnN3 maps quality-controlled and filtered shotgun
metagenomic reads to UniProt Reference Clusters (UniRef; . In running
HUMAnN3, we concatenated forward and reverse read files, by-passed the
taxonomic classifier option, reduced subject sequence coverage
thresholds to 0, classified reads to UniRef50 gene families, and mapped
gene families to MetaCyc reactions and metabolic pathways . The methods
used for quality-control filtering and taxonomic classification of 16S
rRNA amplicon sequence data using DADA2 are described elsewhere .