Comparisons of 16S amplicon versus shallow shotgun metagenomic
sequencing
Despite the methodological biases inherent in amplicon versus shotgun
metagenomic sequencing approaches, these methods yielded similar
biological patterns. In a 13-sample set of DNA extracts sequenced using
both amplicon and shallow shotgun metagenomic methods (both rarefied to
a depth of 35K bacterial read pairs) we observed a positive correlation
in Shannon diversity. Bray-Curtis dissimilarities in these sequencing
datasets were highly correlated, but declined at finer taxonomic
resolutions, and were uncorrelated at the ASV/species level. This is
unsurprising, since ASVs are binned based on sequence similarities,
while classifications of shotgun metagenomics reads are constrained to
the taxonomic demarcations present in reference databases. The modest
correlation in genus-level Bray-Curtis dissimilarities could arise from
an inability to classify amplicon reads to this taxonomic level. Points
of discrepancy in Bray-Curtis dissimilarities may also derive from
differences between taxonomies within reference databases, rather than
biases in sequencing methods. For example, an abundant genus in the
horse microbiome, Oscillibacter , is classified within the familyRuminococcaceae in the 16S Silva database , but within the familyOscillospiraceae in the NCBI non-redundant database.
Bacterial family average relative abundance estimates were positively
correlated between sequencing methods. But, abundant families in the
Sable Island horse microbiome (Lachnospiraceae ,Ruminococcaceae , Prevotellaceae , Spirochaetaceae ,Rikenellaceae ) tended to be over-represented in the amplicon
dataset compared to the shotgun dataset. Amplicon sequencing results are
biased by 16S rRNA gene copy number and primer design, while shotgun
metagenomic estimates are biased by genome size, or in the context of
Kaiju, the size of gene-coding regions. For example, the weakest
correlations in relative abundance estimates were observed inRuminococcaceae and Lachnospiraceae ; families known to
possess large variation in 16S rRNA copy number . Conversely, the
strongest correlation was observed between estimates ofFibrobacteraceae relative abundance, a narrow clade with low 16S
rRNA copy number variation .
Many taxa of moderate relative abundance in the shotgun metagenomic
dataset were either absent, or present at lower-than-expected values in
the amplicon dataset. Additionally, we observed a clear bi-modal
distribution of prevalence in the amplicon dataset, wherein families
were either present in nearly all samples, or very few samples (Figure
S4). This could suggest that amplicon-based sequencing under-represents
some abundant bacterial clades, perhaps due to primer biases or 16S copy
number variation . The discrepancies we observed between shallow shotgun
metagenomic and 16S amplicon sequencing data are qualitatively similar
to previous evaluations of shallow shotgun sequencing , but determining
whether shotgun metagenomic or 16S amplicon data more accurately
estimates microbiome features requires communities of known composition.
Kaiju-based classification of shotgun metagenomic reads is reported to
more accurately estimate taxon relative abundances than 16S amplicon
sequencing . Similarly, found that shallow shotgun sequencing more
accurately recapitulated communities of a known composition than 16S
amplicon data. These previous benchmarking studies, and similarities in
the biological patterns described by our shotgun metagenomic and
amplicon datasets lead us to conclude that shallow shotgun sequencing
provides a suitable, if not superior, substitute for 16S rRNA gene
amplicon-based characterization of the microbiome.