Introduction
Our understanding of host-associated microbiomes has relied heavily upon 16S rRNA gene amplicon sequencing of bacterial communities . Amplicon-based profiling of the microbiome is affordable and supported by a suite of accessible bioinformatic and analytical tools. But despite their popularity, amplicon sequencing data: 1) are biased by 16S rRNA gene copy number variation, 2) often only resolve to the level of genus and 3) do not provide direct estimates of microbiome functional potential . Hence, our ability to interpret the causes and consequences of microbiome variation using 16S data is too often relegated to the realm of speculation based on coarse taxonomic profiles.
In response to the limitations inherent in 16S approaches, there is a growing interest in shotgun metagenomic sequencing. By randomly sequencing the entire microbial genomic content of a sample, shotgun sequencing can reconstruct microbial communities at finer resolutions than amplicon sequencing, and provide direct estimates of microbiome functional potential from microbial gene contents . However, the library preparations and deep sequencing (10→100 million of read-pairs per sample) required to generate shotgun metagenomic data can be an order of magnitude more expensive than amplicon sequencing on a per sample basis . These expenses make shotgun metagenomic sequencing infeasible for large sample-sets .
Recent reductions in the per-base-cost of DNA sequencing and the development of in-house and commercially available high-throughput library preparation techniques have made shotgun sequencing more affordable. Cost reductions notwithstanding, deep sequencing can still be prohibitively expensive for large sample-sets. However, depending on the research questions of interest, the characterization of major patterns in microbiota communities and functional profiles may not require deep sequencing.
Shotgun sequencing at lower depths than is conventional (shallow shotgun sequencing) has been proposed as a cost-effective method for characterizing microbial communities . Although not a synonymous substitution for deep shotgun sequencing, shallow shotgun sequencing can outcompete amplicon-based community characterization at comparable costs, and with the additional benefit of capturing major variation in microbial gene content. In an early test of shallow shotgun sequencing capabilities in humans (retroarticular creases, stool, sub/supragingival plaque, and tongue dorsum microbiomes), accurate species-level differential abundances were observed among taxa which occurred within samples at percent abundances as low as 0.05% of reads, in datasets rarefied to 0.5 million reads/sample. Furthermore, at depths of only 1000 reads/sample, coarse biological patterns in alpha and beta diversity were still evident
Despite its promise, shallow shotgun sequencing has limitations. For instance, a reliance on read-based profilers means that the same low sequencing depth which makes shallow shotgun sequencing economical, renders de novo assembly approaches ineffective. Therefore, shallow shotgun sequencing data cannot be used for novel gene discovery, identification of rare taxa, or to create metagenome assembled genomes (MAGs) using single sample assemblies. An inability to use de novo assembly approaches means that shallow shotgun reads can only be classified if they match references within genome databases. Publicly available microbial genomes are heavily biased towards isolates or MAGs from humans, and lab or production animals Therefore, the utility of shallow shotgun sequencing needs to be assessed for host-associated microbial communities which are likely to be underrepresented within genome databases.
Here, we evaluate the ability of shallow shotgun sequencing to characterize the taxonomic composition and functional potential of the fecal microbiome of a free-ranging horse (Equus ferus caballus ) population living on Sable Island, Nova Scotia, Canada. Although horses have been the subject of many 16S rRNA gene amplicon studies —including Sable Island horses —they have not benefitted from deep shotgun metagenomic studies. As a prevalent domesticated mammal, the major bacterial clades observed in horses might be similar to those observed in other domesticated species, which also originate from a human agricultural environment, but which have been the subject of deeply sequenced metagenomic studies, for example: cows , pigs , sheep , and chickens . Therefore, bacterial species unique to Sable Island horses are likely to have close relatives in available microbial genome reference databases, and so may be suitable for shallow shotgun sequencing.
First, to determine the depth at which shallow shotgun sequencing remains viable, we analyzed a successively rarefied deeply sequenced dataset of 16 fecal microbiome samples. Second, to validate the efficacy of more affordable library preparation methods, we compared sequencing results generated using prevailing library preparation methods (Illumina Nextera XT), to those created using a new high-throughput technique (iGenomx Riptide, now Twist Biosciences Riptide). Third, we compared shallow shotgun sequencing data to 16S rRNA gene amplicon sequencing of the same DNA extracts, to quantify the concordance between amplicon and shallow shotgun metagenomic based estimates of microbiota community structure. Fourth, using an expanded 83-sample dataset, we also tested whether biological patterns in the microbiome—which were first observed in a 16S rRNA gene amplicon dataset (e.g., diet effects and spatial structuring—could be replicated via shallow shotgun sequencing of the same samples. Fifth, we re-analyze this 83-sample dataset using profiles of microbiome functional potential derived from shallow shotgun sequencing, to evaluate the purported advantage of a shallow shotgun sequencing approach.