De-mulitplexing, Quality Filtering, and Phylogeny Reconstruction
Samples were demultiplexed using the MiSeq Reporter Generate FASTQ
workflow. Primer sequences were trimmed from forward and reverse
sequence reads using cutadapt v.1.4.2 (Martin, 2011). Following
de-multiplexing, samples were processed using the QIIME2 v.2018.2
pipeline (https://docs.qiime2.org/2018.2/). DADA2 was used to filter out
PhiX reads and chimeras, truncate the length of reads (forward reads cut
at 200bp, reverse reads cut at 180bp), and cluster reads into unique
amplicon sequence variants (ASVs) corrected for Illumina sequencing
errors (Callahan et al., 2016). Reads were aligned using the MAFFT
plugin in QIIME2 (FFT-NS-i; Katoh et al., 2002, 2005; Katoh & Toh,
2007). Default parameters were used to mask highly variable regions of
the alignment and reconstruct a phylogeny using the FastTree2 plugin
(Price et al., 2010), which was midpoint-rooted. The GreenGenes
Database, v.13.5, trimmed to only the 16S rRNA V4 region, was used as a
reference to train a naïve Bayes q2-feature-classifier, which was used
to taxonomically identify ASVs in our data (DeSantis et al., 2006).
Contamination is ubiquitous in microbiome studies and especially
problematic for low biomass samples (Eisenhofer et al., 2019; Salter et
al., 2014; S. Weiss et al., 2014). To reduce the impact of contaminants,
several filtering steps were performed. First, any bacterial taxon
detected in the negative controls was removed from all other samples,
with the exception of Arsenophonus . This genus of bacteria
contains known symbionts of insects (Nováková et al., 2009) and is
expected to be associated with bat flies (Wilkinson et al., 2016). AsArsenophonus is highly abundant in the samples sequenced for this
study, it may be that its detection in the extraction control (0.7% of
3,524 total reads) and PCR control (55% of 63 total reads) is due to
index bleed, a known issue when multiplexing samples (Eisenhofer et al.,
2019; Kircher et al., 2012; Mitra et al., 2015), and it is not treated
as a contaminant here. Next, bacterial genera were removed that are
known laboratory contaminants (Eisenhofer et al., 2019), as were reads
that were classified as being derived from mitochondria, chloroplast, or
Archaea, or those that could not be classified beyond phylum. Data were
exported from QIIME2 and reformatted for import into the R packagephyloseq v.1.26.1 (McMurdie & Holmes, 2012, 2013) for further
decontamination and all downstream analyses. We used the R packagedecontam to identify ASVs whose frequency is inversely correlated
with initial library concentration (Davis et al., 2018). Nine additional
ASVs were identified as potential contaminants and eliminated from the
dataset. Arsenophonus was not identified as a contaminant bydecontam . Finally, the data were filtered by two coverage depths:
1) all ASVs present in a sample at <0.01% relative abundance
were eliminated from that sample, and 2) all ASVs present at
<0.1% relative abundance were eliminated from that sample. At
a minimum relative abundance of 0.01%, spurious ASVs may remain in the
dataset, but at a minimum relative abundance of 0.1%, rare ASVs may be
incorrectly excluded (Alberdi et al., 2018; Bokulich et al., 2013).
Analyses were performed on both datasets.
As low concentration libraries were used to dilute high concentration
libraries prior to sequencing, sequencing effort across samples is not
even. Low concentration samples may be more prone to contamination due
to their low biomass or may not receive adequate sequencing depth to
describe the diversity of the bacterial community. To assess the
bacterial diversity captured by low concentration libraries compared to
high concentration libraries, the ggrare function was used from
the phyloseq-extended suite of tools, which wraps the functionrarefy from the package vegan v.2.5.4 while maintaining
the phyloseq data structure
(https://github.com/mahendra-mariadassou/phyloseq-extended/blob/master/R/graphical_methods.R ;
(Oksanen et al., 2010).