Bat flies were identified to species morphologically following [31–33]. Access to comparative morphological material was limited, so we barcoded all samples using cytochrome oxidase I [34, 35] (Supporting information; NCBI GenBank accession numbers OL847352-OL847639) and confirmed that individual flies identified morphologically as conspecifics belonged to the same genetic clade.
Environmental variables were measured using ArcGIS 10.1 and Fragstats 3.1 using forest cover maps from the Instituto Brasileiro de Geografia and SOS Mata Atlântica (www.sosmataatlantica.com.br; Table S1; [36, 37]. Habitat fragment area (hectares), isolation (shortest distance between a fragment and its nearest neighbor), distance from source (shortest straight line distance from focal point of a fragment to nearest point of REGUA), perimeter-area ratio, proximity index within a 500m and 1000m buffer [36], and forest cover within a 500m and 1000m buffer were calculated. REGUA was treated as the source because it is the largest, most biodiverse patch of forest in the study region. Perimeter-area ratio, proximity index, and forest cover were correlated with habitat patch area, isolation, and distance from source, so only these latter three landscape variables were used for downstream analyses. Area, isolation, and distance from source were log2 transformed to prevent extremely large or extremely isolated fragments from unduly impacting correlation analyses.
DNA extraction and 16S rRNA metabarcoding
DNA was extracted from 288 bat flies following a wash step and proteinase K digestion using the ZymoBIOMICS™ DNA Miniprep Kit (Zymo Research, Irvine, CA, USA) in a Biosafety Cabinet, Class 2 (Supporting information). One negative control was used for each extraction kit to control for laboratory and kit contamination. Negative controls were pooled for amplification. Extracted DNA was aliquoted into 96-well plates for amplification of the hypervariable region 4 (V4) of 16S rRNA following well-documented procedures outlined by the Earth Microbiome Project and the Illumina 16S Metagenomic Sequencing Library Preparation guidelines (Supporting information) [38–41]. Of 288 initial libraries, 77 libraries required an additional concentration step to reach the minimum 2nM concentration required for sequencing (Supporting information). Low concentration and high concentration libraries were pooled to reach a concentration of 2nM and sequenced using an Illumina MiSeq v3 Reagent Kit with 2x300bp reads and 18% PhiX spike-in on a MiSeq NGS platform (Illumina, San Diego, CA, USA) at the Bioinformatics and Computational Genomics Laboratory (Hunter College, City University of New York, New York, NY, USA).
De-mulitplexing, Quality Filtering, and Phylogeny Reconstruction
Samples were demultiplexed using the MiSeq Reporter Generate FASTQ workflow. Primer sequences were trimmed from forward and reverse sequence reads using cutadapt v.1.4.2 [42]. Following de-multiplexing, samples were processed using the QIIME2 v.2018.2 pipeline (https://docs.qiime2.org/2018.2) [43–47]. The GreenGenes Database, v.13.5, trimmed to only the 16S rRNA V4 region, was used as a reference to train a naïve Bayes q2-feature-classifier for taxonomic identification of amplicon sequence variants (ASVs) [48]. The GreenGenes Database is not able to discern between “Candidatus Aschnera,” the primary symbiont of some nycteribiid flies [49, 50], and Arsenophonus, possibly because the V4 fragment of 16S rRNA does not provide enough resolution. To confirm that ASVs of “Candidatus Aschnera chinzeii” were not misidentified, we mapped ASVs identified as Arsenophonus and “Candidatus Phlomobacter” against reference sequences from the Silva Ribosomal Database (Supporting information; [51–53]. Even though the GreenGenes and Silva Databases distinguish between “Candidatus Phlomobacter” and Arsenophonus, phylogenetic evidence indicates that “Candidatus Phlomobacter” is actually a clade nested within Arsenophonus [54, 55].
Contamination is ubiquitous in microbiome studies and especially problematic for low biomass samples [56–58]. To reduce the impact of contaminants, several filtering steps were performed (Supporting information). Briefly, we removed bacteria present in negative controls that were likely contaminants (i.e., low relative abundance and low prevalence in samples); known laboratory contaminants [58]; bacteria classified as mitochondria, chloroplast, or Archaea; bacteria unclassified beyond phylum; and contaminants identified by the R package decontam v1.14.0 [60, 61]. Finally, the data were filtered by two coverage depths: 1) all ASVs present at <0.01% relative abundance within a sample were eliminated from that sample, and 2) all ASVs present at <0.1% relative abundance were eliminated from that sample. At a minimum within-sample relative abundance of 0.01%, spurious ASVs may remain in the dataset, but at a minimum relative abundance of 0.1%, rare ASVs may be incorrectly excluded [62–64].
As low concentration libraries were used to dilute high concentration libraries prior to sequencing, sequencing effort across samples is not even. To assess the bacterial diversity captured by low concentration libraries compared to high concentration libraries, the ggrare function was used from the phyloseq-extended suite of tools, which wraps the function rarefy from the package vegan v.2.5.4 (https://github.com/mahendra-mariadassou/phyloseq-extended/blob/master/R/graphical_methods.R) [65].
Data visualization, Ordination, and PERMANOVA
We constructed bar plots of the relative abundance of bacterial genera in each bat fly species (ggplot2, v.3.1.0) [66, 67]. All genera with a relative abundance <1% of the total reads in a bat fly species were condensed into a “Low Abundance” group. Principal coordinates analysis (PCoA), implemented in phyloseq v.1.38.0, was used to visualize differences in microbial communities captured by Euclidean distance between phylogenetic isometric log-ratio-transformed relative abundances to account for the compositional nature of metabarcoding data (philr v.1.20.0, [66–71]; Supporting information).
To test whether landscape variables, parasite variables, or host bat variables were correlated with microbial community composition (i.e., ASV composition and relative abundance), we used PERMANOVA on individual variables and Sequential (Type I) sum of squares on pairs of variables, each with 9,999 permutations (adonis command in the R package vegan) [72, 73]. Homogeneity of dispersion of each group of microbiomes was confirmed using betadisper permuted 999 times with permutest (R package vegan).
When sampling is uneven, sequential sum of squares is sensitive to the order in which variables appear in the equation. To overcome this limitation, we examined the impact of landscape within the four most well-sampled bat fly species, excluding the southern sites and only considering data filtered using a threshold of 0.01% relative abundance per sample. We ordinated samples within these species separately from the rest of the data and estimated variation explained by landscape variables using PERMANOVA on individual variables.
To examine the impact of habitat patch area and isolation on taxon richness, we constructed boxplots of ASV richness by sampling site, mimicking standard island biogeography plots of richness by area, isolation, and distance to the source. We used a Kruskal-Wallis test to assess whether mean richness was significantly different among sampling sites. Spearman correlation was used to examine ASV richness across continuous ranges of ranked area, isolation, and distance from a source.
Network Reconstruction and Analysis
Bacterial association networks are a way to visualize correlated changes in relative abundance of bacteria across a given sample set. These networks examine microbiomes at the taxon scale, providing a valuable additional analysis to ordination and PERMANOVA, which compare microbiomes at the community scale. While networks provide a useful tool for further examination of microbiomes, the limitations of networks are still being explored and interpretation of networks should be mindful of known caveats (e.g., bias caused by differing sample sizes). Here, we used networks to explore whether the environment or parasite influence microbiome community structure, taking care to examine the role of sample size in network inference. If the environment drives microbiome composition, then we expect that networks will exhibit conserved changes in structure as habitat patches decrease in area or increase in isolation. If parasite species identity impacts microbiome composition, we expect that network structure will be specific to parasite species but will not vary with habitat patch area or isolation.
Networks were reconstructed using SPIEC-EASI v.1.1.2 using the Meinshausen and Bühlmann method [74]. Nodes indicate individual bacterial ASVs and edges (i.e., connections between nodes) indicate a linear relationship in the abundances of linked nodes. SPIEC-EASI transforms the relative abundance of each ASV using centered log-ratios and then estimates an inverse covariance matrix by solving a regularized linear regression for each node to determine its conditional independence within the graph [74]. Two nodes are conditionally independent when their abundances are statistically independent given the abundances of all other nodes in the network. The l tuning parameter is used to penalize linear regressions and controls the sparsity of the final network. To select the sparsity of the final network, SPIEC-EASI builds graphs from repeated subsamples of the data (i.e., Stability Approach to Regularization Selection, [75] and selects the l value that yields the greatest stability of edge incidences across subsampled graphs. The SPIEC-EASI method is distinct from network estimations based on correlation or covariance, which rely on pairwise comparisons and may incorrectly infer an edge between indirectly linked nodes by ignoring the influence of other nodes in the network.
Two types of networks were reconstructed: habitat patch networks estimated from all samples within a single sampling site (1 network per site), and species-specific networks estimated from well-represented parasite species clustered by their occurrence within or outside of REGUA (1 network of within-REGUA samples and 1 network of outside-REGUA samples for each parasite species; Supplemental information). The lambda.min.ratio parameter was adjusted until network stability was within 0.002 (habitat patch networks) or 0.003 (species-specific networks) of the target 0.05 threshold, then nlambda was set to either 20, 30, or 50 [74, 75]. To examine the impact of sample size on habitat patch networks, we leveraged the subsampling scheme within SpiecEasi to test whether the proportion of highly confident edges (i.e., edges present in at least 80% of sampled networks) was correlated with sample size using the getOptMerge function.
Characteristics of networks provide information about the robustness of a community and the function of members of a community [76]. Leading eigenvector modularity of each network and betweenness centrality of each node were estimated in the R package igraph, v.1.2.11 [77–80]. Modularity is a measure of the structure of a network, where higher modularity indicates nodes are grouped into tightly interacting neighborhoods with few interactions occurring outside of this neighborhood [81]. Leading eigenvector modularity is an optimization method of community detection with known limitations, despite being widely used [82]. It is unclear how this method performs on sparse, disconnected networks, but modularity estimates may be noisy or difficult to resolve [83]. Betweenness centrality measures the number of times the shortest path between all pairs of nodes in the network travel through a given node, giving an estimate of the influence of a node on the structure of the network [81]. Network connectance is the proportion of realized edges relative to the total possible number of edges [81]. We examined the correlation of modularity, betweenness centrality, and connectance with landscape variables and used Mann-Whitney U and Kruskal-Wallis tests to test for significant differentiation.
Modularity and betweenness centrality are impacted by network size (i.e., number of edges) and shape (degree distribution; [81], making comparisons of summary statistics between networks inaccurate. To account for variation in network size and shape in habitat patch networks, we used several standardization techniques to compare networks: 1) centered modularity compared to mean modularity of patch-specific null distribution; 2) Z-score modularity compared to a patch-specific null distribution; 3) Z-score modularity compared to the mean modularity of measured networks (see Supporting information for detail on null distribution).
As a size-independent method of examining variation between networks, we used the graphlet correlation distance from [84, 85] to ordinate the networks as individual points on a plot using the R packages pulsar v.0.3.5 and orca v.1.1-1 [86, 87]. The graphlet correlation distance breaks up a network into up-to 4-node graphlets (i.e., all the possible ways that up to 4 nodes can be wired) and counts the number of times each node plays a specific role within a graphlet (i.e., orbit). For example, within a 3-node graphlet where two “leaf” nodes interact with a central node but not each other (i.e., a line of 3 nodes), the “leaf” nodes belong to one orbit and the central node belongs to the second orbit in this graphlet. Using these orbit counts, we estimate the Spearman correlation of orbits across all nodes. Following [88, 89], the Euclidean distance between these correlations can be used to ordinate the networks and more clearly visualize their differentiation. We used k-means clustering to distinguish groups of networks in ordination space.
Results
Of 228 prepared DNA libraries 222 libraries were used for downstream analysis following quality filtering (NCBI SRX13352735-13352962; BioProject PRJNA786937). Filtered libraries ranged in sequencing depth from 2,983 to 66,164 reads. A total of 1,155 ASVs were detected when a 0.01% filtering threshold was applied, while 526 ASVs were found under a 0.1% filtering threshold. Rarefaction curves showed a plateaued asymptote for each library and low concentration libraries fell within the range of ASVs detected in high concentration libraries generated from the same parasite species (Figures S2 and S3).
Composition of sampled bat fly microbiomes
Plots of relative abundance of bacterial genera showed a stark difference between the microbiome communities in the parasite families Nycteribiidae and Streblidae (Figure 2A; Figure S4). While nycteribiid bat flies had high relative abundances of Wolbachia and Bartonella, streblid bat flies were dominated by Arsenophonus. We did not detect “Candidatus Aschnera chinzeii” and Arsenophonus (including “Candidatus Phlomobacter”) was detected in low relative abundance in nycteribiid bat flies. Almost no Wolbachia was detected in streblid bat flies. Bartonella was present in some streblid bat flies, but at much lower relative abundance than in nycteribiid bat flies. Mycoplasma was also detected at higher relative abundances in streblid bat flies compared to nycteribiid flies. The flies in southern fragments were dominated by Wolbachia and Bartonella, likely due to the abundance of nycteribiid flies in these fragments (Figure S5).