Joint species distribution models to quantify occupancy response to the different coastal areas
We used the SWARM pipeline to generate the MOTUs list in each site of the two coastal areas. From this MOTUs composition matrix, we compared the MOTUs occupancy in each area using a Hierarchical Modelling of Species Communities (HMSC; Ovaskeinen et al. 2017, Ovaskeinen and Abrego 2020): a joint species distribution model whereby latent variables help explain shared species responses to environmental variation (Warton et al. 2015). We further applied the HMSC to model the species responses from the underwater visual census (UVC) of fishes using SCUBA surveys. We applied Hierarchical Modelling of Species Communities (HMSC; Ovaskeinen et al. 2017, Ovaskeinen and Abrego 2020): a joint species distribution model whereby latent variables help explain shared species responses to environmental variation (Warton et al. 2015). The MOTU data set comprises the occurrence of 79 MOTUs in 19 samples. The UVC data set comprises the occurrence of 58 species in 32 samples. We excluded species that occurred in fewer than 5 sampling units and no more than n-2 sampling units to avoid spurious and unidentifiable environmental responses for species with few data (Ovaskeinen and Abrego 2020). For both the UVC and the MOTU, we also fitted a random effect associated with each sample to ensure latent variables (e.g., species’ associations) are fitted in HMSC (Ovaskeinen and Abrego 2020). To strictly compare with the eDNA data, we both fitted a UVC model with the same number of samples as eDNA and a second model with all 32 samples. In all models, we used the sampling unit by species matrix as the response variable (i.e., the n x ns ‘Y’ of HMSC; see Ovaskainen et al. 2017) propagated with species occurrence or absences (0 or 1). We used a probit regression in all analyses. We included a single fixed effect of the anthropic area as our species by covariate matrix (i.e., the n x nc ‘X’ of HMSC; see Ovaskainen et al. 2017). We estimated a species-specific regression parameter to contrast their occupancy in the two areas. For the MOTU data, we further fitted a transect-level random effect to control for unexplained variation amongst sampling units (e.g., 2x30L water filtrations per transect). We used the R-package ‘Hmsc’ (Tikhonov et al. 2020) to fit our model assuming default prior distributions (Ovaskeinen and Abrego 2020). We sampled the prior distribution with four Markov Chain Monte Carlo (MCMC) chains each run for 37,500 interactions of which the first 12,500 were removed as burn-in. The chains were thinned by 100 to obtain 1000 posterior samples in total. We ensured model convergence by evaluating the potential scale reduction factors (e.g., Gelman and Rubin 1992). We evaluated the explanatory power of our models for each species by comparing the observed and predicted occurrences using area under receiver-operator curve (AUC; Pearce and Ferrier 2000) and Tjur’s R2 (Tjur 2009) statistics. Due to the limited number of replicates in our study, we did not expect good predictive (out-of-sample) power and therefore only report model explanatory power (within-sample prediction).
We evaluated the proportion of MOTUs and species that exhibit positive or negative responses to anthropic areas with 95% credible intervals of coefficients non-overlapping 0, assessed the continuity of these responses across eDNA metabarcoding MOTU and UVC data sets, and we computed the phylogenetic signal of the estimated coefficients for both eDNA and UVC. We used 100 randomised phylogenetic trees previously extracted from the phylogeny of Rabosky et al. (2018) that was pruned by both taxa lists. As the taxa list extracted from the SWARM analysis is not always at the species level, we selected one species representing the genus/family detected in the eDNA table into phylogenetic trees. Then, we calculated the mean Pagel’s lambda (λ) statistic and the mean associated p value.