Testing the influence of ecoregion-scale variables, ecoregion identity, and spatial autocorrelation
We averaged species-level tip-based metrics across species of an assemblage to obtain tip-based metrics at the level of ecological assemblage (hereafter: aTR, aST, aLT) and run hypothesis test (Fig. 1).
We estimated the effect of ecoregion-scale variables on each assemblage-level tip-based metric using linear mixed models (LMM, Pinheiro and Bates 2000). Linear mixed models are a class of models that allow estimating the effect of grouping factors describing the study design (random effect), of spatial autocorrelation (as an error term), and of interesting ecological processes (as a fixed effect, Table 1) when modeling variation in aTR, aST, and aLT. Here ecoregion identity was considered as random effect in LMM analysis as they were part of the sampling design, and differences in shape and convolutedness could mask differences between cores and ecotones.
We identified high spatial autocorrelation (Moran’s I >0.5, P<0.001) for all tip-based metrics analyzed here. We then looked for spatial autocorrelation in residuals of our LMM models with either aTR, aST, or aLT as response variables, ecoregion-scale variables as fixed effects (Table 1), and ecoregion identity as a random effect. Spatial autocorrelation was incorporated in the model through an exponential correlation structure with nugget effect based on the latitude and longitude values of each point. We used exponential structure with nugget effect because the variograms generally showed a highly stepped decrease in spatial autocorrelation, mainly between very close points. Comparisons of models with and without nugget effect generally supported the model with nugget effect (Table S2).
To account for phylogenetic uncertainty on tip-based metrics we ran one LMM analysis per estimate of aTR, aST, and aLT. We accounted for phylogenetic uncertainty using a randomly subsampled set of 2,000 of the 10,000 estimated values, due to computational limitations when estimating fixed, random, and spatial parameters for the whole dataset of estimates. Thus, uncertainty on random effect (standard deviation, σ), spatial autocorrelation (range, r and nugget, n ), and fixed effect (regression intercept, and regression coefficient of each variable) were represented by the standard deviation calculated across estimates from the 2,000 models. The LMM intercept represents the average tip-based metric when quantitative variables are at their average (i.e., zero in the standardized scale), and factors are at their first level of contrast (Table 1). The regression coefficient of each variable represents the number of standard deviations from the intercept: the larger the coefficient, the stronger the effect of a variable on the response variable (Schielzeth 2010). We used density plots to represent and infer the effect of ecoregion-scale variables because these plots can show the most likely average parameter value and effect size, as suggested by most of phylogenies. Boxplots in the margins represent the average, first and third quartiles of the distribution of parameter estimates across the 2,000 models.