Modelling
For modeling analyses, we used the normalized BGF frequency dataset
described above. Due to the smaller number of studies and hence larger
numbers of missing data from sampling sites at Brazil’s northernmost and
southernmost locations, those sites were excluded from the analysis. The
resulting spatial range included in the models varied between central
Pará (Belém: 01° 24’ S 48° 29’ W) and south of Rio Grande do Sul (Patos
lagoon inlet: 31° 22’ S). This encompassed 201 of the 211 sites included
in the original dataset. A general linear model (lm ) was
initially used to test for the presence of significant non-random
differences in the frequency distribution of BGF concordances across
latitudes along the Brazilian coast. We used the glm function
available in the package stats implemented in R (R Core Team,
2020) with the default parameters. Initial data exploration showed that
the response of frequency distribution of BGF concordances to latitude
was non-linear. Then, generalized additive modelling (gam ) was
used with latitude as a smooth effect. First, we modelled all data
combined and then individually each taxa with the greatest number of
available data (fishes, crustaceans, mollusks and cnidarians). We used
the gam subroutine available in the mgcv package (Wood,
2017) implemented in R. We fitted the gam ’s using Gaussian
distributions including for logistic regression. We also used a smooth
term with a cubic regression or cyclic cubic regression spline (for
fishes and crustacean data) to represent latitudinal variation. Severalgam models were tested varying the smooth terms parameters:fx (fix the degrees of freedom on a regression spline model), theK (dimension of the basis used to represent the smooth term) and
the bs (smoothing term) parameters. The optimal model was
selected using the gam.check tool (Wood, 2017). Model validation
was assessed using the generalized cross-validation (GCV) index, the
Unbiased Risk Estimation (UBRE), the ”starry-sky ” patterns in the
residuals versus linear prediction graph, and a the linear relationship
between response parameter and the fitted graph. The Effective Degree of
Freedom (EDF) was used to test if gam was a valid method to
analyze our datasets (i.e. EDF > 2). Model quality was also
assessed by comparing adjusted R2 and the explained
deviance.
Differences between average values of BGF frequencies among
phylogeographyc regions identified in the gam results were using
analysis of variance (ANOVA) with type-3 sum of squares. Normality and
homogeneity assumptions of variances were tested using Shapiro-Wilk and
modified robust Brown-Forsythe Levene-type test, respectively (Zar,
1999). Tuckey was used as the post hoc test. Analyses were done in R (R
Core Team, 2020) using aov function in car package, and
adopting an alpha of 0.05.