Modelling
For modeling analyses, we used the normalized BGF frequency dataset described above. Due to the smaller number of studies and hence larger numbers of missing data from sampling sites at Brazil’s northernmost and southernmost locations, those sites were excluded from the analysis. The resulting spatial range included in the models varied between central Pará (Belém: 01° 24’ S 48° 29’ W) and south of Rio Grande do Sul (Patos lagoon inlet: 31° 22’ S). This encompassed 201 of the 211 sites included in the original dataset. A general linear model (lm ) was initially used to test for the presence of significant non-random differences in the frequency distribution of BGF concordances across latitudes along the Brazilian coast. We used the glm function available in the package stats implemented in R (R Core Team, 2020) with the default parameters. Initial data exploration showed that the response of frequency distribution of BGF concordances to latitude was non-linear. Then, generalized additive modelling (gam ) was used with latitude as a smooth effect. First, we modelled all data combined and then individually each taxa with the greatest number of available data (fishes, crustaceans, mollusks and cnidarians). We used the gam subroutine available in the mgcv package (Wood, 2017) implemented in R. We fitted the gam ’s using Gaussian distributions including for logistic regression. We also used a smooth term with a cubic regression or cyclic cubic regression spline (for fishes and crustacean data) to represent latitudinal variation. Severalgam models were tested varying the smooth terms parameters:fx (fix the degrees of freedom on a regression spline model), theK (dimension of the basis used to represent the smooth term) and the bs (smoothing term) parameters. The optimal model was selected using the gam.check tool (Wood, 2017). Model validation was assessed using the generalized cross-validation (GCV) index, the Unbiased Risk Estimation (UBRE), the ”starry-sky ” patterns in the residuals versus linear prediction graph, and a the linear relationship between response parameter and the fitted graph. The Effective Degree of Freedom (EDF) was used to test if gam was a valid method to analyze our datasets (i.e. EDF > 2). Model quality was also assessed by comparing adjusted R2 and the explained deviance.
Differences between average values of BGF frequencies among phylogeographyc regions identified in the gam results were using analysis of variance (ANOVA) with type-3 sum of squares. Normality and homogeneity assumptions of variances were tested using Shapiro-Wilk and modified robust Brown-Forsythe Levene-type test, respectively (Zar, 1999). Tuckey was used as the post hoc test. Analyses were done in R (R Core Team, 2020) using aov function in car package, and adopting an alpha of 0.05.