Species Distribution Model
To characterize the realized spatial niche for the two groups of recruits, we used Maxent, a SDM that uses presence-background data to estimate species’ responses given a set of environmental variables based on maximum entropy algorithms (Phillips et al. 2006; Phillips & Dudík 2008). Maxent consistently out-competes other classical, presence-only and machine learning techniques (Elith et al. 2006, Gastón and García-Viñas 2011); it performs well for small sample sizes (Wiszet al. 2008; Shcheglovitova & Anderson 2013), and is one of the most widely used algorithms in species distribution and ecological niche modelling (Melo-Merino et al. 2020).
We selected the presence (environmental predictors of recruit locations) and background data (environmental predictors randomly sampled across the reefscape) using the dense point clouds and the polygon meshes derived from them. We calculated slope, roughness, TEI, substrate type and benthic cover in each dense point cloud. Then, we overlapped each point cloud to its respective mesh and assigned each attribute to each polygon of the mesh. In the case of invertebrate and substratum type descriptors, the value for each polygon corresponded to the percentage of points from the dense point cloud classified as each descriptor. Next, we used polygons analogous to pixels within a raster in Maxent. Ten recruits were surrounded by another invertebrate and the percentage of points classified as that particular invertebrate reached 100%. However, octocorals and scleractinian recruits are not able to settle on other live benthic invertebrates. Thus, to prevent the models from identifying the surface of invertebrates as suitable habitat for recruitment, we adjusted those percentages from 100% to 90% to account for the small areas adjacent to the benthic invertebrates where the recruit settled.
The use of our fine-grain models (sub-cm resolution) presented a challenge in that our target organisms were approximately the same size as the faces of the mesh, and thus could be ‘seen’ in the environmental data itself. To obtain the value for the predictors associated with each presence, we delimited a 1-cm area around each recruit, and calculated the median value of each predictor within that area. Next, within that area, we selected the polygon of the mesh with the closest value to that median. This essentially assigns the occurrence location to the most environmentally similar neighboring location. This step would be unnecessary when applying this procedure for larger organisms, at lower spatial resolution, and for mobile taxa. In total, we recorded the biological and geomorphological data associated with 210 octocoral and 113 scleractinian coral recruits. To minimize spatial autocorrelation, avoid the introduction of biases due to sampling at adjacent localities, and obtain a more homogeneous representation of the area of study, the background data were selected from groups of four quadrats within each of the eight transects. Within each group, we selected 30,000 random background polygons (N = 240,000) and all recruit presences.
To select the environmental predictors for model fitting, we ran a Maxent model with all the environmental variables, and measured their importance for each taxon using the permutation method included in the vip R package (Supplementary Methods Figure 2, Greenwell & Boehmke 2020). For model fitting, we included all the variables with a permutation importance > 0.01 for at least one taxon (Warren et al. 2019), and to avoid multicollinearity we performed a Spearman’s rank correlation analysis and discarded the least important variable, out of each pair of correlated variables (r > |0.6|; Supplementary Methods Figure 3, Graham 2003). The variables: % coral, % octocoral, % sponge, % calcareous rock, % igneous rock, % sand, slope, and both, roughness and TEI calculated at 10-mm and 100-mm scales were included in model fitting.
We partitioned 70% of the recruits for training the model and the remaining 30% for validating it (Hastie et al. 2001), and randomly selected 30,000 background polygons. To prevent model over-complexity and overfitting, we compared several models with different combinations of feature classes and regularization multipliers (Morales et al. 2017). In particular, we tested models with linear, product, hinge and quadratic features (Merow et al.2013); and 1, 1.5, 2, 2.5, 3, 3.5, 4, regularizations multipliers, resulting in a total of 35 combinations (Anderson & Gonzalez 2011; Shcheglovitova & Anderson 2013; Muscarella et al. 2014). The criteria for selecting optimal combinations of feature classes and regularization multipliers was based first, on a method for testing statistical significance of Ecological Niche Models (ENMs) predictions (Bohl et al. 2019) and second, on the average area under the Receiver Operating Characteristic curve (AUC).  We used random occurrences to train 100 models, and compared the AUCs from the null models to the AUC obtained from those built with the empirical occurrences. For the threshold-independent measure of overfitting we subtracted the AUC value of the training data from the AUC value of the test data (AUCDIFF), with values close to zero indicating low levels of model overfitting (Warren & Seifert 2011). After model selection, marginal response curves were used to illustrate the marginal estimate of habitat suitability, and the distribution of each environmental predictor in the occurrence data and background points. We quantified variable importance, correlation coefficients, fitted, selected and evaluated the performance of Maxent models, and generated responses curves using the R package ENMtools 1.0 (Warrenet al. 2021).