Model complexity
Overall, our results showed 5xH was the most common RM-FC combinations for fish which agreed well with the earlier findings of optimal models with higher RM values combined to more complex FCs like H as better model setting for species with small occurrence data (Shcheglovitova and Anderson 2013, Galante et al. 2018). While, for the odonates the most common RM-FC combination was 5xL which also agrees with the need to use RM values greater than the default setting (Galante et al., 2018). However, our result also showed other optimal models had different RM-FC combinations for both fish and odonates (Figure 2a-b). Further, our results also showed poor agreement among the five model selection approaches with regard to RM-FC combinations of the optimal models selected by them for both the fish and odonate. The EXP approach had the highest agreement with ORTEST_PER for fish, while for odonates the EXP approach agreed the most with AUCDIFF_PER and ORTEST_BAL. When these findings are considered together, they may suggest the need for taxon-specific model tuning, thus agreeing with the recognized need for model tuning for the specific species in a given study (Galante et al., 2018).
Our results showed that all approaches chose models with percentile ORs equal to or greater than the theoretically expected 10% suggesting generally overfit models (Galante et al., 2018). But other studies have also found percentile ORs greater than 10% (see Muscarella et al. 2014, Radosavljevic and Anderson 2014, Galante et al. 2018) when a small number of occurrences were used. However, the ORTEST and EXP approaches not only chose a larger number of optimal models with larger percentile ORs over that by AUCDIFF approaches, but they also chose models with larger AUCDIFF and a larger number of parameters for both fish and odonates. These findings suggest the ORTEST and EXP approaches might have selected overfit and over-parameterized optimal models (Muscarella et al. 2014, Radosavljevic and Anderson 2014, Galante et al., 2018). However, an earlier study found over-parameterization a lesser issue than under-parameterization (Warren and Seifert 2011).
Our use of a relaxed balance threshold to first generate binary suitable/unsuitable habitat area and then choosing the EXP optimal model might have overcome model overfitting and under predicting for EXP approach (Pearson et al., 2007, Radosavljevic and Anderson 2014). Whereas, models chosen solely based on smaller percentile ORs, AUCDIFF and number of parameters may choose overly relaxed (over predicting) models (Galante et al., 2019) which can be aggravated by our use of a relaxed balance threshold to generate the binary habitat map. Therefore, though AUCDIFF approaches chose the optimal models with small AUCDIFF and smaller number of parameters over ORTEST and EXP approaches, in our context these optimal models are not necessarily the best. Further, optimal models chosen by AUCDIFF approaches had comparatively lower AUCTEST values over the EXP approach followed by ORTEST approaches. Therefore, AUCDIFF approaches might have chosen a greater number of over predicting optimal models with lower model discriminatory power (Warren and Seifert 2011).