Introduction
Species distribution models (SDMs) are used increasingly in different stages of conservation decision making (Guisan et al. 2013). Among numerous SDM methods, techniques that use presence-only occurrence data have become more popular with readily available georeferenced presence data (for example from GBIF) and environmental variables (Gomez et al. 2018). Among the presence only modelling methods Maxent has become very popular (Phillips et al. 2017, Morales, Fernández, and Baca-González 2017). Maxent identifies the suitable geographic areas for species given the set of environmental variables and known occurrence records by applying a maximum entropy model (Phillips and Dudík 2008). The increased use of Maxent has been ascribed to its better performance over other methods when used with low occurrence data (Elith et al. 2011, Coxen, Frey, Carleton, and Collins 2017) and also for its ease of use through its graphical user interface (GUI) (Phillips et al. 2006, Morales et al. 2017, Kass et al. 2018).
The Maxent output can be overfit or under fit to the occurrence localities which results in models that under-predict or overpredict the suitable area respectively (Shcheglovitova and Anderson 2013). Maxent output quality depends on model complexity (Shcheglovitova and Anderson 2013, Morales et al. 2017, Phillips 2017), run type (Phillips 2017), bias in occurrence data (Phillips et al. 2009, Syfert, Smith, and Coomes 2013) and its correction methods (Kramer-Schadt et al. 2013), background selection methods (Merow, Smith, and Silander Jr 2013; Vollering, Halvorsen, Auestad, and Rydgren 2019), and output types (Phillips 2017, Phillips et al. 2017). Once other factors are taken into consideration, model complexity becomes the most important issue (Syfert, Smith, and Coomes 2013). Model complexity is controlled by the types of feature class (hereafter: ‘FC’) and the value of the regularization multiplier (hereafter: ‘RM’) used (Radosavljevic and Anderson 2014, Morales et al. 2017, Phillips 2017). The current version of Maxent v 3.4.1 has linear (L), quadratic (Q), product (P), and hinge (H) features as default FCs and threshold (T) as optional FC (Phillips et al. 2017), while default RM used is 1. Though the models builtvusing either larger or smaller RM compared with the default, are expected to be over-complex or over-simplistic respectively (Phillips et al. 2017), it was actually the models built using default RM and the FCs that were found to be either over-complex or over-simplistic (Shcheglovitova and Anderson 2013, Morales et al. 2017). Therefore, it is prudent to build multiple models using different combinations of RM values and FCs and then choose the optimal model for use in conservation decisions (Muscarella et al. 2014, Morales et al. 2017, Phillips et al. 2017, Galante et al. 2018).
Choosing the optimal model is another key concern for all Maxent users (Warren and Seifert 2011, Shcheglovitova and Anderson 2013, Muscarella et al. 2014, Radosavljevic and Anderson 2014, Galante et al. 2018). Maxent models have been selected using two model selection approaches: (i) information criteria, specifically Akaike information criteria corrected for small sample size (AICc) and (ii) performance in predicting withheld data (Galante et al. 2018). Out of the two approaches AICc was more robust when species occurrence had sampling bias, but both performed well when the bias was corrected (Galante et al. 2018). However, there is a question on the AICc’s fit for model selection in Maxent despite its better performance (Muscarella et al. 2014, Galante et al. 2018). Whereas, the withheld data selection approach is open to the use of multiple selection criteria (Muscarella et al. 2014) which are either used independently (e.g. Warren and Siefert 2011) or in various combinations (e.g. Shcheglovitova and Anderson 2013, Radosavljevic and Anderson 2014, Galante et al. 2018) to select the optimal model.
The commonly used model selection criteria under the ‘withheld data’ approach include the ‘area under the curve of the receiver operating characteristic’ plot for the training data (AUCTRAIN) and AUCTEST, the AUCDIFF and the OR (Warren and Siefert 2011, Muscarella et al. 2014, Radosavljevic and Anderson 2014, Galante et al. 2018). When used independently AUCTRAIN and AUCTESTcan provide model discriminatory power. However, use of AUCTRAIN for model evaluation is criticized, with the use of AUCTEST preferred (Radosavljevic and Anderson 2014). While, AUCDIFF and OR can evaluate overfitting (Warren and Siefert 2011, Radosavljevic and Anderson 2014), they may also select over permissive models (Galante et al. 2018). Therefore, recent literature has either used a sequential combination of OR and AUCTEST (ORTEST approach) (Galante et al. 2018) or OR, AUCDIFF and AUCTEST(AUCDIFF approach) (Radosavljevic and Anderson 2014) but their performances have not been compared directly. Since Maxent produces multiple ORs corresponding to multiple thresholding rules there is also a need to assess the performance of different thresholding rules and their corresponding ORs for optimal model selection along with the use of multiple taxonomic groups to help derive general patterns (Galante et al. 2018) if present.
In this study we developed multiple SDMs with different FC and RM combinations for two groups of freshwater organisms with different life history traits, namely 10 fish (which complete their whole lifecycle in water) and 28 odonate species (whose nymphal stage is aquatic but the adults are terrestrial (Bybee et al. 2016)) recorded from Bhutan. We then selected optimal models for each species using two ORTEST and two AUCDIFF approaches. We also selected optimal models through expert screening for ecologically plausible models using binary suitable habitat maps (hereafter ‘EXP approach’). Though it is sensible to tune models and then screen for ecologically plausible models in every study (Muscarella et al. 2014, Morales et al. 2017, Phillips et al. 2017, Galante et al. 2018), this option may be either very time consuming or outright impractical if multiple species are involved within a time bound project. Therefore, we aimed to assess which of the sequential approaches best matched the EXP approach in selecting the optimal models, and hence help reduce the time required for the optimal model selection. We did this by comparing (i) model complexity and (ii) the predicted suitable habitats of the optimal models selected through the five selection approaches.