Introduction
Species distribution models (SDMs) are used increasingly in different
stages of conservation decision making (Guisan et al. 2013). Among
numerous SDM methods, techniques that use presence-only occurrence data
have become more popular with readily available georeferenced presence
data (for example from GBIF) and environmental variables (Gomez et al.
2018). Among the presence only modelling methods Maxent has become very
popular (Phillips et al. 2017, Morales, Fernández, and Baca-González
2017). Maxent identifies the suitable geographic areas for species given
the set of environmental variables and known occurrence records by
applying a maximum entropy model (Phillips and Dudík 2008). The
increased use of Maxent has been ascribed to its better performance over
other methods when used with low occurrence data (Elith et al. 2011,
Coxen, Frey, Carleton, and Collins 2017) and also for its ease of use
through its graphical user interface (GUI) (Phillips et al. 2006,
Morales et al. 2017, Kass et al. 2018).
The Maxent output can be overfit or under fit to the occurrence
localities which results in models that under-predict or overpredict the
suitable area respectively (Shcheglovitova and Anderson 2013). Maxent
output quality depends on model complexity (Shcheglovitova and Anderson
2013, Morales et al. 2017, Phillips 2017), run type (Phillips 2017),
bias in occurrence data (Phillips et al. 2009, Syfert, Smith, and Coomes
2013) and its correction methods (Kramer-Schadt et al. 2013), background
selection methods (Merow, Smith, and Silander Jr 2013; Vollering,
Halvorsen, Auestad, and Rydgren 2019), and output types (Phillips 2017,
Phillips et al. 2017). Once other factors are taken into consideration,
model complexity becomes the most important issue (Syfert, Smith, and
Coomes 2013). Model complexity is controlled by the types of feature
class (hereafter: ‘FC’) and the value of the regularization multiplier
(hereafter: ‘RM’) used (Radosavljevic and Anderson 2014, Morales et al.
2017, Phillips 2017). The current version of Maxent v 3.4.1 has linear
(L), quadratic (Q), product (P), and hinge (H) features as default FCs
and threshold (T) as optional FC (Phillips et al. 2017), while default
RM used is 1. Though the models builtvusing either larger or smaller RM
compared with the default, are expected to be over-complex or
over-simplistic respectively (Phillips et al. 2017), it was actually the
models built using default RM and the FCs that were found to be either
over-complex or over-simplistic (Shcheglovitova and Anderson 2013,
Morales et al. 2017). Therefore, it is prudent to build multiple models
using different combinations of RM values and FCs and then choose the
optimal model for use in conservation decisions (Muscarella et al. 2014,
Morales et al. 2017, Phillips et al. 2017, Galante et al. 2018).
Choosing the optimal model is another key concern for all Maxent users
(Warren and Seifert 2011, Shcheglovitova and Anderson 2013, Muscarella
et al. 2014, Radosavljevic and Anderson 2014, Galante et al. 2018).
Maxent models have been selected using two model selection approaches:
(i) information criteria, specifically Akaike information criteria
corrected for small sample size (AICc) and (ii) performance in
predicting withheld data (Galante et al. 2018). Out of the two
approaches AICc was more robust when species occurrence had sampling
bias, but both performed well when the bias was corrected (Galante et
al. 2018). However, there is a question on the AICc’s fit for model
selection in Maxent despite its better performance (Muscarella et al.
2014, Galante et al. 2018). Whereas, the withheld data selection
approach is open to the use of multiple selection criteria (Muscarella
et al. 2014) which are either used independently (e.g. Warren and
Siefert 2011) or in various combinations
(e.g. Shcheglovitova and Anderson
2013, Radosavljevic and Anderson
2014, Galante et al. 2018) to select the optimal model.
The commonly used model selection
criteria under the ‘withheld data’ approach include the ‘area under the
curve of the receiver operating characteristic’ plot for the training
data (AUCTRAIN) and AUCTEST, the
AUCDIFF and the OR (Warren and Siefert 2011, Muscarella
et al. 2014, Radosavljevic and Anderson 2014, Galante et al. 2018). When
used independently AUCTRAIN and AUCTESTcan provide model discriminatory power. However, use of
AUCTRAIN for model evaluation is criticized, with the
use of AUCTEST preferred (Radosavljevic and Anderson
2014). While, AUCDIFF and OR can evaluate overfitting
(Warren and Siefert 2011, Radosavljevic and Anderson 2014), they may
also select over permissive models (Galante et al. 2018). Therefore,
recent literature has either used a sequential combination of OR and
AUCTEST (ORTEST approach) (Galante et
al. 2018) or OR, AUCDIFF and AUCTEST(AUCDIFF approach) (Radosavljevic and Anderson 2014) but
their performances have not been compared directly. Since Maxent
produces multiple ORs corresponding to multiple thresholding rules there
is also a need to assess the performance of different thresholding rules
and their corresponding ORs for optimal model selection along with the
use of multiple taxonomic groups to help derive general patterns
(Galante et al. 2018) if present.
In this study we developed multiple SDMs with different FC and RM
combinations for two groups of freshwater organisms with different life
history traits, namely 10 fish (which complete their whole lifecycle in
water) and 28 odonate species (whose nymphal stage is aquatic but the
adults are terrestrial (Bybee et al. 2016)) recorded from Bhutan. We
then selected optimal models for each species using two
ORTEST and two AUCDIFF approaches. We
also selected optimal models through expert screening for ecologically
plausible models using binary suitable habitat maps (hereafter ‘EXP
approach’). Though it is sensible to tune models and then screen for
ecologically plausible models in every study (Muscarella et al. 2014,
Morales et al. 2017, Phillips et al. 2017, Galante et al. 2018), this
option may be either very time consuming or outright impractical if
multiple species are involved within a time bound project. Therefore, we
aimed to assess which of the sequential approaches best matched the EXP
approach in selecting the optimal models, and hence help reduce the time
required for the optimal model selection. We did this by comparing (i)
model complexity and (ii) the predicted suitable habitats of the optimal
models selected through the five selection approaches.