Methods
Coding ecological niches for analysis
Coding full niche ranges represents an initial challenge for ancestral niche reconstructions. We first determine relevant analytical limits with respect to single environmental dimensions as the minimum and maximum values within the union of all accessible areas polygons for the species comprising the clade of interest. Next, we parse this range of values into equal-width bins; for each species, bins with values falling within the range of environmental conditions occupied by the species are coded as “suitable” (1). Bins with values represented within the species’ M but falling outside the range of environmental conditions occupied by the species are scored as “unsuitable” (0). In cases in which suitable niche conditions coincide with the limits of environmental conditions present in a species’ M, all more extreme values—that is, values more extreme than those manifested within M (e.g. Fig. 1)—are coded as unknown (?). This procedure allows explicit incorporation of uncertainty in our analyses. When suitability is unknown because the climatic values for the bin in question were not represented within the M of the species, but the bin in question was flanked by two suitable bins, it is also scored as “suitable,” under an assumption of a unimodal response to environmental conditions (Maguire 1973). These steps can be achieved using nichevol v.0.1.17 (Cobos et al. 2020), an R (v.3.6.1; R Core Team 2019) package we created to facilitate studies of niche evolution, including many of the analyses presented in this paper. Package documentation includes a tutorial demonstrating an analytical workflow implementing nichevol.
Binned ancestral range reconstruction demonstration
As a simple illustration of how this analysis works, we simulated data for a scenario of a shift from an ancestral warm niche to a cool niche, to demonstrate the ability of our new method to identify instances of niche evolution. First, we simulated distributions and accessible areas for 1000 species across South America—500 with a fundamental mean annual temperature niche of 24–28ºC (hereafter referred to as “cool niche species”) and 500 with a fundamental mean annual temperature niche of 25–29ºC (hereafter referred to as “warm niche species”). For each set of species, accessible areas were generated using an initial population of 10,000 random polygons using nichevol. We assumed that each species could occupy all suitable cells within its corresponding M (i.e. we ignored biotic factors). Suitable cells were identified based on a 2.5’-resolution raster of annual mean temperature data (Bio 1) from WorldClim v.1.4 (Hijmans et al. 2005) and the raster package v.3.0.0 (Hijmans 2019) in R; simulated Ms with no suitable conditions were removed (n = 1). The median suitable mean annual temperature value for each simulated species was calculated from suitable cell values within its M. We then determined the minimum and maximum mean annual temperature within the union of all simulated M polygons and parsed this range of values into equal-width, 1ºC bins using nichevol in R. Raw and annotated R code the niche simulation and coding steps, as well as input and output data, can be found in supplementary materials provided via Dryad (Code Supplement 1, Annotated Code Supplement 1, Data Package 1; DOI: 10.5061/dryad.c866t1g3j).
We generated a single, 15-taxon stochastic birth-death tree (birth rate = 1, death rate = 0) using the R package phytools v.0.6-99 (Revell, 2012) and assigned simulated species from the cool niche group to a monophyletic clade of 7 taxa. The remaining tips in the tree were assigned simulated species from the warm niche simulation group. We then used nichevol tools to perform BAR reconstructions using maximum parsimony (as implemented in castor v.1.4.3; Louca and Doebeli 2018) and maximum likelihood (as implemented in ape v.5.3; Paradis and Schliep 2018). For both algorithms, ancestral state reconstructions were performed for each bin separately, treating bin scores (including “uncertain”) as discrete characters under an equal transition rate model of evolution. Results were smoothed such that reconstructed suitable ancestral niche bins at each node were not interrupted by unsuitable bins, following the assumption of a unimodal response to environmental conditions (Maguire, 1973), and accounting for evolutionary non-independence of bins. Raw and annotated R code for these steps, as well as input and output, can be found in supplementary materials provided via Dryad (Code Supplement 2, Annotated Code Supplement 2, Data Package 1; DOI: https://doi.org/10.5061/dryad.c866t1g3j).
We note that we have kept this initial example simple for the purpose of illustration—many improvements could be made to this methodology, such as implementation of different character evolution models, Bayesian approaches in inferring ancestral character states, stochastic character mapping, and consideration of joint effects of environmental dimensions (e.g. temperature, precipitation) that are here considered independently. Furthermore, phylogenetic comparative methods are notoriously “data-hungry,” and BAR reconstructions will benefit from further detailed simulation-based examinations in the future. Our purpose here is to illustrate the crucial importance of incorporating uncertainty explicitly in the inference of abiotic ecological niche evolution patterns.
Oriole analyses
We next used BAR reconstructions to infer patterns of niche evolution in 34 species of New World orioles (genus Icterus). We used the single best ultrametric maximum likelihood phylogeny from Powell et al. (2014; their Figure 4) inferred from mitochondrial and nuclear DNA sequences. Distributional data for each species were drawn from the Global Biodiversity Information Facility (GBIF, 2018), a large portion of which were derived from eBird (Supplemental Table 1, Data Package 2; via Dryad, DOI: https://doi.org/10.5061/dryad.c866t1g3j). We removed all records lacking geographic coordinates, and inspected those remaining with respect to known ranges of species based on expert assessment by four ornithologists (authors Cooper, Hosner, and Peterson), removing records that reflected errors or outdated taxonomic arrangements. Species-specific hypotheses of areas accessible to the species (M) were developed based on the biotic attributes and biogeographic history of the clade (Barve et al. 2011; Elith et al. 2010). That is, the ornithologists inspected patterns of occurrences for each species and outlined accessible-area hypotheses based on known barriers to dispersal (i.e., oceans, high mountain ranges, the Amazon River, deserts). While this step remains subjective, it is crucial to a realistic representation of the environments that should be considered within the species’ potential distribution (Phillips et al. 2009; Barve et al. 2011).
We then used BR to score species’ niches, explicitly scoring the parts of these profiles that were not observable (i.e. at the periphery of M) as uncertain (see above). For mean annual temperature (Bio1 in WorldClim v.1.4; Hijmans et al. 2005), we used 32 equal-width, 1ºC bins (3-4º, 4-5º, … 34-35º) across the full range of temperature values represented in the union of all species’ M areas. For annual precipitation (Bio12 in WorldClim v1.4; Hijmans et al. 2005), we used 80, 10 mm-width bins to cover the range of precipitation values from 0 to 800 mm across all species’ M areas. For comparison to more traditional methods of coding species’ niches, we calculated median values for mean annual precipitation and temperature across species’ known occurrences. As with our simulated species, we characterized species’ niches using R; raw and annotated R code for analyses, as well as inputs and outputs, can be found in supplementary materials provided via Dryad (Code Supplement 3, Annotated Code Supplement 3, Data Package 2; DOI: https://doi.org/10.5061/dryad.c866t1g3j).
Finally, we inferred the evolutionary history of oriole temperature and precipitation niches using both BAR, as described above, and GLS reconstructions using the median temperature and precipitation values at species occurrences. For GLS reconstructions, we first examined the fits of Brownian motion, Ornstein-Uhlenbeck, early burst, and diffusion with linear trend models of evolution. We then performed ancestral state reconstructions using a continuous-value maximum likelihood algorithm (as implemented in reconstruct in ape) under the best-fit evolutionary model (Ornstein-Uhlenbeck) for both mean annual temperature and annual precipitation. Raw and annotated R code for analyses, as well as input and output can be found in supplementary materials provided via Dryad (Code Supplement 4, Annotated Code Supplement 4, Data Package 2; DOI: https://doi.org/10.5061/dryad.c866t1g3j).
Results
Binned ancestral range reconstruction demonstration
Our BAR reconstructions detected simulated niche shifts; maximum likelihood reconstructions performed more reliably than parsimony reconstructions. In our simulated example, using maximum likelihood, we were able to recover the expansion from a 25ºC ancestral lower fundamental niche limit to a 24ºC ancestral lower fundamental niche limit at the most recent common ancestor of the 7 cool-niche simulated species (Annotated Code Supplement 1). However, the parsimony-based reconstruction failed to recover this change fully, but did show an expansion to 24ºC for simulated species “t1”, a species with a higher maximum known suitability than the other warm niche species (Annotated Code Supplement 1). By comparison, the GLS reconstruction performed qualitatively worse, reconstructing shifts to a warmer niche for the simulated cool-niche species and their ancestors. This is likely due to biased estimates of species’ realized niches based on their existing niches—“t3”, a cool niche species, had a median suitable temperature of 26.6 ºC, tied for the highest temperature in the clade with “t5”, a simulated warm niche species. Interestingly, BAR using parsimony reconstruction tended to infer more uncertain character states at the cooler ends of ancestral niches, whereas BAR using ML reconstruction inferred more uncertain character states at the warmer ends of ancestral niches. See Annotated Code Supplement 1 for further detail.
Application to oriole niche evolution
Large numbers of occurrence points were available for this clade, thanks to recent advances in biodiversity informatics and community-science initatives regarding bird distributions (Supplemental Table 1, Data Package 2). Niche estimates for some oriole species were completely characterized with respect to M, including the temperature and precipitation dimensions for Icterus fuertesi, and the precipitation dimensions for I. graceannae and I. galbula (Supplementary Figures 2 and 3). That is, estimated limits of suitable conditions were contained completely within the environments available in M, and did not appear to be truncated. The majority of species, however, were estimated to have niche ranges flanked by unknown maxima and/or minima.
In general, BAR reconstructions of species’ ecological niches in Icterus did not recover reduction or gain in inferred suitable niche space, suggesting broad-scale evolutionary stability (Figs. 3 and 4; Supplementary Tables 2-5, Annotated Code Supplement 4). For temperature, both ML and parsimony BAR reconstructed a consistent range of mean annual temperature niche values across all ancestral nodes (hereafter referred to as a “core conserved niche”), although some individual nodes had lower minimum or higher maximum suitable temperatures; the estimated core conserved niche was much broader for maximum likelihood (21–26ºC) than parsimony (24–25ºC; Supplementary Tables 2 and 3). For precipitation, maximum likelihood reconstructed a core conserved annual precipitation niche range of 71–240 mm, whereas parsimony-based reconstructions recovered no clear core conserved niche for precipitation (Supplementary Tables 4 and 5).