Main Body:
The aggregation of large datasets from museum records and community science provides a valuable resource for macroscale ecological analyses. However, such data include spatiotemporal and taxonomic biases that must be addressed (Troudet et al. 2017; Ries et al. 2019). Given these biases, proper data curation and appropriate modelling strategies are necessary to ensure valid inferences.
Fric et al. 2020 used occurrence data for 100 species aggregated in the Global Biodiversity Information Facility (GBIF) in temperate regions of North America and Europe to track phenology across latitudes. Estimating phenology metrics and trends from large occurrence datasets is possible, but requires sufficient data density and appropriate statistical methods (Taylor and Guralnick 2017). The data from this study were frequently too sparse and insufficiently curated to estimate phenological patterns across latitudes. Further, using regression of residuals resulted in spurious patterns; after correcting for altitude and year, onset and termination phenology appeared the same at low and high latitudes for most species, contrary to previous findings (Karlsson 2014, Matechou et al. 2014). We show that applying appropriate data curation and methods, most species demonstrated later onset and shorter flight periods at higher latitudes.
Many species analyzed (in 105 datasets separated by continent) in Fric et al. (2020) had insufficient data for independent analysis. Data were analyzed with as few as 15 occurrence records across >20 degrees latitude and >100 years. Phenological “onset” and “termination” of flight periods were extracted simply as the first and last day-of-year (DOY) values within latitudinal bands, pooled across all years and altitudes. Pooling data increased spatiotemporal bias, lowered the resulting power to detect patterns, and resulted in only one observation date being used as both “onset” and “termination” of flight periods (resulting in one-day flight periods and “peak flight”) in an average of 20% of latitudinal bands per species (Figure 1).
Fric et al.’s data curation was inadequate regarding spatial precision and outlier detection. Altitudes were extracted using imprecise GIS coordinates, sometimes representing sea floor or vague place names (eg, ”Mt Shasta”) and skewed left, giving high altitude observations outsized leverage in regressions. Temporal outliers were problematic; one species’ onset at 68° N was in January, when the next occurrence across all latitudes was in June. No sources were cited for species traits, and we found evidence documenting additional generations in portions of their range for 22 species identified as obligate univoltine (Supplemental Table 1).
Finally, the analytical approach in Fric et al. (2020) produced biased results. Beyond regressing individual species’ phenometrics against latitude, altitude, and year separately, regression of residuals was used for corrected regressions. This resulted in biased parameter estimates due to collinearity among explanatory variables and reduced statistical significance (Freckleton 2002). Results suggested most species’ onset (67 datasets) and termination (71 datasets) were similar across latitudes (Figure 2). These results were surprising, considering well-documented delayed and/or shortened flight periods at high latitudes (Karlsson 2014).
We sought to validate those results with a more robust analysis, applying stricter data standards and curation. For 72 species (76 datasets) we confirmed as univoltine, we filtered data for altitude (0-500m) and timing (March-November). We calculated phenometrics for year-latitude combinations with at least 10 observations. Only 22 datasets, all European, met these requirements in at least three latitudinal bands (Supplemental Table 1). For these, onset and termination were estimated from a Weibull distribution using R package phest (Pearse 2017) and bounded by days (60,330). To estimate unbiased parameters, we modeled each species phenometric using multiple regression (DOY ~ latitude + year) using R version 3.6.2 (R Core Team 2019). We compared our results to those from Fric et al. Supplemental Table 2.
We were unable to validate most patterns reported in the original study. Our results varied substantially in both onset and termination across species (Figure 2). In contrast to Fric et al. (2020), we found significantly later and/or shorter flight periods at higher latitudes for most species. These new results were consistent with the latitudinal gradient in climate and growing season length (Kobayashi et al. 2016). This evidence of inaccurate phenological patterns also discredits Fric et al.’s downstream trait analyses.
Despite this critique, we recognize that occurrence data have great potential to address many ecological questions. New aggregations of large datasets provide valuable inputs for macroscale ecological research, and the sheer amount of data accumulated across time and space may provide statistical power. However, “with great power must come great responsibility” (Lee 1962); robust scientific inference requires careful data curation and robust analytical models. Other phenology metrics are less confounded by abundance and effort (Belitz et al. 2020); integrated community models with random species effects or informed priors better suit community phenological analyses (Ellwood et al. 2012). We enthusiastically support continued digitization and use of collection data in ecological analysis, but urge researchers to exercise caution when using these data.