4. Discussion
Over the last 20 years, CONAFOR has invested significant time and resources to produce forest inventory data that accurately represents all forest ecosystems in Mexico. To further expand the utility of this data, we developed an analytical framework to model, predict, and map forest structural attributes (tree density and height) across the country. By exploiting the available open access of remotely sensed data (e.g., mean land surface temperature, LAI, NPP, FPAR) (Gorelick et al., 2017), the ensemble machine learning method in the LANDMAP package v0.0.14 for R v4.1.0 (Hengl et al., 2021; RStudio Team, 2021), and the INFyS data (CONAFOR, 2017), we have modeled and performed predictions of tree height and tree density across Mexico. Results suggest that the ensemble ML algorithm had a better performance when predicting tree height than tree density (Table 1). In addition to providing numerical estimates, these maps are user-friendly devices that help users visualize forest structures across Mexico.
Mapping forest attributes along with associated uncertainties at a national scale requires substantial computational resources. We simplified our approach by modeling at a 1000-m resolution and reducing the number of model predictors, thus reducing computing costs and still displaying valuable nation-wide maps for biodiversity studies and ecologic matters. Nevertheless, previous studies have shown that high resolution satellite data (e.g., 30 m) has helped achieve an increase in predictive ability (Hengl et al., 2021). It is important to acquire sufficient computational resources for the project’s next stage and perform predictions with high-resolution covariates. Both tree height and density had strong univariate correlations with remotely-sensed predictors like canopy cover, FPAR and LAI. Previous studies have shown that, using more than one vegetation trait as model predictors can reduce prediction uncertainty when mapping forest attributes (Saarela et al., 2020). These results give a sense of the directionality of the relationships between the modeled attributes and its environment and strengthen the conviction of monitoring forest change through time.
The range of mean predicted values for tree height were consistent with forest inventory data (~5-10 m). These results suggest that predictions using the Super Learner model reflected the input data adequately. On average, cloud mountain forest is the forest type with the tallest trees in Mexico (Table 1). This particular forest belongs to humid and temperate areas; it has the largest aerial biomass density and the greatest timber volume of all Mexico forest types but it accounts for only ~1% of the national forest area (Villaseñor & Gual, 2014). According to CONAFOR (2017), more than half of its vegetation is in early stages of succession, with high densities of young and smaller trees. Maps of tree height, therefore, can indicate areas that deserve more attention, such as the wide exploitation of cloud mountain forest goods. Estimates of tree height are also critical for the evaluation of forest structure (e.g., successional stages) and projecting Mexican forests growth trajectories under different management scenarios.
Mean predicted tree density values were smaller than the field-sampled inventory data (Table 1). Globally, 42.8% of the planet’s trees exist in tropical and subtropical regions (Crowther et al., 2015). Generally, optimal conditions for tree growth are warm temperatures and moisture availability (Leathwick & Austin, 2001). In accordance with this assumption, tropical forests, which develop in a warm and moist environment, have the highest tree density of all Mexico forest types (maximum values of ~1370 trees/ha). The highest forest densities can be observed in the Calakmul rainforest area located within the Yucatán Península, in the southeast of Mexico (Fig 5a). The Calakmul rainforest is part of an important ecological gradient, the Mesoamerican Biological Corridor. The conservation of this ecologically important region has been a challenge due to continuous forest disturbances. Tree density has been used as an indicator for forest degradation on tropical ecosystems (Román-Dañobeytia et al., 2014), therefore we encourage the long-term monitoring of tropical forest structure.
For both target variables, uncertainty in our predictions was below 50% in most forests. Our uncertainty maps also show areas where the model performs poorly, especially in northern areas which consist of arid and semi-arid ecosystems (>80% uncertainty). These ecosystems have fewer sampling plots, which leaves less training data for modeling over a considerably large area of Mexico. The diversity of Mexican forests and the limited land access imply a logistics challenge for the forest inventory and this causes an under-representation of specific forest areas. One potential use for our uncertainty maps is for the INFyS to identify certain areas that require more sampling plots (e.g., arid and semi-arid ecosystems) and to identify new sampling locations on the areas with poor modeling accuracy (e.g., areas with high uncertainty).
Data from this study was managed under the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) for scientific data management by setting up an open-access online data repository available at the Environmental Data Initiative (EDI): https://doi.org/10.6073/pasta/4620375aea631ab6a09cb573c7bf8aff. Having well-documented methods, FAIR research protocols, and a good documentation of forest inventory data for all users can help advance the science and policy relevant to forestry research and management.
Continuous improvement in the study design we present here is encouraged in order to improve the accuracy of predictions. For instance, we suggest acquiring remote sensing data at a higher resolution, increasing computational capacity, assessing new spatial prediction models, and locating new sampling sites in ecosystems with poor map quality indicators (e.g. r2, RMSE) or uncertainties >80%. Finally, the results of this study can facilitate the understanding of Mexican forest ecosystems by further applying this methodological framework for the mapping of other forest attributes such as AGB, soil and vegetation carbon storage and their associated functional traits. To achieve this, it is important to continue with active forest inventory campaigns that facilitate the estimation of forest structure patterns through time.
Here we develop a methodological framework for the spatial prediction of forest attributes, which assists the understanding of forest structure and expands institutional and technical capabilities for data analysis within the National Forestry Commission of Mexico. Out of ten forest ecosystems, our analyses show that the best predictive performance when mapping tree height was in tropical dry forest and broadleaf forest (model explained ~50% of variance). The best predictive performance when mapping tree density was in tropical forest (model explained ~30% of variance). For both target variables, uncertainties in our predictions were below 50% in most forests.
Our results suggest that an ensemble learning framework can be successfully used for the spatial prediction of forest attributes and can likely be improved by having a larger number of field observations and sufficient model predictors that reflect the environment of each forest ecosystem. In order to ensure best practices for forest management in Mexico, it is important that governmental and academic institutions work together to develop approaches. This strategy helps improve the quality and transparency of forestry datasets.