Licheng LIU

and 12 more

Improving the estimation of CO2 exchange between the atmosphere and terrestrial ecosystems is critical to reducing the large uncertainty in the global carbon budget. Large amounts of the atmospheric CO2 assimilated by plants return to the atmosphere by ecosystem respiration (Reco), including plant autotrophic respiration (Ra) and soil microbial heterotrophic respiration (Rh). However, Ra and Rh are challenging to be estimated at large regional scales because of the limited understanding of the complex interactions among physical, chemical, and biological processes and the resulting high spatio-temporal dynamics. Traditional approaches for estimating Reco including process-based (PB) models are limited by human knowledge resulting in limited accuracy and efficiency. Accumulation of the in situ observation of net ecosystem exchange (NEE), weather, and soil, and satellite data of GPP, LAI and soil moisture make it possible for applying data driven machine learning (ML) approaches. But the ML model approach has disadvantages of omission of domain knowledge and lack of interpretability. Here we propose a novel knowledge guided machine learning (KGML) method for predicting daily Ra and Rh in the US crop fields. With Gated Recurrent Unit (GRU) as the basis, we develop the KGML models constructing the hierarchical structure of ML with a mass balance constraint. The KGML models were pre-trained using synthetic data generated by an advanced agroecosystem model, ecosys, and re-trained with real-world FLUXNET observation data. We extrapolate the best KGML model to crop fields over the US with the help of satellite data, reanalysis climate forcings, and soil database to reveal the spatio-temporal variations and key controlling factors. We believe this study advances the interpretable machine learning concept for carbon cycle estimation and will shed light on many other process-based biogeochemistry research.

Xiang Li

and 11 more

Streamflow prediction is a long-standing hydrologic problem. Development of models for streamflow prediction often requires incorporation of catchment physical descriptors to characterize the associated complex hydrological processes. Across different scales of catchments, these physical descriptors also allow models to extrapolate hydrologic information from one catchment to others, a process referred to as “regionalization”. Recently, in gauged basin scenarios, deep learning models have been shown to achieve state of the art regionalization performance by building a global hydrologic model. These models predict streamflow given catchment physical descriptors and weather forcing data. However, these physical descriptors are by their nature uncertain, sometimes incomplete, or even unavailable in certain cases, which limits the applicability of this approach. In this paper, we show that by assigning a vector of random values as a surrogate for catchment physical descriptors, we can achieve robust regionalization performance under a gauged prediction scenario. Our results show that the deep learning model using our proposed random vector approach achieves a predictive performance comparable to that of the model using actual physical descriptors. The random vector approach yields robust performance under different data sparsity scenarios and deep learning model selections. Furthermore, based on the use of random vectors, high-dimensional characterization improves regionalization performance in gauged basin scenario when physical descriptors are uncertain, or insufficient.

Wenpeng Xie

and 3 more

With the development of large-scale rice cultivation management initiatives in East Asia, there is concern that a reduction in the number of human cultivators per unit area may lead to poor water management, which could result in decreased land productivity, owing to abnormal high- and low-temperature damage to crops. Accurate simulation of paddy field water temperature is important for studying its impact on crops and for providing timely information to aid in decision making for more efficient management under limited resources. We propose a neural-network framework that considers the heat transfer by the vegetation canopy and applies physical-theory constraints in its training. A novel tuning method is proposed to cope with the trade-off between water temperature accuracy and physical consistency during training to ensure that the calculated water temperature variations in a paddy field enjoy high accuracy and physical consistency. In the experiments, the proposed framework outperforms (with RMSE 0.78°C) both physical process models (with RMSE 1.06°C) and pure neural-network models (with RMSE 0.9°C) while maintaining high accuracy in the case of sparse datasets. Furthermore, an attention-mechanism input layer is integrated into the model to rank feature importance, providing global interpretation to the proposed framework. We also perform sensitivity analysis on the physical process and propose models to compare their different strategies of feature ranking. The results show that the two methods have different sensitivities to different types of feature patterns, but they complement each other. In summary, the proposed model is credible and stable for practical applications and has the potential to guide more efficient paddy management.