savinay nagendra

and 9 more

In this article, we consider the scenario where remotely sensed images are collected sequentially in temporal batches, where each batch focuses on images from a particular ecoregion, but different batches can focus on different ecoregions with distinct landscape characteristics. For such a scenario, we study the following questions: (1) How well do DL models trained in homogeneous regions perform when they are transferred to different ecoregions, (2) Does increasing the spatial coverage in the data improve model performance in a given ecoregion (even when the extra data do not come from the ecoregion), and (3) Can a landslide pixel labelling model be incrementally updated with new data, but without access to the old data and without losing performance on the old data (so that researchers can share models obtained from proprietary datasets)? We address these questions by a framework called Task-Specific Model Updates (TSMU). The goal of this framework is to continually update a (landslide) semantic segmentation model with data from new ecoregions without having to revisit data from old ecoregions and without losing performance on them. We conduct extensive experiments on four ecoregions in the United States to address the above questions and establish that data from other ecoregions can help improve the model’s performance on the original ecoregion. In other words, if one has an ecoregion of interest, one could still collect data both inside and outside that region to improve model performance on the ecoregion of interest. Furthermore, if one has many ecoregions of interest, data from all of them are needed.

Wen-Ping Tsai

and 4 more

Some machine learning (ML) methods such as classification trees are useful tools to generate hypotheses about how hydrologic systems function. However, data limitations dictate that ML alone often cannot differentiate between causal and associative relationships. For example, previous ML analysis suggested that soil thickness is the key physiographic factor determining the storage-streamflow correlations in the eastern US. This conclusion is not robust, especially if data are perturbed, and there were alternative, competing explanations including soil texture and terrain slope. However, typical causal analysis based on process-based models (PBMs) is inefficient and susceptible to human bias. Here we demonstrate a more efficient and objective analysis procedure where ML is first applied to generate data-consistent hypotheses, and then a PBM is invoked to verify these hypotheses. We employed a surface-subsurface processes model and conducted perturbation experiments to implement these competing hypotheses and assess the impacts of the changes. The experimental results strongly support the soil thickness hypothesis as opposed to the terrain slope and soil texture ones, which are co-varying and coincidental factors. Thicker soil permits larger saturation excess and longer system memory that carries wet season water storage to influence dry season baseflows. We further suggest this analysis could be formalized into a novel, data-centric Bayesian framework. This study demonstrates that PBM present indispensable value for problems that ML cannot solve alone, and is meant to encourage more synergies between ML and PBM in the future.

Farshid Rahmani

and 4 more

Stream water temperature is considered a “master variable” in environmental processes and human activities. Existing process-based models have difficulties with defining true equation parameters, and sometimes simplifications like assuming constant values influence the accuracy of results. Machine learning models are a highly successful tool for simulating stream temperature, but it is challenging to learn about processes and dynamics from their success. Here we integrate process-based modeling (SNTEMP model) and machine learning by building on a recently developed framework for parameter learning. With this framework, we used a deep neural network to map raw information (like catchment attributes and meteorological forcings) to parameters, and then inspected and fed the results into SNTEMP equations which we implemented in a deep learning platform. We trained the deep neural network across many basins in the conterminous United States in order to maximize the capturing of physical relationships and avoid overfitting. The presented framework has the ability of providing dynamic parameters based on the response of basins to meteorological conditions. The goal of this framework is to minimize the differences between stream temperature observations and SNTEMP outputs in the new platform. Parameter learning allows us to learn model parameters on large scales, providing benefits in efficiency, performance, and generalizability through applying global constraints. This method has also been shown to provide more physically-sensible parameters due to applying a global constraint. This model improves our understanding of how to parameterize the physical processes related to water temperature.