the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
On the added value of sequential deep learning for upscaling evapotranspiration
Abstract. Estimating ecosystem-atmosphere fluxes such as evapotranspiration (ET) in a robust manner and at global scale remains a challenge. Machine learning-based methods have shown promising results to achieve such upscaling, providing a complementary methodology that is independent from process-based and semi-empirical approaches. However, a systematic evaluation of the skill and robustness of different ML approaches is an active field of research that requires more investigations. Concretely, deep learning approaches in the time domain have not been explored systematically for this task.
In this study, we compared instantaneous (i.e., non-sequential) models—extreme gradient boosting (XGBoost) and a fully-connected neural network (FCN)—with sequential models—a long short-term memory (LSTM) model and a temporal convolutional network (TCN), for the modeling and upscaling of ET. We compared different types of covariates (meteorological, remote sensing, and plant functional types) and their impact on model performance at the site level in a cross-validation setup. For the upscaling from site to global coverage, we input the best-performing combination of covariates—which was meteorological and remote sensing observations—with globally available gridded data. To evaluate and compare the robustness of the modeling approaches, we generated a cross-validation-based ensemble of upscaled ET, compared the ensemble mean and variance among models, and contrasted it with independent global ET data.
We found that the sequential models performed better than the instantaneous models (FCN and XGBoost) in cross-validation, while the advantage of the sequential models diminished with the inclusion of remote-sensing-based predictors. The generated patterns of global ET variability were highly consistent across all ML models overall. However, the temporal models yielded 6–9 % lower globally integrated ET compared to the non-temporal counterparts and estimates from independent land surface models, which was likely due to their enhanced vulnerability to changes in the predictor distributions from site-level training data to global prediction data. In terms of global integrals, the neural network ensembles showed a sizable spread due to training data subsets, which exceeds differences among neural network variants. XGBoost showed smaller ensemble spread compared to neural networks in particular when conditions were poorly represented in the training data.
Our findings highlight non-linear model responses to biases in the training data and underscore the need for improved upscaling methodologies, which could be achieved by increasing the amount and quality of training data or by the extraction of more targeted features representing spatial variability. Approaches such as knowledge-guided ML, which encourage physically consistent results while harnessing the efficiency of ML, or transfer learning, should be investigated. Deep learning for flux upscaling holds large promise, while remedies for its vulnerability to training data distribution changes, especially of sequential models, still need consideration by the community.
- Preprint
(7291 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 24 Dec 2024)
-
RC1: 'Comment on egusphere-2024-2896', Simon Besnard, 07 Nov 2024
reply
Overall impression
Kraft et al. comprehensively review the current state of evapotranspiration upscaling methods. The paper is well-written and includes excellent figures and discussions, making it a valuable contribution to Biogeosciences' scope. The experimental setup is thoughtfully constructed, though I have some feedback regarding the comparative design of the model configurations. Given the quality of the study, it could proceed with either minor or major revisions depending on whether the authors choose to incorporate additional model experiments as suggested.Specific comments
L15: Consider adding quantitative information when discussing model performance differences between the sequential and non-sequential models to make the comparative strengths more transparent for readers.Lag variables in non-sequential models: Have you considered explicitly adding lagged variables to the non-sequential models? This would offer a more balanced comparison, as it would simulate past dynamics without the complexity of models like LSTM. For example, incorporating lagged climate variables into XGBoost could provide insights into whether sequential models are uniquely beneficial in capturing temporal patterns.
Model selection – self-attention models: The paper mentions TCN and self-attention as alternatives to LSTM, yet only TCN was tested. Could you elaborate on why the self-attention models were excluded from the experiments?
L127-129: Precipitation appears to be absent from the meteorological variables. Was there a reason for this omission? Precipitation likely impacts ET indirectly through soil moisture, which could be a significant predictor in capturing temporal dynamics.
L129: For clarity, it would help readers if you briefly explained the significance of the time derivative of potential shortwave irradiation and what processes it represents within the context of ET.
L175: The statement "The remote sensing and PFT covariates were repeated in time to obtain uniform inputs" could be clarified. Does this mean daily remote sensing data were kept constant at a sub-daily scale? If so, it’s worth discussing if the model accounts for the sub-daily variations, as metrics like LST and NDWI are not entirely invariant within a day.
L204-205: Were the same hyperparameters used for each fold in cross-validation? This clarification would help assess whether the variation within model ensembles arises from differences in training data subsets or distinct hyperparameter settings.
Fig 4 – PFT impact on TCN and LSTM models: In Figure 4, adding PFT as a predictor seems to penalize TCN performance, yet it enhances LSTM’s accuracy in capturing interannual variability. Could you discuss this divergence?
Fig 4 – Model sensitivity to hyperparameters: Are the displayed results limited to the best models, or could the performance of the other 19 models (with variation bars) be included to show the sensitivity to hyperparameter choices?
Fig 4—mean-site results: Presenting model performance metrics related to spatial variability, such as the mean site performance, could be informative.
L254-255: While PFT doesn’t enhance site-level predictions, could it mitigate extrapolation errors during upscaling? Including this consideration in the discussion of the scaling-up section may add valuable insight. Or would the spread for the sequential model change with or without PFT?
L286-288: Can you test the hypothesis about observation biases with synthetic data or a process-based model simulating extreme events and disturbances? This might strengthen the argument about model vulnerability to changes in predictor distributions.
L410-411: Jung et al. (2020) introduced an extrapolation index that might be useful here. Plotting the model spread against this index could demonstrate that model uncertainty correlates with areas requiring more extrapolation, supporting your discussion points.
References:
Jung, Martin, et al. "Scaling carbon fluxes from eddy covariance sites to globe: synthesis and evaluation of the FLUXCOM approach." Biogeosciences 17.5 (2020): 1343-1365.Citation: https://doi.org/10.5194/egusphere-2024-2896-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
184 | 38 | 9 | 231 | 4 | 5 |
- HTML: 184
- PDF: 38
- XML: 9
- Total: 231
- BibTeX: 4
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1