A Novel Classifier-Guided Ensemble Framework for Global Terrestrial Evapotranspiration Estimates
Abstract. Evapotranspiration (ET) is a key hydrological and meteorological variable, serving as the critical nexus between water and energy exchanges. However, accurate estimation of global ET remains a challenging task, as process-based ET algorithms are often inadequate to capture the nonlinear relationship among environmental factors, and the application of data-driven ET algorithms is hindered by sparse and uncertain ET observations. In this study, we developed a novel ensemble framework that integrates three existing ET models (process-based algorithm, machine learning-based ET model, and hybrid model), aiming to provide high-precision terrestrial ET estimates. The framework is guided by an additional classifier that can achieve dynamic per-pixel model selection, thus fully utilizing the spatiotemporal dynamics of each model's distinct advantages in mapping global ET and avoiding the typical underestimation of high values by ensemble methods. Comprehensive validation of the model was carried out using in-situ ET observations from the FLUXNET2015 dataset, catchment-scale water balance ET dataset, and six global-scale ET products, including comparisons to individual base models and another Attention-Based ensemble model. The quantitative comparisons across statistical metrics (RMSE, MAE, R2, KGE) indicate that our ensemble model outperforms other evaluated models, especially in extreme samples. Meanwhile, the introduction of classifier can not only significantly enhance the algorithmic robustness and generalizability, but also allow us to gain a basic understanding of the mechanisms behind model selection by interpretability analysis. The study demonstrated the effectiveness of the proposed framework in enhancing ET estimation robustness, thereby providing a valuable reference for the estimation of other similar variables.
Overall, I was very impressed with the written manuscript and the scientific rigor of the analysis. The introduction and methods clearly describe the complex ML, ensemble, and processed based ET estimates as well as training and validation analysis. The results showed high accuracy, and the figures were well developed and easily visualized. I have minor comments below:
Line 100: Precipitation was not used as an input covariate. Please provide an explanation for why this was not included.
Line 110: You do a great job outlining the models available with Autogluon. In line 110, you end the list with “etc.” – leading me to believe that there are even more algorithms available. In line 111, can you please list the specific ML algorithms you used?
Line 110: You say “Autogluon can combine them” – can you provide more specifics on this (perhaps just changing the phrasing) – did Autogluon combine them in your research OR autogluon can combine them – but you did not in your research. I think simply stating “Autogluon combined all the algorithms mentioned above” would suffice.
Figure 1: This is a great workflow figure – and helps visualize the process.
Line 164: You say you used 6 well known ET products – can you please cite them after the sentence in line 164? Or perhaps say “refer to section 3.3”
Figure 5: Can you include the sites and land cover types of the three lowest RMSE in Line 350 paragraph. It would provide additional detail than “the majority of land covers”. I think even reference table 3 in the paragraph of Line 350 would be helpful.
Table 2: This seems repetitive and unnecessary. It is unclear how it is different than Figure 4.
The overall importance of the paper is not clear. Are the global ET estimates available for public use – if so, is a link available to the dataset and what is the spatial and temporal resolution? If the estimates are available – I recommend making it clear and provide specific details and explanations about the use of the data.
On the other hand, is the paper simply a methodologic paper meant to explain a novel method for estimating global ET –although you explain the research, it would be hard for another researcher to reproduce for local or global ET estimates. If so, I recommend making it clear that the data are meant to remain proprietary, but the manuscript simply provides novel methodology.