the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Learning Evaporative Fraction with Memory
Abstract. Evaporative fraction (EF), defined as the ratio of latent heat flux to the sum of sensible and latent heat flux, is a key metric of surface energy partitioning and an indicator of plant water stress. Recognizing the role of vegetation memory effects, we developed an explainable machine learning (ML) model based on a Long Short-Term Memory (LSTM) architecture, which explicitly incorporates memory effects, to investigate the mechanisms underlying EF dynamics. The model was trained using data from 90 eddy-covariance sites across diverse plant functional types (PFTs), compiled from the ICOS, AmeriFlux, and FLUXNET2015 Tier 1 datasets. It accurately captures EF dynamics – particularly during post-rainfall pulses and soil moisture dry-down events – using only routinely available meteorological inputs (e.g., precipitation, radiation, air temperature, vapor pressure deficit) and static site attributes (e.g., PFT, soil properties). The ensemble mean predictions showed strong agreement with observations (R² = 0.82) across sites spanning broad climate and ecosystem gradients. Using explainable ML techniques, we identified precipitation and vapor pressure deficit as the primary drivers of EF in woody savanna, savanna, open shrubland, and grassland ecosystems, while air temperature emerged as the dominant factor in deciduous broadleaf, evergreen needleleaf, and mixed forests. Furthermore, expected gradients revealed variation in memory contributions across PFTs, with evergreen broadleaf forests and savannas exhibiting stronger influences from antecedent conditions compared to grasslands. These memory effects are strongly associated with rooting depth, soil water-holding capacity, and plant water use strategies, which collectively determine the time scales of drought response. Notably, the learned memory patterns could serve as proxies for inferring rooting depth and assessing plant water stress. Our findings underscore the critical role of meteorological memory effects in EF prediction and highlight their relevance for anticipating vegetation water stress under increasing drought frequency and intensity.
- Preprint
(13955 KB) - Metadata XML
-
Supplement
(5331 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-4082', Anonymous Referee #1, 28 Sep 2025
-
RC2: 'Comment on egusphere-2025-4082', Benjamin Stocker, 31 Oct 2025
Review for Zhao et al.
This study presents an analysis of the evaporative fraction (EF, here defined as the latent heat flux divided by the sum of the latent and the sensible heat flux) across a large set of sites with eddy covariance-based flux measurements with a focus on how EF evolved during dry-downs (consecutively dry days that lead to a progressive drying of the rooting zone of vegetation). It fits a recurrent neural network model (and two non-recurrent and simpler alternative models as a benchmark) and performs a targeted model diagnostic analysis (”explainable machine learning”) to investigate EF decay rates. These are then put in context to site characteristics, in particular the average rooting depth of the vegetation type per site, soil texture (sand fraction), etc.
Overall, I found this a very interesting analysis that yields informative insights into key properties of plant responses to water stress, inferred from ecosystem flux measurements. While the overall logic (inferring belowground properties of the vegetation from aboveground time series measurements) have been employed before, and also with a focus on rooting zone water storage capacity, the present study provides a clear added value: The application of a suitably targeted model diagnostic analysis on a recurrent neural network. This appears to yield clearly interpretable insights that comply with generally expected patterns. While generally to be expected (slow decay in tree-dominated ecosystems), the demonstration of how this method can be applied in this context, is valuable and opens the door to similar such applications. I would, however, like to encourage authors to revise the presentation of the manuscript for a clearer separation of results and interpretation in general, more detail on methods, a clarification of how inferences about rooting depths were obtained, and to make code and data publicly available for reproducibility of the results. I have, however, some general (major) and more specific points that I think should be addressed before publication.
Major
The model performance appears excessively strong and the analysis is not reproducible. It appears unconvincingly strong in view of other published studies that employed similar methods for ecosystem flux modelling, mostly with a focus on GPP, NEE, or ET (Nakagawa et al., 2023; Besnard et al., 2019; Kraft et al., 2024; Montero et al., 2024; Biegel et al., 2025). Here, EF is the prediction target. I expect that EF is even harder to model than GPP or ET in view of the known first-order control of solar radiation on GPP and net radiation on ET. These radiation components are reliably measurable and drive strong variations in GPP and ET, respectively. Hence, a “null-model” that is formulated as a GPP being a linear function of solar radiation explains already a large part of the variations in the data and ML models improve on this only to a limited extent (Stocker et al., 2020). Net radiation is factored out in EF and variations are accordingly smaller and should be harder to model than GPP and ET. Yet, the paper suggests that the model employed here is even better GPP and ET models. One explanation is that data from a give site is used for both training and testing. Hence, the model is not generalisable. However, I agree that it’s permissible in the present case, where no spatial upscaling is performed. However, authors report an R-squared of 0.72 also for their evaluation on unseen sites, which is still extremely strong. It would be necessary to test their implementation of the model fitting and evaluation. However, code and data is not available. Therefore, in view of the unconvincingly strong model performance, I consider the lack of reproducibility a roadblock at the moment. Even with code provided, the model performance should be discussed with a view to the published literature.
This paper would be much clearer in presenting what is a result from their analysis vs. what is an interpretation, if Section 4 (now “Results and Discussions”) is separated into two sections, as commonly done, for results and discussions separately. This way, it can be made clear, e.g., how the relation between the model diagnostic analysis and rooting depth is established. I was confused for a long time when reading the paper. First I simply didn’t understand where this part was coming from. Only then, I realised that this is actually coming from an analysis (which brings me to the third point…).
Fig. 8 is very powerful. This could be made more central and the association with rooting depth (not even explained in the caption!) made more explicit. Could Fig. 8 be provided per site in an aggregated fashion, e.g., a line for each site in a common plot, line color distinguished by vegetation type? The relation to RD is hard to decipher. It would be more convincing if some average decay time scale measure is correlated with vegetation type-average rooting depth.
Specific
- Abstract: “R2=0.82” - not clear what exactly is measured here? EF? only during dry-downs?
- Abstract: “expected gradients” - capitalise throughout to clarify that it’s a method name.
- l. 36: I wouldn’t subscribe to such a definition of memory effects.
- l. 38 (”vegetation dynamics”) - revise term. This commonly refers to changes in the vegetation community composition
- l. 43: I find the connection of rooting depth with resistance and resilience not very convincing. The relationships illustrated in Fig. 1 make a connection to the sensitivity of EF to dry-downs. Would you define resistance as the inverse of sensitivity? Anyways, this model doesn’t link to the ability of the recover (=resilience) after re-wetting.
- l. 46: I guess one can debate about what ‘limited’ means, but some work that relied on ET or EF decay and its time scales should not go unmentioned here: Teuling et al., 2006 (this is really, as far as I am aware, the starting point of many similar analyses that followed); Giardina et al., 2023.
- l. 56 (”In the context of vegetation,…”): Appears a bit out of context in this paragraph that deals with ML and LSTM.
- l. 69 and Intro in general: Apparent rooting zone water storage capacity, not rooting depth, is more directly inferred from EF (rooting depth needs additional information about soil texture and groundwater table depth). Also, the logic is not complete. You can only infer that if you can regress EF variations against cumulative water deficits during dry-downs (Giardina et al., 2023). I guess this could be addressed by generally revising the formulations and toning it down: you can establish a relationship between EF dry-down patterns and rooting depth, but without knowing the amount of water consumed during that time and soil texture and groundwater, the association remains correlative, and not a quantitative estimate in a length unit.
- Time scales (or cumulative water deficits) are key to making a connection to waters storage capacities or rooting zone depth (as descirbed also in your Fig. 1). The introduction should explain how functionalities of the applied ML techniques provide such insights. Traditionally, explainable ML has been used to diagnose learned functional relationships or variable importances. I think what’s missing here is an explanation of the Expected Gradients method and the logic for how it can be applied in the context of this study’s objectives. Can you refer to other applications of such methods?
- Please cite Tumber-Davila et al. instead of Stocker et al., 2023 for the rooting depth data
- The methods section lacks critical detail: What LE version was used (energy-balance corrected or not)? What temporal resolution of the data? Any data filtering (quality control-based cleaning) applied? Where is soil data from?
- Related to above: The analysis of outputs from the Expected Gradients analysis is entirely missing from the methods section. This is a glaring gap.
- l. 168: unclear formulation.
- Fig. 8: Hard to decipher the legend. Please indicate what x axis is. The color scale tick labels are not readable. What is RD - explain in caption?
- l. 421: needs a reference
- l. 436: too strong of a statement.
References
- Teuling, A. J., Seneviratne, S. I., Williams, C., and Troch, P. A.: Observed timescales of evapotranspiration response to soil moisture, Geophys. Res. Lett., 33, L23403, https://doi.org/10.1029/2006GL028178, 2006.
- Giardina, F., Gentine, P., Konings, A. G., Seneviratne, S. I., and Stocker, B. D.: Diagnosing evapotranspiration responses to water deficit across biomes using deep learning, New Phytologist, 240, 968–983, https://doi.org/10.1111/nph.19197, 2023.
- Tumber-Dávila, S. J., Schenk, H. J., Du, E., and Jackson, R. B.: Plant sizes and shapes above- and belowground and their interactions with climate, New Phytologist, n/a, https://doi.org/10.1111/nph.18031, 2022.
- Biegel et al. https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1617/egusphere-2025-1617.pdf
- Nakagawa, R., Chau, M., Calzaretta, J., Keenan, T., Vahabi, P., Todeschini, A., ... & Kang, Y. (2023). Upscaling Global Hourly GPP with Temporal Fusion Transformer (TFT). arXiv preprint arXiv:2306.13815.
- Besnard, S., Carvalhais, N., Arain, M. A., Black, A., Brede, B., Buchmann, N., ... & Reichstein, M. (2019). Memory effects of climate and vegetation affecting net ecosystem CO2 fluxes in global forests. PloS one, 14(2), e0211510.
- Kraft, B., Nelson, J. A., Walther, S., Gans, F., Weber, U., Duveiller, G., ... & Jung, M. (2024). On the added value of sequential deep learning for upscaling evapotranspiration. EGUsphere, 2024, 1-30.
- Stocker, B. D., Wang, H., Smith, N. G., Harrison, S. P., Keenan, T. F., Sandoval, D., Davis, T., and Prentice, I. C.: P-model v1.0: an optimality-based light use efficiency model for simulating ecosystem gross primary production, Geoscientific Model Development, 13, 1545–1581, https://doi.org/10.5194/gmd-13-1545-2020, 2020.
Citation: https://doi.org/10.5194/egusphere-2025-4082-RC2 -
RC3: 'Comment on egusphere-2025-4082', Anonymous Referee #3, 10 Nov 2025
This study investigated the role of environmental variables in the prediction of evaporative fraction (EF). Vegetation memory effects on evaporative fraction were studied through the use of a temporal deep neural network, which explicitly incorporates effects from historical conditions. The experiments explored how the network predictions behave during soil moisture dry-down periods, how different features contribute to the predictions at different points in history, and how average contributions from past feature values (at least 7 days in the past) vary across plant functional types and other ecosystem features (e.g. seasonality, rooting depth, etc.). Notably, variations in memory length are detected across different ecosystem types with variations in rooting depth.
Overall, the study brings very interesting insights. The analysis with a temporal network is an interesting approach that leads to substantial new insights into the behaviour of EF over time and across ecosystem types. I would recommend improving the manuscript in a few areas before publication.
First, the connection between EF and its learned memory effects and dynamic rooting depth could be explained further. It is a large component of the motivation (Figure 1), which describes how during a dry-down period, EF decays at different rates depending on the plants' water use strategies. Figure 9 then shows that there is a link between the learned memory effects and rooting depth. While it appears that an increased rooting depth is associated with stronger memory effects, the distributions of the memory effects overlap significantly between rooting depth "bins". As a major goal of the study is to show the potential of using the memory effect estimates for inferring rooting depth and even using them as a proxy, the work should more clearly state or show how this could be done. As the (previously estimated) rooting depth used here is a static estimate, one suggestion would be to further compare the memory effects during dry-down periods and non dry-down periods, a distinction that was also used in the overall model evaluation. In particular, variations in memory effects during dry-down periods may reveal additional insights on water use strategies.
Additionally, the model training setup brings some limitations that should be addressed. As each site appears in the training set with 2 site-years left out for testing, there is potential for overfitting on the patterns in the training data. While I think the motivation given for this setup is reasonable, it would be good to discuss these limitations. In particular, a good predictive performance does not automatically mean that the model represents physical processes. The use of interpretability methods, as done in this work, is suitable for tackling this. However, L221-223 should be revised or discussed later on in the manuscript as this conclusion should not be made based on overall performance.
Other points:
- Some details on the data and model are missing. Is the input data standardised? Which activation function is used for the model (L137)? Other model training details such as optimiser (settings), number of training iterations, data batch size, learning rate, etc. should also be included.
- L160-161: this description is confusing as it suggests the 10-model ensemble consists of models with different sequence lengths, whereas it becomes clear later on that the final model is an ensemble of models each trained with 365-day sequences.
- L186: clarify if separate years are used for testing and validation.
- L183-192: several remarks are repeated and should only be part of the methods section (on the ensembles in particular).
- L194: improved compared to what? What was the difference? If an ensemble is recommended, it would be useful to show more details of the comparison.
- Figure 3 is not mentioned in the text.
- L249: typo in the dates
- Figure 5: the caption appears to contain some interpretations. It would be more suitable to discuss these in the text.
- Figure 10: there is a typo in the top left. The seasonality index is not defined.
Citation: https://doi.org/10.5194/egusphere-2025-4082-RC3
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 2,238 | 205 | 16 | 2,459 | 44 | 26 | 24 |
- HTML: 2,238
- PDF: 205
- XML: 16
- Total: 2,459
- Supplement: 44
- BibTeX: 26
- EndNote: 24
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript applies a relatively new machine learning model to effectively capture the temporal variability of Evaporative Fration (EF). The model shows strong agreement with observations, demonstrating its capability to represent the dynamics of EF across different PFTs and climate zones. The authors also quantitatively assess the influence of surface hydrometeorological drivers on vegetation memory, providing valuable insights into soil-plant-atmosphere interactions.
Overall, I find this study to be of interest and with potential for publication. Several aspects of the methodological description and the presentation of the results require further clarification to ensure that readers can fully understand and evaluate the work.
Major comments:
Specific comments:
9: What are vegetation memory effects? Please explain.
10: I'm new to ML method, what's the difference between explainable ML and regular ML?
11: Should be "vegetation memory effects"
14: What's the advantage of this study compared to SFE (Surface Flux Equilibrium), which also only require routine weather station data for EF calculation?
17-19: Which corresponds to "water-limited" and "energy-limited" regimes?
24: Previously it says "vegetation memory effects", please be consistent.
31: I think here you don't have to emphasize root-zone, since SM-EF at surface soil layer should be stronger.
36: I feel the cause-and-effect of this paragraph should be rephrased as how vegetation memory influence EF, rather than the other way around. The goal of this study is to predict EF, and vegetation memory is one of the key drivers. Or you should place this content after the description of EF prediction.
42: Please explain the terminology the first time it appears.
74: Again, what's the difference between "explainable ML" and regular ML method?
115: Did you also mask those with energy imbalance larger than a threshold (e.g., (Rn-Gs)-(LH+SH)>30W/m2)? Please explicitly indicate it in the main context.
121: What is "corrected" LH? Please explain.
143: Section 3.2, I suggest the authors describe explicitly the difference of each baseline model from LSTM. To non-expert in ML, it looks like the settings of FNN is similar to LSTM, then why does FNN perform worse? Additionally, why do the authors want to add the SPI-based model, and why the SPI model performs so bad (R2<0.1)? Please add relevant discussions.
145: In figure 3 it says FNN, please be consistent.
176: I suggest the authors also add a brief description of EG in the main context, since it is an important component of this study.
212: -1.21, Is it a typo? Why R2 can be negative and larger than 1?
216: “Figure 4” should be “Figure 3”
293: From here to 299, can be moved to method part.
302: Please indicate this is Shortwave radiation (RAD)
307-309: Isn't air temperature correlated with radiation?
351: Figure8, Can you re-arrange this figure with x-axis ranging from shallow to deep rooting-depth? Plus, how do you normalize the Contributions? Why are contributions from all variables even lower than precipitation alone? Does this indicate there could be negative feedbacks between variables?
356: Should be “temporal EF machine learning model”.