the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: How well do evapotranspiration partitioning approaches perform in moss-covered wetlands?
Abstract. Evapotranspiration (ET) is the dominant hydrologic flux in wetlands, and partitioning into transpiration (T) and evaporation (E) is essential for understanding water and carbon dynamics, guiding sustainable water management practices, and predicting responses to climate change in these systems. However, the presence of moss layers in many wetlands challenges the assumptions of commonly used partitioning methods. This study evaluates the performance of nine eddy covariance (EC)-based ET partitioning approaches across multiple moss-covered wetland sites located in boreal and the Canadian Rocky Mountains. The partitioning results from each approach were compared against independent measurement-based estimates, which were obtained using flux chamber, micro-lysimeters, sap flow sensors, and EC systems. Results showed that none of the evaluated methods provided both accurate and precise estimates of ET partitioning (T:ET), and no single method emerged as the most suitable for studied ecosystems. Despite this, the general agreement between modelled and measured T:ET values indicates that many of these approaches still provide valuable insights. Applying multiple methods concurrently is recommended, where possible, to enhance confidence in partitioning results. For researchers with access to high-frequency EC data, priority should be given to high-frequency EC-based methods due to their more consistent performance across sites. The findings also highlight the limitations of current partitioning approaches under evaporation-dominated conditions, and underscore the need to examine the mechanistic role of mosses, as well as to improve how optimal stomatal conductance theory is conceptualized and implemented in model formulations.
- Preprint
(1201 KB) - Metadata XML
-
Supplement
(157 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-4252', Anonymous Referee #1, 31 Oct 2025
Review of "How well do evapotranspiration partitioning approaches perform in moss-covered wetlands?"This paper delves into an important topic of evapotranspiration (ET) partitioning in moss-covered wetlands, comparing different methods against "direct" measurements. The topic is relevant and the manuscript is generally well written and easy to follow. To my knowledge, this is the first manuscript that compares such a wide range of ET partitioning methods, including high-frequency EC and GPP-informed methods. While the study sites are not ideal given the high heterogeneous conditions, I believe it can still provide useful insights on the performance of the different methods. My only concern about the study sites is the presence of flooded areas, which might complicate the interpretation of the results or even invalidate some of the methods. Overall, I think the manuscript has merit and can be published after addressing the comments below.Line 106: Given that Eichelmann et al. (2022)'s method was developed specifically for flooded ecosystems, it is almost required to be included in this analyses. I would strongly encourage the authors to contact the group by Eichelmann et al. (2022) to get help using their code.Line 128: study sitesOne important aspect of the high-frequency partitioning methods is that the ground is a source of CO2 (respiration) and source of water vapor (evaporation). Moss photosynthesis at the ground level might not necessarily invalidate the methods if we consider the net effect (surface_level_respiration - moss_P), as long as the CO2 signal from the surface is positive and well mixed before it reaches the canopy top and sensor (i.e., different from being fully mixed with the plant canopy signal). What is unclear to me is the role of the water in flooded parts. If water is a sink of CO2, then the net CO2 flux from surface level is negative, and none of these methods would make sense given the decoupling between water vapor and co2.Line 225: Results and discussionI agree with the authors when they say that direct measurements cannot be directly compared to any of the approaches since they all use EC fluxes, covering a much larger and heterogeneous footprint. I would also add that even under ideal homogeneous conditions, direct measurements (sap flow, chambers, etc) still suffer from other limitations and cannot be considered as the "truth", but as proxies.Fig 1 and 2: Are "measurement-based" data an average including all "direct measurements"? Is there any measure of variance, say, across chambers/plants (for sap flow) or maybe across methods?Line 272: Is it possible that data quality was worse when T/ET < 0.5?Line 300: ConclusionI would refrain from talking about more or less "accurate" results. One of the main challenges of flux partitioning is that we really do not know the true values of any of the components, and while direct measurements are good proxies of trends, they have limitations and might even required some level of parameterization such as sap flow measurements.Citation: https://doi.org/
10.5194/egusphere-2025-4252-RC1 -
AC1: 'Reply on RC1', Yi Wang, 26 Jan 2026
Reviewer #1:
Review of "How well do evapotranspiration partitioning approaches perform in moss-covered wetlands?"
This paper delves into an important topic of evapotranspiration (ET) partitioning in moss-covered wetlands, comparing different methods against "direct" measurements. The topic is relevant and the manuscript is generally well written and easy to follow. To my knowledge, this is the first manuscript that compares such a wide range of ET partitioning methods, including high-frequency EC and GPP-informed methods. While the study sites are not ideal given the high heterogeneous conditions, I believe it can still provide useful insights on the performance of the different methods. My only concern about the study sites is the presence of flooded areas, which might complicate the interpretation of the results or even invalidate some of the methods. Overall, I think the manuscript has merit and can be published after addressing the comments below.
Line 106: Given that was developed specifically for flooded ecosystems, it is almost required to be included in this analyses. I would strongly encourage the authors to contact the group by Eichelmann et al. (2022) to get help using their code.
Response: Thank you very much for the overall positive comments. We appreciate the reviewer’s recognition of the novelty of the multi-method comparison and agree that flooding can complicate the interpretation of ET partitioning results.
We would like to clarify that flooded conditions were not present at all sites. Only two sites experienced periods when the water table was at or above the peat surface, though not necessarily the moss surface. At the Burstall site, flooding occurred after 17 August 2021 and represents less than one quarter of the study period. This interval also coincides with the largest discrepancies between model-based and measurement-based ET partitioning. In contrast, the Poplar site was consistently flooded throughout the study period, yet most methods showed general agreement with the measurement-based results. Together, these contrasting responses suggest that flooding does not necessarily invalidate ET partitioning methods, but that its effects may be site-specific.
Overall, considering the performance at these two sites and the consistency across the broader site set, we believe the main conclusions of the manuscript remain robust. That said, we agree that flooding effects require more rigorous analysis. We will include more analysis and revise the Results and Discussion to directly address the influence of flooded conditions on model performance.
In addition, we will include an additional ET partitioning method proposed by Stapleton et al. (2022), as suggested by Dr. Eichelmann. This method shares the same underlying concept as that of Eichelmann et al. (2022) and was developed specifically for flooded ecosystems, which assumes negligible nighttime transpiration, an assumption we do not expect to be violated by the presence of mosses. While the primary aim of this study is to evaluate methods that may not be directly applicable to moss-covered wetlands, we believe that including this approach strengthens the comparison and provides useful guidance for future applications.
Reference:
Stapleton, A., Eichelmann, E., Roantree, M., 2022. A framework for constructing machine learning models with feature set optimisation for evapotraspiration partitioning. Applied Computing and Geoscience, 100105. https://doi.org/10.1016/j.acags.2022.100105.
Line 128: study sites
One important aspect of the high-frequency partitioning methods is that the ground is a source of CO2 (respiration) and source of water vapor (evaporation). Moss photosynthesis at the ground level might not necessarily invalidate the methods if we consider the net effect (surface_level_respiration - moss_P), as long as the CO2 signal from the surface is positive and well mixed before it reaches the canopy top and sensor (i.e., different from being fully mixed with the plant canopy signal). What is unclear to me is the role of the water in flooded parts. If water is a sink of CO2, then the net CO2 flux from surface level is negative, and none of these methods would make sense given the decoupling between water vapor and CO2.
Response: We thank the reviewer for raising this important point regarding the role of flooding in high-frequency eddy covariance (EC) based ET partitioning methods. As mentioned in our previous response, the effects of flooded conditions will be addressed explicitly in the revised manuscript.
We would like to clarify, however, that the assumption of a positive surface CO₂ signal may not always hold in moss-covered wetlands. Strong moss photosynthesis can result in net CO₂ uptake at the surface, which means that surface-level respiration minus moss photosynthesis can be negative. Evidence for this behavior is shown in Walker et al. (2017), where chamber measurements on Sphagnum–peat columns (their Fig. 4) reveal clear transitions between positive and negative hourly CO₂ fluxes, as well as sustained daily net CO₂ uptake from mid-June to mid-July. These observations indicate that the presence of a moss layer can challenge the applicability of high-frequency EC based ET partitioning methods that assume a net positive surface CO₂ signal.
For our study sites, direct CO₂ measurements during the study periods were unfortunately not available. However, existing evidence suggests that negative surface CO₂ flux signals are plausible, particularly at the Poplar site. Following a wildfire in 2016, a chamber-based ground CO₂ flux study conducted in 2017 at the Poplar site measured moss-dominated plots and showed that the unburned ground surface acted as a net CO₂ sink until early August (van Beest, 2019, Fig. 3-1). This further supports the possibility that surface CO₂ uptake may occur at some of our sites.
These considerations help clarify the motivation of our study. Our goal is to evaluate the performance of ET partitioning methods in moss-covered wetlands, where surface CO₂ signals may violate key assumptions of some methods. Given the widespread presence of mosses in many peatlands, we hope that our results provide useful guidance for researchers working in these ecosystems.
References:
Walker, A. P., K. R. Carter, L. Gu, P. J. Hanson, A. Malhotra, R. J. Norby, S. D. Sebestyen, S. D. Wullschleger, and D. J. Weston (2017), Biophysical drivers of seasonal variability in Sphagnum gross primary production in a northern temperate bog, J. Geophys. Res. Biogeosci., 122, 1078–1097,https://doi.org/10.1002/2016JG003711.
Van Beest, C. Deeper Burning Increases Available Phosphorus, Promotes Moss Growth, and Carbon Dioxide Uptake in a Fen Peatland One-Year Post-Wildfire in Fort McMurray, AB. Master’s thesis, University of Waterloo, 2019.
Line 225: Results and discussion
I agree with the authors when they say that direct measurements cannot be directly compared to any of the approaches since they all use EC fluxes, covering a much larger and heterogeneous footprint. I would also add that even under ideal homogeneous conditions, direct measurements (sap flow, chambers, etc) still suffer from other limitations and cannot be considered as the "truth", but as proxies.
Response: Thank you for this important clarification, which we fully agree with. In the manuscript, we already note several limitations of direct measurements, including flux chambers, porometers, and lysimeters (Lines 202–205). We will further emphasize in the revised text that even under ideal and homogeneous conditions, these approaches cannot be considered “truth,” but rather proxies with their own inherent uncertainties.
Fig 1 and 2: Are "measurement-based" data an average including all "direct measurements"? Is there any measure of variance, say, across chambers/plants (for sap flow) or maybe across methods?
Response: Thank you for these thoughtful questions. We think that this information should be clarified and expanded in the revised manuscript.
To answer the first question briefly, the measurement-based data shown in Fig. 1 and 2 are not simple averages of all direct measurements.
For the Sibbald, Burstall, and Bonsai sites, the measurement-based T:ET time series are derived from a site-calibrated and validated Shuttleworth–Wallace (S–W) model rather than direct averaging of ground evaporation measurements. This approach was necessary because micro-lysimeters provide averaged daily evaporation over multi-day periods and because eddy-covariance ET measurements at these sites include intermittent data gaps. To address this, we developed a surface resistance formulation that explicitly accounts for moss and litter effects on ground evaporation (Wang et al., 2023) and incorporated it into the S–W model.
The S–W model performance was evaluated against eddy-covariance measurements, with linear regressions of modelled versus observed latent heat flux showing slopes ranging from 0.87 to 1.16, intercepts between −18.5 and 3.8 W m⁻², and R² values greater than 0.78 across the three sites. Modelled ground evaporation was further compared with microlysimeter-based measurements conducted over peat-only surfaces, peat surfaces covered with moss, and peat surfaces covered with both moss and litter, under both open conditions and a willow canopy. Modelled evaporation was generally slightly lower than the measurements, which is expected given that lysimeter experiments were conducted under controlled conditions with sedge removal and therefore reduced shading. Importantly, modelled evaporation fell within the observed measurement range and reproduced similar temporal variability. For conditions consistent with the S–W model framework, namely litter- and moss-covered surfaces under canopy, the agreement between modelled evaporation and measurements was particularly strong, with a linear regression given by λE_modelled = 0.91·λE_measured − 4.46 (W m⁻²) and R² = 0.85. These results support the use of the S–W model to construct T:ET time series at the three sites. Additional validation details are provided in Wang (2025, pp. 110–112), and key procedures and results will be summarized in the revised manuscript.
At the Poplar site, the measurement-based T:ET time series was constructed differently from the other three sites, as this site includes direct eddy-covariance ET measurements and upscaled sap flow measurements for dominant tree and understory species, informed by forest inventory data and LiDAR-based vegetation classification, as well as ground-based chamber flux measurements (Gabrielli, 2016). Ground evaporation was calculated as the residual (ET minus upscaled site transpiration (T)) and evaluated against chamber measurements dominated by moss. This comparison yielded a regression slope of 0.96, an intercept of 0.08 mm h⁻¹, and an R² of 0.26. Because both T and ET were measured at the same timestep, whereas chamber measurements are instantaneous and inherently variable, we used the sap-flow-derived transpiration together with eddy-covariance ET to construct the Poplar T:ET time series. We acknowledge the relatively low R², which reflects the scatter in the chamber measurements, but the near-unity slope and small intercept indicate good overall agreement. On this basis, we consider the resulting T:ET time series to be a reasonable representation for this site.
Overall, constructing measurement-based T:ET estimates is non-trivial and that these data should not be considered “truth,” but rather proxies with inherent uncertainty. We therefore appreciate the reviewer’s comment regarding the interpretation of “truth” data. In response, we will provide more technical details on the measurements and the construction of the measurement-based T:ET time series. This suggestion was also raised by another reviewer, and we will address it explicitly in the revised manuscript.
Regarding the second question, the table below summarizes errors and uncertainties associated with flux measurements relevant to this study, which will be incorporated into the revised manuscript.
Table 1. Examples of Errors and Uncertainties of Flux Measurements
Flux contributions
Method
Source of errors
Normalized Root Mean Square Error (%)
Reference
ET
EC
One-point Uncertainty
8.3
Kroon et al. 210
T
Sap flow system (stem heat balance method)
System measurement resolution,
Steady-state assumption,
Heat-storage error
15.0
Groot and King, 1992
Shackel et al., 1992
Grime et al., 1995
Perämäki et al., 2001
ET (ground)
Chamber
Lower Q*,
Altered Microclimate
6.6
Reicosky et al., 1983
McLeod et al., 2004
Hamel et al., 2015
E
Micro-lysimeter
Soil disturbances,
Handling and weighing
17.8
Calculated using micro-lysimeter measurements from this study
(Note: Table adapted from Gabrielli 2016).
References:
Gabrielli, E. C.: Partitioning Evapotranspiration in Forested Peatlands within the Western Boreal Plain, Fort McMurray, Alberta, Canada, Master’s thesis, Wilfrid Laurier University, 2016.
Grime, V.L., Morrison, J.I.L., Simmonds, L.P. 1995. Including heat storage term in sap flow measurements with the heat storage method. Agricultural and Forest Meteorology, 74: 1-25. https://doi.org/10.1016/0168-1923(94)02187-O.
Groot, A., King, K.M. 1992.Measurement of sap flow by heat balance method: numerical analysis and application to coniferous seedlings. Agricultural and Forest Meteorology, 59: 289-308. https://doi.org/10.1016/0168-1923(92)90098-O.
Hamel, P., Mchugh, I., Coutts, A., Daly, E., Beringer, J., Fletcher, T.D. 2015. Automated Chamber System to Measure Field Evapotranspiration Rates. Journal of Hydrologic Engineering, 20(2): 04014037. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001006.
Kroon, P.S., Hensen, A., Jonker, H.J.J., Ouwersloot. H.G., Vermeulen, A.T., Bosveld, F.C. 2010. Uncertainties in eddy covariance flux measurements assessed from CH4 and N2O observations. Agricultural and Forest Meteorology, 150: 806-816. https://doi.org/10.1016/j.agrformet.2009.08.008.
McLeod, M.K., Daniel, H., Faulkner, R. Murison, R. 2004. Evaluation of an enclosed portable chamber to measure crop and pasture actual evapotranspiration at small scale. Agricultural Water Management, 67: 15-34. https://doi.org/10.1016/j.agwat.2003.12.006.
Perämäki, M., Vesala, T., Nikinmaa, E. 2001. Analyzing the applicability of the heat balance method for estimating sap flow in boreal forest conditions. Boreal Environment Research, 6: 29-43. https://doi.org/10.60910/4tn6-73en.
Reicosky, D.C., Sharratt, B.S., Ljungkull, J.E., Baker, D.G. 1982. Comparison of Alfalfa Evapotranspiration Measured By A Weighing Lysimeter And A Portable Chamber. Agricultural Meteorology, 28: 205-211. https://doi.org/10.1016/0002-1571(83)90026-2.
Wang, Y., Petrone, R. M., and Van Huizen, B.: The dependence of evaporative efficiency of vegetated surfaces on ground cover mass fractions in vegetated soils in mesic ecosystems, Hydrological Processes, 37, e15036, https://doi.org/10.1002/hyp.15036, 2023.
Wang, Y.: Uncovering the understudied role of microtopography and ground cover on evapotranspiration partitioning in high-elevation wetlands in the Canadian Rocky Mountains, Doctoral dissertation, University of Waterloo, 2025.
Line 272: Is it possible that data quality was worse when T/ET < 0.5?
Response: Thank you for this insightful question. Yes, it is possible that data quality was poorer during periods when T/ET < 0.5. We found that these low T/ET values generally coincide with days characterized by low net radiation, low air temperature, and high soil moisture. This does not necessarily imply flooded conditions, as the Poplar site was consistently flooded throughout the study period while maintaining T/ET values above 0.5.
Under such conditions, several data quality issues may arise. For eddy-covariance measurements, high moisture availability combined with weak vertical mixing can lead to surface evaporation that is not efficiently transported upward. As a result, latent heat flux measurements can become noisy and are often underestimated. This suppressed vertical mixing may also introduce bias in high-frequency EC-based ET partitioning methods.
Similarly, low net radiation and temperature generally reduce transpiration rates, leading to low sap flow magnitudes. When sap flow signals are weak, instrument noise and relative measurement uncertainty become more influential, potentially affecting transpiration estimates.
We will explicitly address these data quality considerations in the revised manuscript. In addition, another reviewer raised a related point regarding the influence of meteorological conditions on model performance. We will therefore include a more detailed analysis of how environmental conditions affect both data quality and ET partitioning performance in the revision.
Line 300: Conclusion
I would refrain from talking about more or less "accurate" results. One of the main challenges of flux partitioning is that we really do not know the true values of any of the components, and while direct measurements are good proxies of trends, they have limitations and might even required some level of parameterization such as sap flow measurements.
Response: Thank you for this helpful suggestion. We agree that referring to results as more or less “accurate” is not appropriate given the inherent uncertainties in flux partitioning and the proxy nature of both model- and measurement-based estimates. We will revise the Conclusions to remove references to “accuracy” and instead frame the discussion in terms of relative performance, consistency, and agreement among methods.
Citation: https://doi.org/10.5194/egusphere-2025-4252-AC1
-
AC1: 'Reply on RC1', Yi Wang, 26 Jan 2026
-
RC2: 'Comment on egusphere-2025-4252', Elke Eichelmann, 06 Jan 2026
Review: egusphere-2025-4252
Wang et al., Technical note: How well do evapotranspiration partitioning approaches perform in moss-covered wetlands?
In the interest of transparency, I would like to disclose that some of the work discussed in this manuscript and my review relates to some of my own publications (e.g. Eichelmann et al., 2020; Stapleton et al., 2022).
General remarks:
In this manuscript, Wang et al. conduct a comparative study to evaluate nine different evapotranspiration partitioning tools/models for eddy covariance data at four moss-covered wetland sites over one growing season at each site (total of four site-years).
They conclude that none of the tools captures all the temporal and spatial variability, but that in combination they can provide useful information on T:ET estimates for these sites.
Overall the manuscript investigates an interesting and relevant subject. Cross-site and cross-tool comparisons of eddy covariance based ET partitioning methods are still very rare and are much needed. This is especially true for some of the more unique and challenging ecosystems such as those investigated in this manuscript, where models aren’t routinely evaluated. The manuscript is well written and the science is generally sound.
That all being said, I think that the manuscript at the moment, unfortunately, falls short of being actually useful for other researchers in a concrete way. The overall conclusions are quite broad and vague and don’t go much beyond what existing knowledge already told us. I think there are two main reasons for this:
1) My main concern is the lack of quantitative assessment/comparison of the ET partitioning performance. I acknowledge that it is extremely difficult if not impossible to get ‘ground truth’ data to validate model performance against, and commend the authors on their efforts to obtain some validation data for their sites. While I agree with the authors that the micro-lysimeter and Shuttleworth-Wallace model validation data is not a true ground truth and any comparison between them has to be taken with a grain of salt, I still think that there should be a quantitative evaluation of the model performance, rather than just a qualitative assessment. At the end of the day, there is a set of independent data, which can be used as a baseline for comparison across the models. While we might not expect a perfect fit with these baseline data, we can still get quantifiable differences across models for a more robust discussion. See my specific comments further below for more detail on this.
2) I also would have liked to see a bit more detail on the environmental/meteorological conditions at these sites across the study periods and how they could be influencing some of the observed patterns (or lack thereof). There are no high temporal resolution data presented (only an overall range or average for each site across the measurement period) on water table depth, temperature (soil/water/air), VPD, or precipitation. As the authors state, some of these parameters exert strong influences on biosphere-atmosphere exchange of water, and some of the tools/models are better or worse at incorporating these impacts. Providing some of these meteorological data alongside the partitioning results and/or evaluating under what conditions certain tools perform better/worse would improve the manuscript and make it more useful for other researchers to help decide which tools might be best in their context. But I recognize that the limited data availability might restrict how much can be learned from this.
Overall this study is a very worthwhile effort, but requires a bit more quantitative analysis to realize its full potential.
Specific comments:
Introduction: Well written, concise introduction with a clear justification for the study and sufficient background information provided. Relevant literature is cited, but it might be interesting to have a look at Speranskaya et al. (2024) also.
Methods
Line 106:
While the original ANN based partitioning code from our 2020 paper unfortunately cannot be made publicly available, a subsequent publication using different machine learning algorithms, but based on the same underlying concept, has since been published (Stapleton et al., 2022). The python code for this is publicly accessible and linked in that publication. It might be an interesting comparison to include, given that it works on evaporation prediction and doesn’t require ecosystem carbon-water coupling. As far as I can see all other non-high-frequency based methods in this manuscript rely on carbon-water coupling, so it could be interesting to compare with the machine learning based method and see if different patterns emerge.
Line 149, Table 2: Please add information on the measurement period for each site (start and end date)
Line 161: Maybe add the Pastorello et al. (2020) reference for the Fluxnet data processing protocol
Line 173-175: I understand that the detailed upscaling procedures have been described elsewhere, but I still think the authors should provide a short (couple of sentences) summary of the main steps to help the reader get a sense of what is involved here without having to dig up the other publications and reading up on the detail.
Results and Discussion
Figure 1 and 2: For Bonsai, it looks as though there are more ‘Measurement-based’ black circles in Figure 2 compared to Figure 1, especially in the late growing season after Aug 15, but there are also some of the higher values missing earlier in the growing season (e.g. a measurement based value close to 1 around August 1). Why are some of those measurements omitted from Figure 1?
There doesn’t seem to be a lot of temporal variability in the measured T:ET values and most of the models don’t capture the temporal variability well. I am wondering, are any of these models actually better than just assuming a constant T:ET ratio? I think it would be good to add that in for comparison.
I would also like to see a bit more of a quantitative analysis on the performance of the models. At the moment, everything is based on reading off approximate performance from the graphs and is quite qualitative, e.g. lines 290-292: “Both CP and CWSC [...] produced a narrower spread of T:ET estimates than uWUE ...” I understand that the ‘ground truth’ data based on Micro-lysimeters and the Shuttleworth-Wallace model isn’t really a ground truth and fully comparable 1:1, but I still think some level of quantitative analysis would be beneficial. While we maybe wouldn’t expect the graphs in Fig 3 to fall on the 1:1 line, even just providing R2 values (within and across sites) for how well the models track temporal and spatial patterns would be interesting and will make the resulting discussion more robust. The authors may also want to explore other measures of goodness of fit, which might be more appropriate in this particular situation (e.g. Nash–Sutcliffe coefficient).
Line 296-298: I think it is a bit of a stretch to say these models still offer valuable insights. Just from looking at the graphs in Fig 3, it seems to me that assuming a constant T:ET ratio based on an approximate global average of 0.6 (e.g. Wei et al., 2017; Liu et al., 2022) would perform just as well across sites and in some cases also within sites (especially at Brustall). So this statement really needs to be backed up with some quantitative analysis.
Summary and closing thoughts
Line 305: Again, here it says that the models captured similar temporal patterns of T:ET. I’m not sure what this statement is really based on given that most sites don’t exhibit much temporal variation at all, and and at the one site that does (Brustall) only one out of 9 models manages to capture that temporal trend.
References:
Liu, Y., Zhang, Y., Shan, N., Zhang, Z., & Wei, Z., 2022. Global assessment of partitioning transpiration from evapotranspiration based on satellite solar-induced chlorophyll fluorescence data. Journal of Hydrology, 612. https://doi.org/10.1016/j.jhydrol.2022.128044
Pastorello, G., Trotta, C., Canfora, E. et al., 2020. The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. Sci Data 7, 225. https://doi.org/10.1038/s41597-020-0534-3
Speranskaya, L., Campbell, D. I., Lafleur, P. M., and Humphreys, E. R., 2024. Peatland evaporation across hemispheres: contrasting controls and sensitivity to climate warming driven by plant functional types. Biogeosciences, 21, 1173–1190. https://doi.org/10.5194/bg-21-1173-2024
Stapleton, A., Eichelmann, E., Roantree, M., 2022. A framework for constructing machine learning models with feature set optimisation for evapotraspiration partitioning. Applied Computing and Geoscience, 100105. https://doi.org/10.1016/j.acags.2022.100105
Wei, Z., Yoshimura, K., Wang, L., Miralles, D. G., Jasechko, S., & Lee, X., 2017. Revisiting the contribution of transpiration to global terrestrial evapotranspiration. Geophysical Research Letters, 44(6), 2792–2801. https://doi.org/10.1002/2016GL072235
Citation: https://doi.org/10.5194/egusphere-2025-4252-RC2 -
AC2: 'Reply on RC2', Yi Wang, 26 Jan 2026
Wang et al., Technical note: How well do evapotranspiration partitioning approaches perform in moss-covered wetlands?
In the interest of transparency, I would like to disclose that some of the work discussed in this manuscript and my review relates to some of my own publications (e.g. Eichelmann et al., 2020; Stapleton et al., 2022).
General remarks:
In this manuscript, Wang et al. conduct a comparative study to evaluate nine different evapotranspiration partitioning tools/models for eddy covariance data at four moss-covered wetland sites over one growing season at each site (total of four site-years).
They conclude that none of the tools captures all the temporal and spatial variability, but that in combination they can provide useful information on T:ET estimates for these sites.
Overall the manuscript investigates an interesting and relevant subject. Cross-site and cross-tool comparisons of eddy covariance based ET partitioning methods are still very rare and are much needed. This is especially true for some of the more unique and challenging ecosystems such as those investigated in this manuscript, where models aren’t routinely evaluated. The manuscript is well written and the science is generally sound.
That all being said, I think that the manuscript at the moment, unfortunately, falls short of being actually useful for other researchers in a concrete way. The overall conclusions are quite broad and vague and don’t go much beyond what existing knowledge already told us.
Response: Thank you for your thoughtful assessment of our work. We appreciate your recognition of the relevance of cross-site and cross-method evaluations of ET partitioning approaches, particularly for moss-covered wetlands where such comparisons are still rare.
We agree that, in its current form, the conclusions may appear broad and could be more directly actionable for other researchers. The core motivation of this study is that ET partitioning methods are seldom evaluated in wetland ecosystems, and that the presence of mosses may fundamentally challenge key assumptions underlying many commonly used approaches. Our study explicitly tests this uncertainty by evaluating a wide range of EC-based ET partitioning methods against measurement-informed estimates in moss-covered wetlands, which, to our knowledge, has not been done previously.
That said, we fully agree that the manuscript should provide clearer, more concrete guidance. In response to this comment, we will revise the Results and Discussion to more explicitly identify the conditions under which specific methods perform well or poorly, highlight common failure modes, and clarify the strengths and limitations of each approach in wetland settings. We will also refine the Conclusions to move beyond general statements and provide more targeted recommendations for researchers selecting ET partitioning methods in moss-covered and hydrologically complex ecosystems.
We believe these revisions address your concern and substantially improve the practical usefulness of the manuscript. Detailed responses to your specific comments and suggestions are provided below.
I think there are two main reasons for this:
- My main concern is the lack of quantitative assessment/comparison of the ET partitioning performance. I acknowledge that it is extremely difficult if not impossible to get ‘ground truth’ data to validate model performance against, and commend the authors on their efforts to obtain some validation data for their sites. While I agree with the authors that the micro-lysimeter and Shuttleworth-Wallace model validation data is not a true ground truth and any comparison between them has to be taken with a grain of salt, I still think that there should be a quantitative evaluation of the model performance, rather than just a qualitative assessment. At the end of the day, there is a set of independent data, which can be used as a baseline for comparison across the models. While we might not expect a perfect fit with these baseline data, we can still get quantifiable differences across models for a more robust discussion. See my specific comments further below for more detail on this.
Response: Thank you for this constructive and important suggestion. We agree that a more quantitative evaluation would strengthen the manuscript and improve its usefulness for other researchers.
Our initial hesitation in conducting a quantitative comparison stemmed from the fact that the measurement-based data are not true “ground truth,” and we wanted to avoid implying a level of certainty or accuracy that these data cannot provide. This concern was also raised by Reviewer #1, who agreed that such comparisons should be interpreted with caution. That said, we agree with your point that, even in the absence of true ground truth, the measurement-based estimates can still serve as an independent baseline for quantifying relative differences among models.
In response to this comment, we will therefore treat the measurement-based data as a reference baseline and conduct a quantitative evaluation of model performance. We will focus on relative performance metrics across models rather than absolute accuracy, and we will clearly emphasize the limitations and uncertainties associated with the baseline data in both the Methods and Discussion. We believe this approach balances rigor with appropriate caution and will provide more concrete, comparable information to support model selection and interpretation.
- I also would have liked to see a bit more detail on the environmental/meteorological conditions at these sites across the study periods and how they could be influencing some of the observed patterns (or lack thereof). There are no high temporal resolution data presented (only an overall range or average for each site across the measurement period) on water table depth, temperature (soil/water/air), VPD, or precipitation. As the authors state, some of these parameters exert strong influences on biosphere-atmosphere exchange of water, and some of the tools/models are better or worse at incorporating these impacts. Providing some of these meteorological data alongside the partitioning results and/or evaluating under what conditions certain tools perform better/worse would improve the manuscript and make it more useful for other researchers to help decide which tools might be best in their context. But I recognize that the limited data availability might restrict how much can be learned from this.
Response: Thank you for this constructive suggestion. We agree that providing more detail on environmental and meteorological conditions, and explicitly linking them to model performance, would improve the manuscript’s usefulness.
We note that Reviewer #1 raised related questions regarding model behavior under flooded and evaporation-dominated conditions. Fortunately, we have continuous meteorological measurements and soil moisture data for all sites, as well as manual water table measurements for three of the four sites. While the water table data are not continuous, most measurements coincide with periods when lysimeter or chamber measurements were conducted, which allows for targeted analysis.
In response to this comment, we will expand the Results and Discussion to more explicitly examine how key environmental drivers, including radiation, temperature, soil moisture, water table depth, and atmospheric demand, influence ET partitioning performance. Where possible, we will evaluate the conditions under which specific tools perform better or worse.
We acknowledge that data availability still limits how comprehensively environmental controls can be diagnosed, and we will clearly state these limitations. Nonetheless, we believe that this additional analysis will provide more actionable guidance for researchers selecting ET partitioning methods under similar environmental conditions.
Overall this study is a very worthwhile effort, but requires a bit more quantitative analysis to realize its full potential.
Response: Thank you for this encouraging assessment. We agree and will include additional quantitative analyses to strengthen the manuscript and better realize its full potential.
Specific comments:
Introduction: Well written, concise introduction with a clear justification for the study and sufficient background information provided. Relevant literature is cited, but it might be interesting to have a look at Speranskaya et al. (2024) also.
Response: Thank you for the positive feedback and for the suggested reference. We will review Speranskaya et al. (2024) and incorporate it where relevant. We agree that including additional recent literature will help better situate our study within the current state of the field, and we will strengthen the Introduction accordingly in the revised manuscript.
Methods
Line 106:
While the original ANN based partitioning code from our 2020 paper unfortunately cannot be made publicly available, a subsequent publication using different machine learning algorithms, but based on the same underlying concept, has since been published (Stapleton et al., 2022). The python code for this is publicly accessible and linked in that publication. It might be an interesting comparison to include, given that it works on evaporation prediction and doesn’t require ecosystem carbon-water coupling. As far as I can see all other non-high-frequency based methods in this manuscript rely on carbon-water coupling, so it could be interesting to compare with the machine learning based method and see if different patterns emerge.
Response: Thank you for this constructive and helpful suggestion. Reviewer #1 also noted that including the method developed by your group would substantially strengthen the manuscript. Given that two of our sites experienced flooded conditions, this approach, which is based on the same underlying concept as the earlier ANN-based method, is particularly relevant and well suited for these environments. We therefore agree that it provides an important and informative comparison with the other methods evaluated in this study, and we will include this method in the revised manuscript.
Line 149, Table 2: Please add information on the measurement period for each site (start and end date)
Response: Thank you for this helpful suggestion. We will add the start and end dates of the measurement period for each site to Table 2 in the revised manuscript.
Line 161: Maybe add the Pastorello et al. (2020) reference for the Fluxnet data processing protocol
Response: Thank you for pointing out this oversight. We agree that this is an important reference and will add Pastorello et al. (2020) to the revised manuscript.
Line 173-175: I understand that the detailed upscaling procedures have been described elsewhere, but I still think the authors should provide a short (couple of sentences) summary of the main steps to help the reader get a sense of what is involved here without having to dig up the other publications and reading up on the detail.
Response: Thank you for this helpful suggestion. Reviewer #1 raised similar questions regarding the construction of the measurement-based T:ET time series, and we agree that providing a brief summary of the upscaling procedure would improve clarity. In the revised manuscript, we will add a short description outlining the main steps involved, while still referring readers to the original publications for full methodological details.
Results and Discussion
Figure 1 and 2: For Bonsai, it looks as though there are more ‘Measurement-based’ black circles in Figure 2 compared to Figure 1, especially in the late growing season after Aug 15, but there are also some of the higher values missing earlier in the growing season (e.g. a measurement based value close to 1 around August 1). Why are some of those measurements omitted from Figure 1?
Response: Thank you for pointing out this inconsistency. You are correct that the measurement-based data points should be identical between Figures 1 and 2 for the same site. Upon carefully reviewing our data processing workflow, we identified that this discrepancy was caused by an issue during dataset merging in R. Specifically, some ET partitioning methods only produce outputs for dates with valid model results. When left joins (a tool in R) were used to align datasets by date, differences in the order of dataframe merging between the Group 1 and Group 2 plots resulted in some dates being inadvertently excluded. In simple terms, a left join keeps all dates in the first dataframe, while the second dataframe is matched to it based on the same date. As a result, any dates that are not present in the first dataframe do not appear in the merged dataset.
We found that this issue affected not only the Bonsai site but also a small number of data points at other sites. We will correct the data merging procedure to ensure that the measurement-based time series are consistently represented across all figures, and we will update Figures 1 and 2 accordingly in the revised manuscript.
There doesn’t seem to be a lot of temporal variability in the measured T:ET values and most of the models don’t capture the temporal variability well. I am wondering, are any of these models actually better than just assuming a constant T:ET ratio? I think it would be good to add that in for comparison.
Response: Thank you for this very good point. We agree that comparing model performance against a constant T:ET assumption would be informative, and we will include this comparison in the revised manuscript.
We would like to clarify, however, that the measurement-based data do exhibit temporal variability at all sites. While the magnitude of variability in T:ET may appear modest, this is partly because T:ET is a ratio bounded between 0 and 1. When expressed in terms of transpiration and evaporation separately, the temporal variability becomes more pronounced. Capturing these variations is therefore still important, both for model evaluation and for assessing whether process-based models are able to reproduce observed dynamics.
We also find that most models do capture some degree of the temporal variability in the measurement-based data, particularly the Group 2 methods, which tend to show stronger temporal fluctuations than the observations. At the same time, we acknowledge that T:ET remains relatively stable over much of the measurement period at several sites.
As a result, whether a constant T:ET assumption is sufficient likely depends on the temporal scale and objectives of a given study. We agree that this is an important point to discuss, and we will explicitly address it in the revised manuscript alongside the added constant T:ET comparison.
I would also like to see a bit more of a quantitative analysis on the performance of the models. At the moment, everything is based on reading off approximate performance from the graphs and is quite qualitative, e.g. lines 290-292: “Both CP and CWSC [...] produced a narrower spread of T:ET estimates than uWUE ...” I understand that the ‘ground truth’ data based on Micro-lysimeters and the Shuttleworth-Wallace model isn’t really a ground truth and fully comparable 1:1, but I still think some level of quantitative analysis would be beneficial. While we maybe wouldn’t expect the graphs in Fig 3 to fall on the 1:1 line, even just providing R2 values (within and across sites) for how well the models track temporal and spatial patterns would be interesting and will make the resulting discussion more robust. The authors may also want to explore other measures of goodness of fit, which might be more appropriate in this particular situation (e.g. Nash–Sutcliffe coefficient).
Response: Thank you for these constructive and helpful suggestions. We agree that relying primarily on qualitative interpretation limits the robustness of the comparison, and that incorporating quantitative performance metrics would strengthen the analysis.
In the revised manuscript, we will add a quantitative evaluation of model performance using the measurement-based data as a reference baseline, while clearly emphasizing the associated uncertainties. We will explore goodness-of-fit measures that are more appropriate for this application, including the Nash–Sutcliffe efficiency coefficient, as you suggested.
Line 296-298: I think it is a bit of a stretch to say these models still offer valuable insights. Just from looking at the graphs in Fig 3, it seems to me that assuming a constant T:ET ratio based on an approximate global average of 0.6 (e.g. Wei et al., 2017; Liu et al., 2022) would perform just as well across sites and in some cases also within sites (especially at Brustall). So this statement really needs to be backed up with some quantitative analysis.
Response: Thank you for this important comment. We agree that, as currently presented, the statement that these models offer valuable insights requires stronger quantitative support. We also agree that assuming a constant T:ET ratio, such as a global average value around 0.6, may perform similarly in some cases.
In response to this comment, we will revise the manuscript to explicitly test this hypothesis by including a constant T:ET benchmark in the analysis. We will also revise the text to ensure that any claims regarding the value of these models are supported by quantitative evidence and are framed more cautiously where appropriate.
Summary and closing thoughts
Line 305: Again, here it says that the models captured similar temporal patterns of T:ET. I’m not sure what this statement is really based on given that most sites don’t exhibit much temporal variation at all, and and at the one site that does (Brustall) only one out of 9 models manages to capture that temporal trend.
Response: Thank you for this clarification. In response, we will revise this section to be more cautious and will reassess temporal performance using quantitative metrics. We will also incorporate a more detailed analysis of the role of meteorological conditions and include the additional machine-learning-based method discussed earlier. Based on these analyses, we will refine the conclusions to more accurately reflect the extent to which temporal variability is captured by the evaluated methods, and under what conditions this occurs.
References:
Liu, Y., Zhang, Y., Shan, N., Zhang, Z., & Wei, Z., 2022. Global assessment of partitioning transpiration from evapotranspiration based on satellite solar-induced chlorophyll fluorescence data. Journal of Hydrology, 612. https://doi.org/10.1016/j.jhydrol.2022.128044
Pastorello, G., Trotta, C., Canfora, E. et al., 2020. The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. Sci Data 7, 225. https://doi.org/10.1038/s41597-020-0534-3
Speranskaya, L., Campbell, D. I., Lafleur, P. M., and Humphreys, E. R., 2024. Peatland evaporation across hemispheres: contrasting controls and sensitivity to climate warming driven by plant functional types. Biogeosciences, 21, 1173–1190. https://doi.org/10.5194/bg-21-1173-2024
Stapleton, A., Eichelmann, E., Roantree, M., 2022. A framework for constructing machine learning models with feature set optimisation for evapotraspiration partitioning. Applied Computing and Geoscience, 100105. https://doi.org/10.1016/j.acags.2022.100105
Wei, Z., Yoshimura, K., Wang, L., Miralles, D. G., Jasechko, S., & Lee, X., 2017. Revisiting the contribution of transpiration to global terrestrial evapotranspiration. Geophysical Research Letters, 44(6), 2792–2801. https://doi.org/10.1002/2016GL072235
Citation: https://doi.org/10.5194/egusphere-2025-4252-AC2
-
AC2: 'Reply on RC2', Yi Wang, 26 Jan 2026
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 364 | 86 | 37 | 487 | 45 | 25 | 33 |
- HTML: 364
- PDF: 86
- XML: 37
- Total: 487
- Supplement: 45
- BibTeX: 25
- EndNote: 33
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1