Uncertainty Assessment of Satellite Remote Sensing-based Evapotranspiration Estimates: A Systematic Review of Methods and Gaps

Tran, Bich Ngoc; van der Kwast, Johannes; Seyoum, Solomon; Uijlenhoet, Remko; Jewitt, Graham; Mul, Marloes

doi:https://doi.org/10.5194/egusphere-2023-725

Preprints

https://doi.org/10.5194/egusphere-2023-725

Preprints

25 Apr 2023

| 25 Apr 2023

Uncertainty Assessment of Satellite Remote Sensing-based Evapotranspiration Estimates: A Systematic Review of Methods and Gaps

Bich Ngoc Tran, Johannes van der Kwast, Solomon Seyoum, Remko Uijlenhoet, Graham Jewitt, and Marloes Mul

Abstract. Satellite remote sensing (RS) data are increasingly being used to estimate total evaporation or evapotranspiration (ET) over large regions. Since RS-based ET (RS-ET) estimation inherits uncertainties from several sources, many available studies have assessed these uncertainties using different methods and reference data. However, the suitability of methods and reference data subsequently affects the validity of these evaluations. This study summarizes the status of the various methods applied for uncertainty assessment of RS-ET estimates, discusses the advances and caveats of these methods, identifies assessment gaps, and provides recommendations for future studies. We systematically reviewed 601 research papers published from 2011 to 2021 that assessed the uncertainty or accuracy of RS-ET estimates. We categorized and classified them based on (i) the methods used to assess uncertainties, (ii) the context where uncertainties were evaluated, and (iii) the metrics used to report uncertainties. Our quantitative synthesis shows that the uncertainty assessments of RS-ET estimates are not consistent and comparable in terms of methodology, reference data, geographical distribution, and uncertainty presentation. Most studies used validation methods using Eddy Covariance (EC) based ET estimates as reference. However, in many regions such as Africa and the Middle East, other references are often used due to the lack of EC stations. The accuracy and uncertainty of RS-ET estimates are most often described by Root-Mean-Squared Error (RMSE). When validating against EC-based estimates, the RMSE of daily RS-ET varies greatly among different locations and levels of temporal support, ranging from 0.01 to 6.65 mm/day with a mean of 1.12 mm/day. We conclude that future studies need to report the context of validation, the uncertainty of the reference datasets, the mismatch in temporal and spatial scales of reference datasets to that of the RS-ET estimates, and multiple performance metrics with their variation in different conditions and statistical significance to provide a comprehensive interpretation to assist potential users. We provide specific recommendations in this regard. Furthermore, extending the application of RS-ET to regions that lack validation will require obtaining additional ground-based data and combining different methods for uncertainty assessment.

Received: 12 Apr 2023 – Discussion started: 25 Apr 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1598 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1598 KB)

Supplement (488 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

20 Dec 2023

| Highlight paper

Uncertainty assessment of satellite remote-sensing-based evapotranspiration estimates: a systematic review of methods and gaps

Bich Ngoc Tran, Johannes van der Kwast, Solomon Seyoum, Remko Uijlenhoet, Graham Jewitt, and Marloes Mul

Hydrol. Earth Syst. Sci., 27, 4505–4528, https://doi.org/10.5194/hess-27-4505-2023,https://doi.org/10.5194/hess-27-4505-2023, 2023

Short summary Executive editor

Bich Ngoc Tran, Johannes van der Kwast, Solomon Seyoum, Remko Uijlenhoet, Graham Jewitt, and Marloes Mul

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-725', Joshua Fisher, 17 May 2023

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-725/egusphere-2023-725-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2023-725-RC1
- AC3: 'Reply on RC1', Bich Ngoc Tran, 19 Jul 2023
  
  Thank you very much for your extensive and constructive suggestions. Our responses are in bold below.
  
  This is a great paper giving an overview of remotely sensed ET evaluation approaches in the literature. It’s well-written and interesting. Such an undertaking is certainly a large task so it’s understandable that the authors would miss some literature here and there; I’ve given a few pointers to uncover large missing areas in the literature. That said, I don’t know which of the 601 (plus more coming in revision) papers the authors should cite explicitly in the main text versus refer to implicitly within category, but maybe err on the side of adding more in-text references unless EGUsphere pushes back with a limit? Overall, the paper doesn’t really have a main result other than that different things are different, but the paper will be a great go-to source for those interested in RS-ET. If scientists follow the recommendations, this could help understand results in a relative context.
  
  We appreciate your pointers to some interesting articles. We will review their relevance and eligibility carefully and add references where we find appropriate. Regarding in-text citations, we cite a reference in a sentence where it provides ideas or information that is neither our own nor common knowledge. For some statements, several citations can be used but including all of them could impact the readability of the text. The 601 articles were used to systematically quantify the categories and not all of them directly provided ideas or information to our text. Therefore, we did not cite all of them in the text.
  
  There is some discussion on different time scales of analysis, but perhaps some more extensive commentary on instantaneous vs. temporally upscaled validation would be helpful given that most RS-ET is based on polar orbiting instantaneous measurements.
  
  We also find this a very important point. We will add more commentary on upscaled validation in sections 2.2 and 5.3. Remote sensing data are acquired at the time the satellite passes over the region of interest and thus, RS-ET estimates are essentially instantaneous. For many applications, ET estimates at longer intervals (e.g., day, 10 days, month) are required. Therefore, many methods have been developed to upscale instantaneous RS-ET estimates to daily totals. According to many studies that have compared these methods in different settings, the accuracy and applicability of different upscaling methods are affected by several factors related to location (Jiang et al., 2021). These factors include vegetation cover and soil moisture (Gentine et al., 2007; Hoedjes et al., 2008), cloud coverage (Van Niel et al., 2012), cloud frequency (Xu et al., 2015), air pollution (Zhang et al., 2013), return interval of satellite (Alfieri et al., 2017), the time of instantaneous values (Jiang et al., 2021), number of instantaneous values used for upscaling (Liu et al., 2021). Although many common upscaling methods showed adequate performance at most FLUXNET sites, their performance is inadequate in tropical rainforest and monsoon sites (Liu et al., 2021). Therefore, applying any temporal upscaling method globally will result in varying spatial distributions of uncertainties in RS-ET estimates, as no method has been found to perform equally well on a global scale.
  L31. May want to cite [Fisher et al., 2017].
  
  This seems like common knowledge to us, but we will review the reference and add citations if it provides evidence to this sentence.
  L35. May want to cite [Monteith, 1965; Shuttleworth and Wallace, 1985].
  
  Citations will be added.
  L39. [Fisher et al., 2017].
  
  The citation will be added. The suggested paper is an interesting commentary for readers to refer to.
  L49. Include ECOSTRESS [Fisher et al., 2020].
  
  ECOSTRESS data product will be included.
  Fig 1. This figure seems to be missing a lot of literature, including reviews cited in the text (e.g., Vinukollu; Jimenez; Melton; etc.).
  
  Figure 1 consists of literature review articles only. The purpose is to direct readers to previous literature reviews and distinguish the topics of those literature reviews from our review. We have cross-checked the suggested articles by Vinukollu, Jimenez, and Melton. These are indeed important original research articles that compared different ET products and explored the merging of some products. However, these articles are not literature reviews, we have referred to them in other sections of our review, but not in Figure 1. If we missed some literature review articles, kindly let us know.
  L130. “ET is not measured directly by sensors, but is the result from models or reanalyses, and thus…”
  
  The sentence will be corrected as suggested.
  Section 2.3. We used Gaussian Error Propagation in [Fisher et al., 2005] and Method of Moments in [Fisher et al., 2008].
  
  For the period that we reviewed (2011-2021), these methods were not used. We will mention that these methods have been used before our study period.
  L185. Period.
  
  The sentence will be corrected.
  How do you draw the line between diagnostic models, machine learning models, land surface models, etc.? It’s sometimes a blurry distinction.
  
  There have been many literature reviews that categorized diagnostic ET models, which often differ from each other (Courault et al., 2005; Kalma et al., 2008; Wang and Dickinson, 2012; Zhang et al., 2016; Chen and Liu, 2020). The distinction can be blurry when models fit in more than one category. We can distinguish these types:
  
  • Diagnostic vs. prognostic: Diagnostic models estimate the values of ET at the time-of-overpass and upscale to longer period. Prognostic models use data assimilation to predict temporally continuous ET (Wang and Dickinson, 2012).
  
  • Machine learning models use data-driven algorithms to estimate ET, not explicitly involve physical processes, models are trained with ground data.
  
  • Land surface models are models that simulate various processes that occur at the Earth’s land surface, which includes ET. ET is not the main output of these models and is constrained by initial states and other modelled variables (not only input data).
  
  We consider RS-ET estimates from models that have 2 criteria: (1) aim to estimate ET as the main output (diagnostic) (2) using satellite data as input (satellite remote sensing-based). These models fit in the categories reviewed by Courault et al. (2005), Zhang et al. (2016), and Chen and Liu (2020). We will clarify this in section 3.2.
  Figs 5 & 9. I’m not 100% clear on how to read this. It’s not obvious what the top bars correspond to. The figure does not label what are the bottom numbers. It’s not clear what gray vs. black circles are, and what the connecting lines mean. Maybe define TCH/TH in the caption.
  
  We will add more explanation and define TCH/TH in the figure captions. As mentioned in L233, these figures are upset plots, described in Lex et al. (2014). Since many articles could be placed in more than one category, upset plots are used to show not only the number in each category but also the number of articles in each intersection of the categories (the top bar chart). A combination of more than one category identifies an intersection, which is visualized by the black circles connected with a line. The gray circles indicate that the category is not in the intersection. The bar on top shows the number of articles in each intersection. For example, Figure 5 shows that there are 115 articles that used both intercomparison and validation (intersection of validation and intercomparison).
  L243. Curious what are those other approaches?
  
  We recorded those approaches in https://doi.org/10.4121/797dcaff-56e3-45ae-a931-f6f4a3135d26.v1
  
  - Validation of sub-modules in ET models (De la Fuente-Sáiz et al., 2017).
  
  - Comparison of the ET partitioning (not total ET) to evaluate uncertainty due to model parameterization (Miralles et al., 2016).
  
  - Deduction of the analytical relationship between latent heat flux and AOI size in SEBAL to assess uncertainty due to change of spatial support (Tang et al., 2013).
  
  - Using Analysis of Variance (ANOVA) to compare the mean total evaporation estimates for the different land cover types between Landsat 8 and MODIS to assess uncertainty due to input data (Shoko et al., 2015)
  
  - Using temporal patterns of ET per crop type to evaluate compound uncertainty (Sun et al., 2017).
  
  - Using spatial pattern metric and empirical Copula densities to evaluate relative uncertainty (Stisen et al., 2021)
  
  Explicitly listing other approaches seems to be beneficial, we will mention them briefly in section 4. However, we will not discuss them in as much detail as other approaches since they are less used and often in combination with validation or intercomparison.
  Fig 6. Maybe include a secondary y-axis that is the total #.
  
  We will add the time series of the total number of reviewed articles in Figure 6.
  Fig 7. I’m not seeing the water balance residual papers here?
  
  Figure 7 shows the papers in 4.1.1 (using in-situ measurements). The water balance residual papers are in 4.1.2. We will report the number of papers using water balance residual in 4.1 (L253-254).
  L274. Even smaller with sap flow?
  
  Here, we meant in-situ measurement of ET (sum of soil evaporation, transpiration, and interception), while sap flow only measures transpiration. We will add a few sentences on the in-situ measurement of ET components.
  L308. Slightly misleading because then there was the GRACE-FO mission, which should be mentioned.
  
  The sentence will be rewritten to be more accurate:
  
  “However, the TWSA products only cover the period from 2002 with a gap of 11 months from 2017 to 2018 between the GRACE and GRACE-FO missions.”
  Section 4.1.2. I think you’re missing quite a lot of papers here, so you’ll have to re-search and update.
  
  There are quite a significant number of studies that we reviewed that used WB residual as a reference for validation (N=78). We will include the number of papers with the water balance method in the text to signify this. However, we did not cite all papers because the text is about the caveats and potential improvements of the WB method, and not all of them provide insights on these topics.
  
  Of course, we do not claim that our list is exhaustive. Missing papers might be due to the title and abstract, the year of publication did not meet the criteria of our systematic literature search.
  4.3 out of order.
  
  The ‘uncertainty propagation’ paragraphs will be moved to section 4.5 to be consistent with the order of Figure 5.
  Section 4.7. Yunjun Yao and others have been forging forward with many papers in this realm.
  
  Thank you for pointing out the work by Yao. We will review the papers by this author. We want to note that this section discusses the use of ensembles to assess uncertainties in RS-ET estimates, not the advancements of methods to generate these ensembles. Therefore, papers that aimed to improve ensemble methods but not use them to evaluate uncertainty in RS-ET estimates were not included. We will also change the heading of this section to “Using ensemble of RS-ET estimates” to reflect our objective.
  L556. I think it would also depend on the site. If you’re using a site with low ET, then your RMSE is likely to be low, and vice versa.
  
  We also thought that RMSE depends on the site. In our meta-analysis, we recorded the average of in-situ ET
  
  (https://doi.org/10.4121/e6e1713a-0c2b-4775-a7f4-9e6e0b2cf40f.v1). Unfortunately, too many studies did not report this value, so we don’t have sufficient data to compare RMSE with mean ET. Otherwise, it would be an interesting result to test this argument. We made a recommendation to report mean ET in validation studies (L603). We will add this explanation to Section 6.2.
  L581. “in a”
  
  The sentence will be corrected.
  Section 7. One of the major approaches many of us in the community are working towards is improved spatiotemporal resolution of RS-ET. Moving from ECOSTRESS to SBG, multiple Landsats, TRISHNA, LSTM, and Hydrosat. Would that be worth commenting on here?
  
  Thank you for your suggestion. We think that it is best to mention this development in section 5.3.
  L606. Period.
  
  The sentence will be corrected.
  L754. Reference repeated.
  
  Duplication will be removed.
  Here’s a list of more papers to cross-check:
  
  [McCabe and Wood, 2006; Fisher et al., 2009; Glenn et al., 2010; Liang et al., 2010; Blyth and Harding, 2011; Fisher et al., 2011; Jiménez et al., 2011; Mueller et al., 2011; Sahoo et al., 2011; Vinukollu et al., 2011b; Vinukollu et al., 2011a; Polhamus et al., 2012; McCabe et al., 2013; Muelleret al., 2013; Polhamus et al., 2013; Armanios and Fisher, 2014; Chen et al., 2014; Ershadi et al., 2014; Yao et al., 2014; Chen et al., 2015; Feng et al., 2016; McCabe et al., 2016; Michel et al.,2016a; Michel et al., 2016b; Miralles et al., 2016a; Miralles et al., 2016b; Zhang et al., 2016; Yao et al., 2017a; Yao et al., 2017b; Chang et al., 2018; Jiménez et al., 2018; Xu et al., 2018; Gomis-Cebolla et al., 2019; Guillevic et al., 2019; McCabe et al., 2019; Stoy et al., 2019; Pascolini-Campbell et al., 2020; Sadeghi et al., 2020; Wu et al., 2020; Anderson et al., 2021; Bai et al., 2021; Cawse-Nicholson et al., 2021; Melo et al., 2021; Pascolini-Campbell et al., 2021; Pascolini-Campbell et al., 2021; Shang et al., 2021; Tang et al., 2021; Shi et al., 2022; Xie et al., 2022; Yanget al., 2022; Volk et al., 2023]
  
  Thank you for the extensive list of references. We will consider them after reviewing their relevance and eligibility carefully.
  References
  
  Alfieri, J.G., Anderson, M.C., Kustas, W.P. and Cammalleri, C., 2017. Effect of the revisit interval and temporal upscaling methods on the accuracy of remotely sensed evapotranspiration estimates. Hydrology and Earth System Sciences, 21(1), pp.83-98. doi:10.5194/hess-21-83-2017
  
  Courault, D., Seguin, B., Olioso, A.: Review on estimation of evapotranspiration from remote sensing data: From empirical to numerical modeling approaches. Irrig Drainage Syst 19, 223–249. https://doi.org/10.1007/s10795-005-5186-0, 2005.
  
  De la Fuente-Sáiz, D., Ortega-Farías, S., Fonseca, D., Ortega-Salazar, S., Kilic, A., & Allen, R. (2017). Calibration of METRIC Model to Estimate Energy Balance over a Drip-Irrigated Apple Orchard. Remote Sensing, 9(7), 670. doi:10.3390/rs9070670
  
  Gentine, P., Entekhabi, D., Chehbouni, A., Boulet, G. and Duchemin, B., 2007. Analysis of evaporative fraction diurnal behaviour. Agricultural and forest meteorology, 143(1-2), pp.13-29. https://doi.org/10.1016/j.agrformet.2006.11.002
  
  Hoedjes, J.C.B., Chehbouni, A., Jacob, F., Ezzahar, J. and Boulet, G., 2008. Deriving daily evapotranspiration from remotely sensed instantaneous evaporative fraction over olive orchard in semi-arid Morocco. Journal of Hydrology, 354(1-4), pp.53-64. https://doi.org/10.1016/j.jhydrol.2008.02.016
  
  Jiang, L., Zhang, B., Han, S., Chen, H. and Wei, Z., 2021. Upscaling evapotranspiration from the instantaneous to the daily time scale: Assessing six methods including an optimized coefficient based on worldwide eddy covariance flux network. Journal of Hydrology, 596, p.126135. https://doi.org/10.1016/j.jhydrol.2021.126135
  
  Kalma, J.D., McVicar, T.R., McCabe, M.F.: Estimating Land Surface Evaporation: A Review of Methods Using Remotely Sensed Surface Temperature Data. Surv. Geophys. 29, 421–469. https://doi.org/10.1007/s10712-008-9037-z, 2008.
  
  Lex A., Gehlenborg N., Strobelt H., Vuillemot R., Pfister H.: UpSet: Visualization of Intersecting Sets IEEE Transactions on Visualization and Computer Graphics (InfoVis), 20(12): 1983--1992, https://doi.org/10.1109/TVCG.2014.2346248, 2014
  
  Liu, Z., 2021. The accuracy of temporal upscaling of instantaneous evapotranspiration to daily values with seven upscaling methods. Hydrology and Earth System Sciences, 25(8), pp.4417-4433. https://doi.org/10.5194/hess-25-4417-2021
  
  Miralles, D. G., Jiménez, C., Jung, M., Michel, D., Ershadi, A., McCabe, M. F., … Fernández-Prieto, D. (2016). The WACMOS-ET project – Part 2: Evaluation of global terrestrial evaporation data sets. Hydrology and Earth System Sciences, 20(2), 823–842. doi:10.5194/hess-20-823-2016
  
  Shoko, C., Clark, D., Mengistu, M., Dube, T., & Bulcock, H. (2015). Effect of spatial resolution on remote sensing estimation of total evaporation in the uMngeni catchment, South Africa. Journal of Applied Remote Sensing, 9(1), 095997. doi:10.1117/1.jrs.9.095997
  
  Stisen, S., Soltani, M., Mendiguren, G., Langkilde, H., Garcia, M., & Koch, J. (2021). Spatial Patterns in Actual Evapotranspiration Climatologies for Europe. Remote Sensing, 13(12), 2410. doi:10.3390/rs13122410
  
  Sun, L., Anderson, M. C., Gao, F., Hain, C., Alfieri, J. G., Sharifi, A., … McKee, L. (2017). Investigating water use over the Choptank River Watershed using a multisatellite data fusion approach. Water Resources Research, 53(7), 5298–5319. doi:10.1002/2017wr020700
  
  Tang, R., Li, Z.L., Chen, K.S., Jia, Y., Li, C. and Sun, X., 2013. Spatial-scale effect on the SEBAL model for evapotranspiration estimation using remote sensing data. Agricultural and forest meteorology, 174, pp.28-42.
  
  Van Niel, T.G., McVicar, T.R., Roderick, M.L., van Dijk, A.I., Beringer, J., Hutley, L.B. and Van Gorsel, E., 2012. Upscaling latent heat flux for thermal remote sensing studies: Comparison of alternative approaches and correction of bias. Journal of Hydrology, 468, pp.35-46. https://doi.org/10.1016/j.jhydrol.2012.08.005
  
  Wang, K., Dickinson, R.E.: A review of global terrestrial evapotranspiration: Observation, modeling, climatology, and climatic variability. Rev. Geophys. 50. https://doi.org/10.1029/2011RG000373, 2012.
  
  Xu, T., Liu, S., Xu, L., Chen, Y., Jia, Z., Xu, Z. and Nielson, J., 2015. Temporal upscaling and reconstruction of thermal remotely sensed instantaneous evapotranspiration. Remote Sensing, 7(3), pp.3400-3425. https://doi.org/10.3390/rs70303400
  
  Zhang, K., Kimball, J.S., Running, S.W.: A review of remote sensing based actual evapotranspiration estimation. Wiley Interdisciplinary Reviews: Water 3, 834–853. https://doi.org/10.1002/wat2.1168, 2016.
  
  Zhang, X., Wu, J., Wu, H., Chen, H. and Zhang, T., 2013. Improving temporal extrapolation for daily evapotranspiration using radiation measurements. Journal of Applied Remote Sensing, 7(1), pp.073538-073538. https://doi.org/10.1117/1.JRS.7.073538
  
  Citation: https://doi.org/10.5194/egusphere-2023-725-AC3
RC2:
'Comment on egusphere-2023-725', Anonymous Referee #2, 22 May 2023

First of all, I would like to extend my congratulations to the authors for their valuable research presented in the article titled "Uncertainty Assessment of Satellite Remote Sensing-based Evapotranspiration Estimates: A Systematic Review of Methods and Gaps." The authors have demonstrated a significant research effort and I acknowledge the extensive work invested in this study. However, I believe it would be beneficial for the ET community if the discussion also focused on the performance and uncertainties of the analysed products/models, as well as the underlying reasons for their uncertainty/performance. In light of this, I have several comments and suggestions that I believe can contribute to enhancing the manuscript.

1. Due to the nature of a systematic review, it is difficult to differentiate between articles that evaluate the performance of existing ET products and ET-based models. It would be very beneficial to clarify the distinction between evaporation products and the models used to estimate ET. Currently, it is challenging for readers to differentiate between them, making it difficult to follow certain ideas. For instance, in Line 231, the authors discuss eight topics for assessing uncertainty in RS-ET, where some points relate to the evaluation of ET products while others to the models. It would be beneficial to clearly indicate what is defined as RS-ET inthe manuscript and which results are from models or open-acces gridded products.

2. The article is lengthy, and it would be beneficial to condense the sections "Theoretical frameworks" and "Systematic quantitative literature review method" for brevity.

3. The manuscript could benefit from discussing which methods and products perform better in specific contexts. It would be helpful to provide insights on the performance of models and products in relation to specific regions, climates, and relevant factors. For example, i) identifying the errors associated with each method/product; ii) the reported advantages and disadvantages of different models/products; iii) important parameters that drive the estimation of ET in existing models; iv) lessons learned from previous evaluations; and v) which models/products have demonstrated higher physical consistency.

4. In the section "Review of methods for RS-ET uncertainty assessment", the authors could focus on the performance of models/products and relate their findings to specific regions and climates when reported. Addressing questions such as which models performed better in certain areas and why, the sources of uncertainty, the relevance of spatio-temporal resolution in operational applications, the impact of geographical features on model/product uncertainty, and the influence of climate on product performance would greatly enhance this section.

5. Consider reducing the use of acronyms that are infrequently mentioned in the manuscript, as it can improve readability and comprehension.

6. The authors should clarify the timeframe of their study. While they mention focusing on the period from 2011, the end date or year is only specified on L187 stating that the databases were last accessed on 21.09.2021. It would be very valuable to update the research up to a more recent date to provide a comprehensive evaluation :)

7. The authors mention using keywords like "accuracy," "bias," and "precision" to assess uncertainty in products, although these terms differ from the proper definition of uncertainty. It would be important to include the term "performance" in the evaluation, as many studies summarize their findings in terms of model or product performance.

8. Section 6, "Results of RS-ET uncertainty assessment," primarily evaluates articles based on RMSE. However, comparing articles solely on RMSE is not very meaningful, as this goodnes-of-fit metric does not allow for comparisons across areas with different climates and ET patterns. Therefore, the metrics presented in Table 4 (median, mean, quartiles, standard deviations) and Figure 14, which are grouped by evaluated temporal scale may be misleading. A good and valuable reccomentation that the athors could use in their artile could be related to the fact that researchers should report uncertainty/performance metrics using indices that are comparable across studies and not influenced by regional climate or specific ET patterns. It would be valuable to discuss about which metric is reported to be better proxy of model/product performance and which models/products performed better.

9. The summary of the manuscript could address relevant questions for researchers and practitioners, such as recommended evaluations for assessing the performance of ET products when data is and is not available.
___________________________________________________________________________________________________________________
Furthermore, I would like to provide a few additional minor suggestions for improvement:

L10: The authors can emphasise here that evapotranspiration is often referred to as evaporation. As it is currently written, it seems that the authors are referring to both evaporation and evapotranspiration.

L39-42: Here, the authors mention some methods, but the list is not exhaustive. They can add GLEAM to this list for example, which is a well-known method that drives a ET product with the same name.

L44-45: This sentence can be rewritten for better clarity.

L46: The authors mention that retrieving ET estimates from some models requires expertise about the models. However, this is true for every model, so this sentence can be deleted.

L50: This sentence is a bit convoluted. It would be helpful if the authors could clarify their intended meaning.

L51-53: Here, the authors mention that uncertainty assessment helps data users determine the level of confidence they can have in ET estimates and inferred information about water resources. Since readers of this article may be researchers exploring ET products and models for the first time, it would be a good idea to mention that the use of the products is also limited by their spatio-temporal resolution, specific applications, and latency.

L55: "foci" should be changed to "focus." The focus of multiple articles is explained in Table S2.

L59: What do the authors mean by "spatial data production"?

L60: What do the authors mean by "a good practice protocol for operational validation"?

L59: What do the authors mean by "complete documentation"?

Figure 1: This figure is very good and helpful. It will surely assist readers in accessing previous literature review articles. Could the authors complete the list of existing manuscripts related to the review of RS-ET estimation, uncertainty, and validation of products (and models)?

L130: "reanalyzes" should be "reanalyses." Additionally, could the authors rewrite this sentence to better explain what is considered a high level of processing?

L143: What about replacing "true" with "more accurately representing the ET values"?

Figure 3: Please replace "support" with "resolution." Why does the model calculation not have a number? There is uncertainty regarding whether the model is able to resemble physical processes or not. Finally, the authors can mention in the figure that compound uncertainty is the sum of all other uncertainties.

L153: Why specifically refer to Monte Carlo when there are more advanced techniques to assess uncertainty propagation?

L165: Can the authors add a sentence on how the definition of validation has changed over time?

L170: This sentence is not very clear to me. What do the authors mean by "model validation and data" in this context? Maybe the parentheses are misplaced and disrupt the flow of the sentence?

L171-172: Can this sentence be deleted? I think the idea is clearly explained in the following sentences.

L176: This sentence could be rewritten for clarity. Something like: "Validating a model used to derive ET estimates does not necessarily imply that it can be used with different forcing data and provide accurate results. Therefore, when a model is applied to derive ET estimates with different forcings or in different settings, its performance must be evaluated." In the current version, it is difficult to disentangle what is a model, an ET product, and a product based on running the model with different forcings :)

L183: "by" instead of "tby"

L189-190 and Table 1: It would be interesting to know how these terms were chosen. What about other terms like "performance," "quality" (alone), and "error"?

L200: "process" instead of "system."

Figure 4: What does "not using the same method to report uncertainty" mean?

Figure 5 and 9: Why are there 38 articles without any link to a topic? Could the authors provide an explanation in the caption?

Figure 6: It is difficult to see the low values on the graph. Maybe consider using a barplot to visualise this more straightforwardly.

L256: "Estimate ET" instead of "observe ET." Remember that ET cannot be directly observed ;)

Figure 7: I really liked this figure! In the caption, the authors can add an explanation of the "others" category. Are irrigation and water balance articles combined in this category?

L303: Here, the authors could briefly mention the assumptions of the simplified water balance.

L309: Still less known compared to what? Maybe rephrase the sentence to clarify.

L342: Some acronyms are introduced more than once, e.g., SA.

L4447: Maybe consider renaming this subsection to something other than "Research Objectives." For example: "Assessment based on the objectives of the analysed manuscripts."

L580: There is a missing space between "in" and "a."

L580-581: I completely agree that further research should combine local and global evaluation efforts, but including a reason for this in the text could be very beneficial for the readers.

L593-593: I do not completely agree with this argument. The RMSE ranges could serve as a baseline, but we have to keep in mind that they are not directly comparable.

L601: What do the authors mean by "matched as much as possible"?

L602: I do not completely agree with this statement. We should report metrics that enable a fair comparison between regions with different climates/patterns.

L605: What do the authors mean by this statement? Please provide further clarification.

L611-613: I did not understand this sentence. Could the authors please provide additional clarification or rephrase the sentence for clarity?

I hope these comments and suggestions are helpful in improving the manuscript. Once again, congratulations to the authors on their research, and I look forward to reading a revised version of the manuscript.

Citation: https://doi.org/10.5194/egusphere-2023-725-RC2
- AC2: 'Reply on RC2', Bich Ngoc Tran, 19 Jul 2023
  
  Thank you very much for your extensive and constructive suggestions. Our responses are in bold below.
  
  *Major comments*
  
  1. Due to the nature of a systematic review, it is difficult to differentiate between articles that evaluate the performance of existing ET products and ET-based models. It would be very beneficial to clarify the distinction between evaporation products and the models used to estimate ET. Currently, it is challenging for readers to differentiate between them, making it difficult to follow certain ideas. For instance, in Line 231, the authors discuss eight topics for assessing uncertainty in RS-ET, where some points relate to the evaluation of ET products while others to the models. It would be beneficial to clearly indicate what is defined as RS-ET inthe manuscript and which results are from models or open-acces gridded products.
  
  We will clarify in section 3.2 that we focus on the RS-ET estimates, which is defined as ET values or maps obtained from a RS-based data product or by implementing a RS-based model. 17% of the articles assessed open-access gridded products (Figure 8). There are also studies that compare both data products and model outputs.
  2. The article is lengthy, and it would be beneficial to condense the sections "Theoretical frameworks" and "Systematic quantitative literature review method" for brevity.
  
  We will review these sections and reduce the text where it is unnecessarily wordy.
  
  3. The manuscript could benefit from discussing which methods and products perform better in specific contexts. It would be helpful to provide insights on the performance of models and products in relation to specific regions, climates, and relevant factors. For example, i) identifying the errors associated with each method/product; ii) the reported advantages and disadvantages of different models/products; iii) important parameters that drive the estimation of ET in existing models; iv) lessons learned from previous evaluations; and v) which models/products have demonstrated higher physical consistency.
  The suggested topics are important. However, they were not the objectives of this manuscript. Our goal in this study is to investigate the status of the various methods applied for uncertainty assessment of RS-ET estimates, discuss the advances and caveats of these methods, identify assessment gaps, and provide recommendations for future assessment. Our argument is that because these models and products are evaluated using different assessment methods and reference data, it is not reliable to rank their performance and generalize the conclusion to all contexts.
  
  Furthermore, many literature reviews (Figure 1) have discussed some of these topics repetitively:
  
  i) identifying the errors associated with each method/product
  
  ii) the reported advantages and disadvantages of different models/products
  
  iii) important parameters that drive the estimation of ET in existing models
  
  Regarding the performance of models and products in relation to specific regions, climates, and relevant factors and v) which models/products have demonstrated higher physical consistency, we prefer not to draw conclusions from the reviewed literature because not all models have been compared simultaneously. We do think that this could be the focus of a different paper.
  
  However, we do think that “iv) lessons learned from previous evaluations” could be relevant to our manuscript. We have discussed some in section 7. We will extend our discussion to emphasize the role of developing uncertainty assessment methods to investigate the other topics that you mentioned.
  4. In the section "Review of methods for RS-ET uncertainty assessment", the authors could focus on the performance of models/products and relate their findings to specific regions and climates when reported. Addressing questions such as which models performed better in certain areas and why, the sources of uncertainty, the relevance of spatio-temporal resolution in operational applications, the impact of geographical features on model/product uncertainty, and the influence of climate on product performance would greatly enhance this section.
  Section 4 “review methods for RS-ET uncertainty assessment” focuses on the methods of uncertainty assessment (how each method was applied in reviewed literature), not the results of those assessments per se, which is more discussed in Section 6. Therefore, we don’t think focusing on the performance of models/products and their relation to specific regions and climates should be the focus of this section. As we point out in our response above, this is clearly an important topic that could be addressed by another article or even a special issue.
  
  We want to emphasize that other literature reviews (Figure 1) focused on the performance of RS-ET models/products, while our review discusses the methods to assess them, as we have outlined in the research questions. The research questions suggested by the reviewers are important to investigate, and we will add them to our recommendation for future assessments. However, we found that our methods did not aim to answer these questions (i.e., Which models performed better in certain areas and why, the impact of geographical features on model/product uncertainty, and the influence of climate on product performance) and consequently our results do not address them. There are gaps in uncertainty assessment in terms of geographical regions and models, and also inconsistency in methods (Sections 5 and 6). We believe that it is unfair to conclude which models performed better based on the current literature. Furthermore, as reviewer 1 also mentioned, the RMSE depends on the value of ET.
  
  We have discussed the sources of uncertainty in sections 2.2 and 5.2.
  
  We will elaborate on the discussion on the relevance of spatio-temporal resolution in operational applications in section 5.3.
  5. Consider reducing the use of acronyms that are infrequently mentioned in the manuscript, as it can improve readability and comprehension
  
  We agree that some acronyms are not frequently and necessarily used. We will not use these in the revision
  
  • Essential Climate Variables (ECVs)
  
  • Monte Carlo method (MCM)
  
  • Sensitivity Analysis (SA)
  
  • Systematic Quantitative Literature Review (SQLR)
  
  • Web of Science (WoS)
  6. The authors should clarify the timeframe of their study. While they mention focusing on the period from 2011, the end date or year is only specified on L187 stating that the databases were last accessed on 21.09.2021. It would be very valuable to update the research up to a more recent date to provide a comprehensive evaluation :)
  
  We agree that we should clarify the period of our study. 21.09.2021 is the date we started the literature analysis. L187 was rewritten as follows:
  
  “The search result was limited to a publication date from 2011 until 21.09.2021 and then refined using the available filters of Scopus and Web of Science.”
  
  As the body of literature is huge and growing faster than ever (Annex 2), there will always be a gap between the last date of articles accessed and the most recent literature by the time the analysis is complete. At this stage, we consider more than 600 articles and one decade of recent literature is extensive and comprehensive enough to provide conclusions to our research objectives.
  
  7. The authors mention using keywords like "accuracy," "bias," and "precision" to assess uncertainty in products, although these terms differ from the proper definition of uncertainty. It would be important to include the term "performance" in the evaluation, as many studies summarize their findings in terms of model or product performance.
  
  We are not sure if this comment relates to the search terms in Table 1. The definitions of these terms are indeed different from ‘uncertainty’ but, as discussed in Section 2.1, they are used by various authors to describe uncertainty. We acknowledge the term “performance” is also often used. Studies that use the term “performance” usually also mention “uncertainty”, “accuracy”, “data quality”, “variability”, “reliability”, “evaluat*”, and “validat*” in their titles or abstracts. These variants of ‘uncertainty’ keyword were selected by iterating our search several times until the results include all the articles in Supplementary Information Annex 1. Since we combined these terms with “OR” in our search query, we have included all the articles that use either one or more of these terms. We will do a new search with “performance” to double-check.
  
  8. Section 6, "Results of RS-ET uncertainty assessment," primarily evaluates articles based on RMSE. However, comparing articles solely on RMSE is not very meaningful, as this goodnes-of-fit metric does not allow for comparisons across areas with different climates and ET patterns. Therefore, the metrics presented in Table 4 (median, mean, quartiles, standard deviations) and Figure 14, which are grouped by evaluated temporal scale may be misleading. A good and valuable reccomentation that the athors could use in their artile could be related to the fact that researchers should report uncertainty/performance metrics using indices that are comparable across studies and not influenced by regional climate or specific ET patterns. It would be valuable to discuss about which metric is reported to be better proxy of model/product performance and which models/products performed better.
  
  We did not aim to compare articles or models/products based on RMSE. We agree that it is not fair to compare models/products across areas with different climates and ET patterns using solely RMSE. The purpose of Figure 14 and Table 4 is to identify the typical range of reported uncertainty in RS-ET estimates globally (our third research question). The spreading of the RMSE is partially due to the effect of different climates and site-specific conditions. This is why we did not use our results to conclude on which models/products perform better or worse than any others. We will clarify this important issue in Section 6.2.
  
  We find that grouping the reported RMSE by temporal scale is valuable to show the effect of temporal upscaling across hundreds of studies. We will include a discussion on this aspect, as also suggested by Reviewer 1.
  
  It is indeed valuable to report uncertainty using metrics that are comparable across studies, in order to assess which models/products perform better in different context. We will add this to our recommendations in Section 7.
  *Minor comments*
  
  L10: The authors can emphasise here that evapotranspiration is often referred to as evaporation. As it is currently written, it seems that the authors are referring to both evaporation and evapotranspiration.
  
  The sentence will be rewritten as “Satellite remote sensing (RS) data are increasingly being used to estimate total evaporation, often referred to as evapotranspiration (ET), over large regions.”
  L39-42: Here, the authors mention some methods, but the list is not exhaustive. They can add GLEAM to this list for example, which is a well-known method that drives a ET product with the same name.
  
  The list is definitely not exhaustive. We will add GLEAM and PT-JPL.
  
  L44-45: This sentence can be rewritten for better clarity.
  
  L46: The authors mention that retrieving ET estimates from some models requires expertise about the models. However, this is true for every model, so this sentence can be deleted.
  L44-46 will be rewritten as “Furthermore, retrieving ET estimates requires access to the data, software’s or source codes, and expertise in these models. The limited accessibility of RS-ET models leads to significant challenges to operational applications of RS-ET estimates (e.g., irrigation scheduling and drought monitoring).”
  L50: This sentence is a bit convoluted. It would be helpful if the authors could clarify their intended meaning.
  
  L50 will be rewritten as “Given that more RS-ET data products are becoming available, information about the uncertainties in RS-ET estimates is important for data users (i.e., water managers and policymakers) to apply them properly.”
  L51-53: Here, the authors mention that uncertainty assessment helps data users determine the level of confidence they can have in ET estimates and inferred information about water resources. Since readers of this article may be researchers exploring ET products and models for the first time, it would be a good idea to mention that the use of the products is also limited by their spatio-temporal resolution, specific applications, and latency.
  
  Good point. We will add this after L51-53 this sentence “Inferences based on RS-ET data products are also limited by their spatio-temporal resolution, latency, and specifications.”
  
  L55: "foci" should be changed to "focus." The focus of multiple articles is explained in Table S2.
  
  Since we mean to say that each of these reviews has a different focus, we want to keep the plural form of the word. But we understand that this collocation of words might sound odd to some readers. We will change “foci” to “main topics”.
  L59: What do the authors mean by "spatial data production"?
  
  We mean the generation of spatial data, which also covers methods other than remote sensing.
  
  L60: What do the authors mean by "a good practice protocol for operational validation"?
  
  An operational validation workflow as defined by Bayat et al. (2021) has four components, one of which is based on a good practice protocol for validation agreed upon by the community. A good practice protocol for validation is a set of guidelines that are known to produce reliable validation results. For example, the authors have pointed to good practice protocol for validation of Land Surface Temperature (Guillevic et al., 2018), Surface Albedo (Wang et al., 2019), Leaf Area Index (Fernandes et al., 2014), Soil Moisture (Gruber et al., 2020).
  L59: What do the authors mean by "complete documentation"?
  
  Documentation of the ET estimation that provides sufficient information for data users to judge the accuracy and representativeness of the estimates. Allen et al. (2011) have recommended which information to be included in such documentation.
  
  Figure 1: This figure is very good and helpful. It will surely assist readers in accessing previous literature review articles. Could the authors complete the list of existing manuscripts related to the review of RS-ET estimation, uncertainty, and validation of products (and models)?
  
  Figure 1 consists of only literature review articles. The purpose is to direct readers to previous literature reviews and distinguish the topics of those literature reviews from our review. We will search for other relevant review articles to extend the list. It would be a very helpful if you could point to the review articles that you find missing.
  L130: "reanalyzes" should be "reanalyses."
  
  We will change to “reanalyses”.
  
  Additionally, could the authors rewrite this sentence to better explain what is considered a high level of processing?
  
  By ‘level of processing’, we meant that they are model output or results from analyses of less processed data and we referred to data user guides by ESA and NASA. The sentence is rewritten as followed:
  
  “ET is not measured directly by sensors, but it is resulting from models or reanalyses, and thus, RS-ET data products are considered high level of processing by data providers (ESA, 2021; NASA, 2021).”
  
  L143: What about replacing "true" with "more accurately representing the ET values"?
  
  Yes. It sounds clearer.
  
  Figure 3: Please replace "support" with "resolution."
  
  We understand why “resolution” is suggested because it is linked to the resampling of RS data. ‘Resolution’ is how detailed RS data is, measured by the size of the pixel. While ‘support’ is the volume, shape, size, and orientation that measurement represents. In RS data, these two are similar because the support of ET value in a pixel is also the size of that pixel. However, we wanted to use “support” here because when ET estimates are derived from RS data or validated with reference data, uncertainty occurs also due to a ‘change of spatial support’ from the pixel size to the footprint size of the measurement. We will use “scale” because this term is more general and includes both “resolution” and “support” (also “extent” and “spacing”) (Bloschl and Sivapalan, 1995). We will also add a footnote to clarify the terminologies in the text of Section 2.2.
  Why does the model calculation not have a number? There is uncertainty regarding whether the model is able to resemble physical processes or not.
  
  In remote sensing literature, “uncertainty regarding whether the model is able to resemble physical processes or not” is less often acknowledged (Povey et Grainger, 2015; Foody and Atkinson, 2003) unlike in hydrological modeling (Liu and Gupta, 2007; Nearing et al., 2014). This is due to the fact that RS retrieval models usually share common concepts or formulas, especially for low-level data products (e.g., Surface Radiance, NDVI). Since we have argued before that high-level RS data such as ET are outputs of models that often have different concepts and assumptions (e.g., SEB vs. PM), we should indeed include uncertainty from the ‘model conceptualization’, especially for RS-ET processing chain. We will add “model conceptualization” linked with “model calculation” in the figure. We will add this explanation to Section 2.2 as well.
  
  Finally, the authors can mention in the figure that compound uncertainty is the sum of all other uncertainties.
  
  We will mention that compound uncertainty is the aggregation of all other uncertainties in the figure caption.
  
  L153: Why specifically refer to Monte Carlo when there are more advanced techniques to assess uncertainty propagation?
  
  It is the method we observed most frequently when reviewing the literature. We will mention more advanced techniques for readers.
  
  L165: Can the authors add a sentence on how the definition of validation has changed over time?
  
  The sentence will be rewritten as “However, the definition of validation in modeling has become more well-defined over time and is context-dependent (Bellocchi et al., 2011)”. This reference also summarizes different definitions in Table 1.
  
  L170: This sentence is not very clear to me. What do the authors mean by "model validation and data" in this context? Maybe the parentheses are misplaced and disrupt the flow of the sentence?
  
  We understand the confusion. The sentence will be rewritten as “Since RS-ET retrieval models can be used with different sets of satellite data, validation of model and validation of data (i.e., model result or output) should be distinguished.” The paragraph continues to explain what we mean by “model validation/validation of model” and “validation of data/model results”.
  L171-172: Can this sentence be deleted? I think the idea is clearly explained in the following sentences.
  
  We will remove this sentence.
  
  L176: This sentence could be rewritten for clarity. Something like: "Validating a model used to derive ET estimates does not necessarily imply that it can be used with different forcing data and provide accurate results. Therefore, when a model is applied to derive ET estimates with different forcings or in different settings, its performance must be evaluated." In the current version, it is difficult to disentangle what is a model, an ET product, and a product based on running the model with different forcings :)
  
  Thank you for your suggestion. We will rewrite the sentence as “Validating an RS-ET model does not imply that the model can be applied with any forcing data and produce accurate outputs. Therefore, when a model is applied to derive ET estimates with different forcing data or settings, the model output must be evaluated.” Also, in the introduction, we will clarify what we mean by “data product”.
  
  L183: "by" instead of "tby"
  
  This will be corrected.
  
  L189-190 and Table 1: It would be interesting to know how these terms were chosen. What about other terms like "performance," "quality" (alone), and "error"?
  
  We explained this in L194-195. We will move these lines to the previous paragraph.
  
  L200: "process" instead of "system."
  
  We will change that.
  
  Figure 4: What does "not using the same method to report uncertainty" mean?
  
  For metanalysis, we wanted to include studies that assess uncertainty using the same approach (validation), reference data (Eddy Covariance), and metrics. We will add this explanation to the caption.
  
  Figure 5 and 9: Why are there 38 articles without any link to a topic? Could the authors provide an explanation in the caption?
  
  Thank you very much for pointing this out. We realized that these are the articles excluded after scanning full-text, which is why they are not linked with any topic. We made the mistake of not excluding them when visualizing the dataset. We will correct this as well as Figure 4 (the number n=35 was supposed to be 38).
  Figure 6: It is difficult to see the low values on the graph. Maybe consider using a barplot to visualise this more straightforwardly.
  
  We will update the graph to make the low values more visible.
  
  L256: "Estimate ET" instead of "observe ET." Remember that ET cannot be directly observed ;)
  
  Indeed. We will change that.
  
  Figure 7: I really liked this figure! In the caption, the authors can add an explanation of the "others" category. Are irrigation and water balance articles combined in this category?
  
  Thank you. We will add an explanation of the “others” category. The irrigation water balance is different from the catchment water balance (Section 4.1.2). These papers used measurements about rainfall, irrigation, and drainage of agricultural plots to derive ET and did not use a lysimeter, so we put them in a different category. The scale of agricultural plots is at a similar scale as Scintillometer and Eddy Covariance so we consider these in-situ reference. We will clarify this in the figure caption and in the text of section 4.1.
  
  L303: Here, the authors could briefly mention the assumptions of the simplified water balance.
  
  We will add that to the text.
  
  L309: Still less known compared to what? Maybe rephrase the sentence to clarify.
  
  We consider that it is more challenging to estimate uncertainty in the gap-filled dS/dt than in the original dS/dt. We will rewrite the sentence as follows “Some techniques have been developed to reconstruct this gap in the GRACE time series (e.g., Yang et al., 2021). However, the uncertainties in gap-filled dS/dt estimates are still less known than the initial estimates from GRACE and GRACE-FO (Boergens et al., 2022)”
  
  L342: Some acronyms are introduced more than once, e.g., SA.
  
  We will double-check the use of acronyms and avoid introducing them more than once.
  L447: Maybe consider renaming this subsection to something other than "Research Objectives." For example: "Assessment based on the objectives of the analysed manuscripts."
  
  Indeed, the subsection heading does sound a little confusing. We will change it to “Objectives of the reviewed articles”
  
  L580: There is a missing space between "in" and "a."
  
  We will correct that.
  
  L580-581: I completely agree that further research should combine local and global evaluation efforts, but including a reason for this in the text could be very beneficial for the readers.
  
  We will include more justifications for this in the text.
  
  L593-593: I do not completely agree with this argument. The RMSE ranges could serve as a baseline, but we have to keep in mind that they are not directly comparable.
  
  The sentence will be rewritten as “The RMSE range reported in our study can be used as a baseline for future studies that validate RS-ET estimates using Eddy Covariance.”
  L601: What do the authors mean by "matched as much as possible"?
  
  We will rewrite this sentence to improve clarity as follows “Upscaling methods should be applied to RS-ET data to derive estimates at the temporal and spatial scale of reference datasets.”
  
  L602: I do not completely agree with this statement. We should report metrics that enable a fair comparison between regions with different climates/patterns.
  
  We agree that to compare uncertainties of ET between regions with different climates (thus, different ranges of ET), we need to use scale-independent metrics. We will rewrite the recommendations as follows:
  
  “● The four common metrics (RMSE, bias/mean error, correlation coefficient, coefficient of determination), mean ET, the number of data points, and statistical significance test should be reported.
  
  • In addition, uncertainties in RS-ET estimates should be characterized using multiple metrics that are scale-independent to enable comparison between regions with different ranges of ET.”
  
  L605: What do the authors mean by this statement? Please provide further clarification.
  
  We will rewrite the statement as “Validation of RS-ET models and data products should be reported at different levels of spatial and temporal scales, covering multiple locations.”
  L611-613: I did not understand this sentence. Could the authors please provide additional clarification or rephrase the sentence for clarity?
  
  We will rewrite L610-L615 as follows:
  
  “Several studies have aimed to offer spatially explicit uncertainty in thematic classification, such as land cover and soil type. These studies, like the ones mentioned by Woodcock (2002), have primarily focused on qualitative mapping techniques. However, for quantitative remote sensing, which involves mapping continuous variables like ET, there is a need for methods that can effectively characterize spatially explicit uncertainty. Therefore, we strongly recommend the development and application of methods to evaluate spatiotemporal uncertainty in RS-ET datasets.”
  References
  
  Allen, R.G., Pereira, L.S., Howell, T.A. and Jensen, M.E., 2011. Evapotranspiration information reporting: II. Recommended documentation. Agricultural Water Management, 98(6), pp.921-929.
  
  Blöschl, G. and Sivapalan, M., 1995. Scale issues in hydrological modelling: a review. Hydrological processes, 9(3‐4), pp.251-290. https://doi.org/10.1002/hyp.3360090305
  
  Boergens, E., Kvas, A., Eicker, A., Dobslaw, H., Schawohl, L., Dahle, C., Murböck, M. and Flechtner, F., 2022. Uncertainties of GRACE‐Based Terrestrial Water Storage Anomalies for Arbitrary Averaging Regions. Journal of Geophysical Research: Solid Earth, 127(2), p.e2021JB022081.
  
  Gruber, A., De Lannoy, G., Albergel, C., Al-Yaari, A., Brocca, L., Calvet, J.C., Colliander, A., Cosh, M., Crow, W., Dorigo, W. and Draper, C., 2020. Validation practices for satellite soil moisture retrievals: What are (the) errors?. Remote sensing of environment, 244, p.111806.
  
  Liu, Y.Q. and Gupta, H.V., 2007. Uncertainty in hydrologic modeling: toward an integrated data assimilation framework. Water Resources Research, 43 (7), W07401. doi:10.1029/2006WR005756
  
  P. Guillevic, F. Göttsche, J. Nickeson, M. Román (Eds.), Best Practice for Satellite- Derived Land Product Validation, Land Product Validation Subgroup (WGCV/CEOS (2018), p. 58, doi: 10.5067/doc/ceoswgcv/lpv/lst.001
  
  R.A. Fernandes, S.E. Plummer, J. Nightingale, F. Baret, F. Camacho, H. Fang, S. Garrigues, N. Gobron, M. Lang, R. Lacaze, S.G. Leblanc, M. Meroni, B. Martinez, T. Nilson, B. Pinty, J. Pisek, O. Sonnentag, A. Verger, J.M. Welles, M. Weiss, J.-L. Widlowski, G. Schaepman‐Strub, M.O. Román, J. Nicheson. Global Leaf Area Index Product Validation Good Practices. CEOS Working Group on Calibration and Validation - Land Product Validation Sub-Group (2014), doi:10.5067/doc/ceoswgcv/lpv/lai.002
  
  Z. Wang, J. Nickeson, M. Román (Eds.), Best Practice for Satellite Derived Land Product Validation, Land Product Validation Subgroup (WGCV/CEOS) (2019), p. 45, doi: 10.5067/DOC/CEOSWGCV/LPV/ALBEDO.001
  
  Citation: https://doi.org/10.5194/egusphere-2023-725-AC2
RC3:
'Comment on egusphere-2023-725', Anonymous Referee #3, 28 May 2023

‘Uncertainty Assessment of Satellite Remote Sensing-based Evapotranspiration Estimates: A Systematic Review of Methods and Gaps’ by Tran et al., HESS-2023-725
The manuscript surveyed and reviewed the status of the various methods used for uncertainty assessment of remote sensing based estimation of evapotranspiration. It discussed the advances and caveats of the different methods, identified assessment gaps, and provided recommendations for future studies.
This reviewer considers such an assessment very useful for the community in using the various RS-ET estimates in hydrological studies. It feels however that some important aspects are missing which concern the model physics and dynamics and the considered physical processes in estimating ET using remote sensing data as input. The urgent challenge to the hydrological remote sensing community is therefore investigating the physics and dynamics of the processes involved in evapotranspiration and devising adequate methods to represent such processes in generating the RS-ET estimates. Once a chosen model is able to adequately represent such physics and dynamics for a few quality controlled reference in-situ sites, the uncertainty in their application to other sites and the globe is considerably reduced, because we can confidently expect that the physics is the same everywhere and the dynamics can be attributed to the temporal resolution of the model and the input data.
Fig. 14 needs some more explanation for the different symbols (this is obviously a box plot, but it is not clear to the reader by itself what the different statistics are compared to Table 4).

Citation: https://doi.org/10.5194/egusphere-2023-725-RC3
- AC1: 'Reply on RC3', Bich Ngoc Tran, 19 Jul 2023
  
  Thank you very much for your comments and suggestions. Our responses are in bold below.
  The manuscript surveyed and reviewed the status of the various methods used for uncertainty assessment of remote sensing based estimation of evapotranspiration. It discussed the advances and caveats of the different methods, identified assessment gaps, and provided recommendations for future studies.
  This reviewer considers such an assessment very useful for the community in using the various RS-ET estimates in hydrological studies. It feels however that some important aspects are missing which concern the model physics and dynamics and the considered physical processes in estimating ET using remote sensing data as input. The urgent challenge to the hydrological remote sensing community is therefore investigating the physics and dynamics of the processes involved in evapotranspiration and devising adequate methods to represent such processes in generating the RS-ET estimates. Once a chosen model is able to adequately represent such physics and dynamics for a few quality controlled reference in-situ sites, the uncertainty in their application to other sites and the globe is considerably reduced, because we can confidently expect that the physics is the same everywhere and the dynamics can be attributed to the temporal resolution of the model and the input data.
  Indeed, it is very important to investigate the physics and dynamics of the processes involved in ET. However, that is not the intention of this paper. Our premise is that given the availability of satellite data, we have the opportunity to estimate ET based on its relationship with variables that are observable from satellites. There have been many models developed to represent processes (physically-based) or to derive ET from data (empirical or semi-empirical), as reviewed by many authors (Figure 1 in this paper). However, the methods to evaluate the uncertainty of these models are not consistent (this paper).
  Regardless of the model physics, assessment of uncertainty in RS-ET estimates is needed for the end-users of these estimates. Here, we are considering the uncertainty in RS-ET estimates, which depends not only on the model physics but also the input data. As mentioned in L170-175, if a model is validated in a few sites, the uncertainty in RS-ET outputs in other sites with different characteristics can be different.
  It is challenging to assess uncertainty everywhere with only a few in-situ sites. The physics is expected to be the same everywhere, but the dominant processes and factors are not the same everywhere (Zhang et al., 2016). The quality of RS observations is not the same everywhere due to spatially varied atmospheric conditions. The quality of meteorological input data is also not the same everywhere. Therefore, we recommend that multiple assessment methods are needed. This will help understand better whether the uncertainty can be attributed to input or model.
  Fig. 14 needs some more explanation for the different symbols (this is obviously a box plot, but it is not clear to the reader by itself what the different statistics are compared to Table 4).
  Thank you for pointing this out. We will add a legend for the boxplot and probability density curve in Figure 14 and explain their relations with Table 4.
  
  Citation: https://doi.org/10.5194/egusphere-2023-725-AC1

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-725', Joshua Fisher, 17 May 2023

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-725/egusphere-2023-725-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2023-725-RC1
- AC3: 'Reply on RC1', Bich Ngoc Tran, 19 Jul 2023
  
  Thank you very much for your extensive and constructive suggestions. Our responses are in bold below.
  
  This is a great paper giving an overview of remotely sensed ET evaluation approaches in the literature. It’s well-written and interesting. Such an undertaking is certainly a large task so it’s understandable that the authors would miss some literature here and there; I’ve given a few pointers to uncover large missing areas in the literature. That said, I don’t know which of the 601 (plus more coming in revision) papers the authors should cite explicitly in the main text versus refer to implicitly within category, but maybe err on the side of adding more in-text references unless EGUsphere pushes back with a limit? Overall, the paper doesn’t really have a main result other than that different things are different, but the paper will be a great go-to source for those interested in RS-ET. If scientists follow the recommendations, this could help understand results in a relative context.
  
  We appreciate your pointers to some interesting articles. We will review their relevance and eligibility carefully and add references where we find appropriate. Regarding in-text citations, we cite a reference in a sentence where it provides ideas or information that is neither our own nor common knowledge. For some statements, several citations can be used but including all of them could impact the readability of the text. The 601 articles were used to systematically quantify the categories and not all of them directly provided ideas or information to our text. Therefore, we did not cite all of them in the text.
  
  There is some discussion on different time scales of analysis, but perhaps some more extensive commentary on instantaneous vs. temporally upscaled validation would be helpful given that most RS-ET is based on polar orbiting instantaneous measurements.
  
  We also find this a very important point. We will add more commentary on upscaled validation in sections 2.2 and 5.3. Remote sensing data are acquired at the time the satellite passes over the region of interest and thus, RS-ET estimates are essentially instantaneous. For many applications, ET estimates at longer intervals (e.g., day, 10 days, month) are required. Therefore, many methods have been developed to upscale instantaneous RS-ET estimates to daily totals. According to many studies that have compared these methods in different settings, the accuracy and applicability of different upscaling methods are affected by several factors related to location (Jiang et al., 2021). These factors include vegetation cover and soil moisture (Gentine et al., 2007; Hoedjes et al., 2008), cloud coverage (Van Niel et al., 2012), cloud frequency (Xu et al., 2015), air pollution (Zhang et al., 2013), return interval of satellite (Alfieri et al., 2017), the time of instantaneous values (Jiang et al., 2021), number of instantaneous values used for upscaling (Liu et al., 2021). Although many common upscaling methods showed adequate performance at most FLUXNET sites, their performance is inadequate in tropical rainforest and monsoon sites (Liu et al., 2021). Therefore, applying any temporal upscaling method globally will result in varying spatial distributions of uncertainties in RS-ET estimates, as no method has been found to perform equally well on a global scale.
  L31. May want to cite [Fisher et al., 2017].
  
  This seems like common knowledge to us, but we will review the reference and add citations if it provides evidence to this sentence.
  L35. May want to cite [Monteith, 1965; Shuttleworth and Wallace, 1985].
  
  Citations will be added.
  L39. [Fisher et al., 2017].
  
  The citation will be added. The suggested paper is an interesting commentary for readers to refer to.
  L49. Include ECOSTRESS [Fisher et al., 2020].
  
  ECOSTRESS data product will be included.
  Fig 1. This figure seems to be missing a lot of literature, including reviews cited in the text (e.g., Vinukollu; Jimenez; Melton; etc.).
  
  Figure 1 consists of literature review articles only. The purpose is to direct readers to previous literature reviews and distinguish the topics of those literature reviews from our review. We have cross-checked the suggested articles by Vinukollu, Jimenez, and Melton. These are indeed important original research articles that compared different ET products and explored the merging of some products. However, these articles are not literature reviews, we have referred to them in other sections of our review, but not in Figure 1. If we missed some literature review articles, kindly let us know.
  L130. “ET is not measured directly by sensors, but is the result from models or reanalyses, and thus…”
  
  The sentence will be corrected as suggested.
  Section 2.3. We used Gaussian Error Propagation in [Fisher et al., 2005] and Method of Moments in [Fisher et al., 2008].
  
  For the period that we reviewed (2011-2021), these methods were not used. We will mention that these methods have been used before our study period.
  L185. Period.
  
  The sentence will be corrected.
  How do you draw the line between diagnostic models, machine learning models, land surface models, etc.? It’s sometimes a blurry distinction.
  
  There have been many literature reviews that categorized diagnostic ET models, which often differ from each other (Courault et al., 2005; Kalma et al., 2008; Wang and Dickinson, 2012; Zhang et al., 2016; Chen and Liu, 2020). The distinction can be blurry when models fit in more than one category. We can distinguish these types:
  
  • Diagnostic vs. prognostic: Diagnostic models estimate the values of ET at the time-of-overpass and upscale to longer period. Prognostic models use data assimilation to predict temporally continuous ET (Wang and Dickinson, 2012).
  
  • Machine learning models use data-driven algorithms to estimate ET, not explicitly involve physical processes, models are trained with ground data.
  
  • Land surface models are models that simulate various processes that occur at the Earth’s land surface, which includes ET. ET is not the main output of these models and is constrained by initial states and other modelled variables (not only input data).
  
  We consider RS-ET estimates from models that have 2 criteria: (1) aim to estimate ET as the main output (diagnostic) (2) using satellite data as input (satellite remote sensing-based). These models fit in the categories reviewed by Courault et al. (2005), Zhang et al. (2016), and Chen and Liu (2020). We will clarify this in section 3.2.
  Figs 5 & 9. I’m not 100% clear on how to read this. It’s not obvious what the top bars correspond to. The figure does not label what are the bottom numbers. It’s not clear what gray vs. black circles are, and what the connecting lines mean. Maybe define TCH/TH in the caption.
  
  We will add more explanation and define TCH/TH in the figure captions. As mentioned in L233, these figures are upset plots, described in Lex et al. (2014). Since many articles could be placed in more than one category, upset plots are used to show not only the number in each category but also the number of articles in each intersection of the categories (the top bar chart). A combination of more than one category identifies an intersection, which is visualized by the black circles connected with a line. The gray circles indicate that the category is not in the intersection. The bar on top shows the number of articles in each intersection. For example, Figure 5 shows that there are 115 articles that used both intercomparison and validation (intersection of validation and intercomparison).
  L243. Curious what are those other approaches?
  
  We recorded those approaches in https://doi.org/10.4121/797dcaff-56e3-45ae-a931-f6f4a3135d26.v1
  
  - Validation of sub-modules in ET models (De la Fuente-Sáiz et al., 2017).
  
  - Comparison of the ET partitioning (not total ET) to evaluate uncertainty due to model parameterization (Miralles et al., 2016).
  
  - Deduction of the analytical relationship between latent heat flux and AOI size in SEBAL to assess uncertainty due to change of spatial support (Tang et al., 2013).
  
  - Using Analysis of Variance (ANOVA) to compare the mean total evaporation estimates for the different land cover types between Landsat 8 and MODIS to assess uncertainty due to input data (Shoko et al., 2015)
  
  - Using temporal patterns of ET per crop type to evaluate compound uncertainty (Sun et al., 2017).
  
  - Using spatial pattern metric and empirical Copula densities to evaluate relative uncertainty (Stisen et al., 2021)
  
  Explicitly listing other approaches seems to be beneficial, we will mention them briefly in section 4. However, we will not discuss them in as much detail as other approaches since they are less used and often in combination with validation or intercomparison.
  Fig 6. Maybe include a secondary y-axis that is the total #.
  
  We will add the time series of the total number of reviewed articles in Figure 6.
  Fig 7. I’m not seeing the water balance residual papers here?
  
  Figure 7 shows the papers in 4.1.1 (using in-situ measurements). The water balance residual papers are in 4.1.2. We will report the number of papers using water balance residual in 4.1 (L253-254).
  L274. Even smaller with sap flow?
  
  Here, we meant in-situ measurement of ET (sum of soil evaporation, transpiration, and interception), while sap flow only measures transpiration. We will add a few sentences on the in-situ measurement of ET components.
  L308. Slightly misleading because then there was the GRACE-FO mission, which should be mentioned.
  
  The sentence will be rewritten to be more accurate:
  
  “However, the TWSA products only cover the period from 2002 with a gap of 11 months from 2017 to 2018 between the GRACE and GRACE-FO missions.”
  Section 4.1.2. I think you’re missing quite a lot of papers here, so you’ll have to re-search and update.
  
  There are quite a significant number of studies that we reviewed that used WB residual as a reference for validation (N=78). We will include the number of papers with the water balance method in the text to signify this. However, we did not cite all papers because the text is about the caveats and potential improvements of the WB method, and not all of them provide insights on these topics.
  
  Of course, we do not claim that our list is exhaustive. Missing papers might be due to the title and abstract, the year of publication did not meet the criteria of our systematic literature search.
  4.3 out of order.
  
  The ‘uncertainty propagation’ paragraphs will be moved to section 4.5 to be consistent with the order of Figure 5.
  Section 4.7. Yunjun Yao and others have been forging forward with many papers in this realm.
  
  Thank you for pointing out the work by Yao. We will review the papers by this author. We want to note that this section discusses the use of ensembles to assess uncertainties in RS-ET estimates, not the advancements of methods to generate these ensembles. Therefore, papers that aimed to improve ensemble methods but not use them to evaluate uncertainty in RS-ET estimates were not included. We will also change the heading of this section to “Using ensemble of RS-ET estimates” to reflect our objective.
  L556. I think it would also depend on the site. If you’re using a site with low ET, then your RMSE is likely to be low, and vice versa.
  
  We also thought that RMSE depends on the site. In our meta-analysis, we recorded the average of in-situ ET
  
  (https://doi.org/10.4121/e6e1713a-0c2b-4775-a7f4-9e6e0b2cf40f.v1). Unfortunately, too many studies did not report this value, so we don’t have sufficient data to compare RMSE with mean ET. Otherwise, it would be an interesting result to test this argument. We made a recommendation to report mean ET in validation studies (L603). We will add this explanation to Section 6.2.
  L581. “in a”
  
  The sentence will be corrected.
  Section 7. One of the major approaches many of us in the community are working towards is improved spatiotemporal resolution of RS-ET. Moving from ECOSTRESS to SBG, multiple Landsats, TRISHNA, LSTM, and Hydrosat. Would that be worth commenting on here?
  
  Thank you for your suggestion. We think that it is best to mention this development in section 5.3.
  L606. Period.
  
  The sentence will be corrected.
  L754. Reference repeated.
  
  Duplication will be removed.
  Here’s a list of more papers to cross-check:
  
  [McCabe and Wood, 2006; Fisher et al., 2009; Glenn et al., 2010; Liang et al., 2010; Blyth and Harding, 2011; Fisher et al., 2011; Jiménez et al., 2011; Mueller et al., 2011; Sahoo et al., 2011; Vinukollu et al., 2011b; Vinukollu et al., 2011a; Polhamus et al., 2012; McCabe et al., 2013; Muelleret al., 2013; Polhamus et al., 2013; Armanios and Fisher, 2014; Chen et al., 2014; Ershadi et al., 2014; Yao et al., 2014; Chen et al., 2015; Feng et al., 2016; McCabe et al., 2016; Michel et al.,2016a; Michel et al., 2016b; Miralles et al., 2016a; Miralles et al., 2016b; Zhang et al., 2016; Yao et al., 2017a; Yao et al., 2017b; Chang et al., 2018; Jiménez et al., 2018; Xu et al., 2018; Gomis-Cebolla et al., 2019; Guillevic et al., 2019; McCabe et al., 2019; Stoy et al., 2019; Pascolini-Campbell et al., 2020; Sadeghi et al., 2020; Wu et al., 2020; Anderson et al., 2021; Bai et al., 2021; Cawse-Nicholson et al., 2021; Melo et al., 2021; Pascolini-Campbell et al., 2021; Pascolini-Campbell et al., 2021; Shang et al., 2021; Tang et al., 2021; Shi et al., 2022; Xie et al., 2022; Yanget al., 2022; Volk et al., 2023]
  
  Thank you for the extensive list of references. We will consider them after reviewing their relevance and eligibility carefully.
  References
  
  Alfieri, J.G., Anderson, M.C., Kustas, W.P. and Cammalleri, C., 2017. Effect of the revisit interval and temporal upscaling methods on the accuracy of remotely sensed evapotranspiration estimates. Hydrology and Earth System Sciences, 21(1), pp.83-98. doi:10.5194/hess-21-83-2017
  
  Courault, D., Seguin, B., Olioso, A.: Review on estimation of evapotranspiration from remote sensing data: From empirical to numerical modeling approaches. Irrig Drainage Syst 19, 223–249. https://doi.org/10.1007/s10795-005-5186-0, 2005.
  
  De la Fuente-Sáiz, D., Ortega-Farías, S., Fonseca, D., Ortega-Salazar, S., Kilic, A., & Allen, R. (2017). Calibration of METRIC Model to Estimate Energy Balance over a Drip-Irrigated Apple Orchard. Remote Sensing, 9(7), 670. doi:10.3390/rs9070670
  
  Gentine, P., Entekhabi, D., Chehbouni, A., Boulet, G. and Duchemin, B., 2007. Analysis of evaporative fraction diurnal behaviour. Agricultural and forest meteorology, 143(1-2), pp.13-29. https://doi.org/10.1016/j.agrformet.2006.11.002
  
  Hoedjes, J.C.B., Chehbouni, A., Jacob, F., Ezzahar, J. and Boulet, G., 2008. Deriving daily evapotranspiration from remotely sensed instantaneous evaporative fraction over olive orchard in semi-arid Morocco. Journal of Hydrology, 354(1-4), pp.53-64. https://doi.org/10.1016/j.jhydrol.2008.02.016
  
  Jiang, L., Zhang, B., Han, S., Chen, H. and Wei, Z., 2021. Upscaling evapotranspiration from the instantaneous to the daily time scale: Assessing six methods including an optimized coefficient based on worldwide eddy covariance flux network. Journal of Hydrology, 596, p.126135. https://doi.org/10.1016/j.jhydrol.2021.126135
  
  Kalma, J.D., McVicar, T.R., McCabe, M.F.: Estimating Land Surface Evaporation: A Review of Methods Using Remotely Sensed Surface Temperature Data. Surv. Geophys. 29, 421–469. https://doi.org/10.1007/s10712-008-9037-z, 2008.
  
  Lex A., Gehlenborg N., Strobelt H., Vuillemot R., Pfister H.: UpSet: Visualization of Intersecting Sets IEEE Transactions on Visualization and Computer Graphics (InfoVis), 20(12): 1983--1992, https://doi.org/10.1109/TVCG.2014.2346248, 2014
  
  Liu, Z., 2021. The accuracy of temporal upscaling of instantaneous evapotranspiration to daily values with seven upscaling methods. Hydrology and Earth System Sciences, 25(8), pp.4417-4433. https://doi.org/10.5194/hess-25-4417-2021
  
  Miralles, D. G., Jiménez, C., Jung, M., Michel, D., Ershadi, A., McCabe, M. F., … Fernández-Prieto, D. (2016). The WACMOS-ET project – Part 2: Evaluation of global terrestrial evaporation data sets. Hydrology and Earth System Sciences, 20(2), 823–842. doi:10.5194/hess-20-823-2016
  
  Shoko, C., Clark, D., Mengistu, M., Dube, T., & Bulcock, H. (2015). Effect of spatial resolution on remote sensing estimation of total evaporation in the uMngeni catchment, South Africa. Journal of Applied Remote Sensing, 9(1), 095997. doi:10.1117/1.jrs.9.095997
  
  Stisen, S., Soltani, M., Mendiguren, G., Langkilde, H., Garcia, M., & Koch, J. (2021). Spatial Patterns in Actual Evapotranspiration Climatologies for Europe. Remote Sensing, 13(12), 2410. doi:10.3390/rs13122410
  
  Sun, L., Anderson, M. C., Gao, F., Hain, C., Alfieri, J. G., Sharifi, A., … McKee, L. (2017). Investigating water use over the Choptank River Watershed using a multisatellite data fusion approach. Water Resources Research, 53(7), 5298–5319. doi:10.1002/2017wr020700
  
  Tang, R., Li, Z.L., Chen, K.S., Jia, Y., Li, C. and Sun, X., 2013. Spatial-scale effect on the SEBAL model for evapotranspiration estimation using remote sensing data. Agricultural and forest meteorology, 174, pp.28-42.
  
  Van Niel, T.G., McVicar, T.R., Roderick, M.L., van Dijk, A.I., Beringer, J., Hutley, L.B. and Van Gorsel, E., 2012. Upscaling latent heat flux for thermal remote sensing studies: Comparison of alternative approaches and correction of bias. Journal of Hydrology, 468, pp.35-46. https://doi.org/10.1016/j.jhydrol.2012.08.005
  
  Wang, K., Dickinson, R.E.: A review of global terrestrial evapotranspiration: Observation, modeling, climatology, and climatic variability. Rev. Geophys. 50. https://doi.org/10.1029/2011RG000373, 2012.
  
  Xu, T., Liu, S., Xu, L., Chen, Y., Jia, Z., Xu, Z. and Nielson, J., 2015. Temporal upscaling and reconstruction of thermal remotely sensed instantaneous evapotranspiration. Remote Sensing, 7(3), pp.3400-3425. https://doi.org/10.3390/rs70303400
  
  Zhang, K., Kimball, J.S., Running, S.W.: A review of remote sensing based actual evapotranspiration estimation. Wiley Interdisciplinary Reviews: Water 3, 834–853. https://doi.org/10.1002/wat2.1168, 2016.
  
  Zhang, X., Wu, J., Wu, H., Chen, H. and Zhang, T., 2013. Improving temporal extrapolation for daily evapotranspiration using radiation measurements. Journal of Applied Remote Sensing, 7(1), pp.073538-073538. https://doi.org/10.1117/1.JRS.7.073538
  
  Citation: https://doi.org/10.5194/egusphere-2023-725-AC3
RC2:
'Comment on egusphere-2023-725', Anonymous Referee #2, 22 May 2023

First of all, I would like to extend my congratulations to the authors for their valuable research presented in the article titled "Uncertainty Assessment of Satellite Remote Sensing-based Evapotranspiration Estimates: A Systematic Review of Methods and Gaps." The authors have demonstrated a significant research effort and I acknowledge the extensive work invested in this study. However, I believe it would be beneficial for the ET community if the discussion also focused on the performance and uncertainties of the analysed products/models, as well as the underlying reasons for their uncertainty/performance. In light of this, I have several comments and suggestions that I believe can contribute to enhancing the manuscript.

1. Due to the nature of a systematic review, it is difficult to differentiate between articles that evaluate the performance of existing ET products and ET-based models. It would be very beneficial to clarify the distinction between evaporation products and the models used to estimate ET. Currently, it is challenging for readers to differentiate between them, making it difficult to follow certain ideas. For instance, in Line 231, the authors discuss eight topics for assessing uncertainty in RS-ET, where some points relate to the evaluation of ET products while others to the models. It would be beneficial to clearly indicate what is defined as RS-ET inthe manuscript and which results are from models or open-acces gridded products.

2. The article is lengthy, and it would be beneficial to condense the sections "Theoretical frameworks" and "Systematic quantitative literature review method" for brevity.

3. The manuscript could benefit from discussing which methods and products perform better in specific contexts. It would be helpful to provide insights on the performance of models and products in relation to specific regions, climates, and relevant factors. For example, i) identifying the errors associated with each method/product; ii) the reported advantages and disadvantages of different models/products; iii) important parameters that drive the estimation of ET in existing models; iv) lessons learned from previous evaluations; and v) which models/products have demonstrated higher physical consistency.

4. In the section "Review of methods for RS-ET uncertainty assessment", the authors could focus on the performance of models/products and relate their findings to specific regions and climates when reported. Addressing questions such as which models performed better in certain areas and why, the sources of uncertainty, the relevance of spatio-temporal resolution in operational applications, the impact of geographical features on model/product uncertainty, and the influence of climate on product performance would greatly enhance this section.

5. Consider reducing the use of acronyms that are infrequently mentioned in the manuscript, as it can improve readability and comprehension.

6. The authors should clarify the timeframe of their study. While they mention focusing on the period from 2011, the end date or year is only specified on L187 stating that the databases were last accessed on 21.09.2021. It would be very valuable to update the research up to a more recent date to provide a comprehensive evaluation :)

7. The authors mention using keywords like "accuracy," "bias," and "precision" to assess uncertainty in products, although these terms differ from the proper definition of uncertainty. It would be important to include the term "performance" in the evaluation, as many studies summarize their findings in terms of model or product performance.

8. Section 6, "Results of RS-ET uncertainty assessment," primarily evaluates articles based on RMSE. However, comparing articles solely on RMSE is not very meaningful, as this goodnes-of-fit metric does not allow for comparisons across areas with different climates and ET patterns. Therefore, the metrics presented in Table 4 (median, mean, quartiles, standard deviations) and Figure 14, which are grouped by evaluated temporal scale may be misleading. A good and valuable reccomentation that the athors could use in their artile could be related to the fact that researchers should report uncertainty/performance metrics using indices that are comparable across studies and not influenced by regional climate or specific ET patterns. It would be valuable to discuss about which metric is reported to be better proxy of model/product performance and which models/products performed better.

9. The summary of the manuscript could address relevant questions for researchers and practitioners, such as recommended evaluations for assessing the performance of ET products when data is and is not available.
___________________________________________________________________________________________________________________
Furthermore, I would like to provide a few additional minor suggestions for improvement:

L10: The authors can emphasise here that evapotranspiration is often referred to as evaporation. As it is currently written, it seems that the authors are referring to both evaporation and evapotranspiration.

L39-42: Here, the authors mention some methods, but the list is not exhaustive. They can add GLEAM to this list for example, which is a well-known method that drives a ET product with the same name.

L44-45: This sentence can be rewritten for better clarity.

L46: The authors mention that retrieving ET estimates from some models requires expertise about the models. However, this is true for every model, so this sentence can be deleted.

L50: This sentence is a bit convoluted. It would be helpful if the authors could clarify their intended meaning.

L51-53: Here, the authors mention that uncertainty assessment helps data users determine the level of confidence they can have in ET estimates and inferred information about water resources. Since readers of this article may be researchers exploring ET products and models for the first time, it would be a good idea to mention that the use of the products is also limited by their spatio-temporal resolution, specific applications, and latency.

L55: "foci" should be changed to "focus." The focus of multiple articles is explained in Table S2.

L59: What do the authors mean by "spatial data production"?

L60: What do the authors mean by "a good practice protocol for operational validation"?

L59: What do the authors mean by "complete documentation"?

Figure 1: This figure is very good and helpful. It will surely assist readers in accessing previous literature review articles. Could the authors complete the list of existing manuscripts related to the review of RS-ET estimation, uncertainty, and validation of products (and models)?

L130: "reanalyzes" should be "reanalyses." Additionally, could the authors rewrite this sentence to better explain what is considered a high level of processing?

L143: What about replacing "true" with "more accurately representing the ET values"?

Figure 3: Please replace "support" with "resolution." Why does the model calculation not have a number? There is uncertainty regarding whether the model is able to resemble physical processes or not. Finally, the authors can mention in the figure that compound uncertainty is the sum of all other uncertainties.

L153: Why specifically refer to Monte Carlo when there are more advanced techniques to assess uncertainty propagation?

L165: Can the authors add a sentence on how the definition of validation has changed over time?

L170: This sentence is not very clear to me. What do the authors mean by "model validation and data" in this context? Maybe the parentheses are misplaced and disrupt the flow of the sentence?

L171-172: Can this sentence be deleted? I think the idea is clearly explained in the following sentences.

L176: This sentence could be rewritten for clarity. Something like: "Validating a model used to derive ET estimates does not necessarily imply that it can be used with different forcing data and provide accurate results. Therefore, when a model is applied to derive ET estimates with different forcings or in different settings, its performance must be evaluated." In the current version, it is difficult to disentangle what is a model, an ET product, and a product based on running the model with different forcings :)

L183: "by" instead of "tby"

L189-190 and Table 1: It would be interesting to know how these terms were chosen. What about other terms like "performance," "quality" (alone), and "error"?

L200: "process" instead of "system."

Figure 4: What does "not using the same method to report uncertainty" mean?

Figure 5 and 9: Why are there 38 articles without any link to a topic? Could the authors provide an explanation in the caption?

Figure 6: It is difficult to see the low values on the graph. Maybe consider using a barplot to visualise this more straightforwardly.

L256: "Estimate ET" instead of "observe ET." Remember that ET cannot be directly observed ;)

Figure 7: I really liked this figure! In the caption, the authors can add an explanation of the "others" category. Are irrigation and water balance articles combined in this category?

L303: Here, the authors could briefly mention the assumptions of the simplified water balance.

L309: Still less known compared to what? Maybe rephrase the sentence to clarify.

L342: Some acronyms are introduced more than once, e.g., SA.

L4447: Maybe consider renaming this subsection to something other than "Research Objectives." For example: "Assessment based on the objectives of the analysed manuscripts."

L580: There is a missing space between "in" and "a."

L580-581: I completely agree that further research should combine local and global evaluation efforts, but including a reason for this in the text could be very beneficial for the readers.

L593-593: I do not completely agree with this argument. The RMSE ranges could serve as a baseline, but we have to keep in mind that they are not directly comparable.

L601: What do the authors mean by "matched as much as possible"?

L602: I do not completely agree with this statement. We should report metrics that enable a fair comparison between regions with different climates/patterns.

L605: What do the authors mean by this statement? Please provide further clarification.

L611-613: I did not understand this sentence. Could the authors please provide additional clarification or rephrase the sentence for clarity?

I hope these comments and suggestions are helpful in improving the manuscript. Once again, congratulations to the authors on their research, and I look forward to reading a revised version of the manuscript.

Citation: https://doi.org/10.5194/egusphere-2023-725-RC2
- AC2: 'Reply on RC2', Bich Ngoc Tran, 19 Jul 2023
  
  Thank you very much for your extensive and constructive suggestions. Our responses are in bold below.
  
  *Major comments*
  
  1. Due to the nature of a systematic review, it is difficult to differentiate between articles that evaluate the performance of existing ET products and ET-based models. It would be very beneficial to clarify the distinction between evaporation products and the models used to estimate ET. Currently, it is challenging for readers to differentiate between them, making it difficult to follow certain ideas. For instance, in Line 231, the authors discuss eight topics for assessing uncertainty in RS-ET, where some points relate to the evaluation of ET products while others to the models. It would be beneficial to clearly indicate what is defined as RS-ET inthe manuscript and which results are from models or open-acces gridded products.
  
  We will clarify in section 3.2 that we focus on the RS-ET estimates, which is defined as ET values or maps obtained from a RS-based data product or by implementing a RS-based model. 17% of the articles assessed open-access gridded products (Figure 8). There are also studies that compare both data products and model outputs.
  2. The article is lengthy, and it would be beneficial to condense the sections "Theoretical frameworks" and "Systematic quantitative literature review method" for brevity.
  
  We will review these sections and reduce the text where it is unnecessarily wordy.
  
  3. The manuscript could benefit from discussing which methods and products perform better in specific contexts. It would be helpful to provide insights on the performance of models and products in relation to specific regions, climates, and relevant factors. For example, i) identifying the errors associated with each method/product; ii) the reported advantages and disadvantages of different models/products; iii) important parameters that drive the estimation of ET in existing models; iv) lessons learned from previous evaluations; and v) which models/products have demonstrated higher physical consistency.
  The suggested topics are important. However, they were not the objectives of this manuscript. Our goal in this study is to investigate the status of the various methods applied for uncertainty assessment of RS-ET estimates, discuss the advances and caveats of these methods, identify assessment gaps, and provide recommendations for future assessment. Our argument is that because these models and products are evaluated using different assessment methods and reference data, it is not reliable to rank their performance and generalize the conclusion to all contexts.
  
  Furthermore, many literature reviews (Figure 1) have discussed some of these topics repetitively:
  
  i) identifying the errors associated with each method/product
  
  ii) the reported advantages and disadvantages of different models/products
  
  iii) important parameters that drive the estimation of ET in existing models
  
  Regarding the performance of models and products in relation to specific regions, climates, and relevant factors and v) which models/products have demonstrated higher physical consistency, we prefer not to draw conclusions from the reviewed literature because not all models have been compared simultaneously. We do think that this could be the focus of a different paper.
  
  However, we do think that “iv) lessons learned from previous evaluations” could be relevant to our manuscript. We have discussed some in section 7. We will extend our discussion to emphasize the role of developing uncertainty assessment methods to investigate the other topics that you mentioned.
  4. In the section "Review of methods for RS-ET uncertainty assessment", the authors could focus on the performance of models/products and relate their findings to specific regions and climates when reported. Addressing questions such as which models performed better in certain areas and why, the sources of uncertainty, the relevance of spatio-temporal resolution in operational applications, the impact of geographical features on model/product uncertainty, and the influence of climate on product performance would greatly enhance this section.
  Section 4 “review methods for RS-ET uncertainty assessment” focuses on the methods of uncertainty assessment (how each method was applied in reviewed literature), not the results of those assessments per se, which is more discussed in Section 6. Therefore, we don’t think focusing on the performance of models/products and their relation to specific regions and climates should be the focus of this section. As we point out in our response above, this is clearly an important topic that could be addressed by another article or even a special issue.
  
  We want to emphasize that other literature reviews (Figure 1) focused on the performance of RS-ET models/products, while our review discusses the methods to assess them, as we have outlined in the research questions. The research questions suggested by the reviewers are important to investigate, and we will add them to our recommendation for future assessments. However, we found that our methods did not aim to answer these questions (i.e., Which models performed better in certain areas and why, the impact of geographical features on model/product uncertainty, and the influence of climate on product performance) and consequently our results do not address them. There are gaps in uncertainty assessment in terms of geographical regions and models, and also inconsistency in methods (Sections 5 and 6). We believe that it is unfair to conclude which models performed better based on the current literature. Furthermore, as reviewer 1 also mentioned, the RMSE depends on the value of ET.
  
  We have discussed the sources of uncertainty in sections 2.2 and 5.2.
  
  We will elaborate on the discussion on the relevance of spatio-temporal resolution in operational applications in section 5.3.
  5. Consider reducing the use of acronyms that are infrequently mentioned in the manuscript, as it can improve readability and comprehension
  
  We agree that some acronyms are not frequently and necessarily used. We will not use these in the revision
  
  • Essential Climate Variables (ECVs)
  
  • Monte Carlo method (MCM)
  
  • Sensitivity Analysis (SA)
  
  • Systematic Quantitative Literature Review (SQLR)
  
  • Web of Science (WoS)
  6. The authors should clarify the timeframe of their study. While they mention focusing on the period from 2011, the end date or year is only specified on L187 stating that the databases were last accessed on 21.09.2021. It would be very valuable to update the research up to a more recent date to provide a comprehensive evaluation :)
  
  We agree that we should clarify the period of our study. 21.09.2021 is the date we started the literature analysis. L187 was rewritten as follows:
  
  “The search result was limited to a publication date from 2011 until 21.09.2021 and then refined using the available filters of Scopus and Web of Science.”
  
  As the body of literature is huge and growing faster than ever (Annex 2), there will always be a gap between the last date of articles accessed and the most recent literature by the time the analysis is complete. At this stage, we consider more than 600 articles and one decade of recent literature is extensive and comprehensive enough to provide conclusions to our research objectives.
  
  7. The authors mention using keywords like "accuracy," "bias," and "precision" to assess uncertainty in products, although these terms differ from the proper definition of uncertainty. It would be important to include the term "performance" in the evaluation, as many studies summarize their findings in terms of model or product performance.
  
  We are not sure if this comment relates to the search terms in Table 1. The definitions of these terms are indeed different from ‘uncertainty’ but, as discussed in Section 2.1, they are used by various authors to describe uncertainty. We acknowledge the term “performance” is also often used. Studies that use the term “performance” usually also mention “uncertainty”, “accuracy”, “data quality”, “variability”, “reliability”, “evaluat*”, and “validat*” in their titles or abstracts. These variants of ‘uncertainty’ keyword were selected by iterating our search several times until the results include all the articles in Supplementary Information Annex 1. Since we combined these terms with “OR” in our search query, we have included all the articles that use either one or more of these terms. We will do a new search with “performance” to double-check.
  
  8. Section 6, "Results of RS-ET uncertainty assessment," primarily evaluates articles based on RMSE. However, comparing articles solely on RMSE is not very meaningful, as this goodnes-of-fit metric does not allow for comparisons across areas with different climates and ET patterns. Therefore, the metrics presented in Table 4 (median, mean, quartiles, standard deviations) and Figure 14, which are grouped by evaluated temporal scale may be misleading. A good and valuable reccomentation that the athors could use in their artile could be related to the fact that researchers should report uncertainty/performance metrics using indices that are comparable across studies and not influenced by regional climate or specific ET patterns. It would be valuable to discuss about which metric is reported to be better proxy of model/product performance and which models/products performed better.
  
  We did not aim to compare articles or models/products based on RMSE. We agree that it is not fair to compare models/products across areas with different climates and ET patterns using solely RMSE. The purpose of Figure 14 and Table 4 is to identify the typical range of reported uncertainty in RS-ET estimates globally (our third research question). The spreading of the RMSE is partially due to the effect of different climates and site-specific conditions. This is why we did not use our results to conclude on which models/products perform better or worse than any others. We will clarify this important issue in Section 6.2.
  
  We find that grouping the reported RMSE by temporal scale is valuable to show the effect of temporal upscaling across hundreds of studies. We will include a discussion on this aspect, as also suggested by Reviewer 1.
  
  It is indeed valuable to report uncertainty using metrics that are comparable across studies, in order to assess which models/products perform better in different context. We will add this to our recommendations in Section 7.
  *Minor comments*
  
  L10: The authors can emphasise here that evapotranspiration is often referred to as evaporation. As it is currently written, it seems that the authors are referring to both evaporation and evapotranspiration.
  
  The sentence will be rewritten as “Satellite remote sensing (RS) data are increasingly being used to estimate total evaporation, often referred to as evapotranspiration (ET), over large regions.”
  L39-42: Here, the authors mention some methods, but the list is not exhaustive. They can add GLEAM to this list for example, which is a well-known method that drives a ET product with the same name.
  
  The list is definitely not exhaustive. We will add GLEAM and PT-JPL.
  
  L44-45: This sentence can be rewritten for better clarity.
  
  L46: The authors mention that retrieving ET estimates from some models requires expertise about the models. However, this is true for every model, so this sentence can be deleted.
  L44-46 will be rewritten as “Furthermore, retrieving ET estimates requires access to the data, software’s or source codes, and expertise in these models. The limited accessibility of RS-ET models leads to significant challenges to operational applications of RS-ET estimates (e.g., irrigation scheduling and drought monitoring).”
  L50: This sentence is a bit convoluted. It would be helpful if the authors could clarify their intended meaning.
  
  L50 will be rewritten as “Given that more RS-ET data products are becoming available, information about the uncertainties in RS-ET estimates is important for data users (i.e., water managers and policymakers) to apply them properly.”
  L51-53: Here, the authors mention that uncertainty assessment helps data users determine the level of confidence they can have in ET estimates and inferred information about water resources. Since readers of this article may be researchers exploring ET products and models for the first time, it would be a good idea to mention that the use of the products is also limited by their spatio-temporal resolution, specific applications, and latency.
  
  Good point. We will add this after L51-53 this sentence “Inferences based on RS-ET data products are also limited by their spatio-temporal resolution, latency, and specifications.”
  
  L55: "foci" should be changed to "focus." The focus of multiple articles is explained in Table S2.
  
  Since we mean to say that each of these reviews has a different focus, we want to keep the plural form of the word. But we understand that this collocation of words might sound odd to some readers. We will change “foci” to “main topics”.
  L59: What do the authors mean by "spatial data production"?
  
  We mean the generation of spatial data, which also covers methods other than remote sensing.
  
  L60: What do the authors mean by "a good practice protocol for operational validation"?
  
  An operational validation workflow as defined by Bayat et al. (2021) has four components, one of which is based on a good practice protocol for validation agreed upon by the community. A good practice protocol for validation is a set of guidelines that are known to produce reliable validation results. For example, the authors have pointed to good practice protocol for validation of Land Surface Temperature (Guillevic et al., 2018), Surface Albedo (Wang et al., 2019), Leaf Area Index (Fernandes et al., 2014), Soil Moisture (Gruber et al., 2020).
  L59: What do the authors mean by "complete documentation"?
  
  Documentation of the ET estimation that provides sufficient information for data users to judge the accuracy and representativeness of the estimates. Allen et al. (2011) have recommended which information to be included in such documentation.
  
  Figure 1: This figure is very good and helpful. It will surely assist readers in accessing previous literature review articles. Could the authors complete the list of existing manuscripts related to the review of RS-ET estimation, uncertainty, and validation of products (and models)?
  
  Figure 1 consists of only literature review articles. The purpose is to direct readers to previous literature reviews and distinguish the topics of those literature reviews from our review. We will search for other relevant review articles to extend the list. It would be a very helpful if you could point to the review articles that you find missing.
  L130: "reanalyzes" should be "reanalyses."
  
  We will change to “reanalyses”.
  
  Additionally, could the authors rewrite this sentence to better explain what is considered a high level of processing?
  
  By ‘level of processing’, we meant that they are model output or results from analyses of less processed data and we referred to data user guides by ESA and NASA. The sentence is rewritten as followed:
  
  “ET is not measured directly by sensors, but it is resulting from models or reanalyses, and thus, RS-ET data products are considered high level of processing by data providers (ESA, 2021; NASA, 2021).”
  
  L143: What about replacing "true" with "more accurately representing the ET values"?
  
  Yes. It sounds clearer.
  
  Figure 3: Please replace "support" with "resolution."
  
  We understand why “resolution” is suggested because it is linked to the resampling of RS data. ‘Resolution’ is how detailed RS data is, measured by the size of the pixel. While ‘support’ is the volume, shape, size, and orientation that measurement represents. In RS data, these two are similar because the support of ET value in a pixel is also the size of that pixel. However, we wanted to use “support” here because when ET estimates are derived from RS data or validated with reference data, uncertainty occurs also due to a ‘change of spatial support’ from the pixel size to the footprint size of the measurement. We will use “scale” because this term is more general and includes both “resolution” and “support” (also “extent” and “spacing”) (Bloschl and Sivapalan, 1995). We will also add a footnote to clarify the terminologies in the text of Section 2.2.
  Why does the model calculation not have a number? There is uncertainty regarding whether the model is able to resemble physical processes or not.
  
  In remote sensing literature, “uncertainty regarding whether the model is able to resemble physical processes or not” is less often acknowledged (Povey et Grainger, 2015; Foody and Atkinson, 2003) unlike in hydrological modeling (Liu and Gupta, 2007; Nearing et al., 2014). This is due to the fact that RS retrieval models usually share common concepts or formulas, especially for low-level data products (e.g., Surface Radiance, NDVI). Since we have argued before that high-level RS data such as ET are outputs of models that often have different concepts and assumptions (e.g., SEB vs. PM), we should indeed include uncertainty from the ‘model conceptualization’, especially for RS-ET processing chain. We will add “model conceptualization” linked with “model calculation” in the figure. We will add this explanation to Section 2.2 as well.
  
  Finally, the authors can mention in the figure that compound uncertainty is the sum of all other uncertainties.
  
  We will mention that compound uncertainty is the aggregation of all other uncertainties in the figure caption.
  
  L153: Why specifically refer to Monte Carlo when there are more advanced techniques to assess uncertainty propagation?
  
  It is the method we observed most frequently when reviewing the literature. We will mention more advanced techniques for readers.
  
  L165: Can the authors add a sentence on how the definition of validation has changed over time?
  
  The sentence will be rewritten as “However, the definition of validation in modeling has become more well-defined over time and is context-dependent (Bellocchi et al., 2011)”. This reference also summarizes different definitions in Table 1.
  
  L170: This sentence is not very clear to me. What do the authors mean by "model validation and data" in this context? Maybe the parentheses are misplaced and disrupt the flow of the sentence?
  
  We understand the confusion. The sentence will be rewritten as “Since RS-ET retrieval models can be used with different sets of satellite data, validation of model and validation of data (i.e., model result or output) should be distinguished.” The paragraph continues to explain what we mean by “model validation/validation of model” and “validation of data/model results”.
  L171-172: Can this sentence be deleted? I think the idea is clearly explained in the following sentences.
  
  We will remove this sentence.
  
  L176: This sentence could be rewritten for clarity. Something like: "Validating a model used to derive ET estimates does not necessarily imply that it can be used with different forcing data and provide accurate results. Therefore, when a model is applied to derive ET estimates with different forcings or in different settings, its performance must be evaluated." In the current version, it is difficult to disentangle what is a model, an ET product, and a product based on running the model with different forcings :)
  
  Thank you for your suggestion. We will rewrite the sentence as “Validating an RS-ET model does not imply that the model can be applied with any forcing data and produce accurate outputs. Therefore, when a model is applied to derive ET estimates with different forcing data or settings, the model output must be evaluated.” Also, in the introduction, we will clarify what we mean by “data product”.
  
  L183: "by" instead of "tby"
  
  This will be corrected.
  
  L189-190 and Table 1: It would be interesting to know how these terms were chosen. What about other terms like "performance," "quality" (alone), and "error"?
  
  We explained this in L194-195. We will move these lines to the previous paragraph.
  
  L200: "process" instead of "system."
  
  We will change that.
  
  Figure 4: What does "not using the same method to report uncertainty" mean?
  
  For metanalysis, we wanted to include studies that assess uncertainty using the same approach (validation), reference data (Eddy Covariance), and metrics. We will add this explanation to the caption.
  
  Figure 5 and 9: Why are there 38 articles without any link to a topic? Could the authors provide an explanation in the caption?
  
  Thank you very much for pointing this out. We realized that these are the articles excluded after scanning full-text, which is why they are not linked with any topic. We made the mistake of not excluding them when visualizing the dataset. We will correct this as well as Figure 4 (the number n=35 was supposed to be 38).
  Figure 6: It is difficult to see the low values on the graph. Maybe consider using a barplot to visualise this more straightforwardly.
  
  We will update the graph to make the low values more visible.
  
  L256: "Estimate ET" instead of "observe ET." Remember that ET cannot be directly observed ;)
  
  Indeed. We will change that.
  
  Figure 7: I really liked this figure! In the caption, the authors can add an explanation of the "others" category. Are irrigation and water balance articles combined in this category?
  
  Thank you. We will add an explanation of the “others” category. The irrigation water balance is different from the catchment water balance (Section 4.1.2). These papers used measurements about rainfall, irrigation, and drainage of agricultural plots to derive ET and did not use a lysimeter, so we put them in a different category. The scale of agricultural plots is at a similar scale as Scintillometer and Eddy Covariance so we consider these in-situ reference. We will clarify this in the figure caption and in the text of section 4.1.
  
  L303: Here, the authors could briefly mention the assumptions of the simplified water balance.
  
  We will add that to the text.
  
  L309: Still less known compared to what? Maybe rephrase the sentence to clarify.
  
  We consider that it is more challenging to estimate uncertainty in the gap-filled dS/dt than in the original dS/dt. We will rewrite the sentence as follows “Some techniques have been developed to reconstruct this gap in the GRACE time series (e.g., Yang et al., 2021). However, the uncertainties in gap-filled dS/dt estimates are still less known than the initial estimates from GRACE and GRACE-FO (Boergens et al., 2022)”
  
  L342: Some acronyms are introduced more than once, e.g., SA.
  
  We will double-check the use of acronyms and avoid introducing them more than once.
  L447: Maybe consider renaming this subsection to something other than "Research Objectives." For example: "Assessment based on the objectives of the analysed manuscripts."
  
  Indeed, the subsection heading does sound a little confusing. We will change it to “Objectives of the reviewed articles”
  
  L580: There is a missing space between "in" and "a."
  
  We will correct that.
  
  L580-581: I completely agree that further research should combine local and global evaluation efforts, but including a reason for this in the text could be very beneficial for the readers.
  
  We will include more justifications for this in the text.
  
  L593-593: I do not completely agree with this argument. The RMSE ranges could serve as a baseline, but we have to keep in mind that they are not directly comparable.
  
  The sentence will be rewritten as “The RMSE range reported in our study can be used as a baseline for future studies that validate RS-ET estimates using Eddy Covariance.”
  L601: What do the authors mean by "matched as much as possible"?
  
  We will rewrite this sentence to improve clarity as follows “Upscaling methods should be applied to RS-ET data to derive estimates at the temporal and spatial scale of reference datasets.”
  
  L602: I do not completely agree with this statement. We should report metrics that enable a fair comparison between regions with different climates/patterns.
  
  We agree that to compare uncertainties of ET between regions with different climates (thus, different ranges of ET), we need to use scale-independent metrics. We will rewrite the recommendations as follows:
  
  “● The four common metrics (RMSE, bias/mean error, correlation coefficient, coefficient of determination), mean ET, the number of data points, and statistical significance test should be reported.
  
  • In addition, uncertainties in RS-ET estimates should be characterized using multiple metrics that are scale-independent to enable comparison between regions with different ranges of ET.”
  
  L605: What do the authors mean by this statement? Please provide further clarification.
  
  We will rewrite the statement as “Validation of RS-ET models and data products should be reported at different levels of spatial and temporal scales, covering multiple locations.”
  L611-613: I did not understand this sentence. Could the authors please provide additional clarification or rephrase the sentence for clarity?
  
  We will rewrite L610-L615 as follows:
  
  “Several studies have aimed to offer spatially explicit uncertainty in thematic classification, such as land cover and soil type. These studies, like the ones mentioned by Woodcock (2002), have primarily focused on qualitative mapping techniques. However, for quantitative remote sensing, which involves mapping continuous variables like ET, there is a need for methods that can effectively characterize spatially explicit uncertainty. Therefore, we strongly recommend the development and application of methods to evaluate spatiotemporal uncertainty in RS-ET datasets.”
  References
  
  Allen, R.G., Pereira, L.S., Howell, T.A. and Jensen, M.E., 2011. Evapotranspiration information reporting: II. Recommended documentation. Agricultural Water Management, 98(6), pp.921-929.
  
  Blöschl, G. and Sivapalan, M., 1995. Scale issues in hydrological modelling: a review. Hydrological processes, 9(3‐4), pp.251-290. https://doi.org/10.1002/hyp.3360090305
  
  Boergens, E., Kvas, A., Eicker, A., Dobslaw, H., Schawohl, L., Dahle, C., Murböck, M. and Flechtner, F., 2022. Uncertainties of GRACE‐Based Terrestrial Water Storage Anomalies for Arbitrary Averaging Regions. Journal of Geophysical Research: Solid Earth, 127(2), p.e2021JB022081.
  
  Gruber, A., De Lannoy, G., Albergel, C., Al-Yaari, A., Brocca, L., Calvet, J.C., Colliander, A., Cosh, M., Crow, W., Dorigo, W. and Draper, C., 2020. Validation practices for satellite soil moisture retrievals: What are (the) errors?. Remote sensing of environment, 244, p.111806.
  
  Liu, Y.Q. and Gupta, H.V., 2007. Uncertainty in hydrologic modeling: toward an integrated data assimilation framework. Water Resources Research, 43 (7), W07401. doi:10.1029/2006WR005756
  
  P. Guillevic, F. Göttsche, J. Nickeson, M. Román (Eds.), Best Practice for Satellite- Derived Land Product Validation, Land Product Validation Subgroup (WGCV/CEOS (2018), p. 58, doi: 10.5067/doc/ceoswgcv/lpv/lst.001
  
  R.A. Fernandes, S.E. Plummer, J. Nightingale, F. Baret, F. Camacho, H. Fang, S. Garrigues, N. Gobron, M. Lang, R. Lacaze, S.G. Leblanc, M. Meroni, B. Martinez, T. Nilson, B. Pinty, J. Pisek, O. Sonnentag, A. Verger, J.M. Welles, M. Weiss, J.-L. Widlowski, G. Schaepman‐Strub, M.O. Román, J. Nicheson. Global Leaf Area Index Product Validation Good Practices. CEOS Working Group on Calibration and Validation - Land Product Validation Sub-Group (2014), doi:10.5067/doc/ceoswgcv/lpv/lai.002
  
  Z. Wang, J. Nickeson, M. Román (Eds.), Best Practice for Satellite Derived Land Product Validation, Land Product Validation Subgroup (WGCV/CEOS) (2019), p. 45, doi: 10.5067/DOC/CEOSWGCV/LPV/ALBEDO.001
  
  Citation: https://doi.org/10.5194/egusphere-2023-725-AC2
RC3:
'Comment on egusphere-2023-725', Anonymous Referee #3, 28 May 2023

‘Uncertainty Assessment of Satellite Remote Sensing-based Evapotranspiration Estimates: A Systematic Review of Methods and Gaps’ by Tran et al., HESS-2023-725
The manuscript surveyed and reviewed the status of the various methods used for uncertainty assessment of remote sensing based estimation of evapotranspiration. It discussed the advances and caveats of the different methods, identified assessment gaps, and provided recommendations for future studies.
This reviewer considers such an assessment very useful for the community in using the various RS-ET estimates in hydrological studies. It feels however that some important aspects are missing which concern the model physics and dynamics and the considered physical processes in estimating ET using remote sensing data as input. The urgent challenge to the hydrological remote sensing community is therefore investigating the physics and dynamics of the processes involved in evapotranspiration and devising adequate methods to represent such processes in generating the RS-ET estimates. Once a chosen model is able to adequately represent such physics and dynamics for a few quality controlled reference in-situ sites, the uncertainty in their application to other sites and the globe is considerably reduced, because we can confidently expect that the physics is the same everywhere and the dynamics can be attributed to the temporal resolution of the model and the input data.
Fig. 14 needs some more explanation for the different symbols (this is obviously a box plot, but it is not clear to the reader by itself what the different statistics are compared to Table 4).

Citation: https://doi.org/10.5194/egusphere-2023-725-RC3
- AC1: 'Reply on RC3', Bich Ngoc Tran, 19 Jul 2023
  
  Thank you very much for your comments and suggestions. Our responses are in bold below.
  The manuscript surveyed and reviewed the status of the various methods used for uncertainty assessment of remote sensing based estimation of evapotranspiration. It discussed the advances and caveats of the different methods, identified assessment gaps, and provided recommendations for future studies.
  This reviewer considers such an assessment very useful for the community in using the various RS-ET estimates in hydrological studies. It feels however that some important aspects are missing which concern the model physics and dynamics and the considered physical processes in estimating ET using remote sensing data as input. The urgent challenge to the hydrological remote sensing community is therefore investigating the physics and dynamics of the processes involved in evapotranspiration and devising adequate methods to represent such processes in generating the RS-ET estimates. Once a chosen model is able to adequately represent such physics and dynamics for a few quality controlled reference in-situ sites, the uncertainty in their application to other sites and the globe is considerably reduced, because we can confidently expect that the physics is the same everywhere and the dynamics can be attributed to the temporal resolution of the model and the input data.
  Indeed, it is very important to investigate the physics and dynamics of the processes involved in ET. However, that is not the intention of this paper. Our premise is that given the availability of satellite data, we have the opportunity to estimate ET based on its relationship with variables that are observable from satellites. There have been many models developed to represent processes (physically-based) or to derive ET from data (empirical or semi-empirical), as reviewed by many authors (Figure 1 in this paper). However, the methods to evaluate the uncertainty of these models are not consistent (this paper).
  Regardless of the model physics, assessment of uncertainty in RS-ET estimates is needed for the end-users of these estimates. Here, we are considering the uncertainty in RS-ET estimates, which depends not only on the model physics but also the input data. As mentioned in L170-175, if a model is validated in a few sites, the uncertainty in RS-ET outputs in other sites with different characteristics can be different.
  It is challenging to assess uncertainty everywhere with only a few in-situ sites. The physics is expected to be the same everywhere, but the dominant processes and factors are not the same everywhere (Zhang et al., 2016). The quality of RS observations is not the same everywhere due to spatially varied atmospheric conditions. The quality of meteorological input data is also not the same everywhere. Therefore, we recommend that multiple assessment methods are needed. This will help understand better whether the uncertainty can be attributed to input or model.
  Fig. 14 needs some more explanation for the different symbols (this is obviously a box plot, but it is not clear to the reader by itself what the different statistics are compared to Table 4).
  Thank you for pointing this out. We will add a legend for the boxplot and probability density curve in Figure 14 and explain their relations with Table 4.
  
  Citation: https://doi.org/10.5194/egusphere-2023-725-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (20 Jul 2023) by Alexander Gruber

AR by Bich Ngoc Tran on behalf of the Authors (07 Sep 2023) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (26 Sep 2023) by Alexander Gruber

RR by Joshua Fisher (27 Sep 2023)

ED: Publish as is (20 Oct 2023) by Alexander Gruber

AR by Bich Ngoc Tran on behalf of the Authors (07 Nov 2023) Manuscript

Journal article(s) based on this preprint

20 Dec 2023

| Highlight paper

Uncertainty assessment of satellite remote-sensing-based evapotranspiration estimates: a systematic review of methods and gaps

Bich Ngoc Tran, Johannes van der Kwast, Solomon Seyoum, Remko Uijlenhoet, Graham Jewitt, and Marloes Mul

Hydrol. Earth Syst. Sci., 27, 4505–4528, https://doi.org/10.5194/hess-27-4505-2023,https://doi.org/10.5194/hess-27-4505-2023, 2023

Short summary Executive editor

Bich Ngoc Tran, Johannes van der Kwast, Solomon Seyoum, Remko Uijlenhoet, Graham Jewitt, and Marloes Mul

Supplement

https://doi.org/10.5194/egusphere-2023-725-supplement

Data sets

Systematic Quantitative Literature Review - Uncertainty assessment of Evapotranspiration Remote Sensing Bich Tran https://doi.org/10.4121/797dcaff-56e3-45ae-a931-f6f4a3135d26.v1

Meta-analysis of Remotely sensed Evapotranspiration validation with Eddy Covariance Bich Tran and Marloes Mul https://doi.org/10.4121/e6e1713a-0c2b-4775-a7f4-9e6e0b2cf40f.v1

Bich Ngoc Tran, Johannes van der Kwast, Solomon Seyoum, Remko Uijlenhoet, Graham Jewitt, and Marloes Mul

Viewed

Total article views: 1,190 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
685	478	27	1,190	53	13	15

HTML: 685
PDF: 478
XML: 27
Total: 1,190
Supplement: 53
BibTeX: 13
EndNote: 15

Views and downloads (calculated since 25 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	140	54	4	198
May 2023	192	96	8	296
Jun 2023	52	63	2	117
Jul 2023	76	46	8	130
Aug 2023	26	53	0	79
Sep 2023	65	54	3	122
Oct 2023	61	51	2	114
Nov 2023	55	39	0	94
Dec 2023	18	22	0	40
Jan 2024	0
Feb 2024	0
Mar 2024	0
Apr 2024	0
May 2024	0
Jun 2024	0
Jul 2024	0
Aug 2024	0
Sep 2024	0

Cumulative views and downloads (calculated since 25 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	140	54	4	198
May 2023	192	96	8	296
Jun 2023	52	63	2	117
Jul 2023	76	46	8	130
Aug 2023	26	53	0	79
Sep 2023	65	54	3	122
Oct 2023	61	51	2	114
Nov 2023	55	39	0	94
Dec 2023	18	22	0	40
Jan 2024	0
Feb 2024	0
Mar 2024	0
Apr 2024	0
May 2024	0
Jun 2024	0
Jul 2024	0
Aug 2024	0
Sep 2024	0

Viewed (geographical distribution)

Total article views: 1,175 (including HTML, PDF, and XML) Thereof 1,175 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 04 Sep 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1598 KB)
Metadata XML

Short summary

Satellite data are increasingly used to estimate evapotranspiration, or the amount of water lost from plants and soil, over large areas. However, uncertainties from various sources can affect the accuracy of these estimates. This study reviews the current methods used to assess the uncertainties of these estimates and identifies specific recommendations to provide a comprehensive interpretation that assists the potential uses of these estimates for research, monitoring, and management.


Total:	0
HTML:	0
PDF:	0
XML:	0