the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An Adaptive Method to Estimate Evapotranspiration using Satellite and Reanalysis Products
Abstract. Accurate estimation of evapotranspiration (ET) is critical for hydrological, agricultural, and climate-related applications. However, spatially and temporally consistent ET datasets are often limited, particularly in regions like Ireland, where cloud cover is high and ground-based observations are sparse. This study evaluates ten global, operational open-access ET products by comparing them to Penman-Monteith (PM) reference values derived from weather station data across 22 locations in Ireland between 2019 and 2023. Systematic errors were identified in all ET products, varying across sites, seasons, and years. An adaptive bias correction (AB) method was applied, which dynamically adjusts each product based on recent errors. Although the AB method significantly improved individual ET estimates, no single product consistently exhibited superior performance under all conditions. To further enhance ET accuracy, a novel Combination (COM) method was introduced. This method assigns dynamic weights to each bias-corrected ET product based on recent skill scores, enabling the creation of an optimally merged ET estimate. Unlike traditional static statistical methods, which are interpretable but inflexible, and machine learning approaches, which are adaptive but opaque and data-intensive, the COM method offers a transparent, computationally efficient, and interpretable solution. It requires minimal historical data and runs efficiently on non-specialised systems, making it particularly suitable for operational settings. Results show that the merged COM product outperformed all individual ET datasets, achieving lower errors and stronger correlations with PM observations. Given the persistent cloud cover and variable satellite retrieval accuracy in regions like the Ireland, the ability to adapt to recent performance represents a significant advancement. Overall, the proposed adaptive merging framework provides a scalable, lightweight solution for improving ET monitoring. This method holds promise for enhancing operational hydrology, agricultural decision-making, and climate impact assessments in Ireland and other regions facing similar challenges.
- Preprint
(1937 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-4221', Anonymous Referee #1, 20 Nov 2025
-
AC1: 'Reply on RC1', Haneen Muhammad, 14 Jan 2026
We sincerely thank the reviewer for the thoughtful and constructive comments, as well as for the positive assessment of the clarity and overall quality of the manuscript. We agree with all points raised and will address them fully in the revised manuscript. Our responses are summarised below.
Major Comments
- Temporal coverage and dataset versions
We agree this section requires clarification. In the revision, we will specify the exact versions of all 10 ET products and provide their temporal coverage. This additional detail will clarify the true temporal overlap between the products. - Inclusion of 2018
We greatly appreciate the reviewer’s suggestion regarding the inclusion of 2018, as we may not have fully considered the implications of omitting this year. In the revised manuscript, we will take this recommendation into account and incorporate 2018 appropriately, with corresponding updates to the analysis and discussion. - Expanded justification of study period
We agree and will revise Section 2 to provide a clearer explanation of the chosen analysis period. The updated table described in Point 1 will also help justify the study period by clearly showing the temporal coverage and overlap of all products.
Minor Comments
- Grass-reference assumption
We agree and will add the FAO-56 citation. - Disaggregation of 10-day composites
We acknowledge that the disaggregation–reaggregation procedure represents a physical simplification; however, such temporal standardisation is essential for harmonising ET products with differing native resolutions in comparison and fusion studies. This approach has precedent in the remote-sensing ET literature. Numerous studies uniformly distribute multi-day composite ET (e.g. 8-day totals) to daily mean values prior to analysis or further aggregation (e.g. Ehlert et al., 2024; Etchanchu et al., 2025; Fu et al., 2025; Harvey et al., 2023; Liu et al., 2025; Maherry et al., 2016; Montibeller et al., 2021; Petrakis et al., 2024). Some studies work directly with the resulting daily ET (mm day⁻¹), while others subsequently reaggregate to other temporal scales. For example, Petrakis et al. (2024), Liu et al. (2025), and Montibeller et al. (2021) disaggregate 8-day composites before aggregating to monthly totals. In this study, we reaggregate to 8-day totals to maintain consistency with the native MODIS ET product format, which is widely adopted in ET studies, and to ensure methodological consistency across products used in the fusion framework. Appropriate citations supporting this assumption will be added in Section 3.1.2. - Comparison with machine-learning approaches
We will soften the comparison and cite the recommended study. - Figures 4–5 placement
We agree and will move them to the Results section. - Typos and acronyms
All wording, spacing, and acronym consistency issues will be corrected. - Updated citation
We will include the suggested recent reference in the Introduction. - Acronyms in figure captions
We will restate key acronyms in relevant captions. - Coastal vs. inland classification
We will briefly state the classification method in Section 5.1 or Figure 1. - Transferability and limitations
We will add a short paragraph addressing applicability to other land covers, climates, and data-sparse regions. - Uncertainty in PM benchmark
We will include a short paragraph on uncertainties associated with PM-based reference ET.
We thank the reviewer again for the valuable and constructive feedback, which will be fully incorporated in the revised manuscript.
Citation: https://doi.org/10.5194/egusphere-2025-4221-AC1 - Temporal coverage and dataset versions
-
AC1: 'Reply on RC1', Haneen Muhammad, 14 Jan 2026
-
RC2: 'Comment on egusphere-2025-4221', Anonymous Referee #2, 10 Dec 2025
This manuscript presents an innovative contribution to the field of evapotranspiration (ET) estimation. The evaluation of global ET products against a robust Penman-Monteith (PM) reference dataset derived from weather stations is thorough and well-executed. It provides valuable insights into spatial and temporal product performance. The proposed adaptive bias correction (AB) and combination (COM) methods represent a clever alternative to static statistical fusions or machine learning approaches that lack transparency. However, the manuscript would be improved by addressing the specific comments below.
1) The reanalysis products (section 2.2) include MODIS vegetation inputs. How do you account for the inclusion of satellite data into these products? I suggest including a brief discussion of the implications of the shared data for interpretation of relative performance and for independence of the final ensemble product.
2) In section 2.3 (specifically lines 145 – 152), the assumption that ET = ETo should be supported. Evidence that the grass at all stations was consistently well-watered and unstressed is quite important. You could use LAI at each station to indicate stability.
3) In line 185, conversion to depth. Latent heat flux in many products are given as W/m2. It needs to be clearly stated how these were converted to mm of water equivalent. Was pixel area involved (energy flux is already per unit area)?
4) Which variables does flux refer to? It is unclear because in section 2.2, flux is mentioned as latent heat net flux (line 133). You indicate that latent heat net flux represents total ET but also provides hourly ET estimates. Please clarify. It may be useful to replace generic uses of flux with “latent heat flux”, “eddy-covariance measurements”, “FLUXCOM”, etc. to avoid confusion.
5) Validation uses single pixel-extraction at each station, but many products have coarse native resolution (for example >5 km). Point to pixel comparisons may miss strong coastal gradients or small scale topography changes captured by finer resolutions. The validation should use 3x3 or 5x5 pixel averaging centered on each station with changes in key metrics reported. Otherwise, you should explicitly discuss and justify why single pixel extraction is preferred despite known scale mismatches.
6) In sections 3.3.1 – 3.3.2 (figures 4 – 5), the optimization of rolling windows (iw=4 for AB, iw=8 for COM) via mean error/RMSE minimization is a nice touch, but the rationale for testing only 1–12 days feels ad hoc. Why not include longer windows (like 16–30 days) to capture intra-seasonal variability? A supplementary sensitivity table showing skill scores across a broader range would demonstrate method stability and preempt critiques on overfitting to the 2019 – 2023 period.
7) Section 5.3 provides comparison of your adaptive method to traditional static fusion methods, and the positioning of AB/COM against static (BMA, TC) and ML (ANN, ERT) techniques is compelling, with evidence of superior error reduction (i.e. RMSE 1.40 mm/8d for COM compared to literature benchmarks). However, a quantitative head-to-head (perhaps reimplementing a simple BMA on your data for direct RMSE/CC comparison) would bolster claims of "significant advancement." Also, I suggest addressing computational trade-offs more explicitly (while COM is "lightweight," how does its runtime scale for national gridding, in section 7)?
Minor errors:
Line 24: leaf area “infuleced” is misspelled and should be corrected to “influenced”
Line 133: “latent” should be capitalized at the beginning of the sentence. Also, the variables, such as (lhtfl1have-sfc-fc-gauss) would be better located in Table 1, along with the initial units of each variable (to address line 176).
Table 1: include the versions of each product.
Lines 168 and 169 Change “sunshine” to daylight.
Please add references for JRA_3G in section 2.2.4.
Figure 3. The words “Observations” and “Aggregation” are misspelled.
Figure 7 caption, add indication of dashed line, such as: "Dashed arc: PM SD benchmark." or "Dashed arcs: perfect correlation and PM standard deviation benchmark."
Citation: https://doi.org/10.5194/egusphere-2025-4221-RC2 -
AC2: 'Reply on RC2', Haneen Muhammad, 14 Jan 2026
We sincerely thank the reviewer for the constructive and detailed comments, and for the positive assessment of the novelty and robustness of the proposed methodology. We agree with all comments and confirm that they will be fully addressed in the revised manuscript. Our responses are summarised briefly below.
Major Comments
- Shared satellite inputs in reanalysis products
We agree and will add a short discussion in Section 5.1 addressing the implications of shared MODIS vegetation inputs for interpretation of relative performance and independence of the final ensemble product. - Support for the ET = ETo assumption
We agree and will strengthen Section 2.3 by adding supporting evidence and citations confirming that the grass reference stations are well maintained and representative of unstressed reference conditions. We will prioritise station-scale evidence most relevant to the PM reference and will also consider supplementary indicators such as LAI. - Conversion of latent heat flux to ET depth
We agree and will explicitly describe the conversion from latent heat flux (W m⁻²) to water-equivalent depth (mm) in Section 3.1.2, clarifying that no pixel-area scaling is required as fluxes are already expressed per unit area. - Clarification of “flux” terminology
We agree and will revise all the text to consistently use precise terminology (e.g. latent heat flux) to avoid ambiguity. - Point-to-pixel scale mismatch in validation
We agree and will address this by either incorporating 3x3 or 5x5 spatial averaging for products with updated metrics, or by explicitly justifying the use of single-pixel extraction and discussing its limitations. - Rolling-window sensitivity range
We agree and will extend the sensitivity analysis to include longer windows and provide a supplementary table summarizing performance across a broader range to demonstrate methodological stability. - Comparison with BMA and computational considerations
We agree and will include a direct quantitative comparison with a simple BMA implementation, and expand the discussion of computational cost and scalability in Section 7.
Minor Comments
All minor corrections (spelling errors, capitalization, table additions, missing references, and figure caption clarifications) will be implemented in the revised manuscript as suggested.
Regarding the proposed terminology change from "sunshine" to "daylight", we respectfully retain the term "sunshine". In this study, "sunshine" refers specifically to the meteorologically measured variable sunshine duration (defined as periods of direct irradiance exceeding 120 W m⁻²), as recorded by the Campbell–Stokes recorders and/or the SPN1 pyranometer at the Met Éireann stations. This differs from "daylight", which denotes the astronomical interval between sunrise and sunset. Retaining "sunshine" therefore ensures technical accuracy and consistency with WMO standards and the observed variable.
We again thank the reviewer for the valuable and constructive feedback. All discussed revisions will be incorporated into the updated manuscript.
Citation: https://doi.org/10.5194/egusphere-2025-4221-AC2 - Shared satellite inputs in reanalysis products
-
AC2: 'Reply on RC2', Haneen Muhammad, 14 Jan 2026
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 225 | 167 | 30 | 422 | 15 | 17 |
- HTML: 225
- PDF: 167
- XML: 30
- Total: 422
- BibTeX: 15
- EndNote: 17
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript presents an adaptive framework for estimating evapotranspiration (ET) by combining ten satellite-derived and reanalysis ET products over Ireland. The authors first construct a grass-reference ET benchmark using the Penman–Monteith equation and ground-based data, then evaluate individual products against this reference. Building on this, they propose an AB preprocessing scheme that (i) applies seasonal bias correction and (ii) dynamically updates combination weights based on recent skill scores (error, bias, RMSE, correlation).
Overall, the paper is good and well-written, with a clear structure, strong methodological grounding, and meaningful results. However, several important clarifications and revisions are needed before it can be considered for publication.
Major comments:
1) In Section 2, the manuscript states: “Data were extracted for the period 2019–2023, the earliest interval where all 10 ET products overlapped completely. Although 2018 also offered full coverage, it was excluded to avoid bias from that year’s extreme European heatwave.”
This statement appears to be factually incorrect or at least misleading. Most of the selected satellite and reanalysis ET datasets (GLEAM, ERA5-Land, GLDAS, MOD16, MERRA-2, WaPOR, SSEBop, JRA-3Q, LSA-SAF) have global coverage starting well before 2019, typically in the 2000s or even earlier. Correct this statement or provide a more precise justification; you may consider the earliest period with homogeneous versions, gap-filled “GF” products, post-reprocessing consistency, or stable input forcing. I recommend adding the temporal coverage in Table 1. Also add and specify the exact version used (e.g., GLEAM v4.1a, WaPOR v3, SSEBop v6, ERA5-Land, MERRA-2, etc.).
2) The manuscript notes that 2018 “offered full coverage” but was excluded “to avoid bias from that year’s extreme European heatwave.” The paper aims to develop an adaptive method that should be robust to varying conditions (including extremes). Excluding a documented extreme year may weaken the claim that the method is suitable for operational and climate-related applications, where extremes are precisely the periods of greatest interest. As you have the 2018 dataset, and no sensitivity test is shown to demonstrate how including 2018 would affect the skill scores, this would add value to your paper. More clearly justify why 2018 must be excluded (e.g., known data quality anomalies or product discontinuities), not only because it is an extreme meteorological year.
3) Given that the core of this study is multi-product fusion, the temporal coverage explanation in Section 2 is too brief and currently causes confusion. A clearer justification of the analysis period—supported by a table summarizing coverage dates, version numbers, processing levels, and potential reprocessing events—would greatly improve the transparency and reproducibility of the study.
Minor comment:
1) the manuscript states: “For methodology development under well-maintained synoptic station conditions, Ks = Kc = 1 … We therefore assume the grass reference ETo represents actual ET from the grass surface at these stations (ET = ETo).”
This assumption is standard in FAO-56 terminology, but it requires proper citation such as Allen, R. G., Pereira, L. S., Raes, D., & Smith, M. (1998). Crop Evapotranspiration—Guidelines for Computing Crop Water Requirements. FAO Irrigation and Drainage Paper 56.
2) The sentence “10-day resolution composites were uniformly distributed into daily averages, then reaggregated into 8-day summed totals” describes a temporal disaggregation–reaggregation procedure that implicitly assumes constant ET across each 10-day interval. While this approach is dimensionally consistent, it represents a substantial physical simplification because ET varies significantly from day to day with changes in radiation, temperature, humidity, and wind. Please justify this methodological choice and clarify. Please justify this methodological choice. If such methods have precedent in the remote-sensing ET literature, please provide an appropriate citation to support this assumption.
3) In the Introduction, the paper compares its results to machine learning approaches; however, since the study does not directly evaluate ML models—and doing so would be outside the scope—I recommend tempering the degree of comparison. Machine learning and bias-correction methods have been widely applied in recent work, such as “Analysis of historical global warming impacts on climatological trends for the partially gauged Hirmand River Basin based on multiple data products and bias correction methods.” I strongly recommend considering and citing this study to provide a more balanced context and to strengthen the discussion of existing methodological alternatives. This will also help position your method as one of the potential complementary approaches within the broader suite of emerging ET estimation techniques.
4) Figures 4 and 5 currently appear in the Methods section, but they clearly present results of parameter testing (the sensitivity of the AB and COM window sizes to performance metrics). These figures are therefore conceptually part of the Results rather than the methodology.
5) Typos and Acronyms: Abstract: “regions like the Ireland” → should be “regions like Ireland” (drop “the”). Please check for small spacing issues such as “8days”, “mm/8days”, “Km” vs “km”, and make them consistent (e.g., “8 days”, “mm per 8 days”, “km”). PM is sometimes referred to as “Penman–Monteith”, “PM”, and “PM model”. Standardize the phrasing, e.g., “Penman–Monteith (PM) reference ET” on first use and then use “PM” consistently. AB and COM are clearly defined, but in the Methods and Results it would help to remind the reader once (e.g., “COM (combined product)”) when first mentioned in Section 4.
6) In the Introduction: “Remote sensing and reanalysis products provide ET estimates with broad spatial coverage, making them particularly useful for regions with sparse ground-based measurements (Li et al., 2009).” In this sentence, the citation is old. I strongly recommend considering and citing “Assimilation of Sentinel‐Based Leaf Area Index for Modeling Surface‐Ground Water Interactions in Irrigation Districts” to strengthen your sentence.
7) Some figures (e.g., Figures 6, 7, 11, 13, 15) are well described, but you might briefly restate key acronyms in captions (e.g., “PM = Penman–Monteith reference ET; COM = combined ET product”) so figures can be interpreted independently of the main text.
8) In Section 5.1 and Figure 15, you distinguish “coastal” vs “inland” stations. It would be helpful to briefly state how this classification was made (e.g., distance threshold from coastline, visual assessment), or add a note to Figure 1 or the text.
9) The study focuses on well-maintained synoptic stations over predominantly grassland surfaces in Ireland, using PM-based grass reference ET as the benchmark. While this is appropriate for method development, it would be useful to expand the Discussion to address transferability of the AB/COM framework to (i) other land-cover types (e.g., crops, forests), (ii) more water-limited or arid climates, and (iii) regions with sparser or lower-quality meteorological data.
A short paragraph explicitly outlining key limitations and assumptions (e.g., reliance on a high-quality PM benchmark, grass reference conditions, relatively humid maritime climate) would help readers understand in which contexts the method is expected to perform well and where additional adaptation or testing would be needed.
10) Since all skill scores are computed relative to the PM-based benchmark, it would be helpful to briefly discuss the uncertainty in the reference ET itself (e.g., effects of gap-filled radiation, wind, humidity; representativeness of station-scale PM ET for the product pixel). Even a short qualitative statement or a reference to typical PM uncertainties would help frame the evaluation.
After these revisions are completed, I believe the paper will be of high quality and suitable for publication.