Triple collocation validates CONUS-wide evapotranspiration inferred from atmospheric conditions
Abstract. Large-scale estimation of evapotranspiration (ET) remains challenging because no direct remote sensing estimates of ET exist and because most data-driven estimation approaches require assumptions about the impact of moisture conditions and biogeography on ET. The surface flux equilibrium (SFE) approach offers an alternative, deriving ET directly from atmospheric temperature and humidity under the assumption that conditions in the atmospheric boundary layer reflect ET’s land boundary condition. We present a 4 km resolution, continental United States-wide, daily ET dataset spanning from 1979 to 2024 using the SFE method. The Bowen ratio is first calculated using the SFE method solely based on temperature and specific humidity estimates from gridMET and then converted to ET using net radiation and ground heat fluxes from ERA5-Land. We evaluate its performance using extended triple collocation to estimate the standard deviation of the random error and the correlation coefficient of SFE ET compared to true ET, as well as those of three widely used alternative ET datasets: GLEAM, FluxCom, and ERA5-Land. Despite its extreme simplicity, SFE ET achieves performance comparable to or exceeding the other datasets across large portions of CONUS, particularly in the Western U.S., while requiring no information about land surface, vegetation, or soil properties and no assumptions about ET’s response to environmental and climate drivers. Our results support the use of SFE as a scalable, observation-driven method for estimating ET.
This manuscript shows a rigorous analysis of the uncertainties in evapotranspiration estimates from "classical" and an alternative method using triple collocation analysis. It is very well written, clear, sound, relevant, and fits well into the scope of HESS. I recommend this manuscript to be published after minor revisions, which mainly concern clarifications of the methodology and justifications for certain assumptions.
My two main concerns are:
Sec. 2.2: How justified is the linear error model for ET? Given the non-linear nature of Eq. 1, I am a bit worried that it might not be. Then again, I don't know much about error structures in ET data, so this is more of a personal gut feeling. I can image other people having simlar concerns though, so perhaps you could add some words on that, or a reference to previous work that had looked into that?
Discussion: Your discussion revolves around the different patterns you see in \sigma_eps and R_T. If I understand your your methodology correctly, you compare *unscaled* \sigma_eps estimates (Eq. 6). How meaningful is such a comparison? In Fig 2. you show clearly that the different data sets have a different mean and variability, thus we do expect variations in the \beta terms. I'm not an ET guy, but I assume that most data set applications would try to get rid of any systematic error and therefore scale the random errors accordingly. So I would argue that it only makes sense to compare scaled random error variances, i.e., \sigma_eps, that relate to the same signal variability. After all, it is the signal-to-noise ratio that determines how well the data set information can be separated from underlying noise, and this is directly reflected (in a normalized way) by R_T.
Other comments:
It is stated repeatedly that one advantage of the SFE method is that it doesn't make assumptions about root-zone soil moisture or vegetation status. I understand that the SFE method doesn't require one to do that directly, but aren't such assumptions necessary for the computation of air temperature and humidity that are used as input for the SFE method?
The term "error" and variations thereof are used a bit loosely. There is currently a push to harmonize the terminology concerning "errors" across communities; I recommend having a look at Merchent et al. (2017) and consider adopting their proposed terminology (in particular the useage of "error" vs. "uncertainty").
Sec. 2.3 I'm missing an explaination what you do with the redundant TCA estimates from the different triplets.... In the supplement, you show the results of the individual triplets, which is fine, but in the main text, it is not clear what you show... I assume it is the average of the estimates from all triplets? Did you average for both \sigma_eps and R_t? If so, it is generally advised NOT to average correlation coefficients, but this advise comes from averaging Pearson correlations; I'm not sure if that holds here too. One could actually throw in all the four data sets in a least-squares estimator to get adjusted estimates for the signal and error variance (see Gruber et al., 2016), and then derive the R_t estimates from these, which may be a bit more robust but I'm only speculating here. Anyway, I think it would be good to at least elaborate what you did/show.
Sec. 3.2: Section titles for 3.1. and 3.3 state what its shown whereas the section title for 3.2. is a spelled out conclusion. In the discussion, titles change again to questions. I suggest just choosing one title naming style and stay consistent.
L81-82: "This suggests that..." The use of the embedded relative clause with a dangling preposition felt a bit awkward to read, I suggest rephrasing this sentence.
L104: As far as I know, triple collocation is similar, but not the same as the "three-cornered hat" approach (see e.g., Sjoberg et al., 2021). I recommend to just remove this parenthetical clause.
L139: C_p here is upper case but in Eq. 1 it's lower case. Also, perhaps change all equations symbols in text to equation mode (italic) to be consistent with the Equations?
L145: Why 10%? Can you justify that number, and might it be useful to mention the implications of this assumption?
L161: I find the explaination "By treating the product of \sigma_T as a single unknown variable ..." a bit misleading. It is not the fact that they are treated as a single variable which lets you solve for the error variance, its the fact that the betas for two data sets cancel out in the covariance ratios, which then lets you get rid of the sigma_T term by subtracting the resulting estimate from the variance of the data set.
L199: "increasing the robustness of TC assumptions" sounds a bit odd. I guess you mean that convergence of error estimates incrases our confidence that the assumptions are valid?
L278--: You compare the ET estimates qualitatively and mention some numbers in the text, but I think it could be useful to also show a summary table with all the relevant metrics (e.g., correlations and biases between all data set combinations).
L421: The acronym MAP hasn't been introduced.
L472-482: Doing a weighted averaging comes from least squares theory and serves the purpose of reducing random errors only. I guess what is meant with "this aproach has the disadvantage of obscuring the individual problems" is that if data sets have different systematic errors, especially if they are non-stationary, then you create some uncontroled blend of biased estimates, and any improvement is only a matter of luck because weights derived from random error variances do not account for these biases that are instead assumed to be zero.
L482-484: Isn't this statement trivial and already implied by the paragraph's introductory statement: "It is posible to average ET estimates weighted by each dataset's performance"?
L521: Why is this contrary to expectation? You do state that this might have to do with the lower ET amounts in these regions, so considering my argument in the beginning concerning scaling in TCA, I would argue that this is simply a result of showing unscaled \sigma_eps estimates. When looking at signal-to-noise ratios instead, this gradient vanishes, right?
L589: "complex" instead of "complicated"?
Eq. (4)-(7): The introduction of Q_ii seems a bit unnecessary to me. Since you define Q_ii just as equivalent to \sigma^2_ii, you could use the latter instead of Q directly in Eqs. 6 and 7, which I don't think would make it any more difficult to read. This might be just a personal preference though.
Figure 1: The x-axis date labelling confused me when I first looked at it. The figure caption only states "Mean annual SFE from 1979 to 2024"... Perhaps also spell out the date range shown in the example time series: "Points show time series for [...] from Dec. 2000 to Dec. 2002"?
Figure 7/8: The order of the Figure panels is inconsistent.
Supplement: I always find it hard to visually compare patterns like these. You draw the conclusion that differences are small when using different triplets, therefore assumptions can be considered to be valid. But when exactly are differences "small enough" to draw this conclusion? There isn't an aweful lot of contrast in the figures, and there indeed seem to be regions with some greaterr differences. Perhaps it might be worth plotting the actual *differences* between the TCA results for triplet combinations, or maybe complement the maps you show with boxplots of the differences?
References:
https://doi.org/10.5194/essd-9-511-2017
https://doi.org/10.1175/JTECH-D-19-0217.1
https://doi.org/10.1002/2015JD024027