Evaluating the quality of the E-OBS meteorological forcing data in EStreams for large-sample hydrology studies in Europe

Clerc-Schwarzenbach, Franziska; do Nascimento, Thiago V. M.

doi:https://doi.org/10.5194/egusphere-2025-3710

Preprints

https://doi.org/10.5194/egusphere-2025-3710

Preprints

08 Sep 2025

| 08 Sep 2025

Evaluating the quality of the E-OBS meteorological forcing data in EStreams for large-sample hydrology studies in Europe

Franziska Clerc-Schwarzenbach and Thiago V. M. do Nascimento

Abstract. To conduct large-sample hydrological studies over large spatial domains, standardized meteorological forcing data are often desired. For large-sample studies across Europe, the EStreams dataset and catalogue satisfies this demand. In EStreams, the meteorological time series are obtained from the Ensemble Observation (E-OBS) product which is available for all of Europe. Due to the large spatial extent of this dataset, limitations of data quality have to be expected when the dataset is compared to smaller-scale datasets, e.g., national level. In this study, we compare the meteorological time series included for 3423 catchments in EStreams to nine smaller datasets (mostly CAMELS datasets). We assess how the different meteorological data impact the performance of a bucket-type hydrological model. For most catchments, the precipitation amounts derived from E-OBS are lower than the ones from the CAMELS datasets, while the temperature and the potential evapotranspiration values are higher. Model performances tend to be (slightly) lower when the E-OBS data are used than when the CAMELS datasets are used for calibration. Exceptions arise when the CAMELS data were derived from global datasets rather than national products, as well as when the station density in the E-OBS data is high. This study provides the first assessment of the E-OBS data at a continental scale for hydrological applications and shows that, despite some limitations, the dataset offers a reasonable basis for large-sample hydrological modelling across Europe.

Received: 30 Jul 2025 – Discussion started: 08 Sep 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Franziska Clerc-Schwarzenbach and Thiago V. M. do Nascimento

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-3710', Alexander Dolich, 23 Sep 2025
Thank you to the authors for this study, which shows that E-OBS offers high-quality meteorological data that can be used for large-sample hydrology studies in Europe, while also highlighting the limitations of E-OBS clearly. The manuscript is easy to follow and of overall high quality. There are some minor comments that should be addressed, the overall quality of the manuscript is already good.
Minor specific comments:

L11: “limitations of data quality” -> maybe indicate that data quality is expected to vary in space? (e.g. “limitations and regional variations of data quality”)

L36-L39: The number of catchments is not directly the problem, the mixture of different regions / countries is the challenge, as meteorological data is often available on a national level (e.g. provided by national meteorological organizations)

L40, L45, L55, L72, L417…: I know what you want to say here, but I don’t like the word “standardization” in this context, as it usually refers to something else when it comes to data processing, and e.g. ERA5 or E-OBS are just datasets on a larger scale with different sources and processing methods, they do not “standardize” smaller datasets.

L91-L93: Why did you apply these criteria?

L111-L112: The resolution of 0.25° of E-OBS is very coarse, I know that e.g. the precipitation data for CAMELS-DE has a resolution of 1x1 km, this could be an additional source for limitations of Estreams data, also for comparisons in this study. Maybe you could think about an update of Estreams in the future? I think this could be worth it (not part of this study).

L116-L117: I think the main thing here is that the quality and uncertainty of E-OBS data have a larger (regional) spread, some regions will have very good quality data (where station measurements are available), other regions with less station measurements will have worse data quality. Even if the data comes from the same source (E-OBS), quality and uncertainty varies regionally. I think this is a major challenge in LSH and people need to be aware of this.

L149: Maybe add a small explanation on why you designed the scenarios this way, and which questions you aim to answer with the different scenarios (I and II are quite clear, but why did you do III-V?)

L152: This could also go into limitations, but the catchment shapes are also not identical between EStreams and the CAMELS datasets, which results in different areas for which the meteorological data was “cut out” and aggregated, which can also lead to differences.

L312: I think it is hard to see any patterns in this figure with the mixture of scenario I and II with circles and triangles. I am not sure on how to improve the figure, but you could calculate a regression line and also report the p-values? This could also be used to back up your statement in L310-311

L324-L325: results in Austria are bad as ERA5 data is used, not a “local” dataset, maybe add this here?

L344-L353: Here you have the paragraph about limitations of ERA5 data in Austria, but I think it does not really fit in the paragraph (“Evaluation of the E-OBS data in comparison to the E-OBS station density”). Maybe it could fit better in Section 4.1? I think the point about ERA5 data used in Austria is very important in this study, as this is a fundamental difference to the other CAMELS datasets, where local, highest-quality data is used, in Austria it is quite the opposite. You should make this point very clear, also in the beginning, as you do not test whether "local" CAMELS data is better than E-OBS data in the case of Austria.

It would also be interesting to see how the different CAMELS precipitation data was collected / processed (maybe not so easy to find out). I only know about CAMELS-DE, but HYRAS is also based on interpolated station data (I guess mostly the same stations as used for E-OBS), which would explain the relative similarities, but it is still interesting to see that there are differences (maybe due to different interpolation / processing methods or the coarser resolution of E-OBS)

L398-L401: For smaller catchments, having E-OBS data from the 0.1° version could also help (again, maybe this is worth an update for EStreams, which of course is not part of this study, just a general suggestion)

L402…: You could add to the conclusion that local datasets are usually the best, but using E-OBS data and EStreams offer a great harmonized data source for LSH studies covering all of Europe, especially as an alternative to ERA5 which has shown limitations in Austria. Maybe extend a little bit on this and how E-OBS could be an alternative to ERA5 which was mostly the standard before.

So in general, we now have a great (still growing) coverage of streamflow data in Europe through different national CAMELS datasets now. With the addition of, especially compared to ERA5, high quality E-OBS forcing data, readily available in EStreams, there is a great data basis for LSH studies covering the entirety of Europe.

Technical Corrections
L27: “too” -> “as well”

L42: “than” -> “compared to”

L203: “(Fig. 2Figure 2)”

277: “was the performance gap between scenarios II and I…” -> “the performance gap between scenarios II and I was…”
Citation: https://doi.org/10.5194/egusphere-2025-3710-RC1
CC1:
'Comment from EGU peer review training', Adriaan J. (Ryan) Teuling, 29 Sep 2025

Dear authors, editor,
As editor of HESS, I have participated in the recent EGU peer review training for which I selected the current manuscript as one of the possible manuscripts to review for the participants. One participant asked me to share the contents of the review in the online discussion in the hope it might help to improve the manuscript, which I hereby do. I have left out the introduction to the review, and only copied the (specific) comments.
best regards
Ryan Teuling

Main points:
Regarding the methods, the authors have included all catchments in their studies, regardless of being impacted by human activities or not (L94-97). This can be questionable, as only the climate forcings are used as the hydrological model inputs. This modeling approach may only be applicable to the natural sites without human intervention. Among the 3423 catchments, these include much noise in the modeling results. Importantly, this approach makes it hard to differentiate if one type of meteorological data is better than the other one because of natural condition or human intervention or the quality (or station density) of the meteorological data itself. What about other possible governing factors, such as climate types, topography, land use and land cover, and geology? These are all not addressed or analyzed by coming to the conclusion due to spatial resolution and station density. Simply saying one is better than the other without analyzing the possible governing factors could limit the applicability and generalization of the research outputs. Therefore, more analysis on process-based understanding and transferable knowledge is needed to make robust conclusions supported by the evidence.
The authors adopted the potential evapotranspiration data derived from different approaches: it is calculated with the simplest approach (only temperature based) in E-OBS, but with different varieties of methods in the CAMELS. If the authors want to do a comparison, it should be “apple” to “apple”. It is recommended that the potential evapotranspiration should be calculated with the same methods for both types of datasets.
Regarding the results, it would be more useful to state the governing factors (climatology, topography, land use, etc.) why E-OBS has over- or underestimations compared with CAMELS, besides simply stating which countries or regions have higher or lower meteorological values. More exactly, why one dataset is better than the other one in some countries yes while some countries not?
Another key aspect is that the authors calibrate the models individually with different climate datasets. Therefore, not only the climate data are different, but the model parameters are different. Therefore, the model performance lower or higher is not only due to climate data quality but also the model parameters.

Specific comments:
L10: Maybe mentioned the spatial resolution of the meteorological data from the E-OBS?
L16: Model performance is SLIGHTLY lower when E-OBS data are used compared with CAMELS data: is this difference statistically significant?
L48-53: the authors actually come to the same conclusion as the referred literature, and mentioned the same thing in the abstract. So what is the added value of evaluating E-OBS vs. CAMELS? Just because of a larger scale of detailed dataset?
L84: Why exclude the catchments with area more than 2000 km2? What is the impact or relation between the catchment area and the meteorological data?
L123-132: Why are the annual differences of precipitation and evapotranspiration between the datasets compared but not the seasonal differences? While for temperature, you compared the daily differences?
Figure 4: What are the reasons for the different model performance among the countries? What are the governing factors? Simply stating the KGE is higher here or lower there without providing further reasons sounds not helpful.
Figure 6: The important thing is not the exact number of catchments in a country where E-OBS dataset is better or worse than the CAMELS datasets, but why E-OBS is better/worse than the CAMELS in these catchments?
L275-278: Why is the model performance lower in Great Britain which shows opposite behavior? Please explain.
Figure 7: Simply stating the station density plays the key role seems not convincing, as the author stated that other factors may also play a role. It would be more interesting to analyze other factors as well? Are the relationships between the station numbers and the KGE statistically significant?
Figure 8: What about a trend assessment on the data? Is there a significant relationship between model performance and aridity index?
L366-369: it is too assertive and not supported by evidence. It is a very simple method to calculate the potential evapotranspiration which does not consider solar radiation impact. It is also too assertive to say different calculation approaches of potential evapotranspiration will not change the results.

Citation: https://doi.org/10.5194/egusphere-2025-3710-CC1
- EC1: 'Reply on CC1', Albrecht Weerts, 29 Sep 2025
  
  Dear Ryan,
  
  Many thanks for passing on the review.
  
  Regards,
  Albrecht Weerts
  
  Citation: https://doi.org/10.5194/egusphere-2025-3710-EC1
RC2: 'Comment on egusphere-2025-3710', Anonymous Referee #2, 08 Oct 2025

See attached file

Citation: https://doi.org/10.5194/egusphere-2025-3710-RC2
RC3:
'Comment on egusphere-2025-3710', Anonymous Referee #3, 20 Oct 2025
This paper compares the performance of two forcing datasets used for calibration with a classical hydrological model. Overall, I liked the paper, the idea to compare the quality of forcings using a model, without needing to assume that the model itself is an unflawed representation of the hydrological reality.

Major remarks
First, your paper would deserve a better title. As I understood, you ask a much more general and (from my point of view) interesting question: how can we use a classical precipitation-runoff model such as HBV in order to compare the quality of precipitation data. I believe you should put this point at the forefront of your paper. You should discuss the “good sense” (almost philosophical) hypothesis of your approach: even if your hydrological model is imperfect, the difference of efficiency when calibrating the model with different forcings cannot be due to some random factor. Better performances cannot be due to chance. You could perhaps look at this chapter of the famous Ray Linsley (1982) who discussed the topic, I only remember this short citation “if the data are too poor for the use of a good simulation model they are also inadequate for any other model”, but there must be some other interesting citations there.

Second, I believe that it is worth comparing the CAMELS outputs with the E-obs outputs, introducing a further class of inputs (ERA-5) makes things more complex. I would simply have discarded the LAMAH dataset, stating that you aim at comparing the “best ground-based estimate” of the CAMELS datasets with the E-obs... it is definitely not a big surprise that ERA-5 estimates are not good... and it makes your paper unnecessarily more complex. You do not have to show us everything you have done, if you have pushed open at a few open doors in the course of your research (what we all do...) you do not need to tell us about it.

Third, I was wondering whether it would have been interesting to restrict the dataset to the less-regulated (reservoir-impacted) catchments. I know for example that there are quite a few regulated catchments in the Swiss CAMELS dataset. It will not change the results, but a focus on the less regulated catchments could perhaps show even clearer differences.

Minor remarks
I believe that before mentioning (l.36) that “the inclusion of an increasing number of catchments in one dataset almost always goes hand in hand with difficulties in providing high-quality forcing data” you should underline that large samples also come with their load of problematic discharge stations. In my experience of building a CAMELS dataset, a large part of the effort was absorbed by scrutinizing collectively the time series, the locations, etc. And because E-streams did not make any sorting, there must be along with the hydrometric stations a few (or more) non sense stations (probably a few buoys in France...) or at least stations which measure a level that cannot be related to any significant hydrological flux.

Did you check that the discharge data were exactly the same in E-stream and CAMELS?

In 3.3.1 (Number of E-OBS precipitation stations): I believe you should mention that the number of E-Obs stations is correlated with the size of the catchments... and as you (and all the conceptual modelers) know, the largest catchments get the best KGE criteria.

References
Linsley, R.K., 1982. Rainfall-runoff models-an overview. In: V.P. Singh (Editor), Proceedings of the international symposium on rainfall-runoff modelling. Water Resources Publications, Littleton, CO, pp. 3-22.
Citation: https://doi.org/10.5194/egusphere-2025-3710-RC3

Franziska Clerc-Schwarzenbach and Thiago V. M. do Nascimento

Viewed

Total article views: 2,241 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,158	66	17	2,241	13	16

HTML: 2,158
PDF: 66
XML: 17
Total: 2,241
BibTeX: 13
EndNote: 16

Views and downloads (calculated since 08 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	2,014	31	8	2,053
Oct 2025	144	35	9	188

Cumulative views and downloads (calculated since 08 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	2,014	31	8	2,053
Oct 2025	144	35	9	188

Viewed (geographical distribution)

Total article views: 2,108 (including HTML, PDF, and XML) Thereof 2,108 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 23 Oct 2025

Short summary

This study provides the first assessment of an European meteorological dataset (E=OBS) for hydrological applications. We compared the dataset to meteorological datasets developed at a country level, and tested how the different data influenced the simulation of streamflow with hydrological model. Our findings show that, despite some limitations, the European dataset offers a reasonable basis for hydrological modelling in many river catchments across Europe.


Total:	0
HTML:	0
PDF:	0
XML:	0