HESS Opinions: A few camels or a whole caravan?

Clerc-Schwarzenbach, Franziska Maria; Selleri, Giovanni; Neri, Mattia; Toth, Elena; van Meerveld, Ilja; Seibert, Jan

doi:https://doi.org/10.5194/egusphere-2024-864

Preprints

https://doi.org/10.5194/egusphere-2024-864

Preprints

02 Apr 2024

| 02 Apr 2024

HESS Opinions: A few camels or a whole caravan?

Franziska Maria Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Abstract. Large-sample datasets containing hydrometeorological time series and catchment attributes for hundreds of catchments in a country, many of them known as “Camels” (catchment attributes and meteorology for large-sample studies), have revolutionized hydrological modelling and enabled comparative analyses. The Caravan dataset is a compilation of several (“Camels” and other) large-sample datasets with uniform attribute names and data structure. This simplifies large-sample hydrology across regions, continents, or the globe. However, the use of the Caravan dataset instead of the original Camels or other large-sample datasets may affect model results and the conclusions derived thereof. For the Caravan dataset, the meteorological forcing data are based on ERA5-Land reanalysis data. Here, we describe the differences between the original precipitation, temperature, and potential evapotranspiration (E_pot) data for 1252 catchments in the CAMELS-US, CAMELS-BR, and CAMELS-GB datasets and the forcing data for these catchments in the Caravan dataset. The E_pot in the Caravan dataset is unrealistically high for many catchments but there are, not surprisingly, also considerable differences in the precipitation data. We show that the use of the forcing data from the Caravan dataset impairs hydrological model calibration for the vast majority of catchments, i.e., there is a drop in the calibration performance when using the forcing data from the Caravan dataset compared to the original Camels datasets. This drop is mainly due to the differences in the precipitation data. Therefore, we suggest extending the Caravan dataset with the forcing data included in the original Camels datasets wherever possible, so that users can choose which forcing data they want to use, or at least indicating clearly that the forcing data in Caravan come with a data quality loss and using the original datasets is recommended. Moreover, we suggest not using the E_pot data (and derived catchment attributes, such as the aridity index) from the Caravan dataset and replacing these with (or based on) alternative E_pot estimates.

Received: 22 Mar 2024 – Discussion started: 02 Apr 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 2191 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2191 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

12 Sep 2024

| Highlight paper

Large-sample hydrology – a few camels or a whole caravan?

Franziska Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Hydrol. Earth Syst. Sci., 28, 4219–4237, https://doi.org/10.5194/hess-28-4219-2024,https://doi.org/10.5194/hess-28-4219-2024, 2024

Short summary Executive editor

Franziska Maria Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-864', Thorsten Wagener, 27 Apr 2024

Comparative hydrology with large samples of catchment scale data is a rapidly growing topic in hydrology. Samples are growing to sizes of many thousands of catchments around the world. This offers tremendous opportunities for new learning, but it also creates potential problems. One problem is that errors or inconsistencies in the data get propagated into subsequent studies because there is an assumption that available datasets are ready for use.
Clerc-Schwarzenbach and co-authors address this issue with the example of the popular Caravan dataset in which multiple datasets have been combined. To harmonize the data, some meteorological variables of the original national datasets have been replaced by global products. However, Clerc-Schwarzenbach and co-authors found that this can cause significant problems given some large differences between national and global estimates. This is a very relevant and timely study. It is nice work with a well written manuscript. My comments are mainly suggestions for further improvement.
Main Comments
Are the are evaluations of ERA5-Land reanalysis dataset outside the use for hydrological modelling that might have relevant insights into regional differences? The studies currently cited seem largely focused on hydrological application though I assume there must also be other uses of this dataset?
(Section 4.3) As the authors discuss in this section, hydrological models can generally cope well with poor PET values given that they scale this input variable anyway. What would be nice to add to the discussion is the potential problem of biased parameters. Depending on the model structure, one or more parameters will absorb the bias in the forcing data. This is problematic if the resulting values are used to characterize the system (e.g. Bouaziz et al., 2022, HESS, https://doi.org/10.5194/hess-26-1295-2022 and references therein). Are there parameters in HBV that would show this bias? I could not find a good example in the literature, but it would be interesting to see how stepwise increases in PET are reflected in stepwise bias in a parameter.
In addition to the specific comments regarding the Caravan dataset, are there more general lessons to be learned? E.g. regarding how to benchmark new datasets? This general problem might come up more often in the future in various datasets.
Minor Comments
(Section 4.2) HBV and HyMod have been calibrated to the MOPEX catchments (precursor of CAMELS-US) with NSE (no KGE then) to identify problematic catchments (Kollat et al., 2012, WRR, doi:10.1029/2011WR011534). This might be a possible comparison of difficult to model catchments.
(Section 4.3) The low performance of models like HBV in chalk catchments in the south of the UK is significantly reduced when a more suitable model structure for groundwater processes used. See the recent study by Kiraz et al. (2023, HSJ, https://doi.org/10.1080/02626667.2023.2251968) – results for KGE are in the supplemental material of the study.

Citation: https://doi.org/10.5194/egusphere-2024-864-RC1
- AC1: 'Reply on RC1', Franziska Clerc-Schwarzenbach, 06 Jun 2024
  
  Publisher’s note: a supplement was added to this comment on 6 June 2024.
  We thank Thorsten Wagener for his comments and feedback on our manuscript. We provide our reply in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-864-AC1
RC2:
'Comment on egusphere-2024-864', François Brissette, 24 May 2024

HESS Opinions: A Few Camels or a Whole Caravan? By Franziska Maria Clerc-Schwarzenbach et al.
General Comments
As a strong believer in the importance of large sample studies in hydrology, I read the paper with much interest. I found the results very interesting despite being unsurprised by the results. The strengths and drawbacks of ERA5 and its little brother ERA5-Land data are fairly well known, especially when it comes to temperature (very good) and precipitation (good with some issues such as regional biases). Potential evapotranspiration is more of an unknown, and I found its use in Caravan a bit perplexing since it is an unknown quantity. I believe that 'exotic' data from reanalysis should be thoroughly validated before their incorporation into any hydrological study. My understanding is that potential evapotranspiration is computed using the surface energy balance assuming a crop soil surface, as it was included for irrigation purposes. As such, it is not surprising that catchment scale estimates would be severely overestimated I many cases. See Muñoz-Sabater et al. (2021) for example.
Despite calling the results 'unsurprising', I believe this paper makes some very good observations that are useful to the community and is therefore worthy of publication. In particular:
1-Observations on three continents (and many catchments) confirm results from previous regional studies, notably that ERA5 temperatures are excellent substitutes for observations, at least in the context of hydrological models, and that precipitation is more problematic. Precipitation from ERA5 can certainly be used, but a modeling performance decrease is to be expected.
2-It clearly shows that potential evapotranspiration (PET) from ERA5-Land is problematic. This could have been an educated guess prior to this study, but now we have a clear and well-documented issue.
3-The documentation that precipitation deficiencies are more important than PET deficiencies, despite the much larger biases of the latter, is also very interesting.
4-PET deficiencies can be largely removed by recalculating it using PET formulas based on other variables (e.g., temperature, as done in this work). Other formulations using additional reanalysis variables would likely perform even better.
5-The seven modeling experiments provide very useful information on the strengths of the various datasets.
6-The paper provides a clear warning to hydrologists who are increasingly willing to use such datasets without a clear understanding of the outstanding issues related to reanalysis data.

Specific Comments
I am not fully sure why this is considered an Opinion paper. To me, the breadth of the research work and analysis clearly qualifies it as a research paper. I did not see much 'opinion' in this paper, as most of the arguments/discussions are results-based. Consequently, I suggest this submission be reclassified as a research paper.
I think the title does a disservice to the paper. I like the catchy phrase, but in reality, I would think that a majority of hydrologists are not familiar with Camel, and an even smaller number are aware of Caravan. A more generic title referring to large sample datasets and global datasets would be more appropriate for the varied readership of HESS.
I believe an additional discussion point should be added regarding the choice of a particular hydrological model. It is well known that some models may be more flexible than others at adapting to biases in input variables such as precipitation and easily scale PET with specific calibration parameters. This is mentioned in the paper, but I believe some other hydrological models may perform better than the one used in this study, and the performance drop mentioned in this study may not be as bad. I certainly would not expect the conclusions of this paper to be any different, but this should be mentioned.
There should be a mention of the upcoming ERA6 reanalysis. The ERA5 reanalysis used in Caravan will soon be a thing of the past. In addition to improved resolution, ERA6 will have a full overhaul of the model physics, including radiation, which is overestimated in ERA5 and likely part of the PET problem, in addition to the issue discussed above. Based on past history, we can expect a significant performance increase with ERA6. This should be mentioned in the paper. I believe that reanalysis is indeed the future of large-sample hydrology and that merging reanalysis with Deep Learning approaches will produce very high-quality global datasets much sooner than most people think. Already, the merging of deep-learning methods with weather forecasting models promises to revolutionize weather forecasting—exciting times.
I would also like to add one important advantage of global datasets based on reanalysis that was not mentioned in the paper: they are easily updated once a new version comes out. In addition, new data is produced in near-real time. Comparatively, datasets relying on observations (e.g., Camel) are much more complex to update (missing data, stations being decommissioned, etc.) and, based on past history, are unlikely to be updated at all, or very infrequently. A dataset such as Caravan will still need to be updated, but the process is much more straightforward.
The use of 'significant/ly' should be clarified if it is in the 'statistical' sense from the get-go at line 70. In some cases, it clearly is, but not so much in others.
I would suggest the use of PET instead of Epot, with the former being a lot more common, in my opinion.
Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., ... & Thépaut, J. N. (2021). ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth system science data, 13(9), 4349-4383.

Citation: https://doi.org/10.5194/egusphere-2024-864-RC2
- AC2: 'Reply on RC2', Franziska Clerc-Schwarzenbach, 06 Jun 2024
  
  We thank François Brissette for his comments and feedback on our manuscript. We provide our reply in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-864-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-864', Thorsten Wagener, 27 Apr 2024

Comparative hydrology with large samples of catchment scale data is a rapidly growing topic in hydrology. Samples are growing to sizes of many thousands of catchments around the world. This offers tremendous opportunities for new learning, but it also creates potential problems. One problem is that errors or inconsistencies in the data get propagated into subsequent studies because there is an assumption that available datasets are ready for use.
Clerc-Schwarzenbach and co-authors address this issue with the example of the popular Caravan dataset in which multiple datasets have been combined. To harmonize the data, some meteorological variables of the original national datasets have been replaced by global products. However, Clerc-Schwarzenbach and co-authors found that this can cause significant problems given some large differences between national and global estimates. This is a very relevant and timely study. It is nice work with a well written manuscript. My comments are mainly suggestions for further improvement.
Main Comments
Are the are evaluations of ERA5-Land reanalysis dataset outside the use for hydrological modelling that might have relevant insights into regional differences? The studies currently cited seem largely focused on hydrological application though I assume there must also be other uses of this dataset?
(Section 4.3) As the authors discuss in this section, hydrological models can generally cope well with poor PET values given that they scale this input variable anyway. What would be nice to add to the discussion is the potential problem of biased parameters. Depending on the model structure, one or more parameters will absorb the bias in the forcing data. This is problematic if the resulting values are used to characterize the system (e.g. Bouaziz et al., 2022, HESS, https://doi.org/10.5194/hess-26-1295-2022 and references therein). Are there parameters in HBV that would show this bias? I could not find a good example in the literature, but it would be interesting to see how stepwise increases in PET are reflected in stepwise bias in a parameter.
In addition to the specific comments regarding the Caravan dataset, are there more general lessons to be learned? E.g. regarding how to benchmark new datasets? This general problem might come up more often in the future in various datasets.
Minor Comments
(Section 4.2) HBV and HyMod have been calibrated to the MOPEX catchments (precursor of CAMELS-US) with NSE (no KGE then) to identify problematic catchments (Kollat et al., 2012, WRR, doi:10.1029/2011WR011534). This might be a possible comparison of difficult to model catchments.
(Section 4.3) The low performance of models like HBV in chalk catchments in the south of the UK is significantly reduced when a more suitable model structure for groundwater processes used. See the recent study by Kiraz et al. (2023, HSJ, https://doi.org/10.1080/02626667.2023.2251968) – results for KGE are in the supplemental material of the study.

Citation: https://doi.org/10.5194/egusphere-2024-864-RC1
- AC1: 'Reply on RC1', Franziska Clerc-Schwarzenbach, 06 Jun 2024
  
  Publisher’s note: a supplement was added to this comment on 6 June 2024.
  We thank Thorsten Wagener for his comments and feedback on our manuscript. We provide our reply in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-864-AC1
RC2:
'Comment on egusphere-2024-864', François Brissette, 24 May 2024

HESS Opinions: A Few Camels or a Whole Caravan? By Franziska Maria Clerc-Schwarzenbach et al.
General Comments
As a strong believer in the importance of large sample studies in hydrology, I read the paper with much interest. I found the results very interesting despite being unsurprised by the results. The strengths and drawbacks of ERA5 and its little brother ERA5-Land data are fairly well known, especially when it comes to temperature (very good) and precipitation (good with some issues such as regional biases). Potential evapotranspiration is more of an unknown, and I found its use in Caravan a bit perplexing since it is an unknown quantity. I believe that 'exotic' data from reanalysis should be thoroughly validated before their incorporation into any hydrological study. My understanding is that potential evapotranspiration is computed using the surface energy balance assuming a crop soil surface, as it was included for irrigation purposes. As such, it is not surprising that catchment scale estimates would be severely overestimated I many cases. See Muñoz-Sabater et al. (2021) for example.
Despite calling the results 'unsurprising', I believe this paper makes some very good observations that are useful to the community and is therefore worthy of publication. In particular:
1-Observations on three continents (and many catchments) confirm results from previous regional studies, notably that ERA5 temperatures are excellent substitutes for observations, at least in the context of hydrological models, and that precipitation is more problematic. Precipitation from ERA5 can certainly be used, but a modeling performance decrease is to be expected.
2-It clearly shows that potential evapotranspiration (PET) from ERA5-Land is problematic. This could have been an educated guess prior to this study, but now we have a clear and well-documented issue.
3-The documentation that precipitation deficiencies are more important than PET deficiencies, despite the much larger biases of the latter, is also very interesting.
4-PET deficiencies can be largely removed by recalculating it using PET formulas based on other variables (e.g., temperature, as done in this work). Other formulations using additional reanalysis variables would likely perform even better.
5-The seven modeling experiments provide very useful information on the strengths of the various datasets.
6-The paper provides a clear warning to hydrologists who are increasingly willing to use such datasets without a clear understanding of the outstanding issues related to reanalysis data.

Specific Comments
I am not fully sure why this is considered an Opinion paper. To me, the breadth of the research work and analysis clearly qualifies it as a research paper. I did not see much 'opinion' in this paper, as most of the arguments/discussions are results-based. Consequently, I suggest this submission be reclassified as a research paper.
I think the title does a disservice to the paper. I like the catchy phrase, but in reality, I would think that a majority of hydrologists are not familiar with Camel, and an even smaller number are aware of Caravan. A more generic title referring to large sample datasets and global datasets would be more appropriate for the varied readership of HESS.
I believe an additional discussion point should be added regarding the choice of a particular hydrological model. It is well known that some models may be more flexible than others at adapting to biases in input variables such as precipitation and easily scale PET with specific calibration parameters. This is mentioned in the paper, but I believe some other hydrological models may perform better than the one used in this study, and the performance drop mentioned in this study may not be as bad. I certainly would not expect the conclusions of this paper to be any different, but this should be mentioned.
There should be a mention of the upcoming ERA6 reanalysis. The ERA5 reanalysis used in Caravan will soon be a thing of the past. In addition to improved resolution, ERA6 will have a full overhaul of the model physics, including radiation, which is overestimated in ERA5 and likely part of the PET problem, in addition to the issue discussed above. Based on past history, we can expect a significant performance increase with ERA6. This should be mentioned in the paper. I believe that reanalysis is indeed the future of large-sample hydrology and that merging reanalysis with Deep Learning approaches will produce very high-quality global datasets much sooner than most people think. Already, the merging of deep-learning methods with weather forecasting models promises to revolutionize weather forecasting—exciting times.
I would also like to add one important advantage of global datasets based on reanalysis that was not mentioned in the paper: they are easily updated once a new version comes out. In addition, new data is produced in near-real time. Comparatively, datasets relying on observations (e.g., Camel) are much more complex to update (missing data, stations being decommissioned, etc.) and, based on past history, are unlikely to be updated at all, or very infrequently. A dataset such as Caravan will still need to be updated, but the process is much more straightforward.
The use of 'significant/ly' should be clarified if it is in the 'statistical' sense from the get-go at line 70. In some cases, it clearly is, but not so much in others.
I would suggest the use of PET instead of Epot, with the former being a lot more common, in my opinion.
Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., ... & Thépaut, J. N. (2021). ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth system science data, 13(9), 4349-4383.

Citation: https://doi.org/10.5194/egusphere-2024-864-RC2
- AC2: 'Reply on RC2', Franziska Clerc-Schwarzenbach, 06 Jun 2024
  
  We thank François Brissette for his comments and feedback on our manuscript. We provide our reply in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-864-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to minor revisions (further review by editor) (11 Jun 2024) by Thom Bogaard

AR by Franziska Clerc-Schwarzenbach on behalf of the Authors (01 Jul 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (12 Jul 2024) by Thom Bogaard

AR by Franziska Clerc-Schwarzenbach on behalf of the Authors (19 Jul 2024)

Journal article(s) based on this preprint

12 Sep 2024

| Highlight paper

Large-sample hydrology – a few camels or a whole caravan?

Franziska Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Hydrol. Earth Syst. Sci., 28, 4219–4237, https://doi.org/10.5194/hess-28-4219-2024,https://doi.org/10.5194/hess-28-4219-2024, 2024

Short summary Executive editor

Franziska Maria Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Model code and software

HESS Opinions: A few camels or a whole caravan? Franziska M. Clerc-Schwarzenbach https://doi.org/10.5281/zenodo.10784701

Franziska Maria Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Viewed

Total article views: 1,087 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
803	241	43	1,087	25	18

HTML: 803
PDF: 241
XML: 43
Total: 1,087
BibTeX: 25
EndNote: 18

Views and downloads (calculated since 02 Apr 2024)

Month	HTML	PDF	XML	Total
Apr 2024	418	111	9	538
May 2024	127	39	6	172
Jun 2024	119	37	13	169
Jul 2024	80	32	6	118
Aug 2024	46	20	6	72
Sep 2024	13	2	3	18

Cumulative views and downloads (calculated since 02 Apr 2024)

Month	HTML	PDF	XML	Total
Apr 2024	418	111	9	538
May 2024	127	39	6	172
Jun 2024	119	37	13	169
Jul 2024	80	32	6	118
Aug 2024	46	20	6	72
Sep 2024	13	2	3	18

Viewed (geographical distribution)

Total article views: 1,048 (including HTML, PDF, and XML) Thereof 1,048 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 12 Sep 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2191 KB)
Metadata XML

Large sample hydrology datasets such as Caravan provides the community with hydrometeorological information and catchment attributes for many catchments in the world and offers the opportunity for hydrological research. However, there are considerable differences between the forcing data of Caravan compared to the CAMELS datasets, especially with potential evaporation. This can lead to wrong conclusions on catchment hydrological drivers and affect regionalization. This papers shows the important of robustness of large sample datasets and the need to keep assessing that.

Short summary

We compare the catchment forcing data provided in large-sample datasets, namely the Caravan dataset and three of the original CAMELS datasets (US, BR, GB). We show that the differences affect hydrological model performance and that the data quality in the Caravan dataset is lower than the one in the CAMELS datasets, both for precipitation and potential evapotranspiration. We want to raise awareness of the lower data quality in Caravan and we suggest possible improvements for the Caravan dataset.


Total:	0
HTML:	0
PDF:	0
XML:	0

HESS Opinions: A few camels or a whole caravan?

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Journal article(s) based on this preprint

Model code and software

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.