HESS Opinions: A few camels or a whole caravan?

Clerc-Schwarzenbach, Franziska Maria; Selleri, Giovanni; Neri, Mattia; Toth, Elena; van Meerveld, Ilja; Seibert, Jan

doi:10.5194/egusphere-2024-864

Preprints

https://doi.org/10.5194/egusphere-2024-864

Preprints

02 Apr 2024

| 02 Apr 2024

HESS Opinions: A few camels or a whole caravan?

Franziska Maria Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Abstract. Large-sample datasets containing hydrometeorological time series and catchment attributes for hundreds of catchments in a country, many of them known as “Camels” (catchment attributes and meteorology for large-sample studies), have revolutionized hydrological modelling and enabled comparative analyses. The Caravan dataset is a compilation of several (“Camels” and other) large-sample datasets with uniform attribute names and data structure. This simplifies large-sample hydrology across regions, continents, or the globe. However, the use of the Caravan dataset instead of the original Camels or other large-sample datasets may affect model results and the conclusions derived thereof. For the Caravan dataset, the meteorological forcing data are based on ERA5-Land reanalysis data. Here, we describe the differences between the original precipitation, temperature, and potential evapotranspiration (E_pot) data for 1252 catchments in the CAMELS-US, CAMELS-BR, and CAMELS-GB datasets and the forcing data for these catchments in the Caravan dataset. The E_pot in the Caravan dataset is unrealistically high for many catchments but there are, not surprisingly, also considerable differences in the precipitation data. We show that the use of the forcing data from the Caravan dataset impairs hydrological model calibration for the vast majority of catchments, i.e., there is a drop in the calibration performance when using the forcing data from the Caravan dataset compared to the original Camels datasets. This drop is mainly due to the differences in the precipitation data. Therefore, we suggest extending the Caravan dataset with the forcing data included in the original Camels datasets wherever possible, so that users can choose which forcing data they want to use, or at least indicating clearly that the forcing data in Caravan come with a data quality loss and using the original datasets is recommended. Moreover, we suggest not using the E_pot data (and derived catchment attributes, such as the aridity index) from the Caravan dataset and replacing these with (or based on) alternative E_pot estimates.

Received: 22 Mar 2024 – Discussion started: 02 Apr 2024

Competing interests: At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2191 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2191 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

12 Sep 2024

| Highlight paper

Large-sample hydrology – a few camels or a whole caravan?

Franziska Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Hydrol. Earth Syst. Sci., 28, 4219–4237, https://doi.org/10.5194/hess-28-4219-2024,https://doi.org/10.5194/hess-28-4219-2024, 2024

Short summary Editorial statement

Franziska Maria Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-864', Thorsten Wagener, 27 Apr 2024

Comparative hydrology with large samples of catchment scale data is a rapidly growing topic in hydrology. Samples are growing to sizes of many thousands of catchments around the world. This offers tremendous opportunities for new learning, but it also creates potential problems. One problem is that errors or inconsistencies in the data get propagated into subsequent studies because there is an assumption that available datasets are ready for use.
Clerc-Schwarzenbach and co-authors address this issue with the example of the popular Caravan dataset in which multiple datasets have been combined. To harmonize the data, some meteorological variables of the original national datasets have been replaced by global products. However, Clerc-Schwarzenbach and co-authors found that this can cause significant problems given some large differences between national and global estimates. This is a very relevant and timely study. It is nice work with a well written manuscript. My comments are mainly suggestions for further improvement.
Main Comments
Are the are evaluations of ERA5-Land reanalysis dataset outside the use for hydrological modelling that might have relevant insights into regional differences? The studies currently cited seem largely focused on hydrological application though I assume there must also be other uses of this dataset?
(Section 4.3) As the authors discuss in this section, hydrological models can generally cope well with poor PET values given that they scale this input variable anyway. What would be nice to add to the discussion is the potential problem of biased parameters. Depending on the model structure, one or more parameters will absorb the bias in the forcing data. This is problematic if the resulting values are used to characterize the system (e.g. Bouaziz et al., 2022, HESS, https://doi.org/10.5194/hess-26-1295-2022 and references therein). Are there parameters in HBV that would show this bias? I could not find a good example in the literature, but it would be interesting to see how stepwise increases in PET are reflected in stepwise bias in a parameter.
In addition to the specific comments regarding the Caravan dataset, are there more general lessons to be learned? E.g. regarding how to benchmark new datasets? This general problem might come up more often in the future in various datasets.
Minor Comments
(Section 4.2) HBV and HyMod have been calibrated to the MOPEX catchments (precursor of CAMELS-US) with NSE (no KGE then) to identify problematic catchments (Kollat et al., 2012, WRR, doi:10.1029/2011WR011534). This might be a possible comparison of difficult to model catchments.
(Section 4.3) The low performance of models like HBV in chalk catchments in the south of the UK is significantly reduced when a more suitable model structure for groundwater processes used. See the recent study by Kiraz et al. (2023, HSJ, https://doi.org/10.1080/02626667.2023.2251968) – results for KGE are in the supplemental material of the study.

Citation: https://doi.org/10.5194/egusphere-2024-864-RC1
- AC1: 'Reply on RC1', Franziska Clerc-Schwarzenbach, 06 Jun 2024
  
  Publisher’s note: a supplement was added to this comment on 6 June 2024.
  We thank Thorsten Wagener for his comments and feedback on our manuscript. We provide our reply in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-864-AC1
RC2:
'Comment on egusphere-2024-864', François Brissette, 24 May 2024

HESS Opinions: A Few Camels or a Whole Caravan? By Franziska Maria Clerc-Schwarzenbach et al.
General Comments
As a strong believer in the importance of large sample studies in hydrology, I read the paper with much interest. I found the results very interesting despite being unsurprised by the results. The strengths and drawbacks of ERA5 and its little brother ERA5-Land data are fairly well known, especially when it comes to temperature (very good) and precipitation (good with some issues such as regional biases). Potential evapotranspiration is more of an unknown, and I found its use in Caravan a bit perplexing since it is an unknown quantity. I believe that 'exotic' data from reanalysis should be thoroughly validated before their incorporation into any hydrological study. My understanding is that potential evapotranspiration is computed using the surface energy balance assuming a crop soil surface, as it was included for irrigation purposes. As such, it is not surprising that catchment scale estimates would be severely overestimated I many cases. See Muñoz-Sabater et al. (2021) for example.
Despite calling the results 'unsurprising', I believe this paper makes some very good observations that are useful to the community and is therefore worthy of publication. In particular:
1-Observations on three continents (and many catchments) confirm results from previous regional studies, notably that ERA5 temperatures are excellent substitutes for observations, at least in the context of hydrological models, and that precipitation is more problematic. Precipitation from ERA5 can certainly be used, but a modeling performance decrease is to be expected.
2-It clearly shows that potential evapotranspiration (PET) from ERA5-Land is problematic. This could have been an educated guess prior to this study, but now we have a clear and well-documented issue.
3-The documentation that precipitation deficiencies are more important than PET deficiencies, despite the much larger biases of the latter, is also very interesting.
4-PET deficiencies can be largely removed by recalculating it using PET formulas based on other variables (e.g., temperature, as done in this work). Other formulations using additional reanalysis variables would likely perform even better.
5-The seven modeling experiments provide very useful information on the strengths of the various datasets.
6-The paper provides a clear warning to hydrologists who are increasingly willing to use such datasets without a clear understanding of the outstanding issues related to reanalysis data.

Specific Comments
I am not fully sure why this is considered an Opinion paper. To me, the breadth of the research work and analysis clearly qualifies it as a research paper. I did not see much 'opinion' in this paper, as most of the arguments/discussions are results-based. Consequently, I suggest this submission be reclassified as a research paper.
I think the title does a disservice to the paper. I like the catchy phrase, but in reality, I would think that a majority of hydrologists are not familiar with Camel, and an even smaller number are aware of Caravan. A more generic title referring to large sample datasets and global datasets would be more appropriate for the varied readership of HESS.
I believe an additional discussion point should be added regarding the choice of a particular hydrological model. It is well known that some models may be more flexible than others at adapting to biases in input variables such as precipitation and easily scale PET with specific calibration parameters. This is mentioned in the paper, but I believe some other hydrological models may perform better than the one used in this study, and the performance drop mentioned in this study may not be as bad. I certainly would not expect the conclusions of this paper to be any different, but this should be mentioned.
There should be a mention of the upcoming ERA6 reanalysis. The ERA5 reanalysis used in Caravan will soon be a thing of the past. In addition to improved resolution, ERA6 will have a full overhaul of the model physics, including radiation, which is overestimated in ERA5 and likely part of the PET problem, in addition to the issue discussed above. Based on past history, we can expect a significant performance increase with ERA6. This should be mentioned in the paper. I believe that reanalysis is indeed the future of large-sample hydrology and that merging reanalysis with Deep Learning approaches will produce very high-quality global datasets much sooner than most people think. Already, the merging of deep-learning methods with weather forecasting models promises to revolutionize weather forecasting—exciting times.
I would also like to add one important advantage of global datasets based on reanalysis that was not mentioned in the paper: they are easily updated once a new version comes out. In addition, new data is produced in near-real time. Comparatively, datasets relying on observations (e.g., Camel) are much more complex to update (missing data, stations being decommissioned, etc.) and, based on past history, are unlikely to be updated at all, or very infrequently. A dataset such as Caravan will still need to be updated, but the process is much more straightforward.
The use of 'significant/ly' should be clarified if it is in the 'statistical' sense from the get-go at line 70. In some cases, it clearly is, but not so much in others.
I would suggest the use of PET instead of Epot, with the former being a lot more common, in my opinion.
Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., ... & Thépaut, J. N. (2021). ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth system science data, 13(9), 4349-4383.

Citation: https://doi.org/10.5194/egusphere-2024-864-RC2
- AC2: 'Reply on RC2', Franziska Clerc-Schwarzenbach, 06 Jun 2024
  
  We thank François Brissette for his comments and feedback on our manuscript. We provide our reply in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-864-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-864', Thorsten Wagener, 27 Apr 2024

Comparative hydrology with large samples of catchment scale data is a rapidly growing topic in hydrology. Samples are growing to sizes of many thousands of catchments around the world. This offers tremendous opportunities for new learning, but it also creates potential problems. One problem is that errors or inconsistencies in the data get propagated into subsequent studies because there is an assumption that available datasets are ready for use.
Clerc-Schwarzenbach and co-authors address this issue with the example of the popular Caravan dataset in which multiple datasets have been combined. To harmonize the data, some meteorological variables of the original national datasets have been replaced by global products. However, Clerc-Schwarzenbach and co-authors found that this can cause significant problems given some large differences between national and global estimates. This is a very relevant and timely study. It is nice work with a well written manuscript. My comments are mainly suggestions for further improvement.
Main Comments
Are the are evaluations of ERA5-Land reanalysis dataset outside the use for hydrological modelling that might have relevant insights into regional differences? The studies currently cited seem largely focused on hydrological application though I assume there must also be other uses of this dataset?
(Section 4.3) As the authors discuss in this section, hydrological models can generally cope well with poor PET values given that they scale this input variable anyway. What would be nice to add to the discussion is the potential problem of biased parameters. Depending on the model structure, one or more parameters will absorb the bias in the forcing data. This is problematic if the resulting values are used to characterize the system (e.g. Bouaziz et al., 2022, HESS, https://doi.org/10.5194/hess-26-1295-2022 and references therein). Are there parameters in HBV that would show this bias? I could not find a good example in the literature, but it would be interesting to see how stepwise increases in PET are reflected in stepwise bias in a parameter.
In addition to the specific comments regarding the Caravan dataset, are there more general lessons to be learned? E.g. regarding how to benchmark new datasets? This general problem might come up more often in the future in various datasets.
Minor Comments
(Section 4.2) HBV and HyMod have been calibrated to the MOPEX catchments (precursor of CAMELS-US) with NSE (no KGE then) to identify problematic catchments (Kollat et al., 2012, WRR, doi:10.1029/2011WR011534). This might be a possible comparison of difficult to model catchments.
(Section 4.3) The low performance of models like HBV in chalk catchments in the south of the UK is significantly reduced when a more suitable model structure for groundwater processes used. See the recent study by Kiraz et al. (2023, HSJ, https://doi.org/10.1080/02626667.2023.2251968) – results for KGE are in the supplemental material of the study.

Citation: https://doi.org/10.5194/egusphere-2024-864-RC1
- AC1: 'Reply on RC1', Franziska Clerc-Schwarzenbach, 06 Jun 2024
  
  Publisher’s note: a supplement was added to this comment on 6 June 2024.
  We thank Thorsten Wagener for his comments and feedback on our manuscript. We provide our reply in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-864-AC1
RC2:
'Comment on egusphere-2024-864', François Brissette, 24 May 2024

HESS Opinions: A Few Camels or a Whole Caravan? By Franziska Maria Clerc-Schwarzenbach et al.
General Comments
As a strong believer in the importance of large sample studies in hydrology, I read the paper with much interest. I found the results very interesting despite being unsurprised by the results. The strengths and drawbacks of ERA5 and its little brother ERA5-Land data are fairly well known, especially when it comes to temperature (very good) and precipitation (good with some issues such as regional biases). Potential evapotranspiration is more of an unknown, and I found its use in Caravan a bit perplexing since it is an unknown quantity. I believe that 'exotic' data from reanalysis should be thoroughly validated before their incorporation into any hydrological study. My understanding is that potential evapotranspiration is computed using the surface energy balance assuming a crop soil surface, as it was included for irrigation purposes. As such, it is not surprising that catchment scale estimates would be severely overestimated I many cases. See Muñoz-Sabater et al. (2021) for example.
Despite calling the results 'unsurprising', I believe this paper makes some very good observations that are useful to the community and is therefore worthy of publication. In particular:
1-Observations on three continents (and many catchments) confirm results from previous regional studies, notably that ERA5 temperatures are excellent substitutes for observations, at least in the context of hydrological models, and that precipitation is more problematic. Precipitation from ERA5 can certainly be used, but a modeling performance decrease is to be expected.
2-It clearly shows that potential evapotranspiration (PET) from ERA5-Land is problematic. This could have been an educated guess prior to this study, but now we have a clear and well-documented issue.
3-The documentation that precipitation deficiencies are more important than PET deficiencies, despite the much larger biases of the latter, is also very interesting.
4-PET deficiencies can be largely removed by recalculating it using PET formulas based on other variables (e.g., temperature, as done in this work). Other formulations using additional reanalysis variables would likely perform even better.
5-The seven modeling experiments provide very useful information on the strengths of the various datasets.
6-The paper provides a clear warning to hydrologists who are increasingly willing to use such datasets without a clear understanding of the outstanding issues related to reanalysis data.

Specific Comments
I am not fully sure why this is considered an Opinion paper. To me, the breadth of the research work and analysis clearly qualifies it as a research paper. I did not see much 'opinion' in this paper, as most of the arguments/discussions are results-based. Consequently, I suggest this submission be reclassified as a research paper.
I think the title does a disservice to the paper. I like the catchy phrase, but in reality, I would think that a majority of hydrologists are not familiar with Camel, and an even smaller number are aware of Caravan. A more generic title referring to large sample datasets and global datasets would be more appropriate for the varied readership of HESS.
I believe an additional discussion point should be added regarding the choice of a particular hydrological model. It is well known that some models may be more flexible than others at adapting to biases in input variables such as precipitation and easily scale PET with specific calibration parameters. This is mentioned in the paper, but I believe some other hydrological models may perform better than the one used in this study, and the performance drop mentioned in this study may not be as bad. I certainly would not expect the conclusions of this paper to be any different, but this should be mentioned.
There should be a mention of the upcoming ERA6 reanalysis. The ERA5 reanalysis used in Caravan will soon be a thing of the past. In addition to improved resolution, ERA6 will have a full overhaul of the model physics, including radiation, which is overestimated in ERA5 and likely part of the PET problem, in addition to the issue discussed above. Based on past history, we can expect a significant performance increase with ERA6. This should be mentioned in the paper. I believe that reanalysis is indeed the future of large-sample hydrology and that merging reanalysis with Deep Learning approaches will produce very high-quality global datasets much sooner than most people think. Already, the merging of deep-learning methods with weather forecasting models promises to revolutionize weather forecasting—exciting times.
I would also like to add one important advantage of global datasets based on reanalysis that was not mentioned in the paper: they are easily updated once a new version comes out. In addition, new data is produced in near-real time. Comparatively, datasets relying on observations (e.g., Camel) are much more complex to update (missing data, stations being decommissioned, etc.) and, based on past history, are unlikely to be updated at all, or very infrequently. A dataset such as Caravan will still need to be updated, but the process is much more straightforward.
The use of 'significant/ly' should be clarified if it is in the 'statistical' sense from the get-go at line 70. In some cases, it clearly is, but not so much in others.
I would suggest the use of PET instead of Epot, with the former being a lot more common, in my opinion.
Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., ... & Thépaut, J. N. (2021). ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth system science data, 13(9), 4349-4383.

Citation: https://doi.org/10.5194/egusphere-2024-864-RC2
- AC2: 'Reply on RC2', Franziska Clerc-Schwarzenbach, 06 Jun 2024
  
  We thank François Brissette for his comments and feedback on our manuscript. We provide our reply in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-864-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to minor revisions (further review by editor) (11 Jun 2024) by Thom Bogaard

AR by Franziska Clerc-Schwarzenbach on behalf of the Authors (01 Jul 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (12 Jul 2024) by Thom Bogaard

AR by Franziska Clerc-Schwarzenbach on behalf of the Authors (19 Jul 2024)

Journal article(s) based on this preprint

12 Sep 2024

| Highlight paper

Large-sample hydrology – a few camels or a whole caravan?

Franziska Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Hydrol. Earth Syst. Sci., 28, 4219–4237, https://doi.org/10.5194/hess-28-4219-2024,https://doi.org/10.5194/hess-28-4219-2024, 2024

Short summary Editorial statement

Franziska Maria Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Model code and software

HESS Opinions: A few camels or a whole caravan? Franziska M. Clerc-Schwarzenbach https://doi.org/10.5281/zenodo.10784701

Franziska Maria Clerc-Schwarzenbach, Giovanni Selleri, Mattia Neri, Elena Toth, Ilja van Meerveld, and Jan Seibert

Viewed

Total article views: 3,637 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,595	901	141	3,637	160	181

HTML: 2,595
PDF: 901
XML: 141
Total: 3,637
BibTeX: 160
EndNote: 181

Views and downloads (calculated since 02 Apr 2024)

Month	HTML	PDF	XML	Total
Apr 2024	418	111	9	538
May 2024	127	39	6	172
Jun 2024	119	37	13	169
Jul 2024	160	66	12	238
Aug 2024	100	42	12	154
Sep 2024	124	18	10	152
Oct 2024	50	14	6	70
Nov 2024	46	16	0	62
Dec 2024	36	12	0	48
Jan 2025	60	16	2	78
Feb 2025	40	28	2	70
Mar 2025	48	26	0	74
Apr 2025	60	40	0	100
May 2025	34	10	4	48
Jun 2025	62	44	0	106
Jul 2025	50	28	4	82
Aug 2025	126	20	2	148
Sep 2025	390	50	24	464
Oct 2025	56	40	4	100
Nov 2025	62	72	4	138
Dec 2025	60	18	2	80
Jan 2026	80	28	6	114
Feb 2026	96	12	4	112
Mar 2026	120	72	6	198
Apr 2026	34	18	2	54
May 2026	21	17	3	41
Jun 2026	6	1	2	9
Jul 2026	10	6	2	18

Cumulative views and downloads (calculated since 02 Apr 2024)

Month	HTML	PDF	XML	Total
Apr 2024	418	111	9	538
May 2024	127	39	6	172
Jun 2024	119	37	13	169
Jul 2024	160	66	12	238
Aug 2024	100	42	12	154
Sep 2024	124	18	10	152
Oct 2024	50	14	6	70
Nov 2024	46	16	0	62
Dec 2024	36	12	0	48
Jan 2025	60	16	2	78
Feb 2025	40	28	2	70
Mar 2025	48	26	0	74
Apr 2025	60	40	0	100
May 2025	34	10	4	48
Jun 2025	62	44	0	106
Jul 2025	50	28	4	82
Aug 2025	126	20	2	148
Sep 2025	390	50	24	464
Oct 2025	56	40	4	100
Nov 2025	62	72	4	138
Dec 2025	60	18	2	80
Jan 2026	80	28	6	114
Feb 2026	96	12	4	112
Mar 2026	120	72	6	198
Apr 2026	34	18	2	54
May 2026	21	17	3	41
Jun 2026	6	1	2	9
Jul 2026	10	6	2	18

Viewed (geographical distribution)

Total article views: 3,590 (including HTML, PDF, and XML) Thereof 3,590 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 28 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2191 KB)
Metadata XML

Short summary

We compare the catchment forcing data provided in large-sample datasets, namely the Caravan dataset and three of the original CAMELS datasets (US, BR, GB). We show that the differences affect hydrological model performance and that the data quality in the Caravan dataset is lower than the one in the CAMELS datasets, both for precipitation and potential evapotranspiration. We want to raise awareness of the lower data quality in Caravan and we suggest possible improvements for the Caravan dataset.


Total:	0
HTML:	0
PDF:	0
XML:	0