evalhyd v0.1.1: a polyglot tool for the evaluation of deterministic and probabilistic streamflow predictions

Hallouin, Thibault; Bourgin, François; Perrin, Charles; Ramos, Maria-Helena; Andréassian, Vazken

doi:https://doi.org/10.5194/egusphere-2023-1424

Preprints

https://doi.org/10.5194/egusphere-2023-1424

Preprints

29 Jun 2023

| 29 Jun 2023

evalhyd v0.1.1: a polyglot tool for the evaluation of deterministic and probabilistic streamflow predictions

Thibault Hallouin, François Bourgin, Charles Perrin, Maria-Helena Ramos, and Vazken Andréassian

Abstract. The evaluation of streamflow predictions forms an essential part of most hydrological modelling studies published in the literature. The evaluation process typically involves the computation of some evaluation metrics, but it can also involve the pre-processing of the predictions and the post-processing of the computed metrics. In order for published hydrological studies to be reproducible, these steps need to be carefully documented by the authors. The availability of a single tool performing all of these tasks would simplify the documentation by the authors, but also the reproducibility by the readers. However, this requires for such a tool to be polyglot (i.e. usable in a variety of programming languages) and openly accessible, so that it can be used by everyone in the hydrological community. To this end, we developed a new tool named evalhyd that offers metrics and functionalities for the evaluation of deterministic and probabilistic streamflow predictions. It is open source and it can be used in Python, in R, in C++, or as a command line tool. This article describes the tool and illustrates its functionalities using Global Flood Awareness System (GloFAS) reforecasts over France as an example data set.

Received: 28 Jun 2023 – Discussion started: 29 Jun 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 2677 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2677 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

10 Jun 2024

EvalHyd v0.1.2: a polyglot tool for the evaluation of deterministic and probabilistic streamflow predictions

Thibault Hallouin, François Bourgin, Charles Perrin, Maria-Helena Ramos, and Vazken Andréassian

Geosci. Model Dev., 17, 4561–4578, https://doi.org/10.5194/gmd-17-4561-2024,https://doi.org/10.5194/gmd-17-4561-2024, 2024

Short summary

Thibault Hallouin, François Bourgin, Charles Perrin, Maria-Helena Ramos, and Vazken Andréassian

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-1424', Barbara Casati, 17 Sep 2023

Dear T. Hallouin and co-authors,
I really enjoyed reading this article, and evalhyd seems a very nice verification too: felicitations! I have uploaded a file with some minor suggestions, while here with this following comment I wish to bring the attention of the scientific community to one aspect of verification, for the general online discussion.
I was particularly triggered about the option in evalhyd of performing conditional verification. I will share here some of our own experience (at the Canadian Met Service) with conditional verification, which maybe can inspire further developments in the tool and, more in general, awareness in the interpretation of the results.
Conditioning on the verification sample can have strong impacts on the verification results (e.g. it can flip the sign of a bias), and hence allows in-depth analysis and understanding of the prediction performance, since the conditioning is usually related to physically-driven phenomena. In a sense, conditional verification is the first step towards process-based diagnostics.
In verification exercises which include several variables (e.g. pressure, temperature, clouds, etc) applying a condition to a variable while verifying a different variable is the common practice (as an example, verification of surface temperature in cloudy versus clear-sky conditions inform of the model performance in reproducing the radiation budget). The condition, however, should be applied to both observed and forecast values (e.g. forecast AND observation being cloudy): I will refer to this double condition as bilateral. On the other hand, when a unilateral condition is applied, to only the observed or forecast variable (e.g. cloudy conditions only for the forecast) this can synthetically introduce a bias in the verification results: in the cloud/temperature example above, stratifying for cloudy conditions only in the forecast leads to a synthetic warm bias for the surface temperature, because in the sample there are bound to be both cloudy and clear sky observations, and when the observations have clear sky the surface temperature is expected to be colder. In other words, the bilateral condition will sample all the “hits” for cloudy sky, whereas the unilateral condition will sample the “hits” and “false alarms” for cloudy sky. From our experience, we advise bilateral over unilateral conditioning. (Of course one can also do unilateral conditioning, but need to be aware of the introduced biases in the interpretation of the verification results).
Applying the unilateral condition to the same variable which is verified might also lead to synthetic biases. As an example, if you stratify your sample for the strong predicted stream flows, you are bound to include in the sample several strong observed stream flows (the “hits”), but also some average or weak observed stream-flows (because the prediction might have some “false alarms”). Then you tend to “artificially” diagnose over-prediction for the strong stream flow (and vice-versa for the low stream flow, conditioning only on the prediction you are bound to find under-estimation, because in your sample you’ll have some observed events which are medium or strong).
I was amazed (also a bit puzzled) to see that in your Figure 6 you have opposite results than I expected (underprediction for high predicted stream flow, more overprediction for low predicted stream flows; the under-dispersion for the average predicted stream flow is instead expected). For me it would be interesting to understand why, is it due to the characteristics of streamflow prediction (where the timing is always predicted well, and hence false alarms and misses are very rare)? what is the behaviour in the other stations? What would you obtain with the bilateral condition?
I would be grateful if you could add in the article some discussion about unilateral versus bilateral conditions.
Thank you + Bests Barbara Casati

Citation: https://doi.org/10.5194/egusphere-2023-1424-RC1
- AC1: 'Reply on RC1', Thibault Hallouin, 02 Oct 2023
  
  Dear Barbara Casati,
  
  This a simply a brief comment to thank you for your review and your constructive comments. We will provide a detailed reply in due course.
  
  We thank you for sharing your insight and experience with conditional verification and detailing the distinction between unilateral and bilateral conditioning. We will make sure to cover this distinction in the revised version of the article, to explicitly mention which type of conditioning is possible with evalhyd, and to make the reader aware of the possible bias introduced with unilateral conditioning.
  
  We also encourage the scientific community to join the discussion on this specific aspect of verification while the discussion is still open (until 01 Nov 2023).
  
  Kind regards,
  
  Thibault Hallouin on behalf of the authors.
  
  Citation: https://doi.org/10.5194/egusphere-2023-1424-AC1
CC1:
'Comment on egusphere-2023-1424', Elizabeth Cooper, 02 Oct 2023

Just a very short comment to say that I used the python version of evalhyd v0.1.1 to quickly and easily calculate some streamflow metrics for a paper now under review for GMD. The tool was simple to install and made it very easy to compare different model runs against observations for more metrics than I might otherwise have used. Thanks!

Citation: https://doi.org/10.5194/egusphere-2023-1424-CC1
- AC2: 'Reply on CC1', Thibault Hallouin, 05 Oct 2023
  
  Dear Elizabeth Cooper,
  Thank you for sharing your experience here. We are glad to hear that you found evalhyd useful and easy to use.
  Cheers,
  Thibault Hallouin
  
  Citation: https://doi.org/10.5194/egusphere-2023-1424-AC2
RC2:
'Comment on egusphere-2023-1424', Anonymous Referee #2, 14 Oct 2023

The paper introduces evalhyd, an interesting software tool designed for the evaluation of streamflow predictions. The tool's commitment to standardization and open-source accessibility is commendable, providing a valuable contribution to enhancing reproducibility in hydrological studies. Notably, the well-thought-out design principles, which incorporate a compiled C++ core and thin bindings for multiple languages, contribute to the tool's efficiency and usability.
However, despite the paper positioning evalhyd as a contribution to hydroinformatics, the manuscript's focus on the technical aspects of model development limits its scientific impact. The paper could benefit from a more explicit emphasis on the broader scientific implications and advancements in hydrologic science that the tool facilitates. Some specific comments are as follows:
The introduction would benefit from more clearly articulating what new capabilities evalhyd provides compared to existing hydrologic evaluation packages. As it stands, the motivation around standardization across languages is a bit weak.
In the key functionalities section, the masking and bootstrapping methods need more detailed explanation. Pseudocode or formulas would help make these clearer. For the evaluation metrics, links or references to the original sources for each metric should be provided. More justification for the specific metrics included would also help show the comprehensiveness.
The case study, more novel demonstrations of the tool would strengthen this section.
The conclusions would be improved by specifically emphasizing the limitations around extensibility, visualizations, and support for continuous distributions. Comparisons to other existing packages may help contextualize the pros/cons.
Overall more critical analysis is needed on how evalhyd improves on the current state of the art in hydrologic evaluation tools. The paper currently lacks motivation and innovation.

Citation: https://doi.org/10.5194/egusphere-2023-1424-RC2
- AC3: 'Reply on RC2', Thibault Hallouin, 27 Oct 2023
  
  Dear Referee,
  Thank you for your feedback and advice on our manuscript.
  As we chose to submit our manuscript to the journal Geoscientific Model Development, we consequently chose to focus on the more technical aspects relevant to software development and, in particular, on the software design decisions motived by the specific needs identified in hydrological science. We believe that advancements in good practices and science can be fuelled by the availability of efficient and easy-to-use tools. However, we do take note of your requirement for a stronger emphasis on the scientific implications of our work and, we will endeavour to spell out more explicitly in the revised version what these are and how they fulfil particular needs and address current shortcomings in hydrology. In particular, we believe that the reproducibility of hydrological studies is currently not always achievable and that this partially stems from the lack of information about the many steps involved in the evaluation of hydrological modelling studies. The piece of software evalhyd contributes to lessening the need for detailed explanations by standardising some of these steps and gathering the same methodologies and the same capabilities in a single tool that is made accessible to a variety of users. In addition, some methodological capabilities offered by the software such as masking, bootstrapping, and multivariate scoring can be considered as advanced methods in the science and practice of hydrological forecasting evaluation. The comment made by Barbara Casati (RC1) about the impacts of conditioning on the verification sample is a good example of the scientific implication that the masking functionality can have.
  We will also address your specific comments by providing further details and illustrations around the key functionalities that are masking and boostrapping, and by strengthening the presentation of the case study. We will also add the missing references for the evaluation metrics where possible and motivate their presence in the set of metrics selected. Finally, we will present more explicitly the strengths and limitations of the tools, especially in comparison to what existing state-of-the-art tools can already offer.
  Kind regards,
  The authors.
  
  Citation: https://doi.org/10.5194/egusphere-2023-1424-AC3

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-1424', Barbara Casati, 17 Sep 2023

Dear T. Hallouin and co-authors,
I really enjoyed reading this article, and evalhyd seems a very nice verification too: felicitations! I have uploaded a file with some minor suggestions, while here with this following comment I wish to bring the attention of the scientific community to one aspect of verification, for the general online discussion.
I was particularly triggered about the option in evalhyd of performing conditional verification. I will share here some of our own experience (at the Canadian Met Service) with conditional verification, which maybe can inspire further developments in the tool and, more in general, awareness in the interpretation of the results.
Conditioning on the verification sample can have strong impacts on the verification results (e.g. it can flip the sign of a bias), and hence allows in-depth analysis and understanding of the prediction performance, since the conditioning is usually related to physically-driven phenomena. In a sense, conditional verification is the first step towards process-based diagnostics.
In verification exercises which include several variables (e.g. pressure, temperature, clouds, etc) applying a condition to a variable while verifying a different variable is the common practice (as an example, verification of surface temperature in cloudy versus clear-sky conditions inform of the model performance in reproducing the radiation budget). The condition, however, should be applied to both observed and forecast values (e.g. forecast AND observation being cloudy): I will refer to this double condition as bilateral. On the other hand, when a unilateral condition is applied, to only the observed or forecast variable (e.g. cloudy conditions only for the forecast) this can synthetically introduce a bias in the verification results: in the cloud/temperature example above, stratifying for cloudy conditions only in the forecast leads to a synthetic warm bias for the surface temperature, because in the sample there are bound to be both cloudy and clear sky observations, and when the observations have clear sky the surface temperature is expected to be colder. In other words, the bilateral condition will sample all the “hits” for cloudy sky, whereas the unilateral condition will sample the “hits” and “false alarms” for cloudy sky. From our experience, we advise bilateral over unilateral conditioning. (Of course one can also do unilateral conditioning, but need to be aware of the introduced biases in the interpretation of the verification results).
Applying the unilateral condition to the same variable which is verified might also lead to synthetic biases. As an example, if you stratify your sample for the strong predicted stream flows, you are bound to include in the sample several strong observed stream flows (the “hits”), but also some average or weak observed stream-flows (because the prediction might have some “false alarms”). Then you tend to “artificially” diagnose over-prediction for the strong stream flow (and vice-versa for the low stream flow, conditioning only on the prediction you are bound to find under-estimation, because in your sample you’ll have some observed events which are medium or strong).
I was amazed (also a bit puzzled) to see that in your Figure 6 you have opposite results than I expected (underprediction for high predicted stream flow, more overprediction for low predicted stream flows; the under-dispersion for the average predicted stream flow is instead expected). For me it would be interesting to understand why, is it due to the characteristics of streamflow prediction (where the timing is always predicted well, and hence false alarms and misses are very rare)? what is the behaviour in the other stations? What would you obtain with the bilateral condition?
I would be grateful if you could add in the article some discussion about unilateral versus bilateral conditions.
Thank you + Bests Barbara Casati

Citation: https://doi.org/10.5194/egusphere-2023-1424-RC1
- AC1: 'Reply on RC1', Thibault Hallouin, 02 Oct 2023
  
  Dear Barbara Casati,
  
  This a simply a brief comment to thank you for your review and your constructive comments. We will provide a detailed reply in due course.
  
  We thank you for sharing your insight and experience with conditional verification and detailing the distinction between unilateral and bilateral conditioning. We will make sure to cover this distinction in the revised version of the article, to explicitly mention which type of conditioning is possible with evalhyd, and to make the reader aware of the possible bias introduced with unilateral conditioning.
  
  We also encourage the scientific community to join the discussion on this specific aspect of verification while the discussion is still open (until 01 Nov 2023).
  
  Kind regards,
  
  Thibault Hallouin on behalf of the authors.
  
  Citation: https://doi.org/10.5194/egusphere-2023-1424-AC1
CC1:
'Comment on egusphere-2023-1424', Elizabeth Cooper, 02 Oct 2023

Just a very short comment to say that I used the python version of evalhyd v0.1.1 to quickly and easily calculate some streamflow metrics for a paper now under review for GMD. The tool was simple to install and made it very easy to compare different model runs against observations for more metrics than I might otherwise have used. Thanks!

Citation: https://doi.org/10.5194/egusphere-2023-1424-CC1
- AC2: 'Reply on CC1', Thibault Hallouin, 05 Oct 2023
  
  Dear Elizabeth Cooper,
  Thank you for sharing your experience here. We are glad to hear that you found evalhyd useful and easy to use.
  Cheers,
  Thibault Hallouin
  
  Citation: https://doi.org/10.5194/egusphere-2023-1424-AC2
RC2:
'Comment on egusphere-2023-1424', Anonymous Referee #2, 14 Oct 2023

The paper introduces evalhyd, an interesting software tool designed for the evaluation of streamflow predictions. The tool's commitment to standardization and open-source accessibility is commendable, providing a valuable contribution to enhancing reproducibility in hydrological studies. Notably, the well-thought-out design principles, which incorporate a compiled C++ core and thin bindings for multiple languages, contribute to the tool's efficiency and usability.
However, despite the paper positioning evalhyd as a contribution to hydroinformatics, the manuscript's focus on the technical aspects of model development limits its scientific impact. The paper could benefit from a more explicit emphasis on the broader scientific implications and advancements in hydrologic science that the tool facilitates. Some specific comments are as follows:
The introduction would benefit from more clearly articulating what new capabilities evalhyd provides compared to existing hydrologic evaluation packages. As it stands, the motivation around standardization across languages is a bit weak.
In the key functionalities section, the masking and bootstrapping methods need more detailed explanation. Pseudocode or formulas would help make these clearer. For the evaluation metrics, links or references to the original sources for each metric should be provided. More justification for the specific metrics included would also help show the comprehensiveness.
The case study, more novel demonstrations of the tool would strengthen this section.
The conclusions would be improved by specifically emphasizing the limitations around extensibility, visualizations, and support for continuous distributions. Comparisons to other existing packages may help contextualize the pros/cons.
Overall more critical analysis is needed on how evalhyd improves on the current state of the art in hydrologic evaluation tools. The paper currently lacks motivation and innovation.

Citation: https://doi.org/10.5194/egusphere-2023-1424-RC2
- AC3: 'Reply on RC2', Thibault Hallouin, 27 Oct 2023
  
  Dear Referee,
  Thank you for your feedback and advice on our manuscript.
  As we chose to submit our manuscript to the journal Geoscientific Model Development, we consequently chose to focus on the more technical aspects relevant to software development and, in particular, on the software design decisions motived by the specific needs identified in hydrological science. We believe that advancements in good practices and science can be fuelled by the availability of efficient and easy-to-use tools. However, we do take note of your requirement for a stronger emphasis on the scientific implications of our work and, we will endeavour to spell out more explicitly in the revised version what these are and how they fulfil particular needs and address current shortcomings in hydrology. In particular, we believe that the reproducibility of hydrological studies is currently not always achievable and that this partially stems from the lack of information about the many steps involved in the evaluation of hydrological modelling studies. The piece of software evalhyd contributes to lessening the need for detailed explanations by standardising some of these steps and gathering the same methodologies and the same capabilities in a single tool that is made accessible to a variety of users. In addition, some methodological capabilities offered by the software such as masking, bootstrapping, and multivariate scoring can be considered as advanced methods in the science and practice of hydrological forecasting evaluation. The comment made by Barbara Casati (RC1) about the impacts of conditioning on the verification sample is a good example of the scientific implication that the masking functionality can have.
  We will also address your specific comments by providing further details and illustrations around the key functionalities that are masking and boostrapping, and by strengthening the presentation of the case study. We will also add the missing references for the evaluation metrics where possible and motivate their presence in the set of metrics selected. Finally, we will present more explicitly the strengths and limitations of the tools, especially in comparison to what existing state-of-the-art tools can already offer.
  Kind regards,
  The authors.
  
  Citation: https://doi.org/10.5194/egusphere-2023-1424-AC3

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Thibault Hallouin on behalf of the Authors (31 Jan 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (08 Feb 2024) by Lele Shu

RR by Anonymous Referee #3 (19 Feb 2024)

RR by Anonymous Referee #4 (19 Mar 2024)

ED: Publish subject to minor revisions (review by editor) (28 Mar 2024) by Lele Shu

AR by Thibault Hallouin on behalf of the Authors (07 Apr 2024) Author's response Author's tracked changes Manuscript

ED: Publish subject to minor revisions (review by editor) (18 Apr 2024) by Lele Shu

AR by Thibault Hallouin on behalf of the Authors (25 Apr 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (27 Apr 2024) by Lele Shu

AR by Thibault Hallouin on behalf of the Authors (27 Apr 2024)

Journal article(s) based on this preprint

10 Jun 2024

EvalHyd v0.1.2: a polyglot tool for the evaluation of deterministic and probabilistic streamflow predictions

Thibault Hallouin, François Bourgin, Charles Perrin, Maria-Helena Ramos, and Vazken Andréassian

Geosci. Model Dev., 17, 4561–4578, https://doi.org/10.5194/gmd-17-4561-2024,https://doi.org/10.5194/gmd-17-4561-2024, 2024

Short summary

Thibault Hallouin, François Bourgin, Charles Perrin, Maria-Helena Ramos, and Vazken Andréassian

Model code and software

evalhyd: a polyglot tool for the evaluation of deterministic and probabilistic streamflow predictions Thibault Hallouin, François Bourgin https://hal.science/hal-04088473

Thibault Hallouin, François Bourgin, Charles Perrin, Maria-Helena Ramos, and Vazken Andréassian

Viewed

Total article views: 921 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
659	224	38	921	25	25

HTML: 659
PDF: 224
XML: 38
Total: 921
BibTeX: 25
EndNote: 25

Views and downloads (calculated since 29 Jun 2023)

Month	HTML	PDF	XML	Total
Jun 2023	54	20	5	79
Jul 2023	118	45	3	166
Aug 2023	85	19	0	104
Sep 2023	106	22	3	131
Oct 2023	121	30	12	163
Nov 2023	12	5	1	18
Dec 2023	24	11	3	38
Jan 2024	21	10	0	31
Feb 2024	31	13	0	44
Mar 2024	28	20	2	50
Apr 2024	27	7	6	40
May 2024	27	20	3	50
Jun 2024	5	2	0	7

Cumulative views and downloads (calculated since 29 Jun 2023)

Month	HTML	PDF	XML	Total
Jun 2023	54	20	5	79
Jul 2023	118	45	3	166
Aug 2023	85	19	0	104
Sep 2023	106	22	3	131
Oct 2023	121	30	12	163
Nov 2023	12	5	1	18
Dec 2023	24	11	3	38
Jan 2024	21	10	0	31
Feb 2024	31	13	0	44
Mar 2024	28	20	2	50
Apr 2024	27	7	6	40
May 2024	27	20	3	50
Jun 2024	5	2	0	7

Viewed (geographical distribution)

Total article views: 942 (including HTML, PDF, and XML) Thereof 942 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 10 Jun 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2677 KB)
Metadata XML

Short summary

The evaluation of the quality of hydrological model outputs against streamflow observations is widespread in the hydrological literature. In order to improve on the reproducibility of published studies, a new evaluation tool dedicated to hydrological applications is presented. It is open source and usable in a variety of programming languages to make it as accessible as possible in the community. Thus, authors and readers can use the same tool to produce and reproduce the results, respectively.

evalhyd v0.1.1: a polyglot tool for the evaluation of deterministic and probabilistic streamflow predictions

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Model code and software

Viewed

Viewed (geographical distribution)


Total:	0
HTML:	0
PDF:	0
XML:	0