the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
evalhyd v0.1.1: a polyglot tool for the evaluation of deterministic and probabilistic streamflow predictions
Abstract. The evaluation of streamflow predictions forms an essential part of most hydrological modelling studies published in the literature. The evaluation process typically involves the computation of some evaluation metrics, but it can also involve the pre-processing of the predictions and the post-processing of the computed metrics. In order for published hydrological studies to be reproducible, these steps need to be carefully documented by the authors. The availability of a single tool performing all of these tasks would simplify the documentation by the authors, but also the reproducibility by the readers. However, this requires for such a tool to be polyglot (i.e. usable in a variety of programming languages) and openly accessible, so that it can be used by everyone in the hydrological community. To this end, we developed a new tool named evalhyd that offers metrics and functionalities for the evaluation of deterministic and probabilistic streamflow predictions. It is open source and it can be used in Python, in R, in C++, or as a command line tool. This article describes the tool and illustrates its functionalities using Global Flood Awareness System (GloFAS) reforecasts over France as an example data set.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(2677 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(2677 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1424', Barbara Casati, 17 Sep 2023
Dear T. Hallouin and co-authors,
I really enjoyed reading this article, and evalhyd seems a very nice verification too: felicitations! I have uploaded a file with some minor suggestions, while here with this following comment I wish to bring the attention of the scientific community to one aspect of verification, for the general online discussion.
I was particularly triggered about the option in evalhyd of performing conditional verification. I will share here some of our own experience (at the Canadian Met Service) with conditional verification, which maybe can inspire further developments in the tool and, more in general, awareness in the interpretation of the results.
Conditioning on the verification sample can have strong impacts on the verification results (e.g. it can flip the sign of a bias), and hence allows in-depth analysis and understanding of the prediction performance, since the conditioning is usually related to physically-driven phenomena. In a sense, conditional verification is the first step towards process-based diagnostics.
In verification exercises which include several variables (e.g. pressure, temperature, clouds, etc) applying a condition to a variable while verifying a different variable is the common practice (as an example, verification of surface temperature in cloudy versus clear-sky conditions inform of the model performance in reproducing the radiation budget). The condition, however, should be applied to both observed and forecast values (e.g. forecast AND observation being cloudy): I will refer to this double condition as bilateral. On the other hand, when a unilateral condition is applied, to only the observed or forecast variable (e.g. cloudy conditions only for the forecast) this can synthetically introduce a bias in the verification results: in the cloud/temperature example above, stratifying for cloudy conditions only in the forecast leads to a synthetic warm bias for the surface temperature, because in the sample there are bound to be both cloudy and clear sky observations, and when the observations have clear sky the surface temperature is expected to be colder. In other words, the bilateral condition will sample all the “hits” for cloudy sky, whereas the unilateral condition will sample the “hits” and “false alarms” for cloudy sky. From our experience, we advise bilateral over unilateral conditioning. (Of course one can also do unilateral conditioning, but need to be aware of the introduced biases in the interpretation of the verification results).
Applying the unilateral condition to the same variable which is verified might also lead to synthetic biases. As an example, if you stratify your sample for the strong predicted stream flows, you are bound to include in the sample several strong observed stream flows (the “hits”), but also some average or weak observed stream-flows (because the prediction might have some “false alarms”). Then you tend to “artificially” diagnose over-prediction for the strong stream flow (and vice-versa for the low stream flow, conditioning only on the prediction you are bound to find under-estimation, because in your sample you’ll have some observed events which are medium or strong).
I was amazed (also a bit puzzled) to see that in your Figure 6 you have opposite results than I expected (underprediction for high predicted stream flow, more overprediction for low predicted stream flows; the under-dispersion for the average predicted stream flow is instead expected). For me it would be interesting to understand why, is it due to the characteristics of streamflow prediction (where the timing is always predicted well, and hence false alarms and misses are very rare)? what is the behaviour in the other stations? What would you obtain with the bilateral condition?
I would be grateful if you could add in the article some discussion about unilateral versus bilateral conditions.
Thank you + Bests Barbara Casati
-
AC1: 'Reply on RC1', Thibault Hallouin, 02 Oct 2023
Dear Barbara Casati,
This a simply a brief comment to thank you for your review and your constructive comments. We will provide a detailed reply in due course.We thank you for sharing your insight and experience with conditional verification and detailing the distinction between unilateral and bilateral conditioning. We will make sure to cover this distinction in the revised version of the article, to explicitly mention which type of conditioning is possible with evalhyd, and to make the reader aware of the possible bias introduced with unilateral conditioning.
We also encourage the scientific community to join the discussion on this specific aspect of verification while the discussion is still open (until 01 Nov 2023).
Kind regards,
Thibault Hallouin on behalf of the authors.Citation: https://doi.org/10.5194/egusphere-2023-1424-AC1
-
AC1: 'Reply on RC1', Thibault Hallouin, 02 Oct 2023
-
CC1: 'Comment on egusphere-2023-1424', Elizabeth Cooper, 02 Oct 2023
Just a very short comment to say that I used the python version of evalhyd v0.1.1 to quickly and easily calculate some streamflow metrics for a paper now under review for GMD. The tool was simple to install and made it very easy to compare different model runs against observations for more metrics than I might otherwise have used. Thanks!
Citation: https://doi.org/10.5194/egusphere-2023-1424-CC1 -
AC2: 'Reply on CC1', Thibault Hallouin, 05 Oct 2023
Dear Elizabeth Cooper,
Thank you for sharing your experience here. We are glad to hear that you found evalhyd useful and easy to use.
Cheers,
Thibault Hallouin
Citation: https://doi.org/10.5194/egusphere-2023-1424-AC2
-
AC2: 'Reply on CC1', Thibault Hallouin, 05 Oct 2023
-
RC2: 'Comment on egusphere-2023-1424', Anonymous Referee #2, 14 Oct 2023
The paper introduces evalhyd, an interesting software tool designed for the evaluation of streamflow predictions. The tool's commitment to standardization and open-source accessibility is commendable, providing a valuable contribution to enhancing reproducibility in hydrological studies. Notably, the well-thought-out design principles, which incorporate a compiled C++ core and thin bindings for multiple languages, contribute to the tool's efficiency and usability.
However, despite the paper positioning evalhyd as a contribution to hydroinformatics, the manuscript's focus on the technical aspects of model development limits its scientific impact. The paper could benefit from a more explicit emphasis on the broader scientific implications and advancements in hydrologic science that the tool facilitates. Some specific comments are as follows:
The introduction would benefit from more clearly articulating what new capabilities evalhyd provides compared to existing hydrologic evaluation packages. As it stands, the motivation around standardization across languages is a bit weak.
In the key functionalities section, the masking and bootstrapping methods need more detailed explanation. Pseudocode or formulas would help make these clearer. For the evaluation metrics, links or references to the original sources for each metric should be provided. More justification for the specific metrics included would also help show the comprehensiveness.
The case study, more novel demonstrations of the tool would strengthen this section.
The conclusions would be improved by specifically emphasizing the limitations around extensibility, visualizations, and support for continuous distributions. Comparisons to other existing packages may help contextualize the pros/cons.
Overall more critical analysis is needed on how evalhyd improves on the current state of the art in hydrologic evaluation tools. The paper currently lacks motivation and innovation.
Citation: https://doi.org/10.5194/egusphere-2023-1424-RC2 -
AC3: 'Reply on RC2', Thibault Hallouin, 27 Oct 2023
Dear Referee,
Thank you for your feedback and advice on our manuscript.
As we chose to submit our manuscript to the journal Geoscientific Model Development, we consequently chose to focus on the more technical aspects relevant to software development and, in particular, on the software design decisions motived by the specific needs identified in hydrological science. We believe that advancements in good practices and science can be fuelled by the availability of efficient and easy-to-use tools. However, we do take note of your requirement for a stronger emphasis on the scientific implications of our work and, we will endeavour to spell out more explicitly in the revised version what these are and how they fulfil particular needs and address current shortcomings in hydrology. In particular, we believe that the reproducibility of hydrological studies is currently not always achievable and that this partially stems from the lack of information about the many steps involved in the evaluation of hydrological modelling studies. The piece of software evalhyd contributes to lessening the need for detailed explanations by standardising some of these steps and gathering the same methodologies and the same capabilities in a single tool that is made accessible to a variety of users. In addition, some methodological capabilities offered by the software such as masking, bootstrapping, and multivariate scoring can be considered as advanced methods in the science and practice of hydrological forecasting evaluation. The comment made by Barbara Casati (RC1) about the impacts of conditioning on the verification sample is a good example of the scientific implication that the masking functionality can have.
We will also address your specific comments by providing further details and illustrations around the key functionalities that are masking and boostrapping, and by strengthening the presentation of the case study. We will also add the missing references for the evaluation metrics where possible and motivate their presence in the set of metrics selected. Finally, we will present more explicitly the strengths and limitations of the tools, especially in comparison to what existing state-of-the-art tools can already offer.
Kind regards,
The authors.
Citation: https://doi.org/10.5194/egusphere-2023-1424-AC3
-
AC3: 'Reply on RC2', Thibault Hallouin, 27 Oct 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1424', Barbara Casati, 17 Sep 2023
Dear T. Hallouin and co-authors,
I really enjoyed reading this article, and evalhyd seems a very nice verification too: felicitations! I have uploaded a file with some minor suggestions, while here with this following comment I wish to bring the attention of the scientific community to one aspect of verification, for the general online discussion.
I was particularly triggered about the option in evalhyd of performing conditional verification. I will share here some of our own experience (at the Canadian Met Service) with conditional verification, which maybe can inspire further developments in the tool and, more in general, awareness in the interpretation of the results.
Conditioning on the verification sample can have strong impacts on the verification results (e.g. it can flip the sign of a bias), and hence allows in-depth analysis and understanding of the prediction performance, since the conditioning is usually related to physically-driven phenomena. In a sense, conditional verification is the first step towards process-based diagnostics.
In verification exercises which include several variables (e.g. pressure, temperature, clouds, etc) applying a condition to a variable while verifying a different variable is the common practice (as an example, verification of surface temperature in cloudy versus clear-sky conditions inform of the model performance in reproducing the radiation budget). The condition, however, should be applied to both observed and forecast values (e.g. forecast AND observation being cloudy): I will refer to this double condition as bilateral. On the other hand, when a unilateral condition is applied, to only the observed or forecast variable (e.g. cloudy conditions only for the forecast) this can synthetically introduce a bias in the verification results: in the cloud/temperature example above, stratifying for cloudy conditions only in the forecast leads to a synthetic warm bias for the surface temperature, because in the sample there are bound to be both cloudy and clear sky observations, and when the observations have clear sky the surface temperature is expected to be colder. In other words, the bilateral condition will sample all the “hits” for cloudy sky, whereas the unilateral condition will sample the “hits” and “false alarms” for cloudy sky. From our experience, we advise bilateral over unilateral conditioning. (Of course one can also do unilateral conditioning, but need to be aware of the introduced biases in the interpretation of the verification results).
Applying the unilateral condition to the same variable which is verified might also lead to synthetic biases. As an example, if you stratify your sample for the strong predicted stream flows, you are bound to include in the sample several strong observed stream flows (the “hits”), but also some average or weak observed stream-flows (because the prediction might have some “false alarms”). Then you tend to “artificially” diagnose over-prediction for the strong stream flow (and vice-versa for the low stream flow, conditioning only on the prediction you are bound to find under-estimation, because in your sample you’ll have some observed events which are medium or strong).
I was amazed (also a bit puzzled) to see that in your Figure 6 you have opposite results than I expected (underprediction for high predicted stream flow, more overprediction for low predicted stream flows; the under-dispersion for the average predicted stream flow is instead expected). For me it would be interesting to understand why, is it due to the characteristics of streamflow prediction (where the timing is always predicted well, and hence false alarms and misses are very rare)? what is the behaviour in the other stations? What would you obtain with the bilateral condition?
I would be grateful if you could add in the article some discussion about unilateral versus bilateral conditions.
Thank you + Bests Barbara Casati
-
AC1: 'Reply on RC1', Thibault Hallouin, 02 Oct 2023
Dear Barbara Casati,
This a simply a brief comment to thank you for your review and your constructive comments. We will provide a detailed reply in due course.We thank you for sharing your insight and experience with conditional verification and detailing the distinction between unilateral and bilateral conditioning. We will make sure to cover this distinction in the revised version of the article, to explicitly mention which type of conditioning is possible with evalhyd, and to make the reader aware of the possible bias introduced with unilateral conditioning.
We also encourage the scientific community to join the discussion on this specific aspect of verification while the discussion is still open (until 01 Nov 2023).
Kind regards,
Thibault Hallouin on behalf of the authors.Citation: https://doi.org/10.5194/egusphere-2023-1424-AC1
-
AC1: 'Reply on RC1', Thibault Hallouin, 02 Oct 2023
-
CC1: 'Comment on egusphere-2023-1424', Elizabeth Cooper, 02 Oct 2023
Just a very short comment to say that I used the python version of evalhyd v0.1.1 to quickly and easily calculate some streamflow metrics for a paper now under review for GMD. The tool was simple to install and made it very easy to compare different model runs against observations for more metrics than I might otherwise have used. Thanks!
Citation: https://doi.org/10.5194/egusphere-2023-1424-CC1 -
AC2: 'Reply on CC1', Thibault Hallouin, 05 Oct 2023
Dear Elizabeth Cooper,
Thank you for sharing your experience here. We are glad to hear that you found evalhyd useful and easy to use.
Cheers,
Thibault Hallouin
Citation: https://doi.org/10.5194/egusphere-2023-1424-AC2
-
AC2: 'Reply on CC1', Thibault Hallouin, 05 Oct 2023
-
RC2: 'Comment on egusphere-2023-1424', Anonymous Referee #2, 14 Oct 2023
The paper introduces evalhyd, an interesting software tool designed for the evaluation of streamflow predictions. The tool's commitment to standardization and open-source accessibility is commendable, providing a valuable contribution to enhancing reproducibility in hydrological studies. Notably, the well-thought-out design principles, which incorporate a compiled C++ core and thin bindings for multiple languages, contribute to the tool's efficiency and usability.
However, despite the paper positioning evalhyd as a contribution to hydroinformatics, the manuscript's focus on the technical aspects of model development limits its scientific impact. The paper could benefit from a more explicit emphasis on the broader scientific implications and advancements in hydrologic science that the tool facilitates. Some specific comments are as follows:
The introduction would benefit from more clearly articulating what new capabilities evalhyd provides compared to existing hydrologic evaluation packages. As it stands, the motivation around standardization across languages is a bit weak.
In the key functionalities section, the masking and bootstrapping methods need more detailed explanation. Pseudocode or formulas would help make these clearer. For the evaluation metrics, links or references to the original sources for each metric should be provided. More justification for the specific metrics included would also help show the comprehensiveness.
The case study, more novel demonstrations of the tool would strengthen this section.
The conclusions would be improved by specifically emphasizing the limitations around extensibility, visualizations, and support for continuous distributions. Comparisons to other existing packages may help contextualize the pros/cons.
Overall more critical analysis is needed on how evalhyd improves on the current state of the art in hydrologic evaluation tools. The paper currently lacks motivation and innovation.
Citation: https://doi.org/10.5194/egusphere-2023-1424-RC2 -
AC3: 'Reply on RC2', Thibault Hallouin, 27 Oct 2023
Dear Referee,
Thank you for your feedback and advice on our manuscript.
As we chose to submit our manuscript to the journal Geoscientific Model Development, we consequently chose to focus on the more technical aspects relevant to software development and, in particular, on the software design decisions motived by the specific needs identified in hydrological science. We believe that advancements in good practices and science can be fuelled by the availability of efficient and easy-to-use tools. However, we do take note of your requirement for a stronger emphasis on the scientific implications of our work and, we will endeavour to spell out more explicitly in the revised version what these are and how they fulfil particular needs and address current shortcomings in hydrology. In particular, we believe that the reproducibility of hydrological studies is currently not always achievable and that this partially stems from the lack of information about the many steps involved in the evaluation of hydrological modelling studies. The piece of software evalhyd contributes to lessening the need for detailed explanations by standardising some of these steps and gathering the same methodologies and the same capabilities in a single tool that is made accessible to a variety of users. In addition, some methodological capabilities offered by the software such as masking, bootstrapping, and multivariate scoring can be considered as advanced methods in the science and practice of hydrological forecasting evaluation. The comment made by Barbara Casati (RC1) about the impacts of conditioning on the verification sample is a good example of the scientific implication that the masking functionality can have.
We will also address your specific comments by providing further details and illustrations around the key functionalities that are masking and boostrapping, and by strengthening the presentation of the case study. We will also add the missing references for the evaluation metrics where possible and motivate their presence in the set of metrics selected. Finally, we will present more explicitly the strengths and limitations of the tools, especially in comparison to what existing state-of-the-art tools can already offer.
Kind regards,
The authors.
Citation: https://doi.org/10.5194/egusphere-2023-1424-AC3
-
AC3: 'Reply on RC2', Thibault Hallouin, 27 Oct 2023
Peer review completion
Journal article(s) based on this preprint
Model code and software
evalhyd: a polyglot tool for the evaluation of deterministic and probabilistic streamflow predictions Thibault Hallouin, François Bourgin https://hal.science/hal-04088473
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
659 | 224 | 38 | 921 | 25 | 25 |
- HTML: 659
- PDF: 224
- XML: 38
- Total: 921
- BibTeX: 25
- EndNote: 25
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Thibault Hallouin
François Bourgin
Charles Perrin
Maria-Helena Ramos
Vazken Andréassian
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(2677 KB) - Metadata XML