the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Long-term evaluation of the Sub-seasonal to Seasonal (S2S) dataset and derived hydrological forecasts at the catchment scale
Abstract. Recently, projects such as the S2S (Sub-seasonal to Seasonal) have surfaced with the goal of investigating the potential benefits of operational applications of medium- to long-term weather forecasts from two weeks to three months. Key challenges are to quantify forecast uncertainty and verify these predictions considering the downstream users. This work evaluates the meteorological lead-time performance and 5-years skill evolution of nine models of the S2S project alongside discharge predictions from a coupled hydrological model. Moreover, an analysis of the predictors of Numerical Weather Prediction (NWP) quality and an evaluation of the correlation between meteorological and hydrological quality improvement over time is carried out. Results show that the S2S models have skill at the catchment-scale, particularly for lower threshold levels, and that ensemble size is the main predictor of NWP performance. Discharge simulations forced with S2S predictions remain skilful up to one month. The quality of the S2S has increased over time, and there is a strong correlation between meteorological and hydrological improvements. We conclude that S2S products may provide added value to end-users of water resources applications.
- Preprint
(616 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2022-419', Anonymous Referee #1, 13 Jul 2022
Summary
In this study, the authors evaluate the accuracy and hydrologic utility of subseasonal to seasonal (S2S) forecasts generated from the S2S project. The paper is well written and provides a good overview of the performance of the current state-of-the-art S2S forecasts. However, the paper, as it stands, is lacking in detail, and would especially benefit from better discussion of results.
Â
Major Comments
Â
- Why were these two study regions selected? The total area is 4100 km2. However, the model grid size is 1.5 x 1.5 deg (~20,000km2) but the authors mention that the study regions encompass 4 grid points. I am not sure how this number was arrived at.Â
- I do not fully understand the reason behind not showing the performance of temperature. As NWP modules use different physics for simulating precipitation and temperature, similarity in performance metrics should not preclude the inclusion and discussion of the temperature results as they pertain to different aspects of the NWP model.
- Why does the performance improve after the yield point? This is very surprising and the authors should provide some explanation as to why performance improves with increase in lead times.Â
- The manuscript does not discuss the results comprehensively. What is the impact of ensemble member size on forecast performance? Why do ECMWF models perform better? Have other studies found out the same?
Â
Minor Comments
- Apart from being a project, S2S is generally used to refer to a specific forecast horizon in the forecasting community. I request the authors to explicitly mention that they are referring to the project.Â
- Abstract: ‘Results show that the S2S models have skill at the catchment-scale, particularly for lower threshold levels …’. What does ‘lower threshold levels’ mean here?
- Line 30: What is ‘hydrological horizon of skillful predictability’?
- ECMWF models are referred to as ‘ecmf’ and ISAC-CNR as ‘isac’ in the figures. Please maintain consistency.
Â
Citation: https://doi.org/10.5194/egusphere-2022-419-RC1 - AC1: 'Reply on RC1', Marianne Brum, 02 Sep 2022
-
RC2: 'Comment on egusphere-2022-419', Anonymous Referee #2, 14 Jul 2022
Summary:
The authors present an evaluation of forecast skill of an sub-seasonal 2 seasonal prediction for two regions in Central Germany. The analysis encompasses nine numerical weather prediction (NWPs) models. The authors use a set of well-established forecast skill metrics for this purpose. For both of these regions, the authors analyse meteorological forecast skill for precipitation and temperature. For one of the regions, a hydrologic model (HBV) is applied and forecast skill is evaluated for discharge forecasts. The main findings of the study are that hydrologic forecasts are skillful at longer lead times at which meteorological forecasts are not skillful anymore. The study also shows that forecast have become better over time (i.e., better in 2020 than in 2015) with a larger gain for meteorological variables in comparison to hydrologic variables. The nature of the study is more a report than a scientific exploration. For example, the authors provide no explanation why the forecast skill is higher for discharge than for meteorological variables or why the forecast have gained skill over time. The authors could have performed experiments with HBV to show that it is indeed the hydrologic inertia that results in higher skill at longer leads. Similarly, the authors report that forecast gain over time is present, but do not explain why this is happening. Which changes have been applied to the forecast system that could explain the gain. If no change to the forecast system has been applied, this finding might well be an artifact of the experimental design and truely, no gain is present. These substantial points have to be addressed thoroughly before the manuscript can be published. The manuscript is well-structured and overall well written and easy to follow. I only found the figure captions too short and also lacking an introduction of abbreviations used in figures.
Major comments:
The selection of Rhineland-Pfalz (RLP) for evaluating the meteorological performance seems arbitrary. As seasonal forecast are global in nature, this evaluation could be done at that scale. RLP is also very close to the selected catchment (both are located in Germany a few hundred kilometers apart) and share very similar climate. Please consider using another setting for the meteorological evaluation.
One of the major points is the gain in forecast performance from the beginning of the forecast period (2015) to the end (2020). 5 years is a very short period that is insufficient to characterize the climatology of the investigated catchment (upper Main in Germany). It is unclear whether the gain in skill is an artefact of catchment observations. In other words, it could well be that the upper Main was easier to forecast in 2020 because of the weather that occured in 2020 than in 2015. The authors need to elaborate on the reasons why they think it is actually an improvement in the system. It is surprising to me that the meteorological lead-time gain can increase by 20 days over such a short period. This means that a forecast at lead time 40 in 2020 is as skillful as a forecast at lead time 20 in 2015. This would be an enormous improvement.
Table 3: Within this table, only the correlation of forecast feature with the hisghest correlation is shown and others are not reported. It would be necessary to know the correlation of the other model features too to judge how much better the model feature with the highest correlation really is. Additionally, the authors should conduct a statistical significance test to demonstrate that the outperformance by the model features with the highest correlation is significant.
In general, the Figure captions are too short to understand the Figures and need to be expanded. Additionally, abbreviations used in figures are not explained (kwbc in Figure 4) and coloring of lines is also unclear.
Minor edits:
Abstract
- should mention Germany as location of catchments
- should provide reference to S2S projectIntro:
- paragraph starting at line 48 can be merged with paragraph ending at line 33. They cover the same points.
- line 54: which time? the past decades or lead time.
- line 56: what do the authors mean by "definite model predictors"?
- line 58: What do the authors verfiy here? The sentence seems to be incorrect. Verify is also not an appropriate word in the context of modelling. Maybe better choose validate.Materials and Methods
- line 74: please modify sentence to make clear which statistic corresponds to which gauge
- line 98: it is not clear how temperature dataset has been interpolated from the stations to the grid scale.
- line 125ff. Some metrics that are introduced here are not used in the results section, for example 'value'. These should be removed.
- line 155: HBV uses a triangular weighting function for the river routing. It is not clear to me how this is used to connect the nodes N0, N1, N2, and N3. It needs to be clarified what these nodes represent.
- Line 157: It is not clear to me how a data assimilation (DA) is run for HBV. HBV is a conceptual hydrologic model where model state variables cannot be mapped to observation-based variables. Please expand. It seems like DA improved the performance (see line 166ff), but the model performance even without DA is very high and certainly within the range deemed as useful for end users. Please clarify why DA is necessary at all for this application.Results:
Line 178: it should be stated that Q97 is not shown
Line 180: The sentence starting at 'Persistence, on the...' is not clear to me. Could you please rephrase.
Line 198: please state that this statement is only valid for the meteorological forecasts.
Figure 3: it is confusing that the name for both gauges contain Schwuerbitz in the name.
Figure 3: kwbc is not explained
Line 239ff: It is unclear to me how the composite-model lead-time gain is calculated.Citation: https://doi.org/10.5194/egusphere-2022-419-RC2 - AC2: 'Reply on RC2', Marianne Brum, 02 Sep 2022
Status: closed
-
RC1: 'Comment on egusphere-2022-419', Anonymous Referee #1, 13 Jul 2022
Summary
In this study, the authors evaluate the accuracy and hydrologic utility of subseasonal to seasonal (S2S) forecasts generated from the S2S project. The paper is well written and provides a good overview of the performance of the current state-of-the-art S2S forecasts. However, the paper, as it stands, is lacking in detail, and would especially benefit from better discussion of results.
Â
Major Comments
Â
- Why were these two study regions selected? The total area is 4100 km2. However, the model grid size is 1.5 x 1.5 deg (~20,000km2) but the authors mention that the study regions encompass 4 grid points. I am not sure how this number was arrived at.Â
- I do not fully understand the reason behind not showing the performance of temperature. As NWP modules use different physics for simulating precipitation and temperature, similarity in performance metrics should not preclude the inclusion and discussion of the temperature results as they pertain to different aspects of the NWP model.
- Why does the performance improve after the yield point? This is very surprising and the authors should provide some explanation as to why performance improves with increase in lead times.Â
- The manuscript does not discuss the results comprehensively. What is the impact of ensemble member size on forecast performance? Why do ECMWF models perform better? Have other studies found out the same?
Â
Minor Comments
- Apart from being a project, S2S is generally used to refer to a specific forecast horizon in the forecasting community. I request the authors to explicitly mention that they are referring to the project.Â
- Abstract: ‘Results show that the S2S models have skill at the catchment-scale, particularly for lower threshold levels …’. What does ‘lower threshold levels’ mean here?
- Line 30: What is ‘hydrological horizon of skillful predictability’?
- ECMWF models are referred to as ‘ecmf’ and ISAC-CNR as ‘isac’ in the figures. Please maintain consistency.
Â
Citation: https://doi.org/10.5194/egusphere-2022-419-RC1 - AC1: 'Reply on RC1', Marianne Brum, 02 Sep 2022
-
RC2: 'Comment on egusphere-2022-419', Anonymous Referee #2, 14 Jul 2022
Summary:
The authors present an evaluation of forecast skill of an sub-seasonal 2 seasonal prediction for two regions in Central Germany. The analysis encompasses nine numerical weather prediction (NWPs) models. The authors use a set of well-established forecast skill metrics for this purpose. For both of these regions, the authors analyse meteorological forecast skill for precipitation and temperature. For one of the regions, a hydrologic model (HBV) is applied and forecast skill is evaluated for discharge forecasts. The main findings of the study are that hydrologic forecasts are skillful at longer lead times at which meteorological forecasts are not skillful anymore. The study also shows that forecast have become better over time (i.e., better in 2020 than in 2015) with a larger gain for meteorological variables in comparison to hydrologic variables. The nature of the study is more a report than a scientific exploration. For example, the authors provide no explanation why the forecast skill is higher for discharge than for meteorological variables or why the forecast have gained skill over time. The authors could have performed experiments with HBV to show that it is indeed the hydrologic inertia that results in higher skill at longer leads. Similarly, the authors report that forecast gain over time is present, but do not explain why this is happening. Which changes have been applied to the forecast system that could explain the gain. If no change to the forecast system has been applied, this finding might well be an artifact of the experimental design and truely, no gain is present. These substantial points have to be addressed thoroughly before the manuscript can be published. The manuscript is well-structured and overall well written and easy to follow. I only found the figure captions too short and also lacking an introduction of abbreviations used in figures.
Major comments:
The selection of Rhineland-Pfalz (RLP) for evaluating the meteorological performance seems arbitrary. As seasonal forecast are global in nature, this evaluation could be done at that scale. RLP is also very close to the selected catchment (both are located in Germany a few hundred kilometers apart) and share very similar climate. Please consider using another setting for the meteorological evaluation.
One of the major points is the gain in forecast performance from the beginning of the forecast period (2015) to the end (2020). 5 years is a very short period that is insufficient to characterize the climatology of the investigated catchment (upper Main in Germany). It is unclear whether the gain in skill is an artefact of catchment observations. In other words, it could well be that the upper Main was easier to forecast in 2020 because of the weather that occured in 2020 than in 2015. The authors need to elaborate on the reasons why they think it is actually an improvement in the system. It is surprising to me that the meteorological lead-time gain can increase by 20 days over such a short period. This means that a forecast at lead time 40 in 2020 is as skillful as a forecast at lead time 20 in 2015. This would be an enormous improvement.
Table 3: Within this table, only the correlation of forecast feature with the hisghest correlation is shown and others are not reported. It would be necessary to know the correlation of the other model features too to judge how much better the model feature with the highest correlation really is. Additionally, the authors should conduct a statistical significance test to demonstrate that the outperformance by the model features with the highest correlation is significant.
In general, the Figure captions are too short to understand the Figures and need to be expanded. Additionally, abbreviations used in figures are not explained (kwbc in Figure 4) and coloring of lines is also unclear.
Minor edits:
Abstract
- should mention Germany as location of catchments
- should provide reference to S2S projectIntro:
- paragraph starting at line 48 can be merged with paragraph ending at line 33. They cover the same points.
- line 54: which time? the past decades or lead time.
- line 56: what do the authors mean by "definite model predictors"?
- line 58: What do the authors verfiy here? The sentence seems to be incorrect. Verify is also not an appropriate word in the context of modelling. Maybe better choose validate.Materials and Methods
- line 74: please modify sentence to make clear which statistic corresponds to which gauge
- line 98: it is not clear how temperature dataset has been interpolated from the stations to the grid scale.
- line 125ff. Some metrics that are introduced here are not used in the results section, for example 'value'. These should be removed.
- line 155: HBV uses a triangular weighting function for the river routing. It is not clear to me how this is used to connect the nodes N0, N1, N2, and N3. It needs to be clarified what these nodes represent.
- Line 157: It is not clear to me how a data assimilation (DA) is run for HBV. HBV is a conceptual hydrologic model where model state variables cannot be mapped to observation-based variables. Please expand. It seems like DA improved the performance (see line 166ff), but the model performance even without DA is very high and certainly within the range deemed as useful for end users. Please clarify why DA is necessary at all for this application.Results:
Line 178: it should be stated that Q97 is not shown
Line 180: The sentence starting at 'Persistence, on the...' is not clear to me. Could you please rephrase.
Line 198: please state that this statement is only valid for the meteorological forecasts.
Figure 3: it is confusing that the name for both gauges contain Schwuerbitz in the name.
Figure 3: kwbc is not explained
Line 239ff: It is unclear to me how the composite-model lead-time gain is calculated.Citation: https://doi.org/10.5194/egusphere-2022-419-RC2 - AC2: 'Reply on RC2', Marianne Brum, 02 Sep 2022
Model code and software
Forecast Evaluation Code Marianne Brum https://gitlab.com/mbrum/forecast-evaluation
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
440 | 262 | 27 | 729 | 29 | 20 |
- HTML: 440
- PDF: 262
- XML: 27
- Total: 729
- BibTeX: 29
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1