Long-term evaluation of the Sub-seasonal to Seasonal (S2S) dataset and derived hydrological forecasts at the catchment scale

Brum, Marianne; Schwanenberg, Dirk

doi:https://doi.org/10.5194/egusphere-2022-419

Preprints

https://doi.org/10.5194/egusphere-2022-419

Preprints

09 Jun 2022

| 09 Jun 2022

Long-term evaluation of the Sub-seasonal to Seasonal (S2S) dataset and derived hydrological forecasts at the catchment scale

Marianne Brum and Dirk Schwanenberg

Abstract. Recently, projects such as the S2S (Sub-seasonal to Seasonal) have surfaced with the goal of investigating the potential benefits of operational applications of medium- to long-term weather forecasts from two weeks to three months. Key challenges are to quantify forecast uncertainty and verify these predictions considering the downstream users. This work evaluates the meteorological lead-time performance and 5-years skill evolution of nine models of the S2S project alongside discharge predictions from a coupled hydrological model. Moreover, an analysis of the predictors of Numerical Weather Prediction (NWP) quality and an evaluation of the correlation between meteorological and hydrological quality improvement over time is carried out. Results show that the S2S models have skill at the catchment-scale, particularly for lower threshold levels, and that ensemble size is the main predictor of NWP performance. Discharge simulations forced with S2S predictions remain skilful up to one month. The quality of the S2S has increased over time, and there is a strong correlation between meteorological and hydrological improvements. We conclude that S2S products may provide added value to end-users of water resources applications.

Received: 31 May 2022 – Discussion started: 09 Jun 2022

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Marianne Brum and Dirk Schwanenberg

Status: closed

RC1:
'Comment on egusphere-2022-419', Anonymous Referee #1, 13 Jul 2022
Summary

In this study, the authors evaluate the accuracy and hydrologic utility of subseasonal to seasonal (S2S) forecasts generated from the S2S project. The paper is well written and provides a good overview of the performance of the current state-of-the-art S2S forecasts. However, the paper, as it stands, is lacking in detail, and would especially benefit from better discussion of results.

Major Comments

Why were these two study regions selected? The total area is 4100 km2. However, the model grid size is 1.5 x 1.5 deg (~20,000km2) but the authors mention that the study regions encompass 4 grid points. I am not sure how this number was arrived at.

I do not fully understand the reason behind not showing the performance of temperature. As NWP modules use different physics for simulating precipitation and temperature, similarity in performance metrics should not preclude the inclusion and discussion of the temperature results as they pertain to different aspects of the NWP model.

Why does the performance improve after the yield point? This is very surprising and the authors should provide some explanation as to why performance improves with increase in lead times.

The manuscript does not discuss the results comprehensively. What is the impact of ensemble member size on forecast performance? Why do ECMWF models perform better? Have other studies found out the same?

Minor Comments

Apart from being a project, S2S is generally used to refer to a specific forecast horizon in the forecasting community. I request the authors to explicitly mention that they are referring to the project.

Abstract: ‘Results show that the S2S models have skill at the catchment-scale, particularly for lower threshold levels …’. What does ‘lower threshold levels’ mean here?

Line 30: What is ‘hydrological horizon of skillful predictability’?

ECMWF models are referred to as ‘ecmf’ and ISAC-CNR as ‘isac’ in the figures. Please maintain consistency.
Citation: https://doi.org/10.5194/egusphere-2022-419-RC1
- AC1: 'Reply on RC1', Marianne Brum, 02 Sep 2022
  
  Dear Reviewer, thank you for your comments. We appreciate your suggestions and have used them to improve the quality of this paper. Answers to each comment are in the Supplement attached.
  
  Citation: https://doi.org/10.5194/egusphere-2022-419-AC1
RC2:
'Comment on egusphere-2022-419', Anonymous Referee #2, 14 Jul 2022

Summary:

The authors present an evaluation of forecast skill of an sub-seasonal 2 seasonal prediction for two regions in Central Germany. The analysis encompasses nine numerical weather prediction (NWPs) models. The authors use a set of well-established forecast skill metrics for this purpose. For both of these regions, the authors analyse meteorological forecast skill for precipitation and temperature. For one of the regions, a hydrologic model (HBV) is applied and forecast skill is evaluated for discharge forecasts. The main findings of the study are that hydrologic forecasts are skillful at longer lead times at which meteorological forecasts are not skillful anymore. The study also shows that forecast have become better over time (i.e., better in 2020 than in 2015) with a larger gain for meteorological variables in comparison to hydrologic variables. The nature of the study is more a report than a scientific exploration. For example, the authors provide no explanation why the forecast skill is higher for discharge than for meteorological variables or why the forecast have gained skill over time. The authors could have performed experiments with HBV to show that it is indeed the hydrologic inertia that results in higher skill at longer leads. Similarly, the authors report that forecast gain over time is present, but do not explain why this is happening. Which changes have been applied to the forecast system that could explain the gain. If no change to the forecast system has been applied, this finding might well be an artifact of the experimental design and truely, no gain is present. These substantial points have to be addressed thoroughly before the manuscript can be published. The manuscript is well-structured and overall well written and easy to follow. I only found the figure captions too short and also lacking an introduction of abbreviations used in figures.

Major comments:

The selection of Rhineland-Pfalz (RLP) for evaluating the meteorological performance seems arbitrary. As seasonal forecast are global in nature, this evaluation could be done at that scale. RLP is also very close to the selected catchment (both are located in Germany a few hundred kilometers apart) and share very similar climate. Please consider using another setting for the meteorological evaluation.

One of the major points is the gain in forecast performance from the beginning of the forecast period (2015) to the end (2020). 5 years is a very short period that is insufficient to characterize the climatology of the investigated catchment (upper Main in Germany). It is unclear whether the gain in skill is an artefact of catchment observations. In other words, it could well be that the upper Main was easier to forecast in 2020 because of the weather that occured in 2020 than in 2015. The authors need to elaborate on the reasons why they think it is actually an improvement in the system. It is surprising to me that the meteorological lead-time gain can increase by 20 days over such a short period. This means that a forecast at lead time 40 in 2020 is as skillful as a forecast at lead time 20 in 2015. This would be an enormous improvement.

Table 3: Within this table, only the correlation of forecast feature with the hisghest correlation is shown and others are not reported. It would be necessary to know the correlation of the other model features too to judge how much better the model feature with the highest correlation really is. Additionally, the authors should conduct a statistical significance test to demonstrate that the outperformance by the model features with the highest correlation is significant.

In general, the Figure captions are too short to understand the Figures and need to be expanded. Additionally, abbreviations used in figures are not explained (kwbc in Figure 4) and coloring of lines is also unclear.

Minor edits:

Abstract

- should mention Germany as location of catchments

- should provide reference to S2S project

Intro:

- paragraph starting at line 48 can be merged with paragraph ending at line 33. They cover the same points.

- line 54: which time? the past decades or lead time.

- line 56: what do the authors mean by "definite model predictors"?

- line 58: What do the authors verfiy here? The sentence seems to be incorrect. Verify is also not an appropriate word in the context of modelling. Maybe better choose validate.

Materials and Methods

- line 74: please modify sentence to make clear which statistic corresponds to which gauge

- line 98: it is not clear how temperature dataset has been interpolated from the stations to the grid scale.

- line 125ff. Some metrics that are introduced here are not used in the results section, for example 'value'. These should be removed.

- line 155: HBV uses a triangular weighting function for the river routing. It is not clear to me how this is used to connect the nodes N0, N1, N2, and N3. It needs to be clarified what these nodes represent.

- Line 157: It is not clear to me how a data assimilation (DA) is run for HBV. HBV is a conceptual hydrologic model where model state variables cannot be mapped to observation-based variables. Please expand. It seems like DA improved the performance (see line 166ff), but the model performance even without DA is very high and certainly within the range deemed as useful for end users. Please clarify why DA is necessary at all for this application.

Results:

Line 178: it should be stated that Q97 is not shown

Line 180: The sentence starting at 'Persistence, on the...' is not clear to me. Could you please rephrase.

Line 198: please state that this statement is only valid for the meteorological forecasts.

Figure 3: it is confusing that the name for both gauges contain Schwuerbitz in the name.

Figure 3: kwbc is not explained

Line 239ff: It is unclear to me how the composite-model lead-time gain is calculated.

Citation: https://doi.org/10.5194/egusphere-2022-419-RC2
- AC2: 'Reply on RC2', Marianne Brum, 02 Sep 2022
  
  Dear Reviewer, thank you for your comments. We appreciate your suggestions and have used them to improve the quality of this paper. Answers to each comment are in the Supplement attached.
  
  Citation: https://doi.org/10.5194/egusphere-2022-419-AC2

Status: closed

RC1:
'Comment on egusphere-2022-419', Anonymous Referee #1, 13 Jul 2022
Summary

In this study, the authors evaluate the accuracy and hydrologic utility of subseasonal to seasonal (S2S) forecasts generated from the S2S project. The paper is well written and provides a good overview of the performance of the current state-of-the-art S2S forecasts. However, the paper, as it stands, is lacking in detail, and would especially benefit from better discussion of results.

Major Comments

Why were these two study regions selected? The total area is 4100 km2. However, the model grid size is 1.5 x 1.5 deg (~20,000km2) but the authors mention that the study regions encompass 4 grid points. I am not sure how this number was arrived at.

I do not fully understand the reason behind not showing the performance of temperature. As NWP modules use different physics for simulating precipitation and temperature, similarity in performance metrics should not preclude the inclusion and discussion of the temperature results as they pertain to different aspects of the NWP model.

Why does the performance improve after the yield point? This is very surprising and the authors should provide some explanation as to why performance improves with increase in lead times.

The manuscript does not discuss the results comprehensively. What is the impact of ensemble member size on forecast performance? Why do ECMWF models perform better? Have other studies found out the same?

Minor Comments

Apart from being a project, S2S is generally used to refer to a specific forecast horizon in the forecasting community. I request the authors to explicitly mention that they are referring to the project.

Abstract: ‘Results show that the S2S models have skill at the catchment-scale, particularly for lower threshold levels …’. What does ‘lower threshold levels’ mean here?

Line 30: What is ‘hydrological horizon of skillful predictability’?

ECMWF models are referred to as ‘ecmf’ and ISAC-CNR as ‘isac’ in the figures. Please maintain consistency.
Citation: https://doi.org/10.5194/egusphere-2022-419-RC1
- AC1: 'Reply on RC1', Marianne Brum, 02 Sep 2022
  
  Dear Reviewer, thank you for your comments. We appreciate your suggestions and have used them to improve the quality of this paper. Answers to each comment are in the Supplement attached.
  
  Citation: https://doi.org/10.5194/egusphere-2022-419-AC1
RC2:
'Comment on egusphere-2022-419', Anonymous Referee #2, 14 Jul 2022

Summary:

The authors present an evaluation of forecast skill of an sub-seasonal 2 seasonal prediction for two regions in Central Germany. The analysis encompasses nine numerical weather prediction (NWPs) models. The authors use a set of well-established forecast skill metrics for this purpose. For both of these regions, the authors analyse meteorological forecast skill for precipitation and temperature. For one of the regions, a hydrologic model (HBV) is applied and forecast skill is evaluated for discharge forecasts. The main findings of the study are that hydrologic forecasts are skillful at longer lead times at which meteorological forecasts are not skillful anymore. The study also shows that forecast have become better over time (i.e., better in 2020 than in 2015) with a larger gain for meteorological variables in comparison to hydrologic variables. The nature of the study is more a report than a scientific exploration. For example, the authors provide no explanation why the forecast skill is higher for discharge than for meteorological variables or why the forecast have gained skill over time. The authors could have performed experiments with HBV to show that it is indeed the hydrologic inertia that results in higher skill at longer leads. Similarly, the authors report that forecast gain over time is present, but do not explain why this is happening. Which changes have been applied to the forecast system that could explain the gain. If no change to the forecast system has been applied, this finding might well be an artifact of the experimental design and truely, no gain is present. These substantial points have to be addressed thoroughly before the manuscript can be published. The manuscript is well-structured and overall well written and easy to follow. I only found the figure captions too short and also lacking an introduction of abbreviations used in figures.

Major comments:

The selection of Rhineland-Pfalz (RLP) for evaluating the meteorological performance seems arbitrary. As seasonal forecast are global in nature, this evaluation could be done at that scale. RLP is also very close to the selected catchment (both are located in Germany a few hundred kilometers apart) and share very similar climate. Please consider using another setting for the meteorological evaluation.

One of the major points is the gain in forecast performance from the beginning of the forecast period (2015) to the end (2020). 5 years is a very short period that is insufficient to characterize the climatology of the investigated catchment (upper Main in Germany). It is unclear whether the gain in skill is an artefact of catchment observations. In other words, it could well be that the upper Main was easier to forecast in 2020 because of the weather that occured in 2020 than in 2015. The authors need to elaborate on the reasons why they think it is actually an improvement in the system. It is surprising to me that the meteorological lead-time gain can increase by 20 days over such a short period. This means that a forecast at lead time 40 in 2020 is as skillful as a forecast at lead time 20 in 2015. This would be an enormous improvement.

Table 3: Within this table, only the correlation of forecast feature with the hisghest correlation is shown and others are not reported. It would be necessary to know the correlation of the other model features too to judge how much better the model feature with the highest correlation really is. Additionally, the authors should conduct a statistical significance test to demonstrate that the outperformance by the model features with the highest correlation is significant.

In general, the Figure captions are too short to understand the Figures and need to be expanded. Additionally, abbreviations used in figures are not explained (kwbc in Figure 4) and coloring of lines is also unclear.

Minor edits:

Abstract

- should mention Germany as location of catchments

- should provide reference to S2S project

Intro:

- paragraph starting at line 48 can be merged with paragraph ending at line 33. They cover the same points.

- line 54: which time? the past decades or lead time.

- line 56: what do the authors mean by "definite model predictors"?

- line 58: What do the authors verfiy here? The sentence seems to be incorrect. Verify is also not an appropriate word in the context of modelling. Maybe better choose validate.

Materials and Methods

- line 74: please modify sentence to make clear which statistic corresponds to which gauge

- line 98: it is not clear how temperature dataset has been interpolated from the stations to the grid scale.

- line 125ff. Some metrics that are introduced here are not used in the results section, for example 'value'. These should be removed.

- line 155: HBV uses a triangular weighting function for the river routing. It is not clear to me how this is used to connect the nodes N0, N1, N2, and N3. It needs to be clarified what these nodes represent.

- Line 157: It is not clear to me how a data assimilation (DA) is run for HBV. HBV is a conceptual hydrologic model where model state variables cannot be mapped to observation-based variables. Please expand. It seems like DA improved the performance (see line 166ff), but the model performance even without DA is very high and certainly within the range deemed as useful for end users. Please clarify why DA is necessary at all for this application.

Results:

Line 178: it should be stated that Q97 is not shown

Line 180: The sentence starting at 'Persistence, on the...' is not clear to me. Could you please rephrase.

Line 198: please state that this statement is only valid for the meteorological forecasts.

Figure 3: it is confusing that the name for both gauges contain Schwuerbitz in the name.

Figure 3: kwbc is not explained

Line 239ff: It is unclear to me how the composite-model lead-time gain is calculated.

Citation: https://doi.org/10.5194/egusphere-2022-419-RC2
- AC2: 'Reply on RC2', Marianne Brum, 02 Sep 2022
  
  Dear Reviewer, thank you for your comments. We appreciate your suggestions and have used them to improve the quality of this paper. Answers to each comment are in the Supplement attached.
  
  Citation: https://doi.org/10.5194/egusphere-2022-419-AC2

Marianne Brum and Dirk Schwanenberg

Model code and software

Forecast Evaluation Code Marianne Brum https://gitlab.com/mbrum/forecast-evaluation

Marianne Brum and Dirk Schwanenberg

Viewed

Total article views: 869 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
510	329	30	869	39	33

HTML: 510
PDF: 329
XML: 30
Total: 869
BibTeX: 39
EndNote: 33

Views and downloads (calculated since 09 Jun 2022)

Month	HTML	PDF	XML	Total
Jun 2022	117	29	5	151
Jul 2022	53	12	2	67
Aug 2022	15	32	0	47
Sep 2022	48	23	6	77
Oct 2022	18	7	0	25
Nov 2022	12	31	0	43
Dec 2022	8	16	0	24
Jan 2023	12	5	0	17
Feb 2023	23	17	0	40
Mar 2023	8	14	0	22
Apr 2023	9	5	0	14
May 2023	3	8	0	11
Jun 2023	12	3	2	17
Jul 2023	5	10	2	17
Aug 2023	3	5	0	8
Sep 2023	10	9	1	20
Oct 2023	6	6	1	13
Nov 2023	2	1	0	3
Dec 2023	3	0	3
Jan 2024	5	3	0	8
Feb 2024	5	4	0	9
Mar 2024	5	6	2	13
Apr 2024	15	2	2	19
May 2024	5	3	0	8
Jun 2024	11	1	1	13
Jul 2024	6	1	1	8
Aug 2024	6	2	2	10
Sep 2024	2	3	0	5
Oct 2024	9	4	0	13
Nov 2024	7	6	0	13
Dec 2024	3	1	0	4
Jan 2025	3	4	0	7
Feb 2025	8	0	8
Mar 2025	14	8	0	22
Apr 2025	3	5	0	8
May 2025	8	3	3	14
Jun 2025	4	16	0	20
Jul 2025	7	7	0	14
Aug 2025	4	7	0	11
Sep 2025	12	8	0	20
Oct 2025	1	2	0	3

Cumulative views and downloads (calculated since 09 Jun 2022)

Month	HTML	PDF	XML	Total
Jun 2022	117	29	5	151
Jul 2022	53	12	2	67
Aug 2022	15	32	0	47
Sep 2022	48	23	6	77
Oct 2022	18	7	0	25
Nov 2022	12	31	0	43
Dec 2022	8	16	0	24
Jan 2023	12	5	0	17
Feb 2023	23	17	0	40
Mar 2023	8	14	0	22
Apr 2023	9	5	0	14
May 2023	3	8	0	11
Jun 2023	12	3	2	17
Jul 2023	5	10	2	17
Aug 2023	3	5	0	8
Sep 2023	10	9	1	20
Oct 2023	6	6	1	13
Nov 2023	2	1	0	3
Dec 2023	3	0	3
Jan 2024	5	3	0	8
Feb 2024	5	4	0	9
Mar 2024	5	6	2	13
Apr 2024	15	2	2	19
May 2024	5	3	0	8
Jun 2024	11	1	1	13
Jul 2024	6	1	1	8
Aug 2024	6	2	2	10
Sep 2024	2	3	0	5
Oct 2024	9	4	0	13
Nov 2024	7	6	0	13
Dec 2024	3	1	0	4
Jan 2025	3	4	0	7
Feb 2025	8	0	8
Mar 2025	14	8	0	22
Apr 2025	3	5	0	8
May 2025	8	3	3	14
Jun 2025	4	16	0	20
Jul 2025	7	7	0	14
Aug 2025	4	7	0	11
Sep 2025	12	8	0	20
Oct 2025	1	2	0	3

Viewed (geographical distribution)

Total article views: 845 (including HTML, PDF, and XML) Thereof 845 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 09 Oct 2025

Short summary

Systems such as navigation and water supply rely on river flow forecasts to manage their assets, which themselves use predictions from weather models (NWPs) as inputs. We evaluate the quality of sub-seasonal to seasonal (S2S) NWPs considering these applications. We conclude that the quality of S2S forecasts has increased over time, that resulting flow simulations are skillful up to one month, and that there is a strong correlation between hydro- and meteorological quality improvements.


Total:	0
HTML:	0
PDF:	0
XML:	0

Long-term evaluation of the Sub-seasonal to Seasonal (S2S) dataset and derived hydrological forecasts at the catchment scale

Model code and software

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.