the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Skilful probabilistic predictions of UK floods months ahead using machine learning models trained on multimodel ensemble climate forecasts
Abstract. Seasonal streamflow forecasts are an important component of flood risk management. Hybrid forecasting methods that predict seasonal streamflow using machine learning models driven by climate model outputs are currently underexplored, yet have some important advantages over traditional approaches using hydrological models. Here we develop a hybrid subseasonal to seasonal streamflow forecasting system to predict the monthly maximum daily streamflow up to four months ahead. We train a random forest machine learning model on dynamical precipitation and temperature forecasts from a multimodel ensemble of 196 members (eight seasonal climate forecast models) from the Copernicus Climate Change Service (C3S) to produce probabilistic hindcasts for 579 stations across the UK for the period 2004–2016, with up to four months lead time. We show that multi-site ML models trained on pooled catchment data together with static catchment attributes are significantly more skilful compared to single-site ML models trained on data from each catchment individually. Considering all initialization months, 60 % of stations show positive skill (CRPSS > 0) relative to climatological reference forecasts in the first month after initialization. This falls to 41 % in the second month, 38 % in the third month and 33 % in the fourth month.
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2324', Anonymous Referee #1, 25 Oct 2024
Review comment: Skilful probabilistic predictions of UK floods months ahead using machine learning models trained on multimodel ensemble climate forecasts by Simon Moulds et al.
The manuscript by Moulds et al. presents a new hybrid model approach for flood forecasting at a subseasonal to seasonal (S2S) scale for the UK. In the process of developing the forecasting system, different model setups were tested, comparing a single vs multisite approach, including additional catchments attributes to the dynamical input data (precipitation and temperature) from the multimodel seasonal forecasting system C3S, to predict monthly maximum daily streamflow values.
The manuscript highlights the importance of incorporating the multisite approach into modelling practices and further elaborates on the skil of the framework for the four lead months considered in the analysis. While 60% of stations (over all initialization months) indicate positive skill compared to a climatological reference forecast, the skil over the following lead months decreases, which is to be expected. However, the skill compared to commonly used forecasting systems like EFAS remains higher over the lead periods considered. Overall, the manuscript tackles the questions of a) how skilful monthly maximum daily flow can be predicted up to four months lead time and to what extent the skill of S2S streamflow predictions can be improved by incorporating a multi vs single site framework.While the manuscript is well written and showcases an interesting approach to hybrid modelling for flood forecasting, some extra information on the model development process, such as the training, as well as a more in-depth comparison of the single vs multi-site model results would help the reader understand and follow the key findings more.
The following major comments/suggestions/questions came up during the reading process and could help to strengthen and clarify some aspects of the manuscript:
Model development:
- lines 150-161: I would suggest adding an overview figure highlighting the forecast design setup (with the different forecasting months and the data and timesteps used), as well as the different model options described (line 199) in the following section (and potentially combine it with table 1). I think something like this would greatly help the reader and prepare them for understanding and following the figures in the results section quicker.
- lines 204-218: It would generally be interesting and helpful to include some more aspects of the training, testing and validation approach to better understand the model setup.
-- For example: How many or which percentage of the climate forecast ensembles was used for training, testing and validation? As Table S1 highlights all the different options with varying ensemble sizes).
-- Was there a clear split between training, testing and validation datasets?
-- In line 216 it’s mentioned that the training period gets extended with every year: was the model retrained completely from scratch with all the data or just updated? Would that not lead to overfitting the model?
-- Was it tested separately what the influence of this training method is compared to not extending the training periods and just providing the current available data?
-- How did the training, testing and validation differ between the single and multi-site (ID or 15attributes) approaches (if they did)? How were the attributes incorporated? Are only the medians used as listed in Table S3? Was there a random selection of the catchments for training or where all considered?Single vs multi-site model results :
- line 259: would be nice it could be highlighted in the description of Figure 1b) (and 1a)) that this shows the average performance of the models. Furthermore, as I reader I was more interested the Figure S2 and understanding where and when the multi-model with attributes tends to outperform the single-site model than the Gini importance. Maybe consider switching out Figure S2 and 1c) from/to appendix? While I think the Gini importance is interesting and the discussion in lines 275-284 should remain, I believe having a more detailed look at Figure S2 might also help with the interpretation of Figure 2 and others where the discussion also goes over different months. Furthermore, the difference between single and multisite model performance is apparently one of the key findings of the work.
Minor comments:
- line 23: could refer directly to the model used (quantile regression forest)
- line 33: abstract could include additional line on outcome, relevance or future plans of this developed framework to highlight the relevance of their findings
- line 53: one period to many before ‘While the skill…’
- lines 105-108: a few lines explaining the necessity and urgency for implanting and testing such approach (in the UK but also generally) could be added to strengthen the objectives of the manuscript
- line 110: short explanation of C3S multimodel would be nice to be able to follow (similar to the EFAS one in previous lines). Furthermore, consider introducing the abbreviation already here and not only in the Data section on line 133
- section 2.1 and 2.2: could consider adding subsections for the different paragraphs to make it more obvious for the reader to find which information where
- line 128: could consider adding a map with the locations of the 579 stations throughout the country to give the reader a better understanding whether these locations are equally distributed or whether there are for example multiple stations for the catchments (also in regards to training the model on static variables and whether some locations might be overrepresented compared to others).
- line 148: do the authors know how much of a bias there is in the precipitation and temperature hindcasts compared to the observations of the catchment? Just curious.
As it was mentioned previously that the aim is to include uncorrected monthly dynamical climate forecasts (line 118), I was wondering what the specific reasoning of this is as well?- lines 150-161: I would suggest adding an overview figure highlighting the forecast design setup (with the different forecasting months and the data and timesteps used), as well as the different model options described (line 199) in the following section (and potentially combine it with table 1). I think something like this would greatly help the reader and prepare them for understanding and following the figures in the results section quicker.
- lines 204-218: It would generally be interesting and helpful to include some more aspects of the training, testing and validation approach to better understand the model setup.
-- For example: How many or which percentage of the climate forecast ensembles was used for training, testing and validation? As Table S1 highlights all the different options with varying ensemble sizes).
-- Was there a clear split between training, testing and validation datasets?
-- In line 216 it’s mentioned that the training period gets extended with every year: was the model retrained completely from scratch with all the data or just updated? Would that not lead to overfitting the model?
-- Was it tested separately what the influence of this training method is compared to not extending the training periods and just providing the current available data?
-- How did the training, testing and validation differ between the single and multi-site (ID or 15attributes) approaches (if they did)? How were the attributes incorporated? Are only the medians used as listed in Table S3? Was there a random selection of the catchments for training or where all considered?- line 228: general question out of curiosity: why evaluate only on monthly scale? The model seemed to be trained on daily (?) timesteps and floods are relatively quick phenomenon. Would it not also be interesting to see if the model can forecast the peaks or the timing of floods at a shorter timescale?
-line 250: same as before: consider adding subchapters to make it easier to for the reader to follow and potentially add a line on how the results are structured
- line 251: for this results and statement it would interesting again to know where the stations are located and in terms of their ID and static catchment attributes
- line 259: would be nice it could be highlighted in the description of Figure 1b) (and 1a)) that this shows the average performance of the models. Furthermore, as I reader I was more interested the Figure S2 and understanding where and when the multi-model with attributes tends to outperform the single-site model than the Gini importance. Maybe consider switching out Figure S2 and 1c) from/to appendix? While I think the Gini importance is interesting and the discussion in lines 275-284 should remain, I believe having a more detailed look at Figure S2 might also help with the interpretation of Figure 2 and others where the discussion also goes over different months. Furthermore, the difference between single and multisite model performance is apparently one of the key findings of the work.
- line 287: consider clarifying also in the text that following focus of the results lies on the multisite model with catchment attributes (it’s in the description of Figure 2 but would be nice to have in the main text as well)
- Figure 2: is it possible to make the same figure for comparison also for the single site model? Would be also interesting to see the single site and multisite also to the EFAS comparison
- line 288: ‘with lower skill during spring and autumn.’ Would it be possible to give the reader an estimate of which seasons or months are generally high flood seasons for the UK or different catchments? In other words, is the model able to simulate floods in those months or not or does it only show good performance in months where there are less floods?
- line 309: appreciate that explanation and clarification as a reader
- Table 2: out of curiosity: any idea why the percentage of stations in lead time 2 in March seems to increase and is even higher than in lead 0?
- Figure 3: would it be possible to change the background/areas of the map that is not considered in the analysis to a different color (e.g. white or transparent) to make the distinction between catchments with low CRPSS a bit clearer?
- Figure 4: What is the Qmax range of the observations for the different catchments shown? Does the difference in catchments (e.g. size or average Q) might have an impact? And was this considered in the training selection? Or can it be related to the Gini importance of the different variables?
- line 342: are these 90 stations roughly in the same area? Is there a common ground that explains their positive skill for all four lead times in a few selective months?
-line 338 Discussion: consider coming back to the initial research question from the introduction more clearly
- line 353: consider adding a line with the comparison to the reference (climatology and EFAS) to highlight how much or little difference the model framework brings to the current forecasting systems used or available in the UK.
- is there a specific reason for not having a conclusion to round up the manuscript, highlighting the main points findings? I believe adding one and coming back and answering the initial research questions from the introduction would strengthen the manuscript
Citation: https://doi.org/10.5194/egusphere-2024-2324-RC1 -
AC1: 'Reply on RC1', Simon Moulds, 13 Dec 2024
Review comment: Skilful probabilistic predictions of UK floods months ahead using machine learning models trained on multimodel ensemble climate forecasts by Simon Moulds et al.
The manuscript by Moulds et al. presents a new hybrid model approach for flood forecasting at a subseasonal to seasonal (S2S) scale for the UK. In the process of developing the forecasting system, different model setups were tested, comparing a single vs multisite approach, including additional catchments attributes to the dynamical input data (precipitation and temperature) from the multimodel seasonal forecasting system C3S, to predict monthly maximum daily streamflow values.
The manuscript highlights the importance of incorporating the multisite approach into modelling practices and further elaborates on the skill of the framework for the four lead months considered in the analysis. While 60% of stations (over all initialization months) indicate positive skill compared to a climatological reference forecast, the skill over the following lead months decreases, which is to be expected. However, the skill compared to commonly used forecasting systems like EFAS remains higher over the lead periods considered. Overall, the manuscript tackles the questions of a) how skilful monthly maximum daily flow can be predicted up to four months lead time and to what extent the skill of S2S streamflow predictions can be improved by incorporating a multi vs single site framework.While the manuscript is well written and showcases an interesting approach to hybrid modelling for flood forecasting, some extra information on the model development process, such as the training, as well as a more in-depth comparison of the single vs multi-site model results would help the reader understand and follow the key findings more.
REPLY: Thank you for taking the time to review our manuscript, and for your thoughtful suggestions, which are addressed in further detail below.
The following major comments/suggestions/questions came up during the reading process and could help to strengthen and clarify some aspects of the manuscript:
Model development:
- lines 150-161: I would suggest adding an overview figure highlighting the forecast design setup (with the different forecasting months and the data and timesteps used), as well as the different model options described (line 199) in the following section (and potentially combine it with table 1). I think something like this would greatly help the reader and prepare them for understanding and following the figures in the results section quicker.
REPLY: This is a great suggestion – we will add a new figure showing the forecast design setup to the revised manuscript.
- lines 204-218: It would generally be interesting and helpful to include some more aspects of the training, testing and validation approach to better understand the model setup.
REPLY: We will add some further details of the training/testing/validation approach to the revised manuscript.
-- For example: How many or which percentage of the climate forecast ensembles was used for training, testing and validation? As Table S1 highlights all the different options with varying ensemble sizes).
We took the multimodel ensemble mean of the climate predictors from each model/member combination listed in Table S1. During training, we used forecasts for all years prior to the forecast year in question: for example, to make forecasts during 2005 we used a model trained on data from 1994-2004.
-- Was there a clear split between training, testing and validation datasets?
REPLY: Yes, there was a clear temporal split between training and test datasets. We did not include a separate validation dataset because we found there was limited benefit to be gained from tuning the hyperparameters of the QRF model. The forward-chain cross-validation scheme is described in L213-218 of the manuscript: We use a forward-chain cross-validation approach whereby the models are trained on reforecasts from the previous n years and tested on the current year. For example, to predict all months in 2004, the first training period was taken as January 1994 to December 2003. For 2005, we then extended the training period by one year to December 2004, and continued adding one year until 2016, the final year in the test period, at which point the training period for the QRF models was January 1994 to December 2015.
-- In line 216 it’s mentioned that the training period gets extended with every year: was the model retrained completely from scratch with all the data or just updated? Would that not lead to overfitting the model?
REPLY: The model was retrained from scratch every year using data up to the previous year. For example, to make predictions in 2010, the model was trained on data 1992-2009. The model is never tested on data that it has also been trained on. Our aim was to reproduce an operational setup as far as possible, where the most amount of data would be used for training. Moreover, the QRF architecture is robust to overfitting because it combines predictions from multiple decision trees trained on random subsets of data and features, reducing the likelihood that any single tree dominates and captures noise in the training data.
-- Was it tested separately what the influence of this training method is compared to not extending the training periods and just providing the current available data?
REPLY: We did not perform this test, for the reasons outlined above.
-- How did the training, testing and validation differ between the single and multi-site (ID or 15attributes) approaches (if they did)? How were the attributes incorporated? Are only the medians used as listed in Table S3? Was there a random selection of the catchments for training or where all considered?
REPLY: We did not include the static catchment attributes in the single-site models. In the single-site case the attributes would have no explanatory power because they would not vary across the training data. We therefore left out these attributes from the single-site model.
Single vs multi-site model results :
- line 259: would be nice it could be highlighted in the description of Figure 1b) (and 1a)) that this shows the average performance of the models. Furthermore, as I reader I was more interested the Figure S2 and understanding where and when the multi-model with attributes tends to outperform the single-site model than the Gini importance. Maybe consider switching out Figure S2 and 1c) from/to appendix? While I think the Gini importance is interesting and the discussion in lines 275-284 should remain, I believe having a more detailed look at Figure S2 might also help with the interpretation of Figure 2 and others where the discussion also goes over different months. Furthermore, the difference between single and multisite model performance is apparently one of the key findings of the work.
REPLY: Thank you for this suggestion. We will clarify how these aggregate scores are computed in the revised manuscript. We would like to keep Figure 1c in the main manuscript, but will promote Figure S2 to the main manuscript.
Minor comments:
- line 23: could refer directly to the model used (quantile regression forest)
REPLY: Good point – we will do this in the revised manuscript.
- line 33: abstract could include additional line on outcome, relevance or future plans of this developed framework to highlight the relevance of their findings
REPLY: This is a good suggestion.
- line 53: one period to many before ‘While the skill…’
REPLY: Thanks for pointing this out.
- lines 105-108: a few lines explaining the necessity and urgency for implanting and testing such approach (in the UK but also generally) could be added to strengthen the objectives of the manuscript
REPLY: Thanks for this suggestion – we will add a sentence or two on this in the revised manuscript.
- line 110: short explanation of C3S multimodel would be nice to be able to follow (similar to the EFAS one in previous lines). Furthermore, consider introducing the abbreviation already here and not only in the Data section on line 133
REPLY: Good suggestion.
- section 2.1 and 2.2: could consider adding subsections for the different paragraphs to make it more obvious for the reader to find which information where
REPLY: Thanks – this is a good suggestion. We will add some subsections in the revised manuscript.
- line 128: could consider adding a map with the locations of the 579 stations throughout the country to give the reader a better understanding whether these locations are equally distributed or whether there are for example multiple stations for the catchments (also in regards to training the model on static variables and whether some locations might be overrepresented compared to others).
REPLY: Thanks, we agree this would be helpful but to avoid having too many figures we will include a figure in the supplementary material.
- line 148: do the authors know how much of a bias there is in the precipitation and temperature hindcasts compared to the observations of the catchment? Just curious.
REPLY: We didn’t compute the bias.
As it was mentioned previously that the aim is to include uncorrected monthly dynamical climate forecasts (line 118), I was wondering what the specific reasoning of this is as well?REPLY: We used monthly uncorrected climate forecasts to test the ability of the machine learning model to implicitly perform bias correction by relating input (uncorrected monthly dynamical climate forecasts) to output (monthly daily maximum streamflow).
- line 228: general question out of curiosity: why evaluate only on monthly scale? The model seemed to be trained on daily (?) timesteps and floods are relatively quick phenomenon. Would it not also be interesting to see if the model can forecast the peaks or the timing of floods at a shorter timescale?
REPLY: We did this because of the limited skill of the seasonal forecasts, which is clarified to some degree at coarser temporal resolutions. In fact, while the dependent variable is the maximum monthly daily streamflow, the climate predictors are all drawn from monthly predictions. We agree that further experiments at finer temporal scales would be beneficial – in this case probably using a model architecture such as LSTM.
-line 250: same as before: consider adding subchapters to make it easier to for the reader to follow and potentially add a line on how the results are structured
REPLY: Thanks – we will look at doing this in the revised manuscript.
- line 251: for this results and statement it would interesting again to know where the stations are located and in terms of their ID and static catchment attributes
REPLY: Thanks – see our previous response on including a figure with catchment locations.
- line 287: consider clarifying also in the text that following focus of the results lies on the multisite model with catchment attributes (it’s in the description of Figure 2 but would be nice to have in the main text as well)
REPLY: Thanks – we will do this.
- Figure 2: is it possible to make the same figure for comparison also for the single site model? Would be also interesting to see the single site and multisite also to the EFAS comparison
REPLY: Certainly – we will do this and include it in the supplementary materials.
- line 288: ‘with lower skill during spring and autumn.’ Would it be possible to give the reader an estimate of which seasons or months are generally high flood seasons for the UK or different catchments? In other words, is the model able to simulate floods in those months or not or does it only show good performance in months where there are less floods?
REPLY: Thanks for noticing this omission. In the UK the main flood season is winter (DJF). We will add a sentence to this effect in the revised manuscript.
- line 309: appreciate that explanation and clarification as a reader
REPLY: Thanks for mentioning this.
- Table 2: out of curiosity: any idea why the percentage of stations in lead time 2 in March seems to increase and is even higher than in lead 0?
REPLY: Not really, but it’s possibly related to the skill that is drawn from model initialization in these months.
- Figure 3: would it be possible to change the background/areas of the map that is not considered in the analysis to a different color (e.g. white or transparent) to make the distinction between catchments with low CRPSS a bit clearer?
REPLY: Thank you for this suggestion – we will modify the figure in the revised manuscript.
- Figure 4: What is the Qmax range of the observations for the different catchments shown? Does the difference in catchments (e.g. size or average Q) might have an impact? And was this considered in the training selection? Or can it be related to the Gini importance of the different variables?
REPLY: We normalise discharge by catchment area, so although catchment area may be indirectly related (e.g. larger catchments will have a slower response time), the effect on streamflow magnitude is taken into account.
- line 342: are these 90 stations roughly in the same area? Is there a common ground that explains their positive skill for all four lead times in a few selective months?
REPLY: Unfortunately we did not notice a pattern in the location and/or characteristics of these basins.
-line 338 Discussion: consider coming back to the initial research question from the introduction more clearly
REPLY: Thanks for this suggestion. We will do this in the revised manuscript.
- line 353: consider adding a line with the comparison to the reference (climatology and EFAS) to highlight how much or little difference the model framework brings to the current forecasting systems used or available in the UK.
REPLY: Thanks – we will do this in the revised manuscript.
- is there a specific reason for not having a conclusion to round up the manuscript, highlighting the main points findings? I believe adding one and coming back and answering the initial research questions from the introduction would strengthen the manuscript
REPLY: The main reason was for brevity. However, for additional clarity and to highlight our contribution we will add a conclusion to the revised manuscript.
Citation: https://doi.org/10.5194/egusphere-2024-2324-AC1
-
AC1: 'Reply on RC1', Simon Moulds, 13 Dec 2024
-
RC2: 'Comment on egusphere-2024-2324', Anonymous Referee #2, 03 Nov 2024
Summary
This is a consistently interesting and very well presented study where the authors have used a machine learning method (Quantile Regression Forests) to forecast maximum daily streamflow occurring in a month to long lead times. The authors use a very large number of catchments (579) and stringent cross-validation to show that they can produce skillful forecasts in a majority of catchments in the first month. Skill declines in subsequent months, but a substantial minority of catchments is still skillful at long lead times. Importantly, the authors show they are able to outperform a credible dynamical forecasting system for these predictions. The analyses are appropriately rigorous, and strongly support the authors' conclusions. The discussion clearly outlines the significance of the work and limitations of their study. The paper is concise and enjoyable to read.
Accordingly, I recommend the study be published essentially as is, with minor technical corrections based on the comments below.
Minor comments/typos
L27-L30 "We show that multi-site ML models trained on pooled catchment data together with static catchment attributes are significantly more skilful compared to single-site ML models trained on data from each catchment individually." I think the authors should use the same phrasing for this result as they have used in body of the paper - i.e. 'narrowly but significantly more skillful'. Figure S2 shows that the advantage of multi-site forecasts over single site forecasts, while there, is generally slight. It is a sad truth that many who cite this paper will deprive themselves of the excellent contents and look only at the abstract, and the current phrasing is a little at odds with the body of the paper.
L92 "Hybrid methods are unconstrained by the need to conserve the water balance and implicitly handle biases in the climate data" No change here, but this is also true of conceptual models.
158-159 "estimate initial hydrologic condition predictability" I would put this as: "are proxies for hydrologic initial conditions"
L176-177 "to predict the monthly maximum of mean daily streamflow (Qmax) using" What does 'mean' imply here? Is it not simply the maximum daily streamflow?
L213-218 An admirably stringent cross-validation scheme!
L223-225 "We evaluated our forecasts against an observation-based ensemble climatological forecast consisting of the observed monthly streamflow values from the previous 20 years (e.g. Hauswirth et al., 2023)." I assume the climatology was an ensemble of observed Qmax (i.e. the same variable as is being forecast)? Please confirm.
L232 "reliability index (RI)" please provide more details on this index or a reference.
L302-303 "We bias corrected the EFAS outputs using a quantile mapping approach" Did the QM use parametric or empirical distributions to describe the CDFs? Because Qmax will tend to fall in the tails, fitting an appropriate parametric distribution as part of the QM could matter.
L305 "Figure S1" - Personally I think this is easily interesting enough to include in the main body of the text, and adds to the growing body of forecasting systems where ML methods outperform dynamical systems. I think this figure in particular is likely to be of considerable signficance because it is showing a prediction for events in the tails of distributions; I've often heard the view expressed that ML models are less able to make such predictions than dynamical models. I urge the authors to consider including this figure in the main body of the text.
Citation: https://doi.org/10.5194/egusphere-2024-2324-RC2 -
AC2: 'Reply on RC2', Simon Moulds, 13 Dec 2024
Summary
This is a consistently interesting and very well presented study where the authors have used a machine learning method (Quantile Regression Forests) to forecast maximum daily streamflow occurring in a month to long lead times. The authors use a very large number of catchments (579) and stringent cross-validation to show that they can produce skillful forecasts in a majority of catchments in the first month. Skill declines in subsequent months, but a substantial minority of catchments is still skillful at long lead times. Importantly, the authors show they are able to outperform a credible dynamical forecasting system for these predictions. The analyses are appropriately rigorous, and strongly support the authors' conclusions. The discussion clearly outlines the significance of the work and limitations of their study. The paper is concise and enjoyable to read.
Accordingly, I recommend the study be published essentially as is, with minor technical corrections based on the comments below.
REPLY: Thank you for your positive evaluation of our manuscript, and for the helpful technical suggestions you have made.
Minor comments/typos
L27-L30 "We show that multi-site ML models trained on pooled catchment data together with static catchment attributes are significantly more skilful compared to single-site ML models trained on data from each catchment individually." I think the authors should use the same phrasing for this result as they have used in body of the paper - i.e. 'narrowly but significantly more skillful'. Figure S2 shows that the advantage of multi-site forecasts over single site forecasts, while there, is generally slight. It is a sad truth that many who cite this paper will deprive themselves of the excellent contents and look only at the abstract, and the current phrasing is a little at odds with the body of the paper.
REPLY: Thank you for pointing this out. We will amend the abstract accordingly.
L92 "Hybrid methods are unconstrained by the need to conserve the water balance and implicitly handle biases in the climate data" No change here, but this is also true of conceptual models.
REPLY: Thank you for highlighting this.
158-159 "estimate initial hydrologic condition predictability" I would put this as: "are proxies for hydrologic initial conditions"
REPLY: Thanks – your suggested wording is much clearer. We will add this to the revised manuscript.
L176-177 "to predict the monthly maximum of mean daily streamflow (Qmax) using" What does 'mean' imply here? Is it not simply the maximum daily streamflow?
REPLY: Yes, it is the maximum daily streamflow. The mean was meant to imply the “mean streamflow over the course of a day”, but we agree that it the description is slightly convoluted, and will change to “maximum daily streamflow” in the revised manuscript.
L213-218 An admirably stringent cross-validation scheme!
REPLY: Thank you for mentioning this.
L223-225 "We evaluated our forecasts against an observation-based ensemble climatological forecast consisting of the observed monthly streamflow values from the previous 20 years (e.g. Hauswirth et al., 2023)." I assume the climatology was an ensemble of observed Qmax (i.e. the same variable as is being forecast)? Please confirm.
REPLY: Yes, you are correct. We will clarify this in the revised manuscript.
L232 "reliability index (RI)" please provide more details on this index or a reference.
REPLY: We will provide additional details on the reliability index in the revised manuscript.
L302-303 "We bias corrected the EFAS outputs using a quantile mapping approach" Did the QM use parametric or empirical distributions to describe the CDFs? Because Qmax will tend to fall in the tails, fitting an appropriate parametric distribution as part of the QM could matter.
REPLY: Thanks for highlighting this. We agree that the choice of distribution would be important in this instance. However, we used an empirical CDF to perform quantile mapping. It would be interesting to evaluate the impact of different bias correction strategies on this result.
L305 "Figure S1" - Personally I think this is easily interesting enough to include in the main body of the text, and adds to the growing body of forecasting systems where ML methods outperform dynamical systems. I think this figure in particular is likely to be of considerable signficance because it is showing a prediction for events in the tails of distributions; I've often heard the view expressed that ML models are less able to make such predictions than dynamical models. I urge the authors to consider including this figure in the main body of the text.
REPLY: Thank you for this suggestion – we were hesitant to include this figure in the main body of the manuscript because EFAS is not necessarily optimised to predict monthly maximum daily streamflow. However, at your encouragement we will add it to the main body in the revised manuscript.
Citation: https://doi.org/10.5194/egusphere-2024-2324-AC2
-
AC2: 'Reply on RC2', Simon Moulds, 13 Dec 2024
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
175 | 0 | 0 | 175 | 0 | 0 |
- HTML: 175
- PDF: 0
- XML: 0
- Total: 175
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1