the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Enhancing hydrological hazard early warning: A 60-day streamflow forecasting framework integrating deep learning and process-based modeling
Abstract. Reliable medium- and long-term streamflow forecasting is a cornerstone of hydrological hazard early warning and water resources management, yet achieving accurate predictions with sufficient lead time remains a formidable challenge. This study proposes a 60-day streamflow forecasting framework to strengthen early warning capabilities by systematically integrating a convolutional neural network (CNN) for bias correction of precipitation forecasts from the UK Met Office (UKMO) numerical weather prediction model, the Geomorphology-Based Eco-Hydrological Model (GBEHM) for streamflow simulation, and an autoregressive with exogenous input (ARX) model for statistical post-processing. Applying the proposed framework to the Upper Yangtze River Basin, results indicate that the CNN model reduces the areal-averaged precipitation root mean square error (RMSE) by around 35 % and elevates the temporal correlation coefficient (TCC) from 0.62 to 0.74 against raw UKMO forecasts across the 60-day horizon, with performance gains amplifying at longer lead times. Subsequently, when driving the GBEHM with corrected precipitation and applying ARX post-processing, the streamflow forecasts exhibit substantial enhancements with a reduction in RMSE of 36 %, a decrease in relative error (RE) from 48.2 % to 17.4 %, and an increase in Nash–Sutcliffe efficiency (NSE) from 0.33 to 0.72 compared to those driven by raw forecasts in terms of 60-day mean performance. Error decomposition identifies precipitation forecast errors which intensify with lead time as the dominant source of uncertainty for medium- and long-term streamflow forecasting, while confirming that hydrological model uncertainty remains a significant component, highlighting that the selection of a robust hydrological model is crucial for enhancing the reliability and predictive skill of the streamflow forecasts. By systematically leveraging the CNN to mitigate drifting meteorological biases, the GBEHM to capture physical catchment dynamics, and the ARX to minimize residual errors, the proposed framework extends the effective early warning horizon to 60 days with high volumetric accuracy and temporal consistency, providing vital decision support for flood and drought risk management and regional water security.
- Preprint
(1759 KB) - Metadata XML
-
Supplement
(550 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2026-393', Ningpeng Dong, 08 Mar 2026
- AC1: 'Reply on RC1', Zhijie Liu, 31 Mar 2026
-
RC2: 'Comment on egusphere-2026-393', Samantha Hartke, 17 Mar 2026
This manuscript demonstrates a method for generating probabilistic 60-day streamflow forecasts in the Upper Yangtze River Basin. Through the use of convolutional neural networks (CNNs), they demonstrate a considerable improvement in forecast performance, as measured by RMSE, Nash-Sutcliffe efficiency, and bias. Since precipitation error is often highest at long lead times, increases in performance were greatest at long lead times where the new methodology could quantify and compensate for forecast precipitation uncertainty. This manuscript represents a novel contribution to the literature about the role of CNN methods in streamflow forecasting, particularly at lead times over 30 days.
Given the elevation and complex topography of the study area, it would be useful to have a visualization of the gage density of the CGDPA observational product in this region. Are the authors concerned about the accuracy of the GCDPA product in gage sparse regions of the basin? Additionally, the mismatch between the GBEHM resolution (8-km) and the meteorological forcing (25-km) may be one source of error in streamflow forecasts.
The precipitation bias correction is clearly impactful; this manuscript would benefit from a greater understanding of what relationships the CNN is capturing better than traditional statistical models.
The following step from Line 230 is unclear and could benefit from further explanation or a figure: “the model generates a deterministic forecast by constructing a large-scale pseudo-ensemble from the predicted CSG distribution at equal quantiles and calculating the ensemble mean.”
Citation: https://doi.org/10.5194/egusphere-2026-393-RC2 - AC2: 'Reply on RC2', Zhijie Liu, 31 Mar 2026
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 215 | 117 | 24 | 356 | 44 | 56 | 54 |
- HTML: 215
- PDF: 117
- XML: 24
- Total: 356
- Supplement: 44
- BibTeX: 56
- EndNote: 54
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Greetings! This study proposes a 60-day streamflow forecasting framework for the Upper Yangtze River Basin that integrates a convolutional neural network for precipitation bias correction, a hydrological model for runoff simulation, and an autoregressive model for error post-processing. The paper is generally well-written, and below are some of my comments that I hope will help improve the paper.
My first concern is about the temporal partitioning of the CNN model training. The paper mentions the CNN is trained on a 20-year data using cross-validation, yet the specific years assigned to each fold should also be elaborated. Since the final streamflow evaluation covers the period from 2009 to 2012, there is a risk of data leakage if any of those years were included in the training set. It is also unclear why the calibration periods for the GBEHM and ARX components differ from one another and from the CNN training period.
Second, the CNN generates probabilistic precipitation forecasts via the CSG distribution, which is a well-motivated design choice. However, this statistical information seems to be discarded before feeding to the hydrological model. Is this an intended choice and why did authors choose that? For a system intended for hazard early warning, I think including uncertainty information through the modelling chain would greatly enhance its reliability.
Third, more evaluation metrics targeting hazards (i.e., the topic of this paper) could be introduced. NSE and MSE are often dominated by baseflow conditions and do not necessarily reflect a model performance during extreme events. I recommend the authors select representative flood events from the 2009-2012 evaluation period and present event-scale forecast performance, such as peak flow errors to manifest the model ability for hazard warning.
Finally, the paper would be much stronger if it provides more technical detail on the GBEHM calibration and its operational feasibility. There is currently little information on which specific parameters were tuned. The study states “In this application, the UYRB is discretized into an 8 km × 8 km grid system and further delineated into 479 sub-basins based on the DEM”, which makes me confused. Is the model grid-based or sub-basin based? More details could be provided. The manuscript would also benefit from a brief discussion of operational feasibility, for example, if the modelling system can be applied in operation, and to achieve that what are the challenges and possible solutions, etc.
Best,
Ningpeng Dong