the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A GNN Routing Module Is All You Need for LSTM Rainfall–Runoff Models
Abstract. Rainfall-Runoff (R-R) modeling is crucial for hydrological forecasting and water resource management, yet traditional deep learning approaches, such as Long Short-Term Memory (LSTM) networks, often overlook explicit runoff routing, leading to inaccuracies in complex river basins. This study introduces a novel LSTM-Graph Neural Network (GNN) framework that integrates LSTM for local runoff generation with GNN for spatial flow routing, leveraging river network topology as a directed graph. Applied to the Upper Danube River Basin using the LamaH-CE dataset (1987–2017), the model partitions the basin into 530 subbasins and evaluates four GNN architectures: Graph Convolutional Network (GCN), Graph Attention Network (GAT), Graph SAmple and aggreGatE (GraphSAGE), and Chebyshev Spectral Graph Convolutional Network (ChebNet). Results demonstrate that all LSTM-GNN architectures outperform the baseline LSTM, with LSTM-GAT achieving the highest performance (mean NSE=0.61, KGE=0.65, Correlation Coefficient=0.84, RMSE reduction of ~35 %). Improvements are most evident in downstream stations with high connectivity and large contributing areas, where adaptive attention in GAT effectively captures heterogeneous upstream influences. These findings underscore the potential of GNN-based approaches for large-scale, spatially aware hydrological modelling and provide a foundation for future applications in flood forecasting and climate adaptation.
- Preprint
(2373 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 12 Dec 2025)
-
CC1: 'Comment on egusphere-2025-5008', Zilin Li, 29 Oct 2025
reply
-
AC1: 'Reply on CC1', Hamidreza Mosaffa, 09 Nov 2025
reply
Thank you for your thoughtful questions and for reviewing the manuscript. We provide detailed responses below:
1) In our study, the travel time between subbasins is estimated using the Kirpich formula, which provides a physically-based approximation of the time of concentration as a function of channel length and slope. This approach is widely used in large-sample hydrology and rainfall–runoff routing studies where detailed hydraulic measurements are unavailable. It is also important to note that in the LSTM–GAT configuration, the Kirpich-derived travel-time weights serve as the initial physical prior, while the GAT attention mechanism further adapts the influence strength among upstream subbasins during training. To evaluate the impact of different graph constructions, we conducted a comprehensive ablation study comparing four edge-weighting schemes: (1) binary connectivity, (2) distance-based weights, (3) travel-time weights, and (4) inverse travel-time weights (1/tc). Among these, the inverse travel-time weighting consistently produced the best performance. Therefore, we chose the inverse travel-time weighting as the edge representation in our model.
2) For extreme-flow oversampling, we employed a straightforward frequency adjustment strategy rather than synthetic augmentation. Specifically, we identified the top 2.5% highest-discharge sequences and duplicated each sequence four times in the training set, increasing their representation from approximately 2.5% to ~10% of the training data. This approach was chosen to (1) address the severe class imbalance inherent in flood prediction, where extreme events are rare but hydrologically critical; (2) maintain the physical consistency of extreme flow episodes without introducing artificial noise; and (3) preserve the observed spatiotemporal structure of flood events. Results on the test period show that this strategy improved the model’s ability to capture peak magnitude and timing, reducing the common tendency of LSTM-based models to underestimate high flows. No noise-based or synthetic alteration is applied—the high-flow sequences are duplicated exactly to maintain physical realism.
3) Thank you for your question regarding the training objective and the balance between routine flow and extreme events. Our approach combines two complementary strategies: 1. Station-Weighted MSE Loss: The model is trained end-to-end using a mean-squared-error loss with station-specific weighting ((L_total = (1/N) Σ w_i × MSE(ŷ_i, y_i) )). The station weights are computed from discharge variability (variance-based) and normalized to mean 1, with clipping to the range 0.2–5.0 to prevent domination by any single basin. This ensures that stations with different discharge magnitudes contribute comparably to the objective, preventing the loss from being dominated by large rivers. 2. Data-Level Balancing of Extreme Events (No Synthetic Augmentation) : Rather than modifying the loss, we address the rarity of extreme floods at the dataset level. We duplicate the top ~2.5% highest-discharge sequences four times in the training set, increasing their representation to ~10%. This increases the frequency with which large-error peak events influence gradients, allowing the weighted MSE to penalize flood misestimation more strongly. No synthetic noise or altered hydrographs are introduced; duplicated sequences are identical, preserving physical realism.
4) In our case, the reported mean daily NSE of ~0.60 is the average performance across 530 subbasins that span a very large and hydrologically diverse watershed. This is a fundamentally different evaluation setting from many previous GNN or DL rainfall–runoff studies, which typically report NSE for a small number of selected gauged catchments. In such large-sample settings, an average NSE of ~0.60 is generally considered good skill, particularly when rivers vary widely in size and climatic regime. Our goal in this work is not to present a finalized operational forecasting model, but rather to demonstrate a proof-of-concept: that explicitly modeling runoff routing using a GNN improves performance over a LSTM-based baseline. Relative to the baseline, the proposed model shows consistent and meaningful improvements. This indicates that graph-based routing contributes useful hydrological structure beyond what the LSTM alone can represent.
Citation: https://doi.org/10.5194/egusphere-2025-5008-AC1
-
AC1: 'Reply on CC1', Hamidreza Mosaffa, 09 Nov 2025
reply
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 257 | 32 | 9 | 298 | 7 | 5 |
- HTML: 257
- PDF: 32
- XML: 9
- Total: 298
- BibTeX: 7
- EndNote: 5
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Hi, thanks for sharing this work — it's very interesting and I appreciate that the manuscript is public. I had a few questions / comments as a reader:
1.The graph / routing part isn’t fully clear. How exactly is the “travel time” between subbasins computed, and why is that formula appropriate at this basin scale? Also, was this edge weighting compared against something simpler (e.g. uniform weights or distance-based)?
2.Related: for the extreme-flow oversampling, are high-flow cases just duplicated, or are they augmented in some way? And does this cause bias (e.g. systematic overprediction at high flows), or is it actually improving peak prediction?
3.The training objective is not very transparent. It looks like the model is trained with a standard regression loss, but it’s not clear how the authors make the model care about both “normal” daily flow and rare extremes. Is there any special loss term, weighting, or multi-objective setup to balance routine behavior vs flood peaks? From the example plots, peak magnitude and timing are still not consistently captured, so it would be good to clarify what the model is actually being optimized to do.
4.On performance: the average daily NSE is around 0.6. Is that considered good enough for the intended application (flood forecasting, water management, ungauged prediction, etc.)? It would help if the paper discussed the practical usefulness of that skill level, not just the improvement over the baseline.