A GNN Routing Module Is All You Need for LSTM Rainfall&ndash;Runoff Models

Mosaffa, Hamidreza; Pappenberger, Florian; Prudhomme, Christel; Chantry, Matthew; Rüdiger, Christoph; Cloke, Hannah

doi:10.5194/egusphere-2025-5008

Preprints

https://doi.org/10.5194/egusphere-2025-5008

Preprints

21 Oct 2025

| 21 Oct 2025

Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

A GNN Routing Module Is All You Need for LSTM Rainfall–Runoff Models

Hamidreza Mosaffa, Florian Pappenberger, Christel Prudhomme, Matthew Chantry, Christoph Rüdiger, and Hannah Cloke

Abstract. Rainfall-Runoff (R-R) modeling is crucial for hydrological forecasting and water resource management, yet traditional deep learning approaches, such as Long Short-Term Memory (LSTM) networks, often overlook explicit runoff routing, leading to inaccuracies in complex river basins. This study introduces a novel LSTM-Graph Neural Network (GNN) framework that integrates LSTM for local runoff generation with GNN for spatial flow routing, leveraging river network topology as a directed graph. Applied to the Upper Danube River Basin using the LamaH-CE dataset (1987–2017), the model partitions the basin into 530 subbasins and evaluates four GNN architectures: Graph Convolutional Network (GCN), Graph Attention Network (GAT), Graph SAmple and aggreGatE (GraphSAGE), and Chebyshev Spectral Graph Convolutional Network (ChebNet). Results demonstrate that all LSTM-GNN architectures outperform the baseline LSTM, with LSTM-GAT achieving the highest performance (mean NSE=0.61, KGE=0.65, Correlation Coefficient=0.84, RMSE reduction of ~35 %). Improvements are most evident in downstream stations with high connectivity and large contributing areas, where adaptive attention in GAT effectively captures heterogeneous upstream influences. These findings underscore the potential of GNN-based approaches for large-scale, spatially aware hydrological modelling and provide a foundation for future applications in flood forecasting and climate adaptation.

Received: 10 Oct 2025 – Discussion started: 21 Oct 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Hamidreza Mosaffa, Florian Pappenberger, Christel Prudhomme, Matthew Chantry, Christoph Rüdiger, and Hannah Cloke

Status: open (until 30 Dec 2025)

Post a comment Subscribe to comment alert

CC1:
'Comment on egusphere-2025-5008', Zilin Li, 29 Oct 2025 reply

Hi, thanks for sharing this work — it's very interesting and I appreciate that the manuscript is public. I had a few questions / comments as a reader:

1.The graph / routing part isn’t fully clear. How exactly is the “travel time” between subbasins computed, and why is that formula appropriate at this basin scale? Also, was this edge weighting compared against something simpler (e.g. uniform weights or distance-based)?
2.Related: for the extreme-flow oversampling, are high-flow cases just duplicated, or are they augmented in some way? And does this cause bias (e.g. systematic overprediction at high flows), or is it actually improving peak prediction?
3.The training objective is not very transparent. It looks like the model is trained with a standard regression loss, but it’s not clear how the authors make the model care about both “normal” daily flow and rare extremes. Is there any special loss term, weighting, or multi-objective setup to balance routine behavior vs flood peaks? From the example plots, peak magnitude and timing are still not consistently captured, so it would be good to clarify what the model is actually being optimized to do.
4.On performance: the average daily NSE is around 0.6. Is that considered good enough for the intended application (flood forecasting, water management, ungauged prediction, etc.)? It would help if the paper discussed the practical usefulness of that skill level, not just the improvement over the baseline.

Reply

Citation: https://doi.org/10.5194/egusphere-2025-5008-CC1
- AC1: 'Reply on CC1', Hamidreza Mosaffa, 09 Nov 2025 reply
  
  Thank you for your thoughtful questions and for reviewing the manuscript. We provide detailed responses below:
  1) In our study, the travel time between subbasins is estimated using the Kirpich formula, which provides a physically-based approximation of the time of concentration as a function of channel length and slope. This approach is widely used in large-sample hydrology and rainfall–runoff routing studies where detailed hydraulic measurements are unavailable. It is also important to note that in the LSTM–GAT configuration, the Kirpich-derived travel-time weights serve as the initial physical prior, while the GAT attention mechanism further adapts the influence strength among upstream subbasins during training. To evaluate the impact of different graph constructions, we conducted a comprehensive ablation study comparing four edge-weighting schemes: (1) binary connectivity, (2) distance-based weights, (3) travel-time weights, and (4) inverse travel-time weights (1/tc). Among these, the inverse travel-time weighting consistently produced the best performance. Therefore, we chose the inverse travel-time weighting as the edge representation in our model.
  2) For extreme-flow oversampling, we employed a straightforward frequency adjustment strategy rather than synthetic augmentation. Specifically, we identified the top 2.5% highest-discharge sequences and duplicated each sequence four times in the training set, increasing their representation from approximately 2.5% to ~10% of the training data. This approach was chosen to (1) address the severe class imbalance inherent in flood prediction, where extreme events are rare but hydrologically critical; (2) maintain the physical consistency of extreme flow episodes without introducing artificial noise; and (3) preserve the observed spatiotemporal structure of flood events. Results on the test period show that this strategy improved the model’s ability to capture peak magnitude and timing, reducing the common tendency of LSTM-based models to underestimate high flows. No noise-based or synthetic alteration is applied—the high-flow sequences are duplicated exactly to maintain physical realism.
  3) Thank you for your question regarding the training objective and the balance between routine flow and extreme events. Our approach combines two complementary strategies: 1. Station-Weighted MSE Loss: The model is trained end-to-end using a mean-squared-error loss with station-specific weighting ((L_total = (1/N) Σ w_i × MSE(ŷ_i, y_i) )). The station weights are computed from discharge variability (variance-based) and normalized to mean 1, with clipping to the range 0.2–5.0 to prevent domination by any single basin. This ensures that stations with different discharge magnitudes contribute comparably to the objective, preventing the loss from being dominated by large rivers. 2. Data-Level Balancing of Extreme Events (No Synthetic Augmentation) : Rather than modifying the loss, we address the rarity of extreme floods at the dataset level. We duplicate the top ~2.5% highest-discharge sequences four times in the training set, increasing their representation to ~10%. This increases the frequency with which large-error peak events influence gradients, allowing the weighted MSE to penalize flood misestimation more strongly. No synthetic noise or altered hydrographs are introduced; duplicated sequences are identical, preserving physical realism.
  4) In our case, the reported mean daily NSE of ~0.60 is the average performance across 530 subbasins that span a very large and hydrologically diverse watershed. This is a fundamentally different evaluation setting from many previous GNN or DL rainfall–runoff studies, which typically report NSE for a small number of selected gauged catchments. In such large-sample settings, an average NSE of ~0.60 is generally considered good skill, particularly when rivers vary widely in size and climatic regime. Our goal in this work is not to present a finalized operational forecasting model, but rather to demonstrate a proof-of-concept: that explicitly modeling runoff routing using a GNN improves performance over a LSTM-based baseline. Relative to the baseline, the proposed model shows consistent and meaningful improvements. This indicates that graph-based routing contributes useful hydrological structure beyond what the LSTM alone can represent.
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2025-5008-AC1

Hamidreza Mosaffa, Florian Pappenberger, Christel Prudhomme, Matthew Chantry, Christoph Rüdiger, and Hannah Cloke

Viewed

Total article views: 477 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
367	94	16	477	12	9

HTML: 367
PDF: 94
XML: 16
Total: 477
BibTeX: 12
EndNote: 9

Views and downloads (calculated since 21 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	231	25	8	264
Nov 2025	127	63	8	198
Dec 2025	9	6	0	15

Cumulative views and downloads (calculated since 21 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	231	25	8	264
Nov 2025	127	63	8	198
Dec 2025	9	6	0	15

Viewed (geographical distribution)

Total article views: 469 (including HTML, PDF, and XML) Thereof 469 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 04 Dec 2025

Short summary

This study improves river flow prediction by combining two types of artificial intelligence models to better represent how rainfall turns into runoff and moves through river systems. Tested on the Upper Danube River Basin, the new model more accurately predicts streamflow, especially in large and connected rivers. These findings can help enhance flood forecasting and water management.


Total:	0
HTML:	0
PDF:	0
XML:	0