the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
End-to-End Graph Neural Networks for Real-Time Hydraulic Prediction in Stormwater Systems
Abstract. Urban stormwater systems (SWS) play a critical role in protecting communities from pluvial flooding, ensuring public safety, and supporting resilient infrastructure planning. As climate variability intensifies and urbanization accelerates, there is a growing need for timely and accurate hydraulic predictions to support real-time control and flood mitigation strategies. While physics-based models such as SWMM provide detailed simulations of rainfall-runoff and flow routing processes, their computational demands often limit their feasibility for real-time applications. Surrogate models based on machine learning offer faster alternatives, but most rely on fully connected or grid-based architectures that struggle to capture the irregular spatial structure of drainage networks, often requiring precomputed runoff inputs and focusing only on node-level predictions. To address these limitations, we present GNN-SWS, a novel end-to-end graph neural network (GNN) surrogate model that emulates rainfall-driven hydraulic behavior across stormwater systems. The model predicts hydraulic states at both junctions and conduits directly from rainfall inputs, capturing the coupled dynamics of runoff generation and flow routing. It incorporates a spatiotemporal encoder–processor–decoder architecture with tailored message passing, autoregressive forecasting, and physics-guided constraints to improve predictive accuracy and physical consistency. Additionally, a training strategy based on the pushforward trick enhances model stability over extended prediction horizons. Applied to a real-world urban watershed, GNN-SWS demonstrates strong potential as a fast, scalable, and data-efficient alternative to traditional solvers. This framework supports key applications in urban flood risk assessment, real-time stormwater control, and the optimization of resilient infrastructure systems.
- Preprint
(21179 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 28 Oct 2025)
-
CC1: 'Comment on egusphere-2025-3655', Riccardo Taormina, 02 Sep 2025
reply
-
AC1: 'Reply on CC1', Zanko Zandsalimi, 08 Sep 2025
reply
Thank you for your feedback and comments. We have addressed each point below.
On Points 1 & 2.
There are several advantages to having an end-to-end surrogate. First, it eliminates dependence on mechanistic sub-models. In contrast, a hybrid design requires the SWMM codebase to remain in the loop, which complicates deployment across different computational environments (limiting portability) and restricts the ability to adapt the model for advanced applications (limiting extensibility). Specifically, a hybrid model treats the SWMM component as a non-differentiable 'black box,' preventing seamless integration with gradient-based control and optimization frameworks. Removing this dependency therefore paves the way for greater generalizability. A natural next step for this research is to extend applicability beyond settings where SWMM is readily available and well-calibrated.
On Point 3
Our approach to physics-guided learning is distinct due to our model's end-to-end nature. Prior works (Garzón, A., et al 2024, Zhang, Z., et al 2024) rely on pre-computed runoff as a key input, which makes constraints explicitly dependent on this external runoff to calculate water balance. Since our model learns the entire rainfall-to-hydraulic response without this input, such constraints are not applicable. Therefore, while we also penalize physically implausible states like negative depths, which is a constraint similar in principle to the post-processing step in Garzón et al. (2024), we introduce an additional, more sophisticated differential consistency loss. This novel constraint enforces the physical principle of hydraulic gradients by ensuring the predicted differences in depth and inflow between connected nodes are physically consistent, a core tenet of flow dynamics in any pipeline system.
On Points 4 & 5
We would like to clarify that the "pushforward trick" operates on a different principle than a conventional curriculum learning strategy. A curriculum approach typically tries to solve the long-term stability problem by gradually increasing the simulation length during training, a method referred to as unrolled training. This teaches the model to perfect an entire trajectory over an expanding time horizon by minimizing the final accumulated error.
Our pushforward method, however, is an adversarial technique that directly targets the distribution shift problem, i.e., the tendency for a model's own small errors to compound and push the system into unfamiliar states (Brandstetter et al., 2022). We achieve this by using the model’s own single-step prediction as a realistically flawed input for the subsequent step. Crucially, we cut the backpropagation gradient from this initial prediction, which forces the model to learn robustness and error recovery rather than simple error avoidance. This distinction is vital; our goal is not trajectory perfection, but rather ensuring the model is stable and can dampen perturbations when it inevitably drifts from the ground truth, a property known as zero-stability. This makes our method a more targeted and computationally efficient solution to the specific challenge of long-term autoregressive stability.
We appreciate you taking the time to leave this comment. We will make sure to clarify these points in the revised version of the paper.
REFERENCES
Garzón, Alexander, et al. "Transferable and data efficient metamodeling of storm water system nodal depths using auto-regressive graph neural networks." Water Research 266 (2024): 122396.
Zhang, Z., Tian, W., Lu, C., Liao, Z., and Yuan, Z.: Graph neural network-based surrogate modelling for real-time hydraulic prediction of urban drainage networks, Water Research, 263, 122142, 2024.
Brandstetter, J., Worrall, D., and Welling, M.: Message passing neural PDE solvers, arXiv preprint arXiv:2202.03376, 2022.
Citation: https://doi.org/10.5194/egusphere-2025-3655-AC1
-
AC1: 'Reply on CC1', Zanko Zandsalimi, 08 Sep 2025
reply
-
RC1: 'Comment on egusphere-2025-3655', Anonymous Referee #1, 26 Sep 2025
reply
The paper presents a graph neural network model to surrogate the node and edge hydraulic variables in a sewer system.
On a small case study in Virginia, USA, the model shows almost perfect performance across most predicted metrics.
Despite the paper being in a decent state, I have several major concerns regarding the novelty of the paper.
The authors claim that the paper has 5 main novelties, but they are either not novel or the results are not validated enough, as I justify hereafter.
1) The model converts rainfall to hydraulic variables at junctions and conduits.
There already exists a paper that models both node and edge variables (Garzon et al., 2024b), despite your paper predicts different variables.
The main difference consists in directly taking rainfall as an input rather than the runoff generated by a hydrological model.
However, this introduces other issues that are not discussed in the paper, such as including all of the hydrological characteristics of each catchment in the proper training of the model.
2) Use of a heterogeneous GNN: despite using two different types of nodes, i) there is no clear indication of what changes from the GNN perspective other than having different input values, ii) there is no comparison with a baseline to justify the need of heterogeneous nodes, and iii) there is no analysis showing that this representation "enables structured hydrologic representation".
3) Physics-guided constraints: penalizing negative values doesn't enforce/constrain them to be positive. There is also no ablation on whether this component actually improves the model's performance.
4) Autoregressive forecasting structure: this same approach has been presented in previous papers (e.g., Bentivoglio et al. 2023, Garzon et al. 2024).
5) Pushforward trick: the version presented here is equivalent in terms of equations to the autoregressive appraoches mentioned above. The pushforward trick, as described in Brandstetter et al. 2022, is implemented in a different way from the one here in this paper. Even if it was, there is again no analysis on whether this component benefits the training procedure.
One of the novelties of the paper but is not mentioned here is the estimation of flood volumes directly from the predicted hydraulic variables, but it does not justify the paper being published on its own.
I think that the paper would have to go through a large series of modifications for it to be novel enough to justify its publication.
Because of these concerns, I have to recommend rejection.General comments:
Introduction:
You mention as main knowledge gap the use of runoff rather than rainfall as a node input.
Despite it being true that you could gain a bit more speed-up from emulating that part as well, you also end up in a new challenge where your model has now to generalize also over different hydrological parameters that can be simply disregarded as an input for the GNN model otherwise.
I didn't see in the experiments any results on how your model would behave for changes in the hydrological characteristics of the node catchments.
This limitation seems to be missing as well from the model limitations later on.
Moreover, there is a paper from Garzon et al. (2024b) that already includes edge-level features.
You should at least compare how you models differ and clarify that there are already examples tackling this gap in the introduction.Section 2:
I don't see the point in having a background section as it currently is.
Consider removing it and integrating it directly in the methodology, since there are already some overlaps in the GNN part and the part on SWMM does not seem to be relevant for the rest of the paper.Section 3:
Figure 1:
The whole figure needs to be re-designed as it is quite confusing in the current state.
For instance, x_1, x_4, etc. are not defined anywhere in the figure;
What should be a vector \textbf{x}_1 is written as a scalar;
These vectors (e.g., x_1) seem to have the same input repeated (x_1^1 present twice, same in the others), though I suppose it has different features;
The message-passing block could be better referenced to the "GN Block" you have above "Aggregation";
You define both \hat{y}_i and \hat{y}_{ij} which seems to indicate the same variable, even if you are referring to two different outputs;
From the figure, it also seems like the decoded node features are given as input to the edge decoder: is it the case? This was not clear in the rest of the paper.
The colouring of the cells inside the figure may seem to help showing that the features are mixing but it also create more confusion, especially after the first mixing.
There is a variable called h_{selected} that appears only in the figure and is unclear.
The caption of the figure should also better explain what is happening in the figure, clarifying the different variables.
I would recommend a simpler design, removing all MLP figures, colored shapes, and case-specific names (x_1 or x_14). If you want to leave the latter, consider adding a reference graph with the corresponding node and edge names.line 181: you start defining static and dynamic features without having them introduced before.
You also include xy coordinates as inputs: can you expand more on the implications on transferability of this approach to other case studies? (similarly for example to what is done in Garzon et al. 2024)line 186: is there a reason why you chose 3 time steps as a input history? How is this related to "support multi-step autoregressive prediction"? You can predict autoregressively even without multiple input time steps in theory.
line 185: how do you calculate the node inflow? is it determined by the predicted edge flows?
line 194: with "globally normalized" do you mean that you create a single scaler for all variables or do you have one for each variable?
Figure 2 and line 198:
It seems that your model predicts in one go the following three time steps, but in the figure's caption it seems like you first predict t+1, then t+2, and so on. Which of the two is it? And if it's the first, why, again, choosing 3 as a number of time steps?
Figure 3 seems to clarify this issue as it shows that you do the second. Please change the rest of the paper clarifying that your model predicts only one step into the future, meaning that it's not limited to only 3 steps ahead.
Since your model can predict any number of future time steps, why do you limit your predictions to the same size as the input time steps?
In other works, the two characteristics are independent (e.g., Bentivoglio et al. 2023, Garzon et al. 2024).line 197: maybe add a reference to Fig. 2, otherwise it was unclear to me how you combine static and dynamic features.
Figure 3:
Part a is could be a bit clearer: you can for example clarify what are the inputs and outputs since so far they might look the same.
You should also better show which inputs are taken from ground-truth simulations and which ones are predicted by your model, as the update in the red box seems to show that you use ground-truth data as input.
This figure seems to also indicate that there are no overlaps between training windows from t-p to t+p. This decreases by a factor p the number of training samples, making the training faster but potentially less effective. Please add some justification for this choice.
Part b: same comment regarding the coloring of ground-truth data as before.Section 3.2:
While it is true that there are no hydraulic paper that consider the pushforward trick, there are other papers that you cite that deal with the same problem using directly a multi-step-ahead loss that generalizes the pushforward trick to multiple time steps ahead (Bentivoglio et al. 2023, Garzon et al. 2024).
Indeed, Eq. 9 is identical to that of Bentivoglio et al. 2023 and Garzon et al. 2024, so please clarify your novelty claims.
Moreover, in the original paper from Brandstetter et al. 2022, the gradients were cut after the first time step, but it seems you are not doing that.Eq 9 and 10: you are missing the underscript _v on both variables y. Also mention at some point that these are mean squared errors.
Eq. 10: if you are always predicting your outputs based on ground-truth data, is this equivalent to a one-step-ahead loss accumulated over multiple time steps?
Figure 4:
As for the previous figures, it would be to have a legend that clarifies what each color represents, mainly to highlight which outputs are predicted and which ones are ground-truth.
Also, the top figure includes 3 previous input time steps while figure a and b only 2.
It might make the figure easier to understand if you compressed all static and dynamic features that are always ground-truth into a single block of a more transparent shade.Section 3.3:
This section and 3.2 should be merged for clarity as they both deal with a loss function.line 239: penalty term: penalising negative values doesn't "ensure that your model respects hydraulic feasibility", it just helps skewing the results to that direction.
Some other works, like Palmitessa et al. (2022), directly use a ReLU activation to guarantee that there are no negative values. Did you also try out this approach? Does the presence of this loss term improve the results?line 247: please use the same notation on flows, inflows, etc. throughout the paper. I think using these symbols (also in the rest of Section 3) makes it clearer to identify which variables you are considering.
Eq 13: If you decide to keep the penalty term, with a valid justification, please define it before mentioning it in the loss function.
line 250: you seem to imply that in validation the loss is given by the base term, which comprises both ground-truth and predicted inputs.
Is this the case or are you only considering the "stability" term?line 260: why do you also measure the Pearson correlation?
Section 4:
line 285: why did you consider this coastal case study if you then have to adapt the real conditions (pipes with sea water) to a simplified version? Doesn't the model work with presence of water in the system?Figure 5:
It would help to have an elevation map as well to visualize the slope of the sewer system.
It seems that all SWMM nodes are flooded according to Waze. How did you calibrate the SWMM model then based on these observations (lines 288-289)?
It also seems like there are some disconnected parts in your system. Is it an error in the map or do you model separate parts?lines 291-293: What does this sensitivity anlaysis mean? Is this the variability of the static catchment attributes accross all SWMM nodes?
line 298: "each event was scaled by factors of 1.2, 1.4, or 1.6." How did you choose which ones to scale with which factor? The final number of events is 300 but you start with 85 events.
Moreover, it would be useful to show the variability in hyetographs between training, valiation, and testing, as, based on the basically perfect results, it looks like there might be some overlap between them.lines 367-372: these are a repetition of the training details that to me mostly add confusion, as it seemed that now your model was predicting only 3 steps ahead into the future (line 368), while in line 373 it seems that you don't use any ground-truth. Please clarify in case it's the latter.
In the results section, do you consider dry-period events for the simulations? If so, can you tell how much they affect the NSE values?
Section 5.3:
Why do you test the flooding performance only on a single test scenario?Figure 13:
It should include a comparison with the ground-truth flooded nodes according to SWMM, which you compare against.lines 468-469: please add a reference for this claim.
Section 5.3.2:
This section on limitations doesn't address one important limitation that you introduce with your approach, i.e., the range of hydrological parameters that you now should be modelling in place of SWMM.
It would be also useful to add some insights on the transferability of this model to other case studies.Section 5:
At some point of the paper, you should point out the computational times needed for the model to train and test, since one of the main drivers of your research is speed.Section 6:
Please remove the whole section as it does not add any relevant information to the paper.
It also resembles a lot the interactive dashboard provided by Garzon et al. (2024) (https://github.com/alextremo0205/SWMM_GNN_Repository_Paper_version).Other comments:
line 95: there is a reference error
line 382: please reference fig 10 and then 11, and not viceversa.
References:
Garzón, A., Kapelan, Z., Langeveld, J. and Taormina, R., 2024. Accelerating Urban Drainage Simulations: A Data-Efficient GNN Metamodel for SWMM Flowrates. Engineering Proceedings, 69(1), p.137.
Garzón, A., Kapelan, Z., Langeveld, J. and Taormina, R., 2024. Transferable and data efficient metamodeling of storm water system nodal depths using auto-regressive graph neural networks. Water Research, 266, p.122396.
Bentivoglio, R., Isufi, E., Jonkman, S.N. and Taormina, R., 2023. Rapid spatio-temporal flood modelling via hydraulics-based graph neural networks. Hydrology and Earth System Sciences, 27(23), pp.4227-4246.
Brandstetter, J., Worrall, D. and Welling, M., 2022. Message passing neural PDE solvers. arXiv preprint arXiv:2202.03376.
Palmitessa, R., Grum, M., Engsig-Karup, A.P. and Löwe, R., 2022. Accelerating hydrodynamic simulations of urban drainage systems with physics-guided machine learning. Water Research, 223, p.118972.Citation: https://doi.org/10.5194/egusphere-2025-3655-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
2,150 | 94 | 13 | 2,257 | 18 | 16 |
- HTML: 2,150
- PDF: 94
- XML: 13
- Total: 2,257
- BibTeX: 18
- EndNote: 16
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Dear authors,
Hopefully I will find more time to read your work, but I would like to raise some comments based on what I read so far.
You state the following as main contributions at the end of the Introduction:
"The primary contributions of this paper are as follows:
1. An end-to-end spatiotemporal GNN surrogate model that jointly learns rainfall-runoff generation and hydraulic flow
routing directly from rainfall inputs.
2. A heterogeneous message-passing architecture that distinguishes node types based on subcatchment connectivity, that
enables structured hydrologic representation and prediction of hydraulic states at both junction and conduit levels.
3. Integration of physics-guided constraints to improve physical consistency and flood detection accuracy.
4. An autoregressive forecasting structure that supports multi-step prediction of hydraulic states over extended time horizons.
5. Application of the pushforward trick to enhance stability and accuracy in multi-step stormwater forecasting."
I think a better framing on these contributions and novelty with respect to existing work is required:
1 & 2. The SWMM hydrological model is extremely fast; the vast majority of the simulation time is lost in the hydrodynamics. That is why most works so far focused on accelerating that. SWMM is open source, so it should be relatively easy to retain the hydrological model and swap the hydrodynamic component with a trained GNN-based surrogate. The hydrological model could also be implemented in PyTorch/Tensorflow in a differentiable fashion, letting gradient-descent figure out the parameters from the data.
3. Physical constraints have already been proposed in (Palmitessa et al., 2022; adopted by Garzon et al. 2024 for GNN-based models). What's the added novelty of the approach here? Is there any comparison on the improvements against this existing approach?
4 & 5. MAIN COMMENT: How are these contributions new? Garzon et al. 2024 already proposed an autoregressive GNN for multi-step ahead predictions (i.e., indefinitely long simulations), working also for flows (2024b) that implements the "push forward" trick, using a curriculum learning strategy.
Very best,
Dr Riccardo Taormina
TU Delft
REFERENCES
Palmitessa, Rocco, et al. "Accelerating hydrodynamic simulations of urban drainage systems with physics-guided machine learning." Water Research 223 (2022): 118972.
Garzón, Alexander, et al. "Transferable and data efficient metamodeling of storm water system nodal depths using auto-regressive graph neural networks." Water Research 266 (2024): 122396.
Garzón, Alexander, et al. "Accelerating Urban Drainage Simulations: A Data-Efficient GNN Metamodel for SWMM Flowrates." Engineering Proceedings 69.1 (2024b): 137.