the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Spatially Resolved Rainfall Streamflow Modeling in Central Europe
Abstract. Climate change increases the risk of disastrous floods and makes intelligent fresh water management an ever more important issue for society. A central prerequisite is the ability to accurately predict the water level in rivers from a range of predictors, mainly meteorological forecasts. The field of rainfall runoff modeling has seen neural network models surge in popularity over the last few years, but a lot of this early research on model design has been conducted on catchments with smaller size and a low degree of human impact to ensure optimal conditions. Here we present a pipeline that extends the previous neural network approaches in order to better suit the requirements of larger catchments or those characterized by human activity. Unlike previous studies, we do not aggregate the inputs per catchment, but train a neural network to predict local runoff spatially resolved on a regular grid. In a second stage, another neural network routes these quantities into and along entire river networks. The whole pipeline is trained end-to-end, exclusively on empirical data. We show that this architecture is able to capture spatial variation and model large catchments accurately, while increasing data efficiency. Furthermore, it offers the possibility to interpret and influence internal states due to its simple design. Our contribution helps to make neural networks more operations-ready in this field and opens up new possibilities to more explicitly account for human activity in the water cycle.
- Preprint
(6163 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3649', Yang Wang, 02 Apr 2025
Overall, this study presents a promising and interpretable framework for large-scale, spatially resolved streamflow modeling. The integration of routing structure and end-to-end training offers valuable insights for bridging physical understanding with data-driven methods. Here are a few suggestions and points for discussion:
1.While the proposed pipeline is compared with its own simplified variants (e.g., aggregated processing and naive routing), the manuscript does not include any direct comparison with established hydrological or deep learning models, such as traditional LSTM-based models or conceptual hydrological models. This limits the ability to assess the relative performance and novelty of the proposed approach. Moreover, the results section lacks standard comparative elements such as performance tables, or time series comparisons that would help illustrate how the model performs relative to well-known baselines. Including such benchmarks is important to validate its practical advantages.
2.The paper mentions that predictions are first made at the grid level and then routed to stations, but it does not show how the grid relates to the study area and the river basins. I suggest adding a figure that shows the input grid overlaid on the basin boundaries. This would help readers understand how many grids are used per basin and how they are spatially arranged. In addition, it’s not clear how the model handles grid cells that cross multiple basins. For example, if a grid overlaps two basins, how are the dynamic and static inputs assigned?
3.The current description of the paper’s contributions is somewhat lengthy and overly complex, which may make it difficult for readers to quickly grasp the key innovations. A more concise and structured presentation would improve clarity.
4.The authors mention that they include sine-cosine embeddings of the day of the week and the day of the year, describing them as a coarse proxy for human activity. However, this design choice is not clearly linked to the later discussion in the results section on human influence. It is unclear how these embeddings contribute to modeling human activity or whether they have any measurable effect on model performance.
5.It appears that the input time window used in the local stage is fixed at nine days. The paper does not clearly explain how this value was selected. And there is no discussion of how the model performs under different forecast horizons. This is a critical aspect for practical forecasting applications.
Citation: https://doi.org/10.5194/egusphere-2024-3649-RC1 -
AC1: 'Reply on RC1', Marc Vischer, 19 May 2025
Dear Dr. Yang Wang,
we thank you very kindly for the constructive comments and time spent in reviewing the manuscript. We have carefully revised the manuscript according to the comments and suggestions. Below we provide a point-by-point response.
- R1. While the proposed pipeline is compared with its own simplified variants (e.g., aggregated processing and naive routing), the manuscript does not include any direct comparison with established hydrological or deep learning models, such as traditional LSTM-based models or conceptual hydrological models. This limits the ability to assess the relative performance and novelty of the proposed approach. Moreover, the results section lacks standard comparative elements such as performance tables, or time series comparisons that would help illustrate how the model performs relative to well-known baselines. Including such benchmarks is important to validate its practical advantages.
A1. We appreciate the review pointing this out. Indeed, our aggregated baseline is the established LSTM model introduced by Kratzert (2019), apart from a few small difference in parameter values. Since its introduction, this model has been used in a number of studies across different regions and has become the benchmark neural network model. We changed the manuscript to make this point more clear to the reader in the Subsection "Baselines", and also made sure to explicitly compare the results of our baseline to the results achieved by the same model in other studies. To this end, we included a new paragraph right at the beginning of "Results and Discussion" before discussing our own model. We attached the paragraphs as baselines_paragraph.png and comparison_paragraph.png because inline images are restricted to 500 by 500 pixels only.
As far as physical or conceptual models are concerned, our dataset does not contain features required for the regional calibration that is usually done with these models. We would very much like to see a proper comparison of our model to a physical/conceptual model, but the dataset would need to be extended first (particularly with static hydrological signatures like in the CAMELS dataset). This was beyond the scope of this paper, but we are currently working on re-creating a spatially resolved version of the CAMELS dataset to provide more straightforward comparability in the future. Again, this was unfortunately beyond the scope of the present paper. Finally, we also discuss that the referenced studies employing the baseline model in turn contain comparisons with physical and conceptual models on their respective datasets. In all these studies, the baseline LSTM model consistently outperformed the other model types.
Your point about performance measures is also very valid: We included a summary version of appendix A2 with the most important experimental conditions and performance metrics into the main text. This way, the reader can get a better initial idea about the model performance without having to jump to the appendix. It is attached as results_summary_table.png.
While creating this excerpt of appendix table A1, we noticed an unfortunate error in our KGE scores. We fixed the corresponding code and updated the values (scores_fixed.png). - R2. The paper mentions that predictions are first made at the grid level and then routed to stations, but it does not show how the grid relates to the study area and the river basins. I suggest adding a figure that shows the input grid overlaid on the basin boundaries. This would help readers understand how many grids are used per basin and how they are spatially arranged. In addition, it’s not clear how the model handles grid cells that cross multiple basins. For example, if a grid overlaps two basins, how are the dynamic and static inputs assigned?A2. Indeed an illustration of how the grid relates to the river network is very informative and we should have included one in the first place. Please have a look at the attached figure, in which we overlay grid, catchment boundaries, stations and station network. For better visibility, we zoomed in on a single basin (upper Danube). We combined this as subfigure (b) with an overview of the study area that the second reviewer requested and already existing input data type overview into a paneled figure providing a more comprehensive overview and visual understanding of the study area and input data pipeline (data_overview.png).
We also added a clarifying sentence on how the grid cells are handled to our description of the input grid: "If a grid cell is located along a catchment boundary, we assign it entirely to the catchment that contains the cell's center point. This avoids having to represent fractional cells in the pipeline and seemed an acceptable trade-off for the sake of simplicity, considering that the area covered by each grid cell is relatively small." - R3. The current description of the paper’s contributions is somewhat lengthy and overly complex, which may make it difficult for readers to quickly grasp the key innovations. A more concise and structured presentation would improve clarity.
A3. We re-wrote the Contributions section and hope it is more readily understandable and focused now. We attached it as contributions_reworked.png. - R4. The authors mention that they include sine-cosine embeddings of the day of the week and the day of the year, describing them as a coarse proxy for human activity. However, this design choice is not clearly linked to the later discussion in the results section on human influence. It is unclear how these embeddings contribute to modeling human activity or whether they have any measurable effect on model performance.
A4. In order to address the question of human activity, some datasets like e.g. GAGES-II include static estimates of human activity within each catchment. These are usually based on map data such as e.g. roads or population density. We did include a map of land use, but thoroughly deriving such estimates was unfortunately beyond the scope of this paper. Similarly for temporal aspects of human activity, to properly evaluate the sine-cosine embedding, we would need some time series data or estimates of human activity for the study period, which are even harder to come by. We want to investigate this important question in a principled manner in a context where we have suitable ground truth data. For the purpose of this paper, we decided to focus on developing a suitable model structure as a first step. - R5. It appears that the input time window used in the local stage is fixed at nine days. The paper does not clearly explain how this value was selected. And there is no discussion of how the model performs under different forecast horizons. This is a critical aspect for practical forecasting applications.
A5. The 9 days parameter is actually the size of the time-convolution kernel in the routing module, meaning that the routing has a "memory" of 9 days. We determined this by a back-of-envelope calculation of the maximum time that water would flow inside the river networks that we investigated and multiplied the result by 2 as a generous safety margin. The input length to our model can in fact be chosen flexibly: The LSTM and routing modules do not map sequences of fixed lengths, but instead allow for continuous mapping, one day at a time. For the study, we used an input length of 400 days (with varying start days of year) to fully capture the yearly hydrological cycle plus some safety margin. For inference, the entire length of validation and test sets (several years) are processed in one sweep.
As you point out correctly, the forecast horizon is extremely relevant for practical applications. Please note, however, that our model relies on the weather forecast as input and integrates it with its own representation of the state of the system in order to generate predictions for the future. The performance of our model thus hinges on the accuracy of the weather forecast. For the purpose of this study, we thus limited ourselves to reanalysis data as a first step in order to obtain clear results as far as our model is concerned. We agree that before using our model or the results of this paper in real world operations, a thorough evaluation of the accuracy relative to the forecast horizon should be performed, comparing various weather models and perhaps and ensemble of such models. In practice, we would assume the quality of the weather forecast and thus the quality of our model to be good for a few days, acceptable for a week and deteriorating over another week before becoming practically unusable.
- R1. While the proposed pipeline is compared with its own simplified variants (e.g., aggregated processing and naive routing), the manuscript does not include any direct comparison with established hydrological or deep learning models, such as traditional LSTM-based models or conceptual hydrological models. This limits the ability to assess the relative performance and novelty of the proposed approach. Moreover, the results section lacks standard comparative elements such as performance tables, or time series comparisons that would help illustrate how the model performs relative to well-known baselines. Including such benchmarks is important to validate its practical advantages.
-
AC1: 'Reply on RC1', Marc Vischer, 19 May 2025
-
RC2: 'Comment on egusphere-2024-3649', Anonymous Referee #2, 25 Apr 2025
The usefulness of NN models in flood prediction is an important field that has recently attracted much attention. NN measures can increase the correctness and timeliness of flood forecasts, which is essential for reducing the damage that floods cause to property and human life. Overall, this research article highlights the importance of NN models, which are very useful for other authors working in this field. The study would surely contribute as a significant collection to this journal library. The article can be accepted after minor revision.
- The manuscript is dense and could benefit from clearer section transitions and subheadings to improve readability.
- The study area map lacks coordinates, which suggests that the authors wrote the paper carelessly. Kindly insert the coordinates.
- Please use DEM 30 m or 90 m resolution to prepare the study area map.
- Show the study area map on a world map. This will be very helpful for international readers.
- The methods section should not include the study area portion. Please create a separate section for the study area. Other significant information regarding the study region should also be included in the study area section using figures and graphs.
- Figures are informative but would benefit from more transparent labels and captions.
- Some sentences are overly long or awkwardly phrased. A thorough proofreading for clarity and conciseness is recommended.
Citation: https://doi.org/10.5194/egusphere-2024-3649-RC2 -
AC2: 'Reply on RC2', Marc Vischer, 19 May 2025
Dear Reviewer,
we thank you very kindly for the constructive comments and time spent in reviewing the manuscript. We have carefully revised the manuscript according to the comments and suggestions. Below we provide a point-by-point response.- R1. The manuscript is dense and could benefit from clearer section transitions and subheadings to improve readability.
A1. Thank you very much for this comment. We have substantially modified the manuscript and believe that these changes make it now clearer and more readable. In order to make all changes readily noticeable, we attached a latex differential file (text_differential.pdf) that highlights the changes right next to the original version. Following the suggestion we separated the Section "Data and Methods" into two separate sections "Data" and "Methods". We also subdivided the Section "Data and Methods" into two separate sections. - R2. The study area map lacks coordinates, which suggests that the authors wrote the paper carelessly. Kindly insert the coordinates.
A2. We inserted the coordinates and combined your other figure-related suggestions into a new, paneled figure (attached as data_overview.png), more on this below. - R3. Please use DEM 30 m or 90 m resolution to prepare the study area map.
A3. Thank you for the suggestion. We used the Copernicus DEM 90 m to create the map of the study area featured in the "data_overview.png" figure. - R4. Show the study area map on a world map. This will be very helpful for international readers.
A4. Thank you for the suggestion. We included the study area on a map of Europe instead of a global world map to be more precise and readable. - R5. The methods section should not include the study area portion. Please create a separate section for the study area. Other significant information regarding the study region should also be included in the study area section using figures and graphs.
A5. The new "Data" section now features a "Study Area" subsection with a paneled figure which contains the updated input type maps (discussed above), a map of the study area in Europe (discussed above) and a visualization of the input grid that the first reviewer requested ("data\_overview.png"). We hope that this provides the reader with a clear yet comprehensive visual explanation of the study area and input data pipeline. - R6. Figures are informative but would benefit from more transparent labels and captions.
A6. We improved the figure labels and captions. Please compare the changes in the above-mentioned differential file (text_differential.pdf). - R7. Some sentences are overly long or awkwardly phrased. A thorough proofreading for clarity and conciseness is recommended.
A7. We proofread the entire manuscript and made a number of changes to improve readability. Please refer to the differential file (text_differential.pdf).
- R1. The manuscript is dense and could benefit from clearer section transitions and subheadings to improve readability.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
183 | 83 | 16 | 282 | 11 | 27 |
- HTML: 183
- PDF: 83
- XML: 16
- Total: 282
- BibTeX: 11
- EndNote: 27
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1