Toward Routing River Water in Land Surface Models with Recurrent Neural Networks

Lima, Mauricio; Deck, Katherine; Dunbar, Oliver R. A.; Schneider, Tapio

doi:https://doi.org/10.48550/arXiv.2404.14212

Preprints

https://doi.org/10.48550/arXiv.2404.14212

Preprints

03 Jun 2024

| 03 Jun 2024

Status: this preprint is open for discussion.

Toward Routing River Water in Land Surface Models with Recurrent Neural Networks

Mauricio Lima, Katherine Deck, Oliver R. A. Dunbar, and Tapio Schneider

Abstract. Machine learning is playing an increasing role in hydrology, supplementing or replacing physics-based models. One notable example is the use of recurrent neural networks (RNNs) for forecasting streamflow given observed precipitation and geographic characteristics. Training of such a model over the continental United States has demonstrated that a single set of model parameters can be used across independent catchments, and that RNNs can outperform physics-based models. In this work, we take a next step and study the performance of RNNs for river routing in land surface models (LSMs). Instead of observed precipitation, the LSM-RNN uses instantaneous runoff calculated from physics-based models as an input. We train the model with data from river basins spanning the globe and test it in streamflow hindcasts. The model demonstrates skill at generalization across basins (predicting streamflow in unseen catchments) and across time (predicting streamflow during years not used in training). We compare the predictions from the LSM-RNN to an existing physics-based model calibrated with a similar dataset and find that the LSM-RNN outperforms the physics-based model. Our results give further evidence that RNNs are effective for global streamflow prediction from runoff inputs and motivate the development of complete routing models that can capture nested sub-basis connections.

Received: 24 Apr 2024 – Discussion started: 03 Jun 2024

Mauricio Lima, Katherine Deck, Oliver R. A. Dunbar, and Tapio Schneider

Status: open (until 29 Jul 2024)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2024-1206', Anonymous Referee #1, 22 Jul 2024 reply

Dear Authors,

In this paper, LSTM networks are trained at two levels (the US and more globally) to simulate streamflow. The paper investigates: (a) the potential benefits of using surface and subsurface discharge information instead of precipitation (the most common input variable in rainfall-runoff modeling in almost all previous studies) for representing a water-routing model, and (b) its generalization performance at both temporal and spatial scales. I find the study's concept of replacing precipitation in LSTM inputs with runoff-related variables to be interesting and the paper is overall well written. However, I believe that some major revisions are necessary before it can be considered for publication.

Introduction – The various concepts of the context are explained well and are interesting. However, the review of existing literature related to these concepts is almost completely missing. A good introduction should include key concepts of the problem at hand, i.e. water routing (which is addressed in the current version*). It should then review what has been done so far (both classical and AI-based methods) related to these concepts, thereby revealing the current gap(s) to which your paper would contribute (this is poorly addressed in the current version).

* Although this section could be more engaging by concisely explaining water routing ideas in physics-based models.

Methods – Important LSTM training details are missing. For example, the loss function, with its full definition, is a crucial element of the optimization algorithm and should be presented in the Methods section, not in the Metrics section (and it is not sufficient to refer the reader to another work for its definition). The optimization algorithm and the LSTM architecture are also completely missing here and throughout the paper.

Benchmark – In its current form, the comparison with LISFLOOD is not fully justified in my opinion, as the existing LISFLOOD simulations were conducted under a different setup that seems to be unknown to the authors. Such simulations involve several subtleties that need to be carefully managed; otherwise, any conclusions drawn would be biased. Why don’t the authors conduct these simulations themselves under controlled conditions corresponding to their LSTM experiments?

Results (and Discussion) – The analysis of the results does not appear to be sufficiently in-depth, particularly in relation to the few previous regionalization studies using LSTMs over the US continent. There should be a thorough discussion comparing the findings of this study with those from previous research to highlight the contributions and significance of your work, or, to explain any potential divergence from their results (this is missing in the current version).

The Use of the Term "Forecast" – Based on the content, this paper is not about forecasting but rather about prediction (simulation). This error should be corrected (this mistake does not appear in the Conclusion, where it correctly states: “We have successfully trained and validated an LSTM for the task of predicting streamflow from runoff worldwide”).

MINOR COMMENTS

- PDF Version – The PDF version of the paper did not include line numbers, which made it very impractical for review.

- P.2, Introduction – The phrase “common ungauged basins” in the sentence “This indicates that information in large-scale hydrological datasets is sufficient for generalization tasks, especially to the common ungauged basins (Nearing et al., 2021)” needs clarification. What does “common ungauged basins” mean in this context?

- Table 1 – Please specify the range for each level presented in the table.

- Page 9, Line 6 – Remove "mean" in “mean squared error,” as the term does not include any averaging.

- Appendix B – What are the tested values for each of the three hyperparameters? This information is important and concise enough to be included in the main text.

- Figure 5 – Place the legend above the subplots as it applies to both of them.

- Figure 6 – (Maybe) Place the labels (a), (b), (c), (d) inside the respective subplots to save space.

- P. 16 – It would be interesting if you could present the top 4-5 attributes that, according to your results, show a relationship with model performance. For instance, in which regions are variables like “karst percent cover” and “groundwater table depth” explanatory to some degree in terms of model performance?

- Figure 8 (caption) – Provide the definition of the aridity index both in the caption and in the text. For instance, you mention "Drier regions" (i.e., regions with lower aridity index),” but it is natural to expect that the higher the index of a region, the more the climate lacks effective moisture. Additionally, the following sentence in the caption should be stated more carefully: “There is a tendency for worse scores for smaller aridity indexes (i.e., drier basins),” since at the same range of aridity indices, there are basins with good NSE values. Also, state what each point represents in the figure.

- P.18 – “However, it is not clear if this increase in performance is due to a change in the LSTM model.” How is Nearing et al.’s LSTM model different from yours? This is an example of studies that should be included in the literature review. The provided context and highlighted differences can then be used as an element of result analysis in your discussion section.

- P.18 – The equivalency between gauged and time-split configuration, as well as between ungauged and basin-split, should be mentioned at their first introduction. Additionally, consider using the terms "gauged" and "ungauged" instead of "time-split" and "basin-split," as these are far more intuitive.

- P.18 – The conclusion “suggesting that drier regions pose unique challenges for the LSTM model” is incorrect. As mentioned above, many of your basins with good NSEs fall within the same aridity interval. Please revise this conclusion to reflect the actual results.

- P.19 – Remove the parentheses around “Hoedt et al., 20211.”

- I find the mass balance analysis interesting. You may consider placing it inside the main body of the paper.

Reply

Citation: https://doi.org/10.5194/egusphere-2024-1206-RC1

Mauricio Lima, Katherine Deck, Oliver R. A. Dunbar, and Tapio Schneider

Viewed

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 159 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
159	0	0	159	0	0

HTML: 159
PDF: 0
XML: 0
Total: 159
BibTeX: 0
EndNote: 0

Views and downloads (calculated since 03 Jun 2024)

Month	HTML	PDF	XML	Total
Jun 2024	126	0	126
Jul 2024	33	0	33

Cumulative views and downloads (calculated since 03 Jun 2024)

Month	HTML	PDF	XML	Total
Jun 2024	126	0	126
Jul 2024	33	0	33

Viewed (geographical distribution)

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 150 (including HTML, PDF, and XML) Thereof 150 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Jul 2024

Short summary

Machine learning is playing an increasingly important role in hydrological modeling. In this paper, we introduce an adaptation of existing machine learning models forecasting streamflow in river basins, redesigning them with the goal of integrating them into climate models. We demonstrate the effectiveness of our adapted model by showing that it outperforms a physics-based river model. These results motivate further studies of the use of machine learning based river models inside climate models.


Total:	0
HTML:	0
PDF:	0
XML:	0