From RNNs to Transformers: benchmarking deep learning architectures for hydrologic prediction

Liu, Jiangtao; Shen, Chaopeng; O'Donncha, Fearghal; Song, Yalan; Zhi, Wei; Beck, Hylke E.; Bindas, Tadd; Kraabel, Nicholas; Lawson, Kathryn

doi:10.5194/egusphere-2025-1706

Preprints

https://doi.org/10.5194/egusphere-2025-1706

Preprints

25 Apr 2025

| 25 Apr 2025

From RNNs to Transformers: benchmarking deep learning architectures for hydrologic prediction

Jiangtao Liu, Chaopeng Shen, Fearghal O'Donncha, Yalan Song, Wei Zhi, Hylke E. Beck, Tadd Bindas, Nicholas Kraabel, and Kathryn Lawson

Abstract. Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM) have achieved significant success in hydrological modeling. However, the recent successes of foundation models like ChatGPT and Segment Anything Model (SAM) in natural language processing and computer vision have raised curiosity about the potential of Attention mechanism-based models in the hydrologic domain. In this study, we propose a deep learning framework that seamlessly integrates multi-source, multi-scale data and, multi-model modules, providing a flexible automated platform for multi-dataset benchmarking and attention-based model comparisons beyond LSTM-centered tasks. Furthermore, we evaluate pretrained Large Language Models (LLMs) and Time Series Attention-based Models (TSAMs) in terms of their forecasting capabilities in data sparse regions. This general framework can be applied to regression tasks, autoregression tasks, and zero-shot forecasting tasks (i.e., tasks without prior training data). We evaluated 11 different Transformer models under different scenarios in comparison to benchmark models, particularly LSTM, using datasets for runoff, soil moisture, snow water equivalent, and dissolved oxygen on global and regional scales. Results show that LSTM models perform the best in memory-dependent regression tasks, especially on the global streamflow dataset. However, as tasks become complex (from regression and data integration to autoregression and zero-shot prediction), attention-based models gradually surpass LSTM models. This study provides a robust framework for comparing and developing different model structures in the era of large-scale models, providing a valuable reference and benchmark for water resource modeling, forecasting and management.

Received: 10 Apr 2025 – Discussion started: 25 Apr 2025

Competing interests: Kathryn Lawson and Chaopeng Shen have financial interests in HydroSapient, Inc., a company which could potentially benefit from the results of this research. This interest has been reviewed by The Pennsylvania State University in accordance with its individual conflict of interest policy for the purpose of maintaining the objectivity and the integrity of research.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2683 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2683 KB)

Supplement (1308 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

01 Dec 2025

| Highlight paper

From RNNs to Transformers: benchmarking deep learning architectures for hydrologic prediction

Jiangtao Liu, Chaopeng Shen, Fearghal O'Donncha, Yalan Song, Wei Zhi, Hylke E. Beck, Tadd Bindas, Nicholas Kraabel, and Kathryn Lawson

Hydrol. Earth Syst. Sci., 29, 6811–6828, https://doi.org/10.5194/hess-29-6811-2025,https://doi.org/10.5194/hess-29-6811-2025, 2025

Short summary Editorial statement

Jiangtao Liu, Chaopeng Shen, Fearghal O'Donncha, Yalan Song, Wei Zhi, Hylke E. Beck, Tadd Bindas, Nicholas Kraabel, and Kathryn Lawson

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1706', Anonymous Referee #1, 12 Jun 2025

The manuscript is of good quality and of high relevance. However, the methods are not yet described in sufficient detail to finally judge the value of the results. In the supplementary document, I provide more details of the points that I am missing in the method section and other points that should be addressed.

Citation: https://doi.org/10.5194/egusphere-2025-1706-RC1
- AC1: 'Reply on RC1', Jiangtao Liu, 23 Jul 2025
  
  Thank you for your comments. Please see the attached PDF for our replies.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1706-AC1
RC2:
'Comment on egusphere-2025-1706', Anonymous Referee #2, 24 Jun 2025

The authors evaluate different transformer models under different scenarios (i.e., regression, data integration, and autoregression) in comparison with benchmark models, LSTM networks and DLinear, using various hydrologic datasets. In addition, they compare the performance of LSTM networks with pre-trained LLM in the zero-shot forecasting for autoregression tasks. They show that LSTM networks outperform transformers in regression and data integration tasks, and attention-based methods surpass LSTM networks in autoregression and zero-shot forecasting. The paper is well written, with deep discussion. I have minor comments below.
For boarder audiences with minor ML/DL backgrounds, it would be helpful to provide brief introductions of the DL models used in the manuscript. The information can be provided in the SI if there are limited spaces in the main text. Table 1 provides the main features of different variants of transformers. Since ML/DL has many unique terms, simply providing the names of features does not really help understand their differences. Maybe the authors can adapt the table with more general features, such as “Trained on time series”.
In Section 2.3-attention models, it is better to mention that the authors use pre-trained LLM for zero-shot forecasting. This information come out in Section 2.4.5. In addition, for zero-shot forecasting, the authors only compare DL model performance in autoregression tasks, and state that LSTM underperforms pre-trained LLMs in zero-shot forecasting. How about regression and data integration tasks? I would expect that the LLMs cannot really understand the relationship between input and output without fine-tuning.
Specific comments:
Some figures for equations are burr, like Equation 5.
Equation 6: stating that r is Pearson’s correlation coefficient.
Equation 8: I think the equation is wrong. It should be (RMSE^2-Bias^2)^0.5.

Citation: https://doi.org/10.5194/egusphere-2025-1706-RC2
- AC2: 'Reply on RC2', Jiangtao Liu, 23 Jul 2025
  
  Thank you for your comments. Please see the attached PDF for our replies.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1706-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1706', Anonymous Referee #1, 12 Jun 2025

The manuscript is of good quality and of high relevance. However, the methods are not yet described in sufficient detail to finally judge the value of the results. In the supplementary document, I provide more details of the points that I am missing in the method section and other points that should be addressed.

Citation: https://doi.org/10.5194/egusphere-2025-1706-RC1
- AC1: 'Reply on RC1', Jiangtao Liu, 23 Jul 2025
  
  Thank you for your comments. Please see the attached PDF for our replies.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1706-AC1
RC2:
'Comment on egusphere-2025-1706', Anonymous Referee #2, 24 Jun 2025

The authors evaluate different transformer models under different scenarios (i.e., regression, data integration, and autoregression) in comparison with benchmark models, LSTM networks and DLinear, using various hydrologic datasets. In addition, they compare the performance of LSTM networks with pre-trained LLM in the zero-shot forecasting for autoregression tasks. They show that LSTM networks outperform transformers in regression and data integration tasks, and attention-based methods surpass LSTM networks in autoregression and zero-shot forecasting. The paper is well written, with deep discussion. I have minor comments below.
For boarder audiences with minor ML/DL backgrounds, it would be helpful to provide brief introductions of the DL models used in the manuscript. The information can be provided in the SI if there are limited spaces in the main text. Table 1 provides the main features of different variants of transformers. Since ML/DL has many unique terms, simply providing the names of features does not really help understand their differences. Maybe the authors can adapt the table with more general features, such as “Trained on time series”.
In Section 2.3-attention models, it is better to mention that the authors use pre-trained LLM for zero-shot forecasting. This information come out in Section 2.4.5. In addition, for zero-shot forecasting, the authors only compare DL model performance in autoregression tasks, and state that LSTM underperforms pre-trained LLMs in zero-shot forecasting. How about regression and data integration tasks? I would expect that the LLMs cannot really understand the relationship between input and output without fine-tuning.
Specific comments:
Some figures for equations are burr, like Equation 5.
Equation 6: stating that r is Pearson’s correlation coefficient.
Equation 8: I think the equation is wrong. It should be (RMSE^2-Bias^2)^0.5.

Citation: https://doi.org/10.5194/egusphere-2025-1706-RC2
- AC2: 'Reply on RC2', Jiangtao Liu, 23 Jul 2025
  
  Thank you for your comments. Please see the attached PDF for our replies.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1706-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (28 Jul 2025) by Alexander Gruber

AR by Jiangtao Liu on behalf of the Authors (14 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (20 Aug 2025) by Alexander Gruber

RR by Anonymous Referee #1 (16 Sep 2025)

RR by Anonymous Referee #2 (18 Sep 2025)

ED: Publish as is (24 Sep 2025) by Alexander Gruber

AR by Jiangtao Liu on behalf of the Authors (02 Nov 2025) Author's response Manuscript

Journal article(s) based on this preprint

01 Dec 2025

| Highlight paper

From RNNs to Transformers: benchmarking deep learning architectures for hydrologic prediction

Jiangtao Liu, Chaopeng Shen, Fearghal O'Donncha, Yalan Song, Wei Zhi, Hylke E. Beck, Tadd Bindas, Nicholas Kraabel, and Kathryn Lawson

Hydrol. Earth Syst. Sci., 29, 6811–6828, https://doi.org/10.5194/hess-29-6811-2025,https://doi.org/10.5194/hess-29-6811-2025, 2025

Short summary Editorial statement

Jiangtao Liu, Chaopeng Shen, Fearghal O'Donncha, Yalan Song, Wei Zhi, Hylke E. Beck, Tadd Bindas, Nicholas Kraabel, and Kathryn Lawson

Supplement

https://doi.org/10.5194/egusphere-2025-1706-supplement

Jiangtao Liu, Chaopeng Shen, Fearghal O'Donncha, Yalan Song, Wei Zhi, Hylke E. Beck, Tadd Bindas, Nicholas Kraabel, and Kathryn Lawson

Viewed

Total article views: 5,031 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
3,896	1,040	95	5,031	333	96	121

HTML: 3,896
PDF: 1,040
XML: 95
Total: 5,031
Supplement: 333
BibTeX: 96
EndNote: 121

Views and downloads (calculated since 25 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	268	46	6	320
May 2025	282	40	4	326
Jun 2025	246	40	10	296
Jul 2025	318	40	8	366
Aug 2025	422	60	2	484
Sep 2025	1,204	46	4	1,254
Oct 2025	226	28	4	258
Nov 2025	214	160	2	376
Dec 2025	182	158	6	346
Jan 2026	116	86	20	222
Feb 2026	96	88	10	194
Mar 2026	156	160	10	326
Apr 2026	74	52	3	129
May 2026	70	22	4	96
Jun 2026	11	7	1	19
Jul 2026	11	7	1	19

Cumulative views and downloads (calculated since 25 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	268	46	6	320
May 2025	282	40	4	326
Jun 2025	246	40	10	296
Jul 2025	318	40	8	366
Aug 2025	422	60	2	484
Sep 2025	1,204	46	4	1,254
Oct 2025	226	28	4	258
Nov 2025	214	160	2	376
Dec 2025	182	158	6	346
Jan 2026	116	86	20	222
Feb 2026	96	88	10	194
Mar 2026	156	160	10	326
Apr 2026	74	52	3	129
May 2026	70	22	4	96
Jun 2026	11	7	1	19
Jul 2026	11	7	1	19

Viewed (geographical distribution)

Total article views: 5,022 (including HTML, PDF, and XML) Thereof 5,022 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 25 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2683 KB)
Metadata XML

Short summary

Using global and regional datasets, we compared attention-based models and Long Short-Term Memory (LSTM) models to predict hydrologic variables. Our results show LSTM models perform better in simpler tasks, whereas attention-based models perform better in complex scenarios, offering insights for improved water resource management.


Total:	0
HTML:	0
PDF:	0
XML:	0

From RNNs to Transformers: benchmarking deep learning architectures for hydrologic prediction

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Supplement

Viewed

Viewed (geographical distribution)