Technical note: An approach for handling multiple temporal frequencies with different input dimensions using a single LSTM cell

Acuña Espinoza, Eduardo; Kratzert, Frederik; Klotz, Daniel; Gauch, Martin; Álvarez Chaves, Manuel; Loritz, Ralf; Ehret, Uwe

doi:10.5194/egusphere-2024-3355

Preprints

https://doi.org/10.5194/egusphere-2024-3355

Preprints

12 Dec 2024

| 12 Dec 2024

Technical note: An approach for handling multiple temporal frequencies with different input dimensions using a single LSTM cell

Eduardo Acuña Espinoza, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, Ralf Loritz, and Uwe Ehret

Abstract. Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art performance for rainfall-runoff hydrological modeling. However, most studies focus on daily-scale predictions, limiting the benefits of sub-daily (e.g. hourly) predictions in applications like flood forecasting. Moreover, training an LSTM exclusively on sub-daily data is computationally expensive, and may lead to model-learning difficulties due to the extended sequence lengths. In this study, we introduce a new architecture, multi-frequency LSTM (MF-LSTM), designed to use input of various temporal frequencies to produce sub-daily (e.g. hourly) predictions at a moderate computational cost. Building on two existing methods previously proposed by coauthors of this study, the MF-LSTM processes older inputs at coarser temporal resolutions than more recent ones. The MF-LSTM gives the possibility to handle different temporal frequencies, with different number of input dimensions, in a single LSTM cell, enhancing generality and simplicity of use. Our experiments, conducted on 516 basins from the CAMELS-US dataset, demonstrate that MF-LSTM retains state-of-the-art performance while offering a simpler design. Moreover, the MF-LSTM architecture reported a 5x reduction in processing time, compared to models trained exclusively on hourly data.

Received: 28 Oct 2024 – Discussion started: 12 Dec 2024

Competing interests: At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1345 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1345 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

26 Mar 2025

Technical note: An approach for handling multiple temporal frequencies with different input dimensions using a single LSTM cell

Eduardo Acuña Espinoza, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, Ralf Loritz, and Uwe Ehret

Hydrol. Earth Syst. Sci., 29, 1749–1758, https://doi.org/10.5194/hess-29-1749-2025,https://doi.org/10.5194/hess-29-1749-2025, 2025

Short summary

Eduardo Acuña Espinoza, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, Ralf Loritz, and Uwe Ehret

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3355', Anonymous Referee #1, 06 Jan 2025

The paper addresses the challenge of predicting sub-daily forecasts. In such cases, sub-daily inputs are utilized to achieve optimal performance. However, when longer dependencies are present, processing this data at a sub-daily resolution can be quite time-consuming, as both sub-daily and monthly information may be required.
The authors introduce a simple and innovative approach to handle both short and long dependencies using the same LSTM model. They demonstrate that LSTM can effectively manage data with different frequencies by incorporating a label that indicates the data frequency, without sacrificing performance. Additionally, they show that LSTM can accommodate varying numbers of inputs at different frequencies by including an embedding layer before the LSTM.
These findings apply to any forecasting problem involving multiple time dependencies, suggesting that the proposed approach could have widespread utility.
The paper is well-written, with clear results, and I believe it should be accepted with minor comments.

Minor comments:
Line 25-26: I believe that one year is insufficient to capture groundwater behavior due to the longer residence times in these systems. Even in snowmelt-dominated catchments, additional memory may be necessary if snow accumulates between years. If you wish to retain this sentence, you must include a reference to support this assertion or refrain from mentioning specific processes.
Line 98: It would be helpful to provide a brief explanation of the example before presenting any values. For example, Why are you using 351?
Line 106-107: This section indicates that the value of 351 is arbitrary and that any other value could be used. If this is the case, does it imply that this value is a hyperparameter? How should it be estimated? Additionally, how do you determine the duration when dealing with hourly, daily, and monthly periods?
Line 159: You mentioned that the median KGE was similar, but what about the entire distribution (CDF)? If there are no significant differences, you could include the figure in the appendix. Did you consider extending the sequence beyond one year, particularly since you can now process longer sequences with reduced computational costs?

Citation: https://doi.org/10.5194/egusphere-2024-3355-RC1
- AC1: 'Reply on RC1', Eduardo Acuna, 15 Jan 2025
  
  We want to thank the referee for the detailed evaluation of our paper. In the attached document, we answer the questions, comments and suggestions given.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3355-AC1
RC2:
'Comment on egusphere-2024-3355', Anonymous Referee #2, 19 Jan 2025

The technical note builds on previous work from Gauch et al, 2021 by an improved multi-time-step / multi frequency LSTM (MF-LSTM) architecture. The MF-LSTM is capable to handle inputs with different temporal resolutions and input variables within a single LSTM cell and provides streamflow predictions on high resolution time steps in the same performance and computational efficiency as Gauch et al, 2021.
I think the paper is well-written, clearly structured and has the potential to advance multi-frequency LSTM applications. I think it fits well to the scope and I support publication as a Technical Note in HESS.

I understand the improvements to the LSTM simplifies the structure and then potentially code maintenance and flexibility of the LSTM code. However, I have difficulties to identify the major added value of the approach to the hydrological community:

The part "enhancing generality and simplicity of use" (Abstract, l.9) is, as I understand it, the main difference between previous work conducted by Gauch et al. 2021 and this Technical Note. I suggest to elaborate more on this point in the paper as, at its current state, it is not clear to me why this is the case. If a single LSTM cell is able to handle the same data and processes as two cells, isn't that single LSTM cell becoming more complex? What is the tradeoff / advantage here? I provided further comments below that might help to understand where I think more details could help in that regard.
l.15 - I would add that this is particularly the case for small, fast responding catchments.
l.18 - I know that different processes can be at play, but you might want to mention that shorter and flexible time steps are also a prerequisite for eventually being able to depict flash floods, which would also be a strong motivation
l.50 - why two weeks? (also l.99). I see, it is mentioned in l.135 - suggest to give that explanation earlier.
l.98 - I acknowledge that the LSTM normalizes the data internally anyway and per se does not 'care' about the pysical plausibility of the inputs vs outputs. But given the hydrological focus of the journal it might make sense mentioning that it is not needed to use the sum instead of average as an input for precipitation?
l.112-118 This section is important to understand the difference between the MTS-LSTM and the MF-LSTM. However, I find it hard to grasp the structural difference between these two approaches. Can you give more details on the structure of the two different LSTM cells (MTS-LSTM) vs the embedding networks (MF-LSTM)? I suggest to particularly focus on the advantage a user gains. Computationally-wise both approaches are similar as you state later, both approaches can handle the same temporal and variable flexibility and both approaches yield the same performance. For an end-user of your provided codes the question arises why to choose the MF over the MTS-LSTM (with which a user might be familiar already)? Is input data generation and provision simpler - and if yes how?
l.131ff - I think you should provide a few more details about the application of the MF-LSTM: (1) did you conduct a hyperparameter tuning and if yes, how are the hyperparameters comparable to Gauch et al. 2021 (could you provide a small table comparing the hyperparameters)? If you transferred the hyperparameters from the previous study, is that plausible given the different architecture? (2) you don't mention for what time period your results are presented. I assume you show the testing results?
l.137 - median: would that be the "median streamflow across the 10 models for each time step"? If yes, I would suggest to add this in brackets
l.170 Similar as comment to l.112-118 - you mention that there is no significant difference between processing a batch. I wonder why this is the case. You now have only a single LSTM cell while for the MTS and sMTS you have two. What is then the advantage of your approach over the previous architectures?

Language suggestions:
l.76 - correct "same experimental allowed"
l.87 - "observation that" could be replaced by something like "principle" ?
l.88 - "reservoir" I find "storage" more appropriate in this context
l.91 - "time-varying" wouldn't "time-independent" gating be more appropriate?
Figure 1 caption : "where one has" suggest to rephrase to "where the same ... are available" or "... exist"?
l.167 - "comparing the total training time ... influenced by external factors"? I think you want to say that "total training time is influenced by external factors"?

Citation: https://doi.org/10.5194/egusphere-2024-3355-RC2
- AC2: 'Reply on RC2', Eduardo Acuna, 23 Jan 2025
  
  We want to thank the referee for the detailed evaluation of our paper. In the attached document, we answer the questions, comments and suggestions given.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3355-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3355', Anonymous Referee #1, 06 Jan 2025

The paper addresses the challenge of predicting sub-daily forecasts. In such cases, sub-daily inputs are utilized to achieve optimal performance. However, when longer dependencies are present, processing this data at a sub-daily resolution can be quite time-consuming, as both sub-daily and monthly information may be required.
The authors introduce a simple and innovative approach to handle both short and long dependencies using the same LSTM model. They demonstrate that LSTM can effectively manage data with different frequencies by incorporating a label that indicates the data frequency, without sacrificing performance. Additionally, they show that LSTM can accommodate varying numbers of inputs at different frequencies by including an embedding layer before the LSTM.
These findings apply to any forecasting problem involving multiple time dependencies, suggesting that the proposed approach could have widespread utility.
The paper is well-written, with clear results, and I believe it should be accepted with minor comments.

Minor comments:
Line 25-26: I believe that one year is insufficient to capture groundwater behavior due to the longer residence times in these systems. Even in snowmelt-dominated catchments, additional memory may be necessary if snow accumulates between years. If you wish to retain this sentence, you must include a reference to support this assertion or refrain from mentioning specific processes.
Line 98: It would be helpful to provide a brief explanation of the example before presenting any values. For example, Why are you using 351?
Line 106-107: This section indicates that the value of 351 is arbitrary and that any other value could be used. If this is the case, does it imply that this value is a hyperparameter? How should it be estimated? Additionally, how do you determine the duration when dealing with hourly, daily, and monthly periods?
Line 159: You mentioned that the median KGE was similar, but what about the entire distribution (CDF)? If there are no significant differences, you could include the figure in the appendix. Did you consider extending the sequence beyond one year, particularly since you can now process longer sequences with reduced computational costs?

Citation: https://doi.org/10.5194/egusphere-2024-3355-RC1
- AC1: 'Reply on RC1', Eduardo Acuna, 15 Jan 2025
  
  We want to thank the referee for the detailed evaluation of our paper. In the attached document, we answer the questions, comments and suggestions given.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3355-AC1
RC2:
'Comment on egusphere-2024-3355', Anonymous Referee #2, 19 Jan 2025

The technical note builds on previous work from Gauch et al, 2021 by an improved multi-time-step / multi frequency LSTM (MF-LSTM) architecture. The MF-LSTM is capable to handle inputs with different temporal resolutions and input variables within a single LSTM cell and provides streamflow predictions on high resolution time steps in the same performance and computational efficiency as Gauch et al, 2021.
I think the paper is well-written, clearly structured and has the potential to advance multi-frequency LSTM applications. I think it fits well to the scope and I support publication as a Technical Note in HESS.

I understand the improvements to the LSTM simplifies the structure and then potentially code maintenance and flexibility of the LSTM code. However, I have difficulties to identify the major added value of the approach to the hydrological community:

The part "enhancing generality and simplicity of use" (Abstract, l.9) is, as I understand it, the main difference between previous work conducted by Gauch et al. 2021 and this Technical Note. I suggest to elaborate more on this point in the paper as, at its current state, it is not clear to me why this is the case. If a single LSTM cell is able to handle the same data and processes as two cells, isn't that single LSTM cell becoming more complex? What is the tradeoff / advantage here? I provided further comments below that might help to understand where I think more details could help in that regard.
l.15 - I would add that this is particularly the case for small, fast responding catchments.
l.18 - I know that different processes can be at play, but you might want to mention that shorter and flexible time steps are also a prerequisite for eventually being able to depict flash floods, which would also be a strong motivation
l.50 - why two weeks? (also l.99). I see, it is mentioned in l.135 - suggest to give that explanation earlier.
l.98 - I acknowledge that the LSTM normalizes the data internally anyway and per se does not 'care' about the pysical plausibility of the inputs vs outputs. But given the hydrological focus of the journal it might make sense mentioning that it is not needed to use the sum instead of average as an input for precipitation?
l.112-118 This section is important to understand the difference between the MTS-LSTM and the MF-LSTM. However, I find it hard to grasp the structural difference between these two approaches. Can you give more details on the structure of the two different LSTM cells (MTS-LSTM) vs the embedding networks (MF-LSTM)? I suggest to particularly focus on the advantage a user gains. Computationally-wise both approaches are similar as you state later, both approaches can handle the same temporal and variable flexibility and both approaches yield the same performance. For an end-user of your provided codes the question arises why to choose the MF over the MTS-LSTM (with which a user might be familiar already)? Is input data generation and provision simpler - and if yes how?
l.131ff - I think you should provide a few more details about the application of the MF-LSTM: (1) did you conduct a hyperparameter tuning and if yes, how are the hyperparameters comparable to Gauch et al. 2021 (could you provide a small table comparing the hyperparameters)? If you transferred the hyperparameters from the previous study, is that plausible given the different architecture? (2) you don't mention for what time period your results are presented. I assume you show the testing results?
l.137 - median: would that be the "median streamflow across the 10 models for each time step"? If yes, I would suggest to add this in brackets
l.170 Similar as comment to l.112-118 - you mention that there is no significant difference between processing a batch. I wonder why this is the case. You now have only a single LSTM cell while for the MTS and sMTS you have two. What is then the advantage of your approach over the previous architectures?

Language suggestions:
l.76 - correct "same experimental allowed"
l.87 - "observation that" could be replaced by something like "principle" ?
l.88 - "reservoir" I find "storage" more appropriate in this context
l.91 - "time-varying" wouldn't "time-independent" gating be more appropriate?
Figure 1 caption : "where one has" suggest to rephrase to "where the same ... are available" or "... exist"?
l.167 - "comparing the total training time ... influenced by external factors"? I think you want to say that "total training time is influenced by external factors"?

Citation: https://doi.org/10.5194/egusphere-2024-3355-RC2
- AC2: 'Reply on RC2', Eduardo Acuna, 23 Jan 2025
  
  We want to thank the referee for the detailed evaluation of our paper. In the attached document, we answer the questions, comments and suggestions given.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3355-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to minor revisions (further review by editor) (28 Jan 2025) by Fabrizio Fenicia

AR by Eduardo Acuna on behalf of the Authors (03 Feb 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (04 Feb 2025) by Fabrizio Fenicia

AR by Eduardo Acuna on behalf of the Authors (04 Feb 2025) Manuscript

Journal article(s) based on this preprint

26 Mar 2025

Technical note: An approach for handling multiple temporal frequencies with different input dimensions using a single LSTM cell

Eduardo Acuña Espinoza, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, Ralf Loritz, and Uwe Ehret

Hydrol. Earth Syst. Sci., 29, 1749–1758, https://doi.org/10.5194/hess-29-1749-2025,https://doi.org/10.5194/hess-29-1749-2025, 2025

Short summary

Eduardo Acuña Espinoza, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, Ralf Loritz, and Uwe Ehret

Viewed

Total article views: 3,383 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,340	954	89	3,383	85	130

HTML: 2,340
PDF: 954
XML: 89
Total: 3,383
BibTeX: 85
EndNote: 130

Views and downloads (calculated since 12 Dec 2024)

Month	HTML	PDF	XML	Total
Dec 2024	274	86	6	366
Jan 2025	282	76	12	370
Feb 2025	122	72	2	196
Mar 2025	166	84	6	256
Apr 2025	110	36	4	150
May 2025	138	60	0	198
Jun 2025	78	48	12	138
Jul 2025	84	38	0	122
Aug 2025	110	38	2	150
Sep 2025	350	42	4	396
Oct 2025	84	38	0	122
Nov 2025	68	104	10	182
Dec 2025	96	56	0	152
Jan 2026	106	34	16	156
Feb 2026	90	62	4	156
Mar 2026	96	52	6	154
Apr 2026	48	14	3	65
May 2026	32	11	2	45
Jun 2026	6	3	0	9
Jul 2026	0

Cumulative views and downloads (calculated since 12 Dec 2024)

Month	HTML	PDF	XML	Total
Dec 2024	274	86	6	366
Jan 2025	282	76	12	370
Feb 2025	122	72	2	196
Mar 2025	166	84	6	256
Apr 2025	110	36	4	150
May 2025	138	60	0	198
Jun 2025	78	48	12	138
Jul 2025	84	38	0	122
Aug 2025	110	38	2	150
Sep 2025	350	42	4	396
Oct 2025	84	38	0	122
Nov 2025	68	104	10	182
Dec 2025	96	56	0	152
Jan 2026	106	34	16	156
Feb 2026	90	62	4	156
Mar 2026	96	52	6	154
Apr 2026	48	14	3	65
May 2026	32	11	2	45
Jun 2026	6	3	0	9
Jul 2026	0

Viewed (geographical distribution)

Total article views: 3,383 (including HTML, PDF, and XML) Thereof 3,383 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1345 KB)
Metadata XML

Short summary

Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art performance for rainfall-runoff hydrological modeling. However, most studies focus on daily-scale predictions, limiting the benefits of sub-daily (e.g. hourly) predictions in applications like flood forecasting. In this study, we introduce a new architecture, multi-frequency LSTM (MF-LSTM), designed to use input of various temporal frequencies to produce sub-daily (e.g. hourly) predictions at a moderate computational cost.


Total:	0
HTML:	0
PDF:	0
XML:	0