the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: An approach for handling multiple temporal frequencies with different input dimensions using a single LSTM cell
Abstract. Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art performance for rainfall-runoff hydrological modeling. However, most studies focus on daily-scale predictions, limiting the benefits of sub-daily (e.g. hourly) predictions in applications like flood forecasting. Moreover, training an LSTM exclusively on sub-daily data is computationally expensive, and may lead to model-learning difficulties due to the extended sequence lengths. In this study, we introduce a new architecture, multi-frequency LSTM (MF-LSTM), designed to use input of various temporal frequencies to produce sub-daily (e.g. hourly) predictions at a moderate computational cost. Building on two existing methods previously proposed by coauthors of this study, the MF-LSTM processes older inputs at coarser temporal resolutions than more recent ones. The MF-LSTM gives the possibility to handle different temporal frequencies, with different number of input dimensions, in a single LSTM cell, enhancing generality and simplicity of use. Our experiments, conducted on 516 basins from the CAMELS-US dataset, demonstrate that MF-LSTM retains state-of-the-art performance while offering a simpler design. Moreover, the MF-LSTM architecture reported a 5x reduction in processing time, compared to models trained exclusively on hourly data.
- Preprint
(1345 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 23 Jan 2025)
-
RC1: 'Comment on egusphere-2024-3355', Anonymous Referee #1, 06 Jan 2025
reply
The paper addresses the challenge of predicting sub-daily forecasts. In such cases, sub-daily inputs are utilized to achieve optimal performance. However, when longer dependencies are present, processing this data at a sub-daily resolution can be quite time-consuming, as both sub-daily and monthly information may be required.
The authors introduce a simple and innovative approach to handle both short and long dependencies using the same LSTM model. They demonstrate that LSTM can effectively manage data with different frequencies by incorporating a label that indicates the data frequency, without sacrificing performance. Additionally, they show that LSTM can accommodate varying numbers of inputs at different frequencies by including an embedding layer before the LSTM.
These findings apply to any forecasting problem involving multiple time dependencies, suggesting that the proposed approach could have widespread utility.
The paper is well-written, with clear results, and I believe it should be accepted with minor comments.
Minor comments:
Line 25-26: I believe that one year is insufficient to capture groundwater behavior due to the longer residence times in these systems. Even in snowmelt-dominated catchments, additional memory may be necessary if snow accumulates between years. If you wish to retain this sentence, you must include a reference to support this assertion or refrain from mentioning specific processes.
Line 98: It would be helpful to provide a brief explanation of the example before presenting any values. For example, Why are you using 351?
Line 106-107: This section indicates that the value of 351 is arbitrary and that any other value could be used. If this is the case, does it imply that this value is a hyperparameter? How should it be estimated? Additionally, how do you determine the duration when dealing with hourly, daily, and monthly periods?
Line 159: You mentioned that the median KGE was similar, but what about the entire distribution (CDF)? If there are no significant differences, you could include the figure in the appendix. Did you consider extending the sequence beyond one year, particularly since you can now process longer sequences with reduced computational costs?
Citation: https://doi.org/10.5194/egusphere-2024-3355-RC1 -
AC1: 'Reply on RC1', Eduardo Acuna, 15 Jan 2025
reply
We want to thank the referee for the detailed evaluation of our paper. In the attached document, we answer the questions, comments and suggestions given.
-
AC1: 'Reply on RC1', Eduardo Acuna, 15 Jan 2025
reply
-
RC2: 'Comment on egusphere-2024-3355', Anonymous Referee #2, 19 Jan 2025
reply
The technical note builds on previous work from Gauch et al, 2021 by an improved multi-time-step / multi frequency LSTM (MF-LSTM) architecture. The MF-LSTM is capable to handle inputs with different temporal resolutions and input variables within a single LSTM cell and provides streamflow predictions on high resolution time steps in the same performance and computational efficiency as Gauch et al, 2021.
I think the paper is well-written, clearly structured and has the potential to advance multi-frequency LSTM applications. I think it fits well to the scope and I support publication as a Technical Note in HESS.
I understand the improvements to the LSTM simplifies the structure and then potentially code maintenance and flexibility of the LSTM code. However, I have difficulties to identify the major added value of the approach to the hydrological community:
The part "enhancing generality and simplicity of use" (Abstract, l.9) is, as I understand it, the main difference between previous work conducted by Gauch et al. 2021 and this Technical Note. I suggest to elaborate more on this point in the paper as, at its current state, it is not clear to me why this is the case. If a single LSTM cell is able to handle the same data and processes as two cells, isn't that single LSTM cell becoming more complex? What is the tradeoff / advantage here? I provided further comments below that might help to understand where I think more details could help in that regard.l.15 - I would add that this is particularly the case for small, fast responding catchments.
l.18 - I know that different processes can be at play, but you might want to mention that shorter and flexible time steps are also a prerequisite for eventually being able to depict flash floods, which would also be a strong motivation
l.50 - why two weeks? (also l.99). I see, it is mentioned in l.135 - suggest to give that explanation earlier.
l.98 - I acknowledge that the LSTM normalizes the data internally anyway and per se does not 'care' about the pysical plausibility of the inputs vs outputs. But given the hydrological focus of the journal it might make sense mentioning that it is not needed to use the sum instead of average as an input for precipitation?
l.112-118 This section is important to understand the difference between the MTS-LSTM and the MF-LSTM. However, I find it hard to grasp the structural difference between these two approaches. Can you give more details on the structure of the two different LSTM cells (MTS-LSTM) vs the embedding networks (MF-LSTM)? I suggest to particularly focus on the advantage a user gains. Computationally-wise both approaches are similar as you state later, both approaches can handle the same temporal and variable flexibility and both approaches yield the same performance. For an end-user of your provided codes the question arises why to choose the MF over the MTS-LSTM (with which a user might be familiar already)? Is input data generation and provision simpler - and if yes how?
l.131ff - I think you should provide a few more details about the application of the MF-LSTM: (1) did you conduct a hyperparameter tuning and if yes, how are the hyperparameters comparable to Gauch et al. 2021 (could you provide a small table comparing the hyperparameters)? If you transferred the hyperparameters from the previous study, is that plausible given the different architecture? (2) you don't mention for what time period your results are presented. I assume you show the testing results?
l.137 - median: would that be the "median streamflow across the 10 models for each time step"? If yes, I would suggest to add this in brackets
l.170 Similar as comment to l.112-118 - you mention that there is no significant difference between processing a batch. I wonder why this is the case. You now have only a single LSTM cell while for the MTS and sMTS you have two. What is then the advantage of your approach over the previous architectures?
Language suggestions:
l.76 - correct "same experimental allowed"
l.87 - "observation that" could be replaced by something like "principle" ?
l.88 - "reservoir" I find "storage" more appropriate in this context
l.91 - "time-varying" wouldn't "time-independent" gating be more appropriate?
Figure 1 caption : "where one has" suggest to rephrase to "where the same ... are available" or "... exist"?
l.167 - "comparing the total training time ... influenced by external factors"? I think you want to say that "total training time is influenced by external factors"?
Citation: https://doi.org/10.5194/egusphere-2024-3355-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
227 | 67 | 6 | 300 | 4 | 5 |
- HTML: 227
- PDF: 67
- XML: 6
- Total: 300
- BibTeX: 4
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1