Towards Interpretable LSTM-based Modelling of Hydrological Systems

De la Fuente, Luis Andres; Ehsani, Mohammad Reza; Gupta, Hoshin Vijai; Condon, Laura E.

doi:https://doi.org/10.5194/egusphere-2023-666

Preprints

https://doi.org/10.5194/egusphere-2023-666

Preprints

20 Apr 2023

| 20 Apr 2023

Towards Interpretable LSTM-based Modelling of Hydrological Systems

Luis Andres De la Fuente, Mohammad Reza Ehsani, Hoshin Vijai Gupta, and Laura E. Condon

Abstract. Several studies have demonstrated the ability of Long Short-Term Memory (LSTM) machine learning based modeling to outperform traditional spatially lumped process-based modeling approaches for streamflow prediction. However, due mainly to the structural complexity of the LSTM network (which includes gating operations and sequential processing of the data), difficulties can arise when interpreting the internal processes and weights in the model.

Here, we propose and test a modification of LSTM architecture that represents internal system processes in a manner that is analogous to a hydrological reservoir. Our architecture, called HydroLSTM, simulates behaviors inherent in a dynamic system, such as sequential updating of the Markovian storage. Specifically, we modify how data is fed to the new representation to facilitate simultaneous access to past lagged inputs, thereby explicitly acknowledging the importance of trends and patterns in the data.

We compare the performance of the HydroLSTM and LSTM architectures using data from 10 hydro-climatically varied catchments. We further examine how the new architecture exploits the information in lagged inputs, for 588 catchments across the USA. The HydroLSTM-based models require fewer cell states to obtain similar performance to their LSTM-based counterparts. Further, the weights patterns associated with lagged input variables are interpretable and consistent with regional hydroclimatic characteristics (snowmelt-dominated, recent rainfall-dominated, and historical rainfall-dominated). These findings illustrate how the hydrological interpretability of LSTM-based models can be enhanced by appropriate architectural modifications that are physically and conceptually consistent with our understanding of the system.

Received: 05 Apr 2023 – Discussion started: 20 Apr 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Luis Andres De la Fuente, Mohammad Reza Ehsani, Hoshin Vijai Gupta, and Laura E. Condon

Status: closed

CC1:
'Comment on egusphere-2023-666', Grey Nearing, 22 Apr 2023

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-666/egusphere-2023-666-CC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2023-666-CC1
- AC1: 'Reply on CC1', Luis De La Fuente, 29 Apr 2023
  
  We would like to thank Grey Nearing for reviewing and commenting on our paper. We find this discussion very interesting and feel that it will enrich the final version of the paper.
  
  Citation: https://doi.org/10.5194/egusphere-2023-666-AC1
RC1:
'Comment on egusphere-2023-666', Tadd Bindas, 30 May 2023
Hello,
Thank you for the lovely preprint. I enjoyed reading your work and offer the following suggestions below. I believe the paper should be reconsidered for HESS, with major revisions, and look forward to reading the next submission.
Best,
Tadd Bindas
Does the paper address relevant scientific questions within the scope of HESS?

Yes.

Does the paper present novel concepts, ideas, tools, or data?

The concept proposed by their HydroLSTM model is novel. The authors are looking to add more interpretability to the LSTM architecture and get similar results with fewer cell-states using the HydroLSTM code they developed.

Are substantial conclusions reached?

I’m not sure. As a summary of my understanding of the paper: the results obtained from their first experiment showcase that a simplified LSTM framework (similar to our understanding of a reservoir) can use one cell to learn a relationship between inputted forcings. The second experiment shows how their model performs when compared to 588 CAMELS basin observations.

My confusion arises with how the authors train their HydroLSTM and LSTM in experiments 5 and 6. From what I’ve read, and understood from talks at conferences, LSTM models should be trained using all basin data, then tested at individual sites using either a PUB, PUR approach, or median NSE/KGE metric for all catchments. I do not believe the authors are doing this, thus, I am curious if training their HydroLSTM and LSTM models on all catchments would show the same results.

Are the scientific methods and assumptions valid and clearly outlined?

Yes. Table 1 does a good job of showing similarities between Storage and LSTM equations.

Are the results sufficient to support the interpretations and conclusions?

I believe more work needs to be done to validate the conclusion that HydroLSTM provides comparable performance with LSTM, but with added interpretability. A PUR or PUB experiment to see how a HydroLSTM trained on all CAMELS basins performs would be appreciated.

Is the description of experiments and calculations sufficiently complete and precise to allow their reproduction by fellow scientists (traceability of results)?

Almost. I still need clarification on the model training procedure.

Do the authors give proper credit to related work and clearly indicate their own new/original contribution?

Yes

Does the title clearly reflect the contents of the paper?

Yes

Does the abstract provide a concise and complete summary?

Yes

Is the overall presentation well-structured and clear?

Yes

Is the language fluent and precise?

Yes

Are mathematical formulae, symbols, abbreviations, and units correctly defined and used?

It would be appreciated to italicize all equations when in-line. It was hard to read/locate them amongst the text. There are also some repeated variable names (See the comments for an example).

Should any parts of the paper (text, formulae, figures, tables) be clarified, reduced, combined, or eliminated?

The model training could be a little clearer (Similar to the above comment).

Are the number and quality of references appropriate?

Yes

Is the amount and quality of supplementary material appropriate?

Yes

Major Comments:
Can you italicize all in-line variables and equations? It’s hard to determine which parts of the text describe equations/LSTM properties. In some cases, I’ve had to reread a paragraph multiple times to search for an equation I missed.

(Lines 245) Are any static attributes used in model training?

(Lines 252-259) I suggest swapping Calibration, Selection, and Evaluation periods with the training, validation, and testing periods within the parentheses. It looks like you are using the train, validation, and test verbiage throughout the paper, and only referring to calibration, selection, and evaluation periods once (Line 427) after being defined.

(Section 5.1) Would it be possible to include a PUR analysis rather than a 10-basin (PUB) holdout? So, rather than having two basins from each region, you would test on all gages within a snowmelt-dominant or Recent rainfall-dominant region. I believe this study would benefit from comparing how each LSTM performs on regions not included in the training set. This analysis would strengthen the claim that HydroLSTM has similar model performance to LSTM, but with heightened interpretability.

(Line 310) How many total catchments were included in the training period? It is mentioned in Section 6.1, but not in 5.1. Is it just one catchment?

(Line 428) From my understanding of the literature, the best-performing LSTM models are using forcings, and attributes, from all basins in their inputs. For example, if there are 588 catchments, all catchments would be included in the training set. Then, testing would be done on all catchments, to determine a median KGE. Is training HydroLSTM on all catchments, or using basin attributes, something you have explored? Is the optimal lag memory hyperparameter the reason against having an entire CAMELS-trained LSTM? More explanation would be appreciated.

(Section 6) Is it possible to add a comparison against an LSTM applied to a large sample of catchments?

(Section 6) Is it possible to add a PUB comparison to this section?

Minor Comments:
(Affiliations) The s in the United States is cut off

(Line 58) Is Expected Gradient supposed to be capitalized?

(Line 107, Line 120) The symbol for the output gate, and the time-constant value, are both o. This could lead to some confusion.

(Line 130-135) Physical state and informational state don’t need to be italicized.

(Table 1) Are the brackets supposed to be facing outward? (ex: o = ]0,1[)

(Line 212) Typo. There needs to be a space inside Wand

(Line 253) I believe you mean “Commonly referred to as Training, Validation, and Testing.” You used evaluation twice in this part.

(Line 286) You didn’t establish what a testing period is (see earlier comment for Line 253). Testing should be replaced with “Evaluation.”

(Line 304) The header “5 Experiment 1” reads weird. Maybe change to “5 First Experiment?”

(Figure 3) It may be clearer to the reader that rows are the Catchment Studied if you put the gage number on the row’s y-axis in bold above “Cells.”

(Line 417) Same as the above comment. Maybe replace this with 6 Second Experiment. The section title reads weird.

(Line 439) There is an unnecessary space before “However”
Citation: https://doi.org/10.5194/egusphere-2023-666-RC1
- AC2: 'Reply on RC1', Luis De La Fuente, 15 Jun 2023
  
  Dear Tadd,
  Thank you for your kind comments. We will incorporate your suggestions and clarify parts that were not clear enough.
  Best regards,
  
  De la Fuente et al.
  
  Citation: https://doi.org/10.5194/egusphere-2023-666-AC2
RC2:
'Comment on egusphere-2023-666', Anonymous Referee #2, 15 Jun 2023
Major Comments:
The structure of the manuscript may need to be optimized to make it easier to read. For instance, Section 2.3 is titled “Differences with the Hydrologic Reservoir”, but I am hard to get the differences between LSTM and the Hydrologic Reservoir; some terms, such as optimal lag memory, significant values for weights and so on, are not well defined in the manuscript; I am confused about how Experiment 2 helped with the topic.

The manuscript uses a long length to compare linear reservoir model and LSTM. However, the linear reservoir model is not used in the case study. Is it possible to use the linear reservoir model as the benchmark model to relate the parameters of the linear reservoir model and LSTM in order to discuss the physical meaning of LSTM parameters.

My primary concern is that the HydroLSTM/LSTM is not well validated. From Figure 4, the performance of HydroLSTM/LSTM is not very good. Only 5 out of 10 basins have a KGE value larger than 0.7. In some previous studies, most runoff simulations with a local LSTM can obtain a KGE greater than 0.7. I am concerned about whether the parameters and structure of the ML model are well set. Further, for a ML model with poor performance, the interpretability of the model does not seem to be very meaningful. Also, I wonder why the authors did not use more catchments to verify the reliability of the model. In my opinion, the summary of 588 catchments makes Figure 4 more credible.

From Figure 5, the uncertainty of the model parameters seems large. The large uncertainty of parameters may make the model less interpretable. I think it is necessary to explain the effect of parameter uncertainty on the model.

Why does Figure 5 use the logarithmic horizontal axis? If using a regular coordinate axis, I think it is hard to distinguish the fluctuation of precip weights between 0-10 days and after 10 days in the ID11473900 catchment. The fluctuation of Pot. Evapot. Weights in a regular coordinate axis seem to be a periodic variation in the ID9035900 catchment. The logarithmic horizontal axis may mislead readers into thinking that there is a trend from 0-1 days.

The manuscript analyses the physical meaning of the output gate. I'm wondering if the forget gate and input gate have a corresponding interpretation.

Why dose Experiment 2 classify the catchments with Aridity rather than catchment dominance factors in Experiment 1?

How is the optimal number of lagged days obtained in Experiment 2? From Figure 8, I think the optimal numbers of lagged days of the catchments with AI<0.6 and AI>1.0 are also 128 days. There needs to be a more discussion about the relationship between required memory time scales and aridity.

Minor Comments:
We usually use the Hydrologic Reservoir model rather than the “Hydrologic Reservoir”. Just a suggestion.

Table 2. It is necessary to explain the difference between “Recent” and “Historical”.

Figure 4. How to choose the “red *”?
Citation: https://doi.org/10.5194/egusphere-2023-666-RC2
- AC3: 'Reply on RC2', Luis De La Fuente, 06 Jul 2023
  
  Dear reviewer,
  Thank you for your kind comments. We will incorporate your feedback and address the clarifications you have highlighted.
  Best regards,
  De la Fuente et al.
  
  Citation: https://doi.org/10.5194/egusphere-2023-666-AC3
RC3:
'Comment on egusphere-2023-666', Anonymous Referee #3, 18 Jul 2023

egusphere-2023-666 “Towards Interpretable LSTM-based Modelling of Hydrological Systems” LA.De la Fuente, MR.Ehsani, HV.Gupta and LE.Condon
This article is really well written and constructed, making it an easy manuscript to read. I find little to critique about the presentation, but will talk in generalities from a hydrological modellers perspective..
Temporal weighting of lagged events is highly reminiscent of unit hydrograph theory and is potentially another interpretation of the weights obtained (e.g., Sherman 1932; Lienhard 1964; Rodriguez-Iturbe and Valdes 1979). Similarly, the number of cell states may be broadly associated with the number of linear reservoirs in series that produce these unit hydrographs (e.g., Ocak and Bayazit 2003). Clearly a unit hydrograph that partitions daily total runoff into an hourly signal (for estimating peak flow for example) is not the same, but the concepts are equivalent when representing inputs as a temporal output distributed with the memory of prior inputs.
The titles of Sections 5 and 6 should be more descriptive than “Experiment 1” and “Experiment 2”, maybe “Comparison of LSTM and HydroLSTM across hydrological regime” and “HydroLSTM performance with a single cell state”. These suggestions are very rough, but the opening paragraph of each section should follow logically from the section title and expand upon it. It is unclear how the 10 catchments in Section 5 were selected, but it interesting to observe that in each of the five hydro-climate regimes that LSTM had one low cell state result, and one much higher. Was this deliberate or simply to support the later message regarding spatial variability of lag, that definitive patterns of number of cell states or lag are difficult to establish based on hydro-climate for multiple cell state representations?
Please be careful in your equations that the hyperbolic tangent function “tanh” stays a single word and not split to “tan_^h” with a space, as I notice the language of tangent hyperbolic in the text.
Figure 6a has four clearly inferior lag times (4, 8, 16 and 32 days) with the other three (64, 128 and 256 days) being the same for all practical purposes for KGE>0.4. It is hard to reconcile in the text (S6.2) that the graph has “saturated” at lag=256 if very similar results are obtained with lag=64 and 128, and there are no lag values >256 to confirm it. With Figure 6b it is not surprising that modelling with a single cell state is more difficult with increased aridity, as this is well known in the standard hydrologic modelling literature (e.g., Pilgrim et al. 1988). It also seems that the general success of HydroLSTM with a single cell state alludes to the usefulness (success?) of simple lumped hydrologic models such as GR4J or SIMHYD with few parameters that may be physically interpretable.
The Discussion is very interesting and points to useful future endeavours, with multiple output criteria to make the possible solution space smaller and get the right answer (outputs) for the right reason (see Kirchner 2006). As far as groundwater characterisation for rainfall-runoff modelling, the lag times may be very different for a “land” cell state and a “groundwater” cell state and potentially exceed one calendar year. Statistical correlation methods for groundwater variation with lag measured in months to years such as HARTT (e.g., Ferdowsian et al. 2001; Goodarzi 2020) are abstract methods that rely on variability of the assumed controlling mechanism, and have also been compared with neural network implementations. Whether these might provide some information or inspiration for additional work is unknown.

Ferdowsian R, Pannell DJ, McCarron C, Ryder A and Crossing L (2001) Explaining groundwater hydrographs: separating atypical rainfall events from time trends. Australian Journal of Soil Research, 39(4), 861–875, doi: 10.1071/SR00037
Goodarzi M (2020) Application and performance evaluation of time series, neural networks and HARTT models in predicting groundwater level changes, Najafabad Plain, Iran. Sustainable Water Resources Management, 6, 67, doi: 10.1007/s40899-020-00427-2
Kirchner, J. W. (2006) Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science of hydrology. Water Resources Research, 42, W03S04, doi:10.1029/2005WR004362
Lienhard JH (1964) A statistical mechanical prediction of the dimensionless unit hydrograph. Journal of Geophysical Research, 69(24), 5231-5238, doi: 10.1029/JZ069i024p05231
Ocak A and Bayazit M (2003) Linear Reservoirs in Series Model for Unit Hydrograph of Finite Duration. Turkish Journal of Engineering and Environmental Science, 27(2), 107-113, https://search.trdizin.gov.tr/en/yayin/detay/31619/
Pilgrim DH, Chapman TG and Doran DG (1988) Problems of rainfall-runoff modelling in arid and semiarid regions. Hydrological Sciences Journal, 33(4), 379-400, doi: 10.1080/02626668809491261
Rodriguez-Iturbe I and Valdes JB (1979) The geomorphic structure of hydrologic response. Water Resources Research, 15(6), 1409-1420, doi: 10.1029/WR015i006p01409
Sherman LK (1932) Streamflow from rainfall by the unit-graph method. Engineering News Record, 108, 501–505.

Citation: https://doi.org/10.5194/egusphere-2023-666-RC3
- AC4: 'Reply on RC3', Luis De La Fuente, 31 Jul 2023
  
  Thanks for your compliment. We appreciate your contribution and comments.
  
  Citation: https://doi.org/10.5194/egusphere-2023-666-AC4

Status: closed

CC1:
'Comment on egusphere-2023-666', Grey Nearing, 22 Apr 2023

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-666/egusphere-2023-666-CC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2023-666-CC1
- AC1: 'Reply on CC1', Luis De La Fuente, 29 Apr 2023
  
  We would like to thank Grey Nearing for reviewing and commenting on our paper. We find this discussion very interesting and feel that it will enrich the final version of the paper.
  
  Citation: https://doi.org/10.5194/egusphere-2023-666-AC1
RC1:
'Comment on egusphere-2023-666', Tadd Bindas, 30 May 2023
Hello,
Thank you for the lovely preprint. I enjoyed reading your work and offer the following suggestions below. I believe the paper should be reconsidered for HESS, with major revisions, and look forward to reading the next submission.
Best,
Tadd Bindas
Does the paper address relevant scientific questions within the scope of HESS?

Yes.

Does the paper present novel concepts, ideas, tools, or data?

The concept proposed by their HydroLSTM model is novel. The authors are looking to add more interpretability to the LSTM architecture and get similar results with fewer cell-states using the HydroLSTM code they developed.

Are substantial conclusions reached?

I’m not sure. As a summary of my understanding of the paper: the results obtained from their first experiment showcase that a simplified LSTM framework (similar to our understanding of a reservoir) can use one cell to learn a relationship between inputted forcings. The second experiment shows how their model performs when compared to 588 CAMELS basin observations.

My confusion arises with how the authors train their HydroLSTM and LSTM in experiments 5 and 6. From what I’ve read, and understood from talks at conferences, LSTM models should be trained using all basin data, then tested at individual sites using either a PUB, PUR approach, or median NSE/KGE metric for all catchments. I do not believe the authors are doing this, thus, I am curious if training their HydroLSTM and LSTM models on all catchments would show the same results.

Are the scientific methods and assumptions valid and clearly outlined?

Yes. Table 1 does a good job of showing similarities between Storage and LSTM equations.

Are the results sufficient to support the interpretations and conclusions?

I believe more work needs to be done to validate the conclusion that HydroLSTM provides comparable performance with LSTM, but with added interpretability. A PUR or PUB experiment to see how a HydroLSTM trained on all CAMELS basins performs would be appreciated.

Is the description of experiments and calculations sufficiently complete and precise to allow their reproduction by fellow scientists (traceability of results)?

Almost. I still need clarification on the model training procedure.

Do the authors give proper credit to related work and clearly indicate their own new/original contribution?

Yes

Does the title clearly reflect the contents of the paper?

Yes

Does the abstract provide a concise and complete summary?

Yes

Is the overall presentation well-structured and clear?

Yes

Is the language fluent and precise?

Yes

Are mathematical formulae, symbols, abbreviations, and units correctly defined and used?

It would be appreciated to italicize all equations when in-line. It was hard to read/locate them amongst the text. There are also some repeated variable names (See the comments for an example).

Should any parts of the paper (text, formulae, figures, tables) be clarified, reduced, combined, or eliminated?

The model training could be a little clearer (Similar to the above comment).

Are the number and quality of references appropriate?

Yes

Is the amount and quality of supplementary material appropriate?

Yes

Major Comments:
Can you italicize all in-line variables and equations? It’s hard to determine which parts of the text describe equations/LSTM properties. In some cases, I’ve had to reread a paragraph multiple times to search for an equation I missed.

(Lines 245) Are any static attributes used in model training?

(Lines 252-259) I suggest swapping Calibration, Selection, and Evaluation periods with the training, validation, and testing periods within the parentheses. It looks like you are using the train, validation, and test verbiage throughout the paper, and only referring to calibration, selection, and evaluation periods once (Line 427) after being defined.

(Section 5.1) Would it be possible to include a PUR analysis rather than a 10-basin (PUB) holdout? So, rather than having two basins from each region, you would test on all gages within a snowmelt-dominant or Recent rainfall-dominant region. I believe this study would benefit from comparing how each LSTM performs on regions not included in the training set. This analysis would strengthen the claim that HydroLSTM has similar model performance to LSTM, but with heightened interpretability.

(Line 310) How many total catchments were included in the training period? It is mentioned in Section 6.1, but not in 5.1. Is it just one catchment?

(Line 428) From my understanding of the literature, the best-performing LSTM models are using forcings, and attributes, from all basins in their inputs. For example, if there are 588 catchments, all catchments would be included in the training set. Then, testing would be done on all catchments, to determine a median KGE. Is training HydroLSTM on all catchments, or using basin attributes, something you have explored? Is the optimal lag memory hyperparameter the reason against having an entire CAMELS-trained LSTM? More explanation would be appreciated.

(Section 6) Is it possible to add a comparison against an LSTM applied to a large sample of catchments?

(Section 6) Is it possible to add a PUB comparison to this section?

Minor Comments:
(Affiliations) The s in the United States is cut off

(Line 58) Is Expected Gradient supposed to be capitalized?

(Line 107, Line 120) The symbol for the output gate, and the time-constant value, are both o. This could lead to some confusion.

(Line 130-135) Physical state and informational state don’t need to be italicized.

(Table 1) Are the brackets supposed to be facing outward? (ex: o = ]0,1[)

(Line 212) Typo. There needs to be a space inside Wand

(Line 253) I believe you mean “Commonly referred to as Training, Validation, and Testing.” You used evaluation twice in this part.

(Line 286) You didn’t establish what a testing period is (see earlier comment for Line 253). Testing should be replaced with “Evaluation.”

(Line 304) The header “5 Experiment 1” reads weird. Maybe change to “5 First Experiment?”

(Figure 3) It may be clearer to the reader that rows are the Catchment Studied if you put the gage number on the row’s y-axis in bold above “Cells.”

(Line 417) Same as the above comment. Maybe replace this with 6 Second Experiment. The section title reads weird.

(Line 439) There is an unnecessary space before “However”
Citation: https://doi.org/10.5194/egusphere-2023-666-RC1
- AC2: 'Reply on RC1', Luis De La Fuente, 15 Jun 2023
  
  Dear Tadd,
  Thank you for your kind comments. We will incorporate your suggestions and clarify parts that were not clear enough.
  Best regards,
  
  De la Fuente et al.
  
  Citation: https://doi.org/10.5194/egusphere-2023-666-AC2
RC2:
'Comment on egusphere-2023-666', Anonymous Referee #2, 15 Jun 2023
Major Comments:
The structure of the manuscript may need to be optimized to make it easier to read. For instance, Section 2.3 is titled “Differences with the Hydrologic Reservoir”, but I am hard to get the differences between LSTM and the Hydrologic Reservoir; some terms, such as optimal lag memory, significant values for weights and so on, are not well defined in the manuscript; I am confused about how Experiment 2 helped with the topic.

The manuscript uses a long length to compare linear reservoir model and LSTM. However, the linear reservoir model is not used in the case study. Is it possible to use the linear reservoir model as the benchmark model to relate the parameters of the linear reservoir model and LSTM in order to discuss the physical meaning of LSTM parameters.

My primary concern is that the HydroLSTM/LSTM is not well validated. From Figure 4, the performance of HydroLSTM/LSTM is not very good. Only 5 out of 10 basins have a KGE value larger than 0.7. In some previous studies, most runoff simulations with a local LSTM can obtain a KGE greater than 0.7. I am concerned about whether the parameters and structure of the ML model are well set. Further, for a ML model with poor performance, the interpretability of the model does not seem to be very meaningful. Also, I wonder why the authors did not use more catchments to verify the reliability of the model. In my opinion, the summary of 588 catchments makes Figure 4 more credible.

From Figure 5, the uncertainty of the model parameters seems large. The large uncertainty of parameters may make the model less interpretable. I think it is necessary to explain the effect of parameter uncertainty on the model.

Why does Figure 5 use the logarithmic horizontal axis? If using a regular coordinate axis, I think it is hard to distinguish the fluctuation of precip weights between 0-10 days and after 10 days in the ID11473900 catchment. The fluctuation of Pot. Evapot. Weights in a regular coordinate axis seem to be a periodic variation in the ID9035900 catchment. The logarithmic horizontal axis may mislead readers into thinking that there is a trend from 0-1 days.

The manuscript analyses the physical meaning of the output gate. I'm wondering if the forget gate and input gate have a corresponding interpretation.

Why dose Experiment 2 classify the catchments with Aridity rather than catchment dominance factors in Experiment 1?

How is the optimal number of lagged days obtained in Experiment 2? From Figure 8, I think the optimal numbers of lagged days of the catchments with AI<0.6 and AI>1.0 are also 128 days. There needs to be a more discussion about the relationship between required memory time scales and aridity.

Minor Comments:
We usually use the Hydrologic Reservoir model rather than the “Hydrologic Reservoir”. Just a suggestion.

Table 2. It is necessary to explain the difference between “Recent” and “Historical”.

Figure 4. How to choose the “red *”?
Citation: https://doi.org/10.5194/egusphere-2023-666-RC2
- AC3: 'Reply on RC2', Luis De La Fuente, 06 Jul 2023
  
  Dear reviewer,
  Thank you for your kind comments. We will incorporate your feedback and address the clarifications you have highlighted.
  Best regards,
  De la Fuente et al.
  
  Citation: https://doi.org/10.5194/egusphere-2023-666-AC3
RC3:
'Comment on egusphere-2023-666', Anonymous Referee #3, 18 Jul 2023

egusphere-2023-666 “Towards Interpretable LSTM-based Modelling of Hydrological Systems” LA.De la Fuente, MR.Ehsani, HV.Gupta and LE.Condon
This article is really well written and constructed, making it an easy manuscript to read. I find little to critique about the presentation, but will talk in generalities from a hydrological modellers perspective..
Temporal weighting of lagged events is highly reminiscent of unit hydrograph theory and is potentially another interpretation of the weights obtained (e.g., Sherman 1932; Lienhard 1964; Rodriguez-Iturbe and Valdes 1979). Similarly, the number of cell states may be broadly associated with the number of linear reservoirs in series that produce these unit hydrographs (e.g., Ocak and Bayazit 2003). Clearly a unit hydrograph that partitions daily total runoff into an hourly signal (for estimating peak flow for example) is not the same, but the concepts are equivalent when representing inputs as a temporal output distributed with the memory of prior inputs.
The titles of Sections 5 and 6 should be more descriptive than “Experiment 1” and “Experiment 2”, maybe “Comparison of LSTM and HydroLSTM across hydrological regime” and “HydroLSTM performance with a single cell state”. These suggestions are very rough, but the opening paragraph of each section should follow logically from the section title and expand upon it. It is unclear how the 10 catchments in Section 5 were selected, but it interesting to observe that in each of the five hydro-climate regimes that LSTM had one low cell state result, and one much higher. Was this deliberate or simply to support the later message regarding spatial variability of lag, that definitive patterns of number of cell states or lag are difficult to establish based on hydro-climate for multiple cell state representations?
Please be careful in your equations that the hyperbolic tangent function “tanh” stays a single word and not split to “tan_^h” with a space, as I notice the language of tangent hyperbolic in the text.
Figure 6a has four clearly inferior lag times (4, 8, 16 and 32 days) with the other three (64, 128 and 256 days) being the same for all practical purposes for KGE>0.4. It is hard to reconcile in the text (S6.2) that the graph has “saturated” at lag=256 if very similar results are obtained with lag=64 and 128, and there are no lag values >256 to confirm it. With Figure 6b it is not surprising that modelling with a single cell state is more difficult with increased aridity, as this is well known in the standard hydrologic modelling literature (e.g., Pilgrim et al. 1988). It also seems that the general success of HydroLSTM with a single cell state alludes to the usefulness (success?) of simple lumped hydrologic models such as GR4J or SIMHYD with few parameters that may be physically interpretable.
The Discussion is very interesting and points to useful future endeavours, with multiple output criteria to make the possible solution space smaller and get the right answer (outputs) for the right reason (see Kirchner 2006). As far as groundwater characterisation for rainfall-runoff modelling, the lag times may be very different for a “land” cell state and a “groundwater” cell state and potentially exceed one calendar year. Statistical correlation methods for groundwater variation with lag measured in months to years such as HARTT (e.g., Ferdowsian et al. 2001; Goodarzi 2020) are abstract methods that rely on variability of the assumed controlling mechanism, and have also been compared with neural network implementations. Whether these might provide some information or inspiration for additional work is unknown.

Ferdowsian R, Pannell DJ, McCarron C, Ryder A and Crossing L (2001) Explaining groundwater hydrographs: separating atypical rainfall events from time trends. Australian Journal of Soil Research, 39(4), 861–875, doi: 10.1071/SR00037
Goodarzi M (2020) Application and performance evaluation of time series, neural networks and HARTT models in predicting groundwater level changes, Najafabad Plain, Iran. Sustainable Water Resources Management, 6, 67, doi: 10.1007/s40899-020-00427-2
Kirchner, J. W. (2006) Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science of hydrology. Water Resources Research, 42, W03S04, doi:10.1029/2005WR004362
Lienhard JH (1964) A statistical mechanical prediction of the dimensionless unit hydrograph. Journal of Geophysical Research, 69(24), 5231-5238, doi: 10.1029/JZ069i024p05231
Ocak A and Bayazit M (2003) Linear Reservoirs in Series Model for Unit Hydrograph of Finite Duration. Turkish Journal of Engineering and Environmental Science, 27(2), 107-113, https://search.trdizin.gov.tr/en/yayin/detay/31619/
Pilgrim DH, Chapman TG and Doran DG (1988) Problems of rainfall-runoff modelling in arid and semiarid regions. Hydrological Sciences Journal, 33(4), 379-400, doi: 10.1080/02626668809491261
Rodriguez-Iturbe I and Valdes JB (1979) The geomorphic structure of hydrologic response. Water Resources Research, 15(6), 1409-1420, doi: 10.1029/WR015i006p01409
Sherman LK (1932) Streamflow from rainfall by the unit-graph method. Engineering News Record, 108, 501–505.

Citation: https://doi.org/10.5194/egusphere-2023-666-RC3
- AC4: 'Reply on RC3', Luis De La Fuente, 31 Jul 2023
  
  Thanks for your compliment. We appreciate your contribution and comments.
  
  Citation: https://doi.org/10.5194/egusphere-2023-666-AC4

Luis Andres De la Fuente, Mohammad Reza Ehsani, Hoshin Vijai Gupta, and Laura E. Condon

Viewed

Total article views: 1,374 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
908	422	44	1,374	31	27

HTML: 908
PDF: 422
XML: 44
Total: 1,374
BibTeX: 31
EndNote: 27

Views and downloads (calculated since 20 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	314	118	10	442
May 2023	98	39	0	137
Jun 2023	83	23	7	113
Jul 2023	96	25	9	130
Aug 2023	69	37	0	106
Sep 2023	22	29	1	52
Oct 2023	32	29	2	63
Nov 2023	6	6	1	13
Dec 2023	22	27	2	51
Jan 2024	38	27	2	67
Feb 2024	17	7	1	25
Mar 2024	22	15	1	38
Apr 2024	25	15	2	42
May 2024	15	12	2	29
Jun 2024	40	8	2	50
Jul 2024	9	5	2	16

Cumulative views and downloads (calculated since 20 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	314	118	10	442
May 2023	98	39	0	137
Jun 2023	83	23	7	113
Jul 2023	96	25	9	130
Aug 2023	69	37	0	106
Sep 2023	22	29	1	52
Oct 2023	32	29	2	63
Nov 2023	6	6	1	13
Dec 2023	22	27	2	51
Jan 2024	38	27	2	67
Feb 2024	17	7	1	25
Mar 2024	22	15	1	38
Apr 2024	25	15	2	42
May 2024	15	12	2	29
Jun 2024	40	8	2	50
Jul 2024	9	5	2	16

Viewed (geographical distribution)

Total article views: 1,364 (including HTML, PDF, and XML) Thereof 1,364 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Jul 2024

Short summary

Long Short-Term Memory (LSTM) is a widely-used machine learning (ML) model in hydrology. However, it is difficult to extract knowledge from it. We propose HydroLSTM which represents processes analogous to a hydrological reservoir. Models using HydroLSTM perform similarly to LSTM but require fewer cell states. The learned parameters are informative about the dominant hydroclimatic characteristics of a catchment. Our results demonstrate how hydrological knowledge is encoded in the new structure.


Total:	0
HTML:	0
PDF:	0
XML:	0