An explainable deep learning model based on hydrological principles for flood simulation and forecasting

Xiang, Xin; Guo, Shenglian; Li, Chenglong; Wang, Yun

doi:10.5194/egusphere-2025-279

Preprints

https://doi.org/10.5194/egusphere-2025-279

Preprints

07 Feb 2025

| 07 Feb 2025

An explainable deep learning model based on hydrological principles for flood simulation and forecasting

Xin Xiang, Shenglian Guo, Chenglong Li, and Yun Wang

Abstract. Deep learning (DL) models always perform well in hydrological simulation but lack physical-based principles. To address this gap, we integrate the runoff generation and flow routing principals of Xinanjiang (XAJ) model into the architecture of recurrent neural network (RNN) units and establish a physical-based XAJRNN neural network layer. Subsequently, this layer is fused with LSTM layers to construct an explainable deep learning (EDL) model, which underwent testing at the Lushui River and Qingjiang River basins in China. Compared to benchmark models, the proposed EDL model performs very well, the average Nash-Sutcliffe efficiency (NSE)values for these two basins are 0.98 and 0.94, respectively. The small flood peak relative errors (PRE) and peak timing difference (∆T) close to zero demonstrate that the EDL model can accuracy simulate flood events. Notably, the EDL model not only enhances simulation accuracy over ordinary DL models but also enhances interpretability by incorporating physical principles, thereby offering fresh insights for the fusion of DL and hydrological models for flood simulation and forecasting.

Received: 21 Jan 2025 – Discussion started: 07 Feb 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3309 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (3309 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

19 Dec 2025

An explainable deep learning model based on hydrological principles for flood simulation and forecasting

Xin Xiang, Shenglian Guo, Chenglong Li, and Yun Wang

Hydrol. Earth Syst. Sci., 29, 7217–7239, https://doi.org/10.5194/hess-29-7217-2025,https://doi.org/10.5194/hess-29-7217-2025, 2025

Short summary

Xin Xiang, Shenglian Guo, Chenglong Li, and Yun Wang

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-279', Anonymous Referee #1, 11 Mar 2025
The paper is well-structured and provides a solid foundation. However, there are a few suggestions for improvement regarding this study:
Lines 30–31 mention the limitations of traditional hydrological models. More details on these limitations should be provided.

Line 51 contains a typo: “As an RNN subset in DL.”

Figure 1 shows that the rainfall gauge color is too similar to the elevation color. Please use a different color for better distinction. Besides that, the elevations of the two basins can be unified in color scale to make it easier to compare the terrain differences.

In section 2.2, mention that the timestep of data are different in two basins. For example, the rainfall data of Lushui River basin is 3h, but Qingjiang River basin is 6h.

The Xinanjiang (XAJ) hydrological model should be explicitly mentioned in Line 182.

Equation 2 and Figure 3 present the proposed XAJRANN layer. How are its parameters optimized?

Lines 247-248 mention that “The choice of LSTM is based on the numerous studies demonstrating its ability to improve the performance of hydrological model simulations.”, the author should add some references for these statements.

Figure 4 illustrates the structure of the EDL model. In the XAJRNN layer, the model outputs actual evapotranspiration (Et), areal mean free water storage (S0), areal mean tension water storage (W), and basin outflow discharge (Q). Is this output generated through supervised learning? Subsequently, in the LSTM layer, the model produces runoff, which may also be trained using supervised learning. Does it make sense to train the model twice?

Is there a typo in Lines 296 “and vis vasa.”? (vice versa?)

The RMSE values of the two basins in Table 1 are quite different. Please explain the results based on data statistics.

Table 1 presents the model performance. During the testing phase, the LSTM model achieved better NSE, RE, and RMSE for the Qingjiang River. What could be the reason for this?

What do the colors in the scatter plots of Figure 5 and Figure 6 represent? Please add a legend.

Figure 5 shows that there are flood events exceeding 3000 m³/s during the test period, while there are fewer flood events exceeding 3000 m³/s during the training period. Please explain whether this is the reason why the scatter points are below the 1:1 ideal line in the high flow range.

Table 2 indicates that the ∆𝑇 of XAJ and LSTM model are more than one day in 20170702 event. Is there a calculation error since the NSE is 0.93 for LSTM?

There is a problem with the statements for Lines 394-396. The discrepancies in the rising speed during the flood rising phase compared to the observations may be due to the slow response of the model to rainfall rather than to the models' insufficient ability to simulate low flow conditions.

Lines 401-403 mention that all three models underestimated the peak flow, and the simulated peak was significantly delayed compared to the observed peak, especially under complex terrain conditions. Please select stations with complex terrain and simple terrain for result comparisons to illustrate the impact of terrain on model simulations.

The author should add some statements about the simulated time horizon (e.g. T+1, T+2, …).
Citation: https://doi.org/10.5194/egusphere-2025-279-RC1
- AC1: 'Reply on RC1', Shenglian Guo, 21 Mar 2025
  
  Reply to Reviewers’ comments (Reviewer#1)
  The paper is well-structured and provides a solid foundation. However, there are a few suggestions for improvement regarding this study:
  Response: We thank the reviewer for his/her time in reviewing our manuscript and providing comprehensive suggestions for further improvements. Below is our detailed response to the reviewers' comments and suggestions.
  (1) Lines 30–31 mention the limitations of traditional hydrological models. More details on these limitations should be provided.
  Response: Thank you for this suggestion. Traditional hydrological models have several limitations in simulating hydrological processes. First, they struggle to accurately capture the complex nonlinear relationships in hydrological processes, especially under extreme weather events or in the context of climate change, which limits their applicability. Additionally, traditional hydrological models rely on predefined mathematical equations and assumptions, making them less adaptable to environmental changes such as land use alterations and human activities. At the same time, these models have limited consideration for spatial heterogeneity, often employing simplified approaches that overlook local hydrological characteristics, thereby affecting simulation accuracy. Lastly, physics-based hydrological models require high computational costs, restricting their application in large-scale and long-term simulations, while conceptual models, despite being computationally efficient, generally have lower accuracy and applicability. Therefore, traditional hydrological models face significant challenges in dealing with complex hydrological processes, extreme events, and rapidly changing environmental conditions.
  (2) Line 51 contains a typo: “As an RNN subset in DL.”
  Response: Thank you for pointing this out. This is indeed a typo. The revised sentence now reads: “As a subset of RNN in DL, …”.
  (3) Figure 1 shows that the rainfall gauge color is too similar to the elevation color. Please use a different color for better distinction. Besides that, the elevations of the two basins can be unified in color scale to make it easier to compare the terrain differences.
  Response: Thank you for your valuable suggestions. We modify Figure 1 to make the color of the rain gauges more distinguishable. Additionally, we unify the color scale for the elevation of the two basins to make it easier to compare the terrain differences.
  (4) In section 2.2, mention that the timestep of data are different in two basins. For example, the rainfall data of Lushui River basin is 3h, but Qingjiang River basin is 6h.
  Response: Thank you for your valuable suggestions. The timesteps of the two basin datasets in this study are different. Therefore, we add the following statement to explain this issue.
  It should be noted that the time step of these data is 3 h in the Lushui River basin, whereas it is 6 h in the Qingjiang River basin.
  (5) The Xinanjiang (XAJ) hydrological model should be explicitly mentioned in Line 182.
  Response: Thank you for your valuable comment. We agree that the XAJ hydrological model should be explicitly mentioned in Line 182. The revised text now reads: The XAJ hydrological model is a classic conceptual hydrological model that is widely used in basin hydrological simulation and water resource management. The model was first proposed by Chinese scholars in the 1970s (Zhao, 1992, 1993), with the aim of simulating regional hydrological processes. The core idea of the XAJ model is to describe the hydrological processes within a basin by combining physical processes and empirical formulas. The XAJ model consists mainly of modules for evapotranspiration, runoff generation, runoff separation, and flow routing.
  Zhao, R.: The Xinanjiang model applied in China, J. Hydrol., 135, 371–381, https://doi.org/10.1016/0022-1694(92)90096-E, 1992. 686
  Zhao, R.: A non-linear system model for basin concentration, J. Hydrol., 142, 477–482, https://doi.org/10.1016/0022-1694(93)90024-4, 1993.
  (6) Equation 2 and Figure 3 present the proposed XAJRANN layer. How are its parameters optimized?
  Response: Thank you for your question. Following the general optimization methods of deep learning models, the parameters of the XAJRANN layer are optimized using gradient descent, specifically with the Adam optimizer to minimize the loss function. In our study, the loss function is NSE, which helps measure the difference between the model’s simulations and the true values. We adjust hyperparameters, such as learning rate and batch size, to select the best hyperparameter configuration, thereby optimizing the model's performance.
  (7) Lines 247-248 mention that “The choice of LSTM is based on the numerous studies demonstrating its ability to improve the performance of hydrological model simulations.”, the author should add some references for these statements.
  Response: Thank you for the reviewer’s suggestion. To support the statement regarding the choice of LSTM in lines 247-248, Some studies demonstrate that LSTM performs well in hydrological models and can effectively improve simulation accuracy. For example, Alizadeh et al. (2021) demonstrated the SAINA-LSTM model outperforms the EnsPost and MS-EnsPost in low, medium, and high flow ranges, as well as in 1 to 7 day forecast horizons, and significantly reduces the root mean square error of flow predictions. Additionally, Xu et al. (2022) combined the particle swarm optimization (PSO) algorithm with the LSTM model to obtain the PSO-LSTM model. The research results show that the PSO-LSTM model outperforms the Artificial Neural Network (ANN) and PSO-ANN at all stations in the basin.
  Alizadeh, B., Ghaderi Bafti, A., Kamangir, H., Zhang, Y., Wright, D.B., Franz, K.J., 2021. A novel attention-based LSTM cell post-processor coupled with Bayesian optimization for streamflow prediction. J. Hydrol. 601, 126526. https://doi.org/10.1016/j.jhydrol.2021.126526
  Xu, Y., Hu, C., Wu, Q., Jian, S., Li, Z., Chen, Y., Zhang, G., Zhang, Z., Wang, S., 2022. Research on particle swarm optimization in LSTM neural networks for rainfall-runoff simulation. J. Hydrol. 608, 127553. https://doi.org/10.1016/j.jhydrol.2022.127553
  (8) Figure 4 illustrates the structure of the EDL model. In the XAJRNN layer, the model outputs actual evapotranspiration (Et), areal mean free water storage (S0), areal mean tension water storage (W), and basin outflow discharge (Q). Is this output generated through supervised learning? Subsequently, in the LSTM layer, the model produces runoff, which may also be trained using supervised learning. Does it make sense to train the model twice?
  Response: Thank you for your question. In our model, the training of the XAJRNN layer and the LSTM layer is complementary and not an independent "two-step training" process. The XAJRNN layer simulates actual evapotranspiration (E_t), water storage (S₀, W), and basin outflow (Q) based on input data through supervised learning. These intermediate outputs then serve as the input for the Normalization layer and LSTM layer, which further processes them to simulate runoff. The training process of the entire model is integrated, and the parameters of both the XAJRNN and LSTM layers are jointly optimized to improve simulation accuracy. Therefore, even though there are multiple layers of output, their training is interconnected, and there is no issue of redundant training.
  (9) Is there a typo in Lines 296 “and vis vasa.”? (vice versa?)
  Response: Thank you for your careful observation. Indeed, "and vis vasa" in line 296 is a typo, and it should be "vice versa.".
  (10) The RMSE values of the two basins in Table 1 are quite different. Please explain the results based on data statistics.
  Response: Thank you for your question. The significant difference in RMSE between the two basins is primarily due to the disparity in their annual average flows. Based on statistical calculations, the annual average flow of the Qingjiang River basin is 272 m³/s, while that of the Lushui River basin is only 93 m³/s. Additionally, the overall simulation performance in the Lushui River basin is better than in the Qingjiang River basin, which contributes to the higher RMSE observed in the Qingjiang River basin.
  (11) Table 1 presents the model performance. During the testing phase, the LSTM model achieved better NSE, RE, and RMSE for the Qingjiang River. What could be the reason for this?
  Response: Thank you for your questions. During the testing phase, the LSTM model demonstrated better simulation performance in the Qingjiang River basin, primarily due to the close integration of our EDL model with the XAJ model. Specifically, the XAJRNN layer in the EDL model adopts the runoff generation and routing principles of the XAJ model, making the model’s performance closely related to the simulation accuracy of the XAJ model. When the XAJ model performs well, the EDL model also improves accordingly; conversely, if the XAJ model’s performance is suboptimal, the EDL model is similarly affected.
  (12) What do the colors in the scatter plots of Figure 5 and Figure 6 represent? Please add a legend.
  Response: Thank you for your suggestion. The colors in Figures 5 and 6 represent the density of scatter point distribution, where higher density corresponds to higher color intensity. To clarify this, we add a legend.
  (13) Figure 5 shows that there are flood events exceeding 3000 m3/s during the test period, while there are fewer flood events exceeding 3000 m3/s during the training period. Please explain whether this is the reason why the scatter points are below the 1:1 ideal line in the high flow range.
  Response: Thank you for your suggestion. We agree with your point that multiple flood events with flow exceeding 3,000 m³/s occurred during the test period, while such events were relatively rare during the training period. This indicates that the model had limited training in the high-flow range, leading to scatter points in this range being positioned below the 1:1 ideal line. Below is the statement we added to explain this issue.
  In Figure 5, for the EDL model and LSTM model, the reason that the scatter points fall below the 1:1 ideal line in the high flow range may be due to the fact that during the training period, there were few flow values exceeding 3000 m³/s, while in the test period, there were relatively more high flows exceeding 3000 m³/s.
  (14) Table 2 indicates that the ∆𝑇 of XAJ and LSTM model are more than one day in 20170702 event. Is there a calculation error since the NSE is 0.93 for LSTM?
  Response: Thank you for your question. In the 20170702 flood event, the ∆𝑇 of both the XAJ model and the LSTM model exceeded one day, yet the NSE of the LSTM model still reached 0.93. This is not a calculation error, and the reasons are as follows: As shown in Figure 7(b), the 20170702 flood event was a typical double-peak flood. The LSTM model overestimated the first peak, resulting in a ∆𝑇 of more than one day, a phenomenon also observed in the XAJ model. However, the overall flood hydrograph simulated by the LSTM model closely matches the observed hydrograph, leading to a high NSE of 0.93. In contrast, the XAJ model's simulated hydrograph deviated more from the observed hydrograph, resulting in a lower NSE of only 0.65.
  (15) There is a problem with the statements for Lines 394-396. The discrepancies in the rising speed during the flood rising phase compared to the observations may be due to the slow response of the model to rainfall rather than to the models' insufficient ability to simulate low flow conditions.
  Response: Thank you for your valuable comments. We agree with your viewpoint that the discrepancy in the rising speed during the flood rising phase may be related to the model's response speed to rainfall, rather than solely due to its ability to simulate low flow conditions. The revised statement is as follows:
  This may be due to the slow response of the model to rainfall.
  (16) Lines 401-403 mention that all three models underestimated the peak flow, and the simulated peak was significantly delayed compared to the observed peak, especially under complex terrain conditions. Please select stations with complex terrain and simple terrain for result comparisons to illustrate the impact of terrain on model simulations.
  Response: Thank you for your suggestion. In the manuscript, we mentioned that all three models underestimated the peak flow and there was a delay in the simulated peak. This is because our study involved two basins, the Lushui and Qingjiang basins. As shown in Figure 1, the Qingjiang Basin has a more complex terrain and a more winding river system. Based on the flood simulation results in Figures 7 and 8, the simulation performance in the Lushui Basin is better than in the Qingjiang Basin. This led us to conclude that under complex terrain conditions, the model’s simulation results might not perform as well as under simpler terrain conditions.
  (17) The author should add some statements about the simulated time horizon (e.g. T+1, T+2, …).
  Response: Thank you for your suggestion. We understand what you mean by the simulated time horizon, but in the current stage of our research, we are using simulations similar to traditional hydrological models. Specifically, the input consists of a period of areal mean rainfall and evaporation, and the output is the corresponding flow, without involving the concept of a horizon.
  
  Citation: https://doi.org/10.5194/egusphere-2025-279-AC1

RC2: 'Comment on egusphere-2025-279', Anonymous Referee #2, 12 Mar 2025

This paper integrates the runoff generation and flow routing principles of the Xinanjiang model into a recurrent neural network framework, proposing the XAJRNN layer and constructing an EDL model. This approach enhances the physical interpretability of deep learning-based flood forecasting. Using the Lushui River and Qingjiang River basins as case studies, the EDL model is compared with benchmark models, demonstrating superior performance in flood simulation. The study is well-structured, data-driven, and methodologically rigorous, offering a novel perspective and valuable tool for explainable deep learning in hydrology. However, improvements in clarity, graphical details, and language are needed.

(1) Line 128: Please provide additional explanation on why the runoff generation and flow routing principles of the Xinanjiang model were chosen to construct an explainable deep learning model, specifically elaborating on its advantages and applicability.

(2) Line 131: Please further explain the rationale for using LSTM neural network layers to construct the model, highlighting its superiority.

(3) Lines 154-166: To enhance the completeness of the research background, it is recommended that information on the magnitude and frequency of historical floods in the study area be supplemented.

(4) Line 168: Please adjust the scale of the river curves in Figure 1 to improve the aesthetic quality and clarity of the illustration.

(5) Line 224: The author mentions "a similar structure"; please specify in which aspects this similarity is reflected to improve clarity.

(6) Equations (1) and (2) and Figure 3 (b): The parameter symbols in the equations do not match those used in Figure 3 (b). Please carefully verify and ensure consistency.

(7) Lines 258-260: The XAJRNN layer outputs four physical variables of interest. Please explain why these four variables were selected as outputs instead of others.

(8) Lines 274-280: The paper mentions that a genetic algorithm was used to optimize the parameters of the Xinanjiang model. Please provide the obtained optimal parameter values and include them in the relevant section.

(9) Figure 8 (d): The simulation performance of the EDL and the benchmark models appears to be poor. Please analyze the potential reasons for this issue.

(10) Language expression: Some parts of the paper contain repetitive phrasing. It is recommended to refine the text to improve fluency and conciseness.

(11) Reference formatting: Please carefully check the reference formatting to ensure compliance with the journal’s requirements, including the correct spelling of author names, publication year format, DOI, and page ranges.

Citation: https://doi.org/10.5194/egusphere-2025-279-RC2

AC2: 'Reply on RC2', Shenglian Guo, 21 Mar 2025

Reply to Reviewers’ comments (Reviewer#2)

Response: We thank the reviewer for his/her time in reviewing our manuscript and providing comprehensive suggestions for further improvements. Below is our detailed response to the reviewers' comments and suggestions.

Response: Thank you for providing this comprehensive review. We chose the runoff generation and flow routing principles of the Xinanjiang (XAJ) model as the foundation for constructing an explainable deep learning model based on the following considerations. First, the XAJ model has demonstrated excellent performance in hydrological simulation and forecasting across various watersheds. Its hydrological principles have been extensively validated over time, ensuring high reliability and maturity. Second, our study area is located in the Yangtze River Basin, which falls within a typical humid and semi-humid climate zone. The XAJ model’s saturation excess runoff mechanism effectively captures the nonlinear runoff response under such climatic conditions. This mechanism is particularly suitable for depicting the runoff response of our study area under varying rainfall intensities, thereby providing a solid theoretical foundation for both the interpretability and accuracy of our model.

(2) Line 131: Please further explain the rationale for using LSTM neural network layers to construct the model, highlighting its superiority.

Response: Thank you very much for the notice. We chose LSTM as a component of the explainable deep learning model primarily based on two considerations. First, LSTM's memory units can store hydrological information over long periods, enabling it to effectively model the temporal dependencies in the rainfall-runoff process and enhance flood prediction accuracy. Second, flood evolution involves multiple dynamic processes, including precipitation, evapotranspiration, surface runoff, and groundwater recharge. LSTM can adaptively learn the nonlinear relationships among these variables.

(3) Lines 154-166: To enhance the completeness of the research background, it is recommended that information on the magnitude and frequency of historical floods in the study area be supplemented.

Response: Thank you very much for the notice. We agree with your viewpoint, and we will add information on the magnitude and frequency of historical floods in the background section of the study area. For example, the Qingjiang River basin experienced major floods in 2016 and 2017, with peak inflow discharge into the Shuibuya Reservoir reaching 13,100 m³/s and 6,710 m³/s, respectively.

(4) Line 168: Please adjust the scale of the river curves in Figure 1 to improve the aesthetic quality and clarity of the illustration.

Response: Thank you very much for the notice. We revise the scale of the river curves in Figure 1 to improve the aesthetic quality and clarity.

(5) Line 224: The author mentions "a similar structure"; please specify in which aspects this similarity is reflected to improve clarity.

Response: Thank you very much for your suggestion. In our manuscript, we mention “a similar structure”, which is primarily reflected in the composition of Equations (2) and (3). Both equations consist of two parts: an ordinary differential equation and an output equation, and they share a highly similar structure. Specifically, in the ordinary differential equation part, both equations include the state variable from the previous time step (h(t-1)), the state variable at the current time step (h(t)), the input (x), and the parameters ((φ, W, b)). In the output equation part, both equations rely on the current state variable (h(t)), the output (y), and the same set of parameters ((φ, W, b)).

(6) Equations (1) and (2) and Figure 3 (b): The parameter symbols in the equations do not match those used in Figure 3 (b). Please carefully verify and ensure consistency.

Response: Thank you very much for your suggestion. We have reviewed the manuscript and found a minor error. We revise the parameter symbols in Figure 3(b) to ensure consistency with Equations (1) and (2).

(7) Lines 258-260: The XAJRNN layer outputs four physical variables of interest. Please explain why these four variables were selected as outputs instead of others.

Response: Thank you very much for your suggestion. We chose actual evapotranspiration (Eₜ), areal mean free water storage (S₀), areal mean tension water storage (W), and outflow discharge (Q) as the output variables of the XAJRNN layer, primarily based on their high hydrological relevance to flood forecasting. Actual evapotranspiration (Eₜ) is a key component of the hydrological cycle, directly affecting water availability and being crucial for runoff processes and flood simulation. Areal mean free water storage (S₀) and tension water storage (W) represent the states of free water and water under tension in the watershed, reflecting the watershed's storage capacity, which in turn influences flood occurrence and intensity. Outflow discharge (Q), as the direct output of the basin system, is a core indicator for flood simulating and can directly reflect downstream flood risk. The selection of these variables fully considers their physical significance and practical application value in flood simulation.

Response: Thank you very much for your suggestion. Below are the calibrated parameter values of the Xinanjiang model.

Parameter	Value range	Lushui River basin	Qingjiang River basin
Kc	[0.6,1.5]	0.95	0.85
c	[0.01,0.2]	0.18	0.19
Wum	[5,30]	28.75	23.15
Wlm	[60,90]	84.36	64.47
Wdm	[15,60]	23.19	15.60
Aimp	[0.01,0.2]	0.02	0.01
b	[0.1,0.4]	0.40	0.35
Sm	[10,50]	49.97	39.86
ex	[1,1.5]	1.08	1.06
Ki	[0.1,0.55]	0.19	0.37
Kg	[0.7-]	0.51	0.33
ci	[0.1,0.9]	0.87	0.89
cg	[0.9,0.988]	0.98	0.97
Kf	[0.1,10]	3.99	1.58

(9) Figure 8 (d): The simulation performance of the EDL and the benchmark models appears to be poor. Please analyze the potential reasons for this issue.

Response: Thank you very much for your suggestion. By analyzing the simulation performance of the EDL model and the benchmark model in Figure 8(d), we identified two major influencing factors: First, the location of the heavy rainfall center has a significant impact on the simulation results. Since the model input uses areal average rainfall, it fails to fully account for the spatial distribution characteristics of rainfall. As shown in Figure 8(d), when the heavy rainfall center is close to the Shuibuya Reservoir, the shortened routing time leads to a significant decline in the model's simulation performance. Second, the impact of upstream reservoir regulation cannot be ignored. During multiple flood events in the Qingjiang River Basin in 2020, the upstream reservoirs of Shuibuya increased their outflow to cope with the severe flood control situation. Combined with the effects of rainfall, this further reduced the model's simulation accuracy.

(10) Language expression: Some parts of the paper contain repetitive phrasing. It is recommended to refine the text to improve fluency and conciseness.

Response: Thank you very much for your suggestion. We will carefully review and refine the language in the manuscript.

Response: Thank you very much for your suggestion. We will carefully check the reference format in the manuscript according to the journal's requirements, including names, spelling, and publication years.

Citation: https://doi.org/10.5194/egusphere-2025-279-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-279', Anonymous Referee #1, 11 Mar 2025
The paper is well-structured and provides a solid foundation. However, there are a few suggestions for improvement regarding this study:
Lines 30–31 mention the limitations of traditional hydrological models. More details on these limitations should be provided.

Line 51 contains a typo: “As an RNN subset in DL.”

Figure 1 shows that the rainfall gauge color is too similar to the elevation color. Please use a different color for better distinction. Besides that, the elevations of the two basins can be unified in color scale to make it easier to compare the terrain differences.

In section 2.2, mention that the timestep of data are different in two basins. For example, the rainfall data of Lushui River basin is 3h, but Qingjiang River basin is 6h.

The Xinanjiang (XAJ) hydrological model should be explicitly mentioned in Line 182.

Equation 2 and Figure 3 present the proposed XAJRANN layer. How are its parameters optimized?

Lines 247-248 mention that “The choice of LSTM is based on the numerous studies demonstrating its ability to improve the performance of hydrological model simulations.”, the author should add some references for these statements.

Figure 4 illustrates the structure of the EDL model. In the XAJRNN layer, the model outputs actual evapotranspiration (Et), areal mean free water storage (S0), areal mean tension water storage (W), and basin outflow discharge (Q). Is this output generated through supervised learning? Subsequently, in the LSTM layer, the model produces runoff, which may also be trained using supervised learning. Does it make sense to train the model twice?

Is there a typo in Lines 296 “and vis vasa.”? (vice versa?)

The RMSE values of the two basins in Table 1 are quite different. Please explain the results based on data statistics.

Table 1 presents the model performance. During the testing phase, the LSTM model achieved better NSE, RE, and RMSE for the Qingjiang River. What could be the reason for this?

What do the colors in the scatter plots of Figure 5 and Figure 6 represent? Please add a legend.

Figure 5 shows that there are flood events exceeding 3000 m³/s during the test period, while there are fewer flood events exceeding 3000 m³/s during the training period. Please explain whether this is the reason why the scatter points are below the 1:1 ideal line in the high flow range.

Table 2 indicates that the ∆𝑇 of XAJ and LSTM model are more than one day in 20170702 event. Is there a calculation error since the NSE is 0.93 for LSTM?

There is a problem with the statements for Lines 394-396. The discrepancies in the rising speed during the flood rising phase compared to the observations may be due to the slow response of the model to rainfall rather than to the models' insufficient ability to simulate low flow conditions.

Lines 401-403 mention that all three models underestimated the peak flow, and the simulated peak was significantly delayed compared to the observed peak, especially under complex terrain conditions. Please select stations with complex terrain and simple terrain for result comparisons to illustrate the impact of terrain on model simulations.

The author should add some statements about the simulated time horizon (e.g. T+1, T+2, …).
Citation: https://doi.org/10.5194/egusphere-2025-279-RC1
- AC1: 'Reply on RC1', Shenglian Guo, 21 Mar 2025
  
  Reply to Reviewers’ comments (Reviewer#1)
  The paper is well-structured and provides a solid foundation. However, there are a few suggestions for improvement regarding this study:
  Response: We thank the reviewer for his/her time in reviewing our manuscript and providing comprehensive suggestions for further improvements. Below is our detailed response to the reviewers' comments and suggestions.
  (1) Lines 30–31 mention the limitations of traditional hydrological models. More details on these limitations should be provided.
  Response: Thank you for this suggestion. Traditional hydrological models have several limitations in simulating hydrological processes. First, they struggle to accurately capture the complex nonlinear relationships in hydrological processes, especially under extreme weather events or in the context of climate change, which limits their applicability. Additionally, traditional hydrological models rely on predefined mathematical equations and assumptions, making them less adaptable to environmental changes such as land use alterations and human activities. At the same time, these models have limited consideration for spatial heterogeneity, often employing simplified approaches that overlook local hydrological characteristics, thereby affecting simulation accuracy. Lastly, physics-based hydrological models require high computational costs, restricting their application in large-scale and long-term simulations, while conceptual models, despite being computationally efficient, generally have lower accuracy and applicability. Therefore, traditional hydrological models face significant challenges in dealing with complex hydrological processes, extreme events, and rapidly changing environmental conditions.
  (2) Line 51 contains a typo: “As an RNN subset in DL.”
  Response: Thank you for pointing this out. This is indeed a typo. The revised sentence now reads: “As a subset of RNN in DL, …”.
  (3) Figure 1 shows that the rainfall gauge color is too similar to the elevation color. Please use a different color for better distinction. Besides that, the elevations of the two basins can be unified in color scale to make it easier to compare the terrain differences.
  Response: Thank you for your valuable suggestions. We modify Figure 1 to make the color of the rain gauges more distinguishable. Additionally, we unify the color scale for the elevation of the two basins to make it easier to compare the terrain differences.
  (4) In section 2.2, mention that the timestep of data are different in two basins. For example, the rainfall data of Lushui River basin is 3h, but Qingjiang River basin is 6h.
  Response: Thank you for your valuable suggestions. The timesteps of the two basin datasets in this study are different. Therefore, we add the following statement to explain this issue.
  It should be noted that the time step of these data is 3 h in the Lushui River basin, whereas it is 6 h in the Qingjiang River basin.
  (5) The Xinanjiang (XAJ) hydrological model should be explicitly mentioned in Line 182.
  Response: Thank you for your valuable comment. We agree that the XAJ hydrological model should be explicitly mentioned in Line 182. The revised text now reads: The XAJ hydrological model is a classic conceptual hydrological model that is widely used in basin hydrological simulation and water resource management. The model was first proposed by Chinese scholars in the 1970s (Zhao, 1992, 1993), with the aim of simulating regional hydrological processes. The core idea of the XAJ model is to describe the hydrological processes within a basin by combining physical processes and empirical formulas. The XAJ model consists mainly of modules for evapotranspiration, runoff generation, runoff separation, and flow routing.
  Zhao, R.: The Xinanjiang model applied in China, J. Hydrol., 135, 371–381, https://doi.org/10.1016/0022-1694(92)90096-E, 1992. 686
  Zhao, R.: A non-linear system model for basin concentration, J. Hydrol., 142, 477–482, https://doi.org/10.1016/0022-1694(93)90024-4, 1993.
  (6) Equation 2 and Figure 3 present the proposed XAJRANN layer. How are its parameters optimized?
  Response: Thank you for your question. Following the general optimization methods of deep learning models, the parameters of the XAJRANN layer are optimized using gradient descent, specifically with the Adam optimizer to minimize the loss function. In our study, the loss function is NSE, which helps measure the difference between the model’s simulations and the true values. We adjust hyperparameters, such as learning rate and batch size, to select the best hyperparameter configuration, thereby optimizing the model's performance.
  (7) Lines 247-248 mention that “The choice of LSTM is based on the numerous studies demonstrating its ability to improve the performance of hydrological model simulations.”, the author should add some references for these statements.
  Response: Thank you for the reviewer’s suggestion. To support the statement regarding the choice of LSTM in lines 247-248, Some studies demonstrate that LSTM performs well in hydrological models and can effectively improve simulation accuracy. For example, Alizadeh et al. (2021) demonstrated the SAINA-LSTM model outperforms the EnsPost and MS-EnsPost in low, medium, and high flow ranges, as well as in 1 to 7 day forecast horizons, and significantly reduces the root mean square error of flow predictions. Additionally, Xu et al. (2022) combined the particle swarm optimization (PSO) algorithm with the LSTM model to obtain the PSO-LSTM model. The research results show that the PSO-LSTM model outperforms the Artificial Neural Network (ANN) and PSO-ANN at all stations in the basin.
  Alizadeh, B., Ghaderi Bafti, A., Kamangir, H., Zhang, Y., Wright, D.B., Franz, K.J., 2021. A novel attention-based LSTM cell post-processor coupled with Bayesian optimization for streamflow prediction. J. Hydrol. 601, 126526. https://doi.org/10.1016/j.jhydrol.2021.126526
  Xu, Y., Hu, C., Wu, Q., Jian, S., Li, Z., Chen, Y., Zhang, G., Zhang, Z., Wang, S., 2022. Research on particle swarm optimization in LSTM neural networks for rainfall-runoff simulation. J. Hydrol. 608, 127553. https://doi.org/10.1016/j.jhydrol.2022.127553
  (8) Figure 4 illustrates the structure of the EDL model. In the XAJRNN layer, the model outputs actual evapotranspiration (Et), areal mean free water storage (S0), areal mean tension water storage (W), and basin outflow discharge (Q). Is this output generated through supervised learning? Subsequently, in the LSTM layer, the model produces runoff, which may also be trained using supervised learning. Does it make sense to train the model twice?
  Response: Thank you for your question. In our model, the training of the XAJRNN layer and the LSTM layer is complementary and not an independent "two-step training" process. The XAJRNN layer simulates actual evapotranspiration (E_t), water storage (S₀, W), and basin outflow (Q) based on input data through supervised learning. These intermediate outputs then serve as the input for the Normalization layer and LSTM layer, which further processes them to simulate runoff. The training process of the entire model is integrated, and the parameters of both the XAJRNN and LSTM layers are jointly optimized to improve simulation accuracy. Therefore, even though there are multiple layers of output, their training is interconnected, and there is no issue of redundant training.
  (9) Is there a typo in Lines 296 “and vis vasa.”? (vice versa?)
  Response: Thank you for your careful observation. Indeed, "and vis vasa" in line 296 is a typo, and it should be "vice versa.".
  (10) The RMSE values of the two basins in Table 1 are quite different. Please explain the results based on data statistics.
  Response: Thank you for your question. The significant difference in RMSE between the two basins is primarily due to the disparity in their annual average flows. Based on statistical calculations, the annual average flow of the Qingjiang River basin is 272 m³/s, while that of the Lushui River basin is only 93 m³/s. Additionally, the overall simulation performance in the Lushui River basin is better than in the Qingjiang River basin, which contributes to the higher RMSE observed in the Qingjiang River basin.
  (11) Table 1 presents the model performance. During the testing phase, the LSTM model achieved better NSE, RE, and RMSE for the Qingjiang River. What could be the reason for this?
  Response: Thank you for your questions. During the testing phase, the LSTM model demonstrated better simulation performance in the Qingjiang River basin, primarily due to the close integration of our EDL model with the XAJ model. Specifically, the XAJRNN layer in the EDL model adopts the runoff generation and routing principles of the XAJ model, making the model’s performance closely related to the simulation accuracy of the XAJ model. When the XAJ model performs well, the EDL model also improves accordingly; conversely, if the XAJ model’s performance is suboptimal, the EDL model is similarly affected.
  (12) What do the colors in the scatter plots of Figure 5 and Figure 6 represent? Please add a legend.
  Response: Thank you for your suggestion. The colors in Figures 5 and 6 represent the density of scatter point distribution, where higher density corresponds to higher color intensity. To clarify this, we add a legend.
  (13) Figure 5 shows that there are flood events exceeding 3000 m3/s during the test period, while there are fewer flood events exceeding 3000 m3/s during the training period. Please explain whether this is the reason why the scatter points are below the 1:1 ideal line in the high flow range.
  Response: Thank you for your suggestion. We agree with your point that multiple flood events with flow exceeding 3,000 m³/s occurred during the test period, while such events were relatively rare during the training period. This indicates that the model had limited training in the high-flow range, leading to scatter points in this range being positioned below the 1:1 ideal line. Below is the statement we added to explain this issue.
  In Figure 5, for the EDL model and LSTM model, the reason that the scatter points fall below the 1:1 ideal line in the high flow range may be due to the fact that during the training period, there were few flow values exceeding 3000 m³/s, while in the test period, there were relatively more high flows exceeding 3000 m³/s.
  (14) Table 2 indicates that the ∆𝑇 of XAJ and LSTM model are more than one day in 20170702 event. Is there a calculation error since the NSE is 0.93 for LSTM?
  Response: Thank you for your question. In the 20170702 flood event, the ∆𝑇 of both the XAJ model and the LSTM model exceeded one day, yet the NSE of the LSTM model still reached 0.93. This is not a calculation error, and the reasons are as follows: As shown in Figure 7(b), the 20170702 flood event was a typical double-peak flood. The LSTM model overestimated the first peak, resulting in a ∆𝑇 of more than one day, a phenomenon also observed in the XAJ model. However, the overall flood hydrograph simulated by the LSTM model closely matches the observed hydrograph, leading to a high NSE of 0.93. In contrast, the XAJ model's simulated hydrograph deviated more from the observed hydrograph, resulting in a lower NSE of only 0.65.
  (15) There is a problem with the statements for Lines 394-396. The discrepancies in the rising speed during the flood rising phase compared to the observations may be due to the slow response of the model to rainfall rather than to the models' insufficient ability to simulate low flow conditions.
  Response: Thank you for your valuable comments. We agree with your viewpoint that the discrepancy in the rising speed during the flood rising phase may be related to the model's response speed to rainfall, rather than solely due to its ability to simulate low flow conditions. The revised statement is as follows:
  This may be due to the slow response of the model to rainfall.
  (16) Lines 401-403 mention that all three models underestimated the peak flow, and the simulated peak was significantly delayed compared to the observed peak, especially under complex terrain conditions. Please select stations with complex terrain and simple terrain for result comparisons to illustrate the impact of terrain on model simulations.
  Response: Thank you for your suggestion. In the manuscript, we mentioned that all three models underestimated the peak flow and there was a delay in the simulated peak. This is because our study involved two basins, the Lushui and Qingjiang basins. As shown in Figure 1, the Qingjiang Basin has a more complex terrain and a more winding river system. Based on the flood simulation results in Figures 7 and 8, the simulation performance in the Lushui Basin is better than in the Qingjiang Basin. This led us to conclude that under complex terrain conditions, the model’s simulation results might not perform as well as under simpler terrain conditions.
  (17) The author should add some statements about the simulated time horizon (e.g. T+1, T+2, …).
  Response: Thank you for your suggestion. We understand what you mean by the simulated time horizon, but in the current stage of our research, we are using simulations similar to traditional hydrological models. Specifically, the input consists of a period of areal mean rainfall and evaporation, and the output is the corresponding flow, without involving the concept of a horizon.
  
  Citation: https://doi.org/10.5194/egusphere-2025-279-AC1