A hybrid model based on Boruta feature selection and neural network for forecasting hydrological drought

Li, Min; Yao, Yuhang; Ou, Ming; Yin, Changman

doi:10.5194/egusphere-2026-1033

Preprints

https://doi.org/10.5194/egusphere-2026-1033

Preprints

02 Apr 2026

| 02 Apr 2026

A hybrid model based on Boruta feature selection and neural network for forecasting hydrological drought

Min Li, Yuhang Yao, Ming Ou, and Changman Yin

Abstract. Accurate hydrological drought prediction is vital for water management. This study proposes a hybrid model combining Boruta feature selection, convolutional neural network (CNN), and bidirectional long short-term memory (BiLSTM) methods, to predict hydrological drought in the Huaihe River Basin of China. The Boruta algorithm selected key predictors from 31 potential drought-influencing factors. By comparing the established model Boruta-CNN-BiLSTM with other models, including Boruta-CNN-LSTM, Boruta-CNN-XGBoost, Boruta-BiLSTM, Boruta-LSTM, and Boruta-XGBoost, the results show Boruta significantly enhances all models. The Boruta-CNN-BiLSTM model has achieved the highest accuracy across 28 basin grid regions, exhibiting the largest performance gains. Furthermore, the prediction performance of the model is mainly influenced by factors such as precipitation, volumetric soil water (0–7 cm), volumetric soil water (7–28 cm) and surface net solar radiation. The model's prediction performance is most affected by precipitation, followed by volumetric soil water (0–7 cm), volumetric soil water (7–28 cm), and surface net solar radiation has the least impact. It provides enhanced support for basin-scale drought risk assessment and water resources management.

Received: 23 Feb 2026 – Discussion started: 02 Apr 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Min Li, Yuhang Yao, Ming Ou, and Changman Yin

Status: final response (author comments only)

CC1:
'Comment on egusphere-2026-1033', Nima Zafarmomen, 06 Apr 2026
The manuscript presents a relevant and well-structured contribution to hydrological drought prediction by proposing a hybrid Boruta–CNN–BiLSTM framework. The integration of feature selection with deep learning is timely and aligns well with current research trends in hydroinformatics and data-driven modeling.
One of the main strengths of the study is the systematic combination of feature selection (Boruta) and hybrid deep learning architectures, which addresses a common limitation in drought prediction models, the presence of redundant or irrelevant predictors. The use of 31 potential predictors and their reduction through Boruta provides a clear methodological advantage and improves model interpretability
Overall, the manuscript is methodologically sound, clearly organized, and relevant for both scientific and applied drought prediction contexts. I will put some minor comments:
While the hybrid Boruta–CNN–BiLSTM framework performs well, similar CNN–LSTM/BiLSTM hybrid approaches have been widely explored. The manuscript would benefit from more explicitly clarifying what is fundamentally newbeyond performance improvement.

The manuscript could be strengthened by incorporating recent studies that integrate hydrological process understanding with data-driven modeling. For example, Zafarmomen et al. (2024), “Assimilation of Sentinel‐based Leaf Area Index for Modeling Surface–Ground Water Interactions in Irrigation Districts,” demonstrates how integrating remotely sensed vegetation dynamics can improve hydrological representation and predictive performance.

The study mainly focuses on predictive accuracy. A deeper discussion on hydrological interpretability of the model outputs would strengthen the contribution.

The analysis is limited to monthly SRI-1. Since drought processes are scale-dependent, a short discussion on multi-timescale applicability (e.g., SRI-3, SRI-6) would be valuable.

While multiple models are compared, the inclusion of a simpler baseline (e.g., MLP or linear model) would help better quantify the added value of the hybrid architecture.

The manuscript is strong and suitable for publication after minor revisions. The suggested comments mainly aim to improve clarity, positioning, and broader impact rather than requiring major methodological changes.
Citation: https://doi.org/10.5194/egusphere-2026-1033-CC1
- AC1:
  'Reply on CC1', Li min, 08 May 2026
  While the hybrid Boruta-CNN-BiLSTM framework performs well, similar CNN-LSTM/BiLSTM hybrid approaches have been widely explored. The manuscript would benefit from more explicitly clarifying what is fundamentally new beyond performance improvement.
  
  Respond: Thank you for your valuable comments. In the original manuscript, the research contribution was not clearly stated. To address this issue, paragraphs 5 and 6 of the Introduction section are amended as follows:
  Integrating hydrological process understanding with data-driven models has become a research focus in hydrological simulation, as it can effectively enhance the physical rationality and prediction reliability of models. For instance, Zafarmomen et al. (2024) demonstrated how integrating remotely sensed vegetation dynamics can improve hydrological representation and predictive performance. Such studies collectively highlight that incorporating physically meaningful variables and process knowledge is crucial for strengthening the interpretability of data-driven models, which is a key direction for current hydrological drought prediction research. However, existing studies on data-driven hydrological drought prediction still have obvious shortcomings. On the one hand, most studies rely on a large set of input variables without explicitly evaluating their relevance to drought, which easily introduces redundant information and further affects model stability. On the other hand, although many hybrid models have attempted to combine the advantages of multiple algorithms to improve prediction performance, they lack effective optimization of the matching between input features and model structures, which limits the further improvement of prediction accuracy.
  To address the above shortcomings, this study aims to develop a novel hybrid machine learning model (Boruta-CNN-BiLSTM) to improve the accuracy and interpretability of hydrological drought prediction. The specific research objectives are as follows: First, the Boruta algorithm is adopted to objectively and accurately select the most relevant features for hydrological drought from 31 potential hydro-meteorological variables. This step is intended to reduce feature redundancy, mitigate the risk of model overfitting, and lay a solid foundation for improving model performance. Second, the selected key features are combined with the CNN-BiLSTM model to fully leverage the spatial feature extraction capability of CNN and the bidirectional temporal data processing capability of BiLSTM. This integration is designed to enhance the model’s ability to characterize complex hydrological drought dynamics and further improve its predictive performance. Finally, the performance of the proposed Boruta-CNN-BiLSTM model is validated using actual hydrological data from 28 regions in the Huaihe River Basin. Meanwhile, the model is compared with other benchmark models to verify its applicability and superiority under different spatial conditions. Notably, the Boruta-CNN-BiLSTM framework developed in this study enables the quantitative interpretation of the relative importance of key drought-controlling factors (e.g., precipitation, soil moisture, and net radiation). This not only improves the mechanistic understandability of data-driven drought prediction but also makes the model consistent with hydrological mechanisms, thereby providing a useful reference for integrating hydrological process understanding with data-driven drought forecasting.
  
  The manuscript could be strengthened by incorporating recent studies that integrate hydrological process understanding with data-driven modeling. For example, Zafarmomen et al. (2024), “Assimilation of Sentinel‐based Leaf Area Index for Modeling Surface–Ground Water Interactions in Irrigation Districts,” demonstrates how integrating remotely sensed vegetation dynamics can improve hydrological representation and predictive performance.
  
  Response: Thank you for this valuable and constructive suggestion. We fully agree that integrating hydrological process understanding with data-driven modeling can significantly improve the physical rationality and interpretability of drought prediction models. Following your advice, we have carefully read the recommended literature by Zafarmomen et al. (2024) and supplemented by relevant discussion in the Introduction section (paragraphs 5 and 6). We have added citations and comments on the importance of combining physical mechanism understanding with advanced data-driven approaches to our study. These revisions have strengthened the connection between our data-driven framework and hydrological process understanding, and enriched the background and significance of this research.
  
  The study mainly focuses on predictive accuracy. A deeper discussion on hydrological interpretability of the model outputs would strengthen the contribution.
  
  Response: Thank you for your valuable comments. In the original manuscript, the discussion section mainly focused on statistical relations, and the hydrological interpretation was limited. In order to solve this problem, the first three paragraphs of the Discussion section have been revised as follows:
  In order to analyze the reasons for the spatial differences in the prediction accuracy of the Boruta-CNN-BiLSTM model, the most influential factors obtained by the Boruta method were selected, namely precipitation, volumetric soil water (0-7cm), volumetric soil water (7-28cm) and surface net solar radiation. The CCM method was used to quantify the impacts of each influencing factor on the model evaluation index R2. The results are shown in Figures 16 and 17. According to Figures 15, 16 and 17, precipitation is the most significant influencing factor affecting the prediction accuracy of the model across the entire watershed. The mean value of the ρ-value is close to 0.9, and the range is mainly between 0.8 and 1.0. The data are relatively concentrated, which indicates that the model's prediction accuracy is sensitive to precipitation and is distributed relatively uniformly in space. This suggests that SRI-1 is strongly controlled by short-term water input, and precipitation directly influences runoff generation on a monthly scale.
  VSW1 and VSW2 are factors that have a greater impact on model prediction accuracy after precipitation. The mean ρ-value ranging from 0.4 to 0.5, and the range is mainly between 0.2 and 0.7. The wide range of data distribution indicates that the sensitivity of the model's prediction accuracy to volumetric soil water varies significantly in space. The specific manifestation is that the sensitivity in the upper and middle reaches is greater than that in downstream areas. This suggests that antecedent soil moisture influences runoff response through its persistence effect and plays a crucial role in drought persistence. The distribution of VSW1 and VSW2 across the basin is uneven, though its distribution is consistent with partial sensitivity.
  The factor that has the least impact on the model's prediction accuracy is SNSR, with a mean close to 0.2 and a distribution range between 0 and 0.4. Although its direct impact is limited, it may still influence hydrological drought indirectly through surface energy balance and evapotranspiration processes. The most influential factors obtained by the Boruta method indicate that model performance is closely related to key hydrological processes, including precipitation-driven runoff generation, soil moisture memory effects, and energy-controlled evapotranspiration, which jointly influence short-term hydrological drought evolution.
  
  The analysis is limited to monthly SRI-1. Since drought processes are scale-dependent, a short discussion on multi-timescale applicability (e.g., SRI-3, SRI-6) would be valuable.
  
  Response: Thank you for your valuable comments. In the original manuscript, the discussion on timescale applicability was relatively brief. To solve this problem, the paragraph 5 of the Discussion section is amended as follows：
  This study analyzed the SRI based on a one-month time scale, constructed several prediction models, and evaluated the effectiveness of the prediction models from multiple aspects. The results show that the Boruta-CNN-BiLSTM model has the most effective prediction effect. However, the SRI on different time scales may have a significant impact on the performance of the prediction model. At longer timescales, such as SRI-3 or SRI-6, hydrological drought is more strongly influenced by cumulative precipitation, basin storage conditions, and low-frequency climate variability. As a result, the relative importance of predictors, particularly soil moisture and large-scale climatic factors, may change, and model performance may vary across timescales. Evaluating the proposed framework under multi-timescale conditions would provide a more comprehensive understanding of its applicability. In addition to that, drought is also affected by human activities, basin geographical features, etc. For future research, the uncertainty of the model's prediction performance due to different time scales and various influence factors can be considered.
  
  While multiple models are compared, the inclusion of a simpler baseline (e.g., MLP or linear model) would help better quantify the added value of the hybrid architecture.
  
  Response: Thank you for this valuable and constructive suggestion. We fully agree that the inclusion of simpler baseline models is helpful to better quantify the added value of the proposed Boruta-CNN-BiLSTM hybrid architecture, as it can more intuitively reflect the performance advantages brought by the hybrid structure and the Boruta feature selection strategy, thereby enhancing the comprehensiveness and rigor of model comparison. To solve this problem, the paragraph 4 of the Discussion section is amended as follows：
  The deep learning model relies on deep network structures and is adept at capturing spatio-temporal correlations and complex nonlinear patterns in data, making it a research hotspot in drought prediction in recent years. Traditional statistical models, such as linear regression models, are essentially unable to capture the complex nonlinear relationship between drought influencing factors and SRI, while traditional machine learning models, such as multi-layer perceptrons (MLPs), lack the ability to extract spatial features and capture bidirectional temporal dependencies. In contrast, the Boruta-CNN-BiLSTM model integrates Boruta feature selection, CNN-based spatial feature extraction, and BiLSTM-based bidirectional temporal learning, effectively overcoming the inherent limitations of simple baseline models.
  
  Citation: https://doi.org/10.5194/egusphere-2026-1033-AC1

RC1: 'Comment on egusphere-2026-1033', Anonymous Referee #1, 11 May 2026

Li et al have presented their analyses of 6 ML based models implemented for spatio-temporal forecasting of the SPI-1 drought index. The analysis and the proposed algorithms are interesting, and the manuscript language is easy to understand.

However, the manuscript is currently lacking in sufficient description of the dataset, especially the predicted variable, in order to validate the authors’ conclusions, and understand its utility. I also have concerns regarding data leakage based on my interpretation of the manuscript, especially related to using SPI-1 both as predicted and predictor variables, and using the validation set for hyperparameter tuning. Additionally more information about their chosen models, parameters, methods, and validation will further improve the manuscript for future readers.

I have listed my major and minor comments for improvement below:

What is the lead time over which SPI-1 is predicted? For example, is it predicted 1 month is advance, or 10 years in advance? Are multiple temporal points predicted or a single future prediction is made?
Table 2: If SPI-1 is the predicted variable, how is it also an influencing factor? Using it both as an input and the output of the model creates data leakage, and would result in a perfect model. This would also explain the extremely high importance of this variable in the analyses.
Line 319: What was the cross-validation dataset used for hyperparameter tuning? If the authors used the same post-2010 data as validation set for hyperparameter tuning, then that creates a case for data leakage, and the predictive ability of the trained models over the same validation set can no longer be generalized.
Paragraph 37: When describing drought prediction, can the authors add information about the time horizons for predictions, e.g., are the droughts predicted one year in advance, etc.?
Line 21: Can the authors add some examples of loss amounts to further support the statement and help orient readers?
Line 103: Can the authors add more information about the droughts, e.g., how often do they recur?
Line 115: What is meant by “interpolation method in array”? Can more information about mapping data to the grid be added? This also relates to previous comment about spatial resolution to better understand the interpolation methodology.
Eq1, 2: The equations do not have any SRI term, so it is unclear how the SRI relates to the probability distributions. Can the authors clarify the STI calculation, including the values of the parameters alpha, gamma, and x.
Section 3.3.4: Can the authors also list the loss function that they used for the model depicted in Fig 4?
Line 242: Both RMSE and MSE emphasize the larger error since they both use the squared error. Why is one better than the other?
Section 3.5: Since the observations are spatiotemporal, can the authors also list how the observations are combined spatially and temporally for the listed metrics?
Section 4: Can the predicted variable: SPI-1 range be also included? Additionally at what value is a drought state considered? How many times did droughts occur in the study region based on the SPI-1 value, both in the training and the validation sets?
Section 4.1: How is the final list of the important features determined after identifying them separately at each of the 28 grid points?
Line 320: The six models have not been described prior to their mention here except one statement in the Abstract. Suggest explaining the models prior to their inclusion here.
Line 321: Can the authors provide a list of all the hyperparameters that were tuned for each model, the range used for hyperparameters, scaling (linear vs exponential), and number of iterations? Please also include the loss function and whether they were different across the models.
Table 3: What is the input size used for CNN? Based on previous descriptions, it appears that the data is prepared for the 28 grid points, which is constructed roughly from a 10x5 grid. If that’s the case, how can the filter size be greater than the number of grid points, e.g., 25?
Line 327: What is the baseline model against which the improvement is shown?
Line 334: Without any information about the baseline model, it is not possible to verify the accuracy of this conclusion.
Line 14 and 16: Repetitive
Line 28: The authors list the drought indices as addressing the challenges of “monitoring and predicting droughts”. Since they are using SRI to characterize droughts, the sentence structure gives the impression that SRI can also be used to “predict” droughts. Suggest clarifying that the listed indices are only designed to characterize droughts.
Line 54: The sentence is too broad, and does not provide sufficient reasons for the importance of feature selection. For neural network models, it is in fact the lack of need to select features that make them attractive models so why would feature selection be needed to improve their performance? This is important to include as it is one of the key motivations for the manuscript.
Line 67: What is “tool wear prediction” in drought analysis?
Line 102: It is not clear what is meant by “greater”. Does it mean the area of mountains within the selected grid is larger than plains? Can the authors add the area values in the sentence for a better comparison and understanding?
Line 105: Can some examples of extreme climatic events be added?
Line 106: Is the area primarily used for agriculture so that cropland losses are the most significant source of losses?
Line 106: What is the total area of the grid compared to the affected area?
Line 106: Is the affected area from only droughts or all climatic events?
Line 110: Can the authors add temporal and spatial resolutions of each dataset, perhaps in a Table?
Line 117: What is meant by “potential climate prediction factors”?
Line 134: What does item refer to?
Line 163: Can the authors clarify what will y(0) be, i.e., how is the loss calculated for the first base model?
Line 172: Since the authors have not used RNN, it is unnecessary to compare LSTM with RNN.
Line 268: How are the time lag step and embedding dimension determined?
Line 270: Mx is undefined: I assume it refers to the manifold.
Line 281: What does it mean by “library size” and how is it increased?
Line 283: What are the “drivers”?
Line 288: What is meant by “scale data”?
Line 294: The sentence is unclear: what is 1st, 17th, etc. region? Is the analysis done for only certain grid points, or is the analysis included in the manuscript for only certain grid points?
Table 2: What does lead time mean? Does T=1 SPI-1 mean the SPI-1 value 1 (month?) prior to the value being predicted?
Table 2: Please include definitions of acronyms
Table 2: Please include units where applicable
Table 2: What are the ranges of each of the features in the training dataset and the validation dataset?
Line 298: How were 35 factors selected out of 31?
Line 299: How was the random forest model, over which the Boruta selection algorithm is applied, trained and its hyperparameters selected? Which dataset was used for training? Was any validation done for the trained model before implementing the Boruta feature selection?
Line 300: What is the cutoff for determining feature significance, and what are the ranges and units of feature importance?
Fig 5, 6: What are the error bars on the plot?
Line 302: The figure scale and resolution did not allow for identifying the Blue color on the plots. As a result it is unclear if the blue features mark the boundary between red and green. I suspect that since blue features are randomly generated, any features less important than Blue implies non-significance.
Line 305: What leads to indetermination of feature significance (yellow)?
Line 307: Feature redundancy is not considered a significant concern for well trained neural net models. Can the authors clarify whether they intend the statement to apply only for the random forest model that the Boruta selection algorithm is applied on?
Line 311: The basis of the statement is unclear. Since the authors are “removing” unimportant features, why does the feature selection process result in the conclusion that “addition of climate indices improves prediction”?
Section 4.2: What were the other parameter values used in the models, e.g., stride, padding, batch normalization, layer normalization, etc.?
Table 3: How is xgboost used for time series forecasting in the 3rd and 6th models?
Figure 8: What is the range of SPI-1 index?
Figure 13: Can the authors include a description of the boxplots in the figure caption: e.g., the interquartiles of the whiskers, outliers, etc.
Figure 13: Is the box plot constructed across the entire spatiotemporal domain, i.e., all 28 grid points and the 120 months of validation set?
Line 376: It is unclear how to interpret Figure 13 to identify that Boruta-CNN-XGBoost has the highest error. Error bars of other models are much higher with similar medians.
Line 378: Why did the addition of CNN reduce the performance in the Xgboost variants, and improve the performance in LSTM?
Line 380: I am unfamiliar with Taylor diagrams. Can the authors add a short description about how to interpret them?
Line 394: Suggest including the results of the CCM in the Results section, instead of Discussion section.
Line 408, 409: Repeated statements
Figure 17: Can a raincloud plot be described and how to interpret it?

Citation: https://doi.org/10.5194/egusphere-2026-1033-RC1

AC2: 'Reply on RC2', Li min, 31 May 2026

Reviewer:

What is the lead time over which SPI-1 is predicted? For example, is it predicted 1 month is advance, or 10 years in advance? Are multiple temporal points predicted or a single future prediction is made?

Response: Thank you for the valuable comment. In this study, the SRI-1 value at month t+1 was predicted using historical drought-related variables up to month t, representing a one-month-ahead hydrological drought forecasting framework. Continuous monthly predictions were generated during 2011-2020. Added in Results: “The SRI-1 at t+1 is predicted by using the drought-related variables at t and before.”

Table 2: If SPI-1 is the predicted variable, how is it also an influencing factor? Using it both as an input and the output of the model creates data leakage, and would result in a perfect model. This would also explain the extremely high importance of this variable in the analyses.

Response: Thank you for the valuable comment. We would like to clarify that the prediction target in this study is SRI-1 rather than SPI-1. SPI-1, SPI-3, SPI-6, and SPI-9 are used as meteorological drought-related input variables for hydrological drought prediction and are not prediction targets. Therefore, no data leakage occurs between model inputs and outputs. To avoid ambiguity between drought time scales and prediction lead times, the notation of current-month variables was revised to SPI-1(t), SPI-3(t), SPI-6(t), and SPI-9(t), while “T=1”, “T=2”, and “T=3” were retained to represent different lead times.

A note was added on Table 2: “SPI-1(t), SPI-3(t), SPI-6(t), and SPI-9(t) represent drought indices at month t, while “T=1”, “T=2”, and “T=3” denote one-, two-, and three-month lead times, respectively. The prediction target is SRI-1 at month t+1.”

Line 319: What was the cross-validation dataset used for hyperparameter tuning? If the authors used the same post-2010 data as validation set for hyperparameter tuning, then that creates a case for data leakage, and the predictive ability of the trained models over the same validation set can no longer be generalized.

Response: Bayesian optimization was performed only within the calibration period (1960-2010). The independent validation period (2011-2020) was exclusively used for final model evaluation and was not involved in hyperparameter tuning. In Section 4.2, Bayesian optimization adds: “In this study, Bayesian optimization is used to fine-tune the hyperparameters of each model during the model training period from 1960 to 2010 to ensure optimal performance.”

Paragraph 37: When describing drought prediction, can the authors add information about the time horizons for predictions, e.g., are the droughts predicted one year in advance, etc.?

Response: Additional descriptions regarding the one-month-ahead forecasting framework were added throughout the manuscript.

Line 21: Can the authors add some examples of loss amounts to further support the statement and help orient readers?

Response: Thank you for this helpful suggestion. We add in the first paragraph of the introduction: “For example, severe drought events in China have affected millions of hectares of cropland annually and resulted in substantial economic losses exceeding billions of dollars in some extreme years(Wang et al., 2022).”

Line 103: Can the authors add more information about the droughts, e.g., how often do they recur?

Response:

Thank you for the valuable suggestion. Additional background information regarding drought characteristics in the Huaihe River Basin has been added in the revised manuscript: “Previous studies have shown that hydrological droughts in the basin occur mainly from October to May, indicating a pronounced seasonal recurrence pattern, and that drought variability also exhibits multi-year periodicity, with significant periods of about 2-7 years for annual drought and 1-3 years for seasonal drought (Sun et al., 2019; Li et al., 2022).”

Line 115: What is meant by “interpolation method in array”? Can more information about mapping data to the grid be added? This also relates to previous comment about spatial resolution to better understand the interpolation methodology.

Response: The unclear expression was revised to “Using the interpolation method in Xarray.”

Eq1, 2: The equations do not have any SRI term, so it is unclear how the SRI relates to the probability distributions. Can the authors clarify the STI calculation, including the values of the parameters alpha, gamma, and x.

Response: Thank you for the helpful suggestion. We have added in Section 3.1: “The calculation of SRI usually assumes that the runoff series obey the Gamma distribution, and the calculation formula is as follows”

Section 3.3.4: Can the authors also list the loss function that they used for the model depicted in Fig 4?

Response: Thank you for the helpful suggestion. We have added a brief description in Section 3.3.4 to clarify that the mean square error (MSE) was adopted as the loss function during model training.

Line 242: Both RMSE and MSE emphasize the larger error since they both use the squared error. Why is one better than the other?

Response: Thank the reviewer for pointing out this inaccurate statement. We do not claim that one is better than the other, but they have their own practical functions when reporting. Modified in Section 3.5 : “RMSE calculates the square root of the average square difference between the predicted value and the observed value, reflecting the overall size of the prediction error.” been revised as: “RMSE is the square root of the MSE. Because it is also derived from squared errors, RMSE is equally sensitive to large prediction errors. It is expressed in the same units as the target variable, providing an intuitive measure of the average magnitude of error.” The relevant formula is as follows:

(1)

(2)

Section 3.5: Since the observations are spatiotemporal, can the authors also list how the observations are combined spatially and temporally for the listed metrics?

Response: Evaluation metrics were first calculated separately for each grid region using monthly sequences and then averaged across the 28 regions.

Section 4: Can the predicted variable: SPI-1 range be also included? Additionally at what value is a drought state considered? How many times did droughts occur in the study region based on the SPI-1 value, both in the training and the validation sets?

Response: Thank you for the important comment. We would like to clarify that the predicted variable in this study is SRI-1 rather than SPI-1. Following the reviewer’s suggestion, we have added a brief description of the SRI-1 range and clarified that hydrological drought conditions were identified when the SRI-1 value was lower than -0.5. Since this study focuses on continuous SRI-1 prediction rather than drought event frequency analysis, drought occurrence counts during the training and validation periods were not further analyzed in this work.

Section 4.1: How is the final list of the important features determined after identifying them separately at each of the 28 grid points?

Response: Thank you for the helpful comment. In this study, feature selection was conducted independently for each grid region because hydrological and climatic characteristics vary spatially across the basin. Therefore, the final set of important features used for model training may differ among the 28 grid regions. We have added a clarification in Section 4.1 to improve the description of the feature selection process: “Since hydrological and climatic conditions differ across regions, feature selection was conducted independently for each grid region. Therefore, the final set of important features used for model training may vary among the 28 grid regions.”

Line 320: The six models have not been described prior to their mention here except one statement in the Abstract. Suggest explaining the models prior to their inclusion here.

Response: Thank you for your valuable suggestion. In order to improve the logical fluency and readability of the text, we added a brief description of the six comparison modes in Section 3.3: To comprehensively evaluate the prediction performance of the proposed framework, six different machine learning and deep learning models were developed in this study, including Boruta-CNN-BiLSTM, Boruta-CNN-LSTM, Boruta-CNN-XGBoost, Boruta-BiLSTM, Boruta-LSTM, and Boruta-XGBoost. These models were compared to investigate the effects of CNN structure, recurrent learning architecture, and feature selection strategy on hydrological drought prediction performance.

Line 321: Can the authors provide a list of all the hyperparameters that were tuned for each model, the range used for hyperparameters, scaling (linear vs exponential), and number of iterations? Please also include the loss function and whether they were different across the models.

Response: Thank you for the detailed suggestion: “The Bayesian iteration of each model in 28 different regions is 50 times.” In order to maintain the readability and simplicity of the document, there is no further expansion of the detailed parameter range and optimization settings in the text. The range of relevant parameters of the model is as follows :

LSTM	numhidden_units1, [3, 50] InitialLearnRate, [1e-5, 0.1]) dropoutRate, [0.1, 0.5] MaxEpochs, [50, 250]
XGBoost	eta, [1e-5, 0.1]) max_depth, [2, 10] num_trees, [100, 1000]
BiLSTM	numhidden_units, [3, 50], InitialLearnRate, [1e-5, 0.1]) dropoutRate, [0.1, 0.5] MaxEpochs, [50, 250]
CNN-LSTM	learningRate, [1e-5, 0.1] numUnits, [3, 50] convFilterSize, [1, 32] convNumFilters, [1, 32] poolSize, [1, 4] MaxEpochs, [50, 200] dropoutRate, [0.1, 0.5])
CNN- XGBoost	convFilterSize, [1, 32] convNumFilters, [1, 32] poolSize, [1, 3] numTrees, [100, 1000] eta, [0.01, 0.3]) maxDepth, [2, 10]
CNN- BiLSTM	learningRate, [1e-5, 0.1] numUnits, [3, 50] convFilterSize, [1, 32] convNumFilters, [1, 32] poolSize, [1, 4] MaxEpochs, [50, 200] dropoutRate, [0.1, 0.5])

Table 3: What is the input size used for CNN? Based on previous descriptions, it appears that the data is prepared for the 28 grid points, which is constructed roughly from a 10x5 grid. If that’s the case, how can the filter size be greater than the number of grid points, e.g., 25?

Response: Thank you for your important comment. We would like to clarify that the CNN structure used in this study is a one-dimensional temporal convolution network rather than a spatial convolution network applied to grid layouts. The convolutional filter size refers to the temporal sequence dimension of the input data rather than the number of spatial grid points. Therefore, filter sizes larger than the number of grid regions are reasonable in the present framework. We add in Section 3.3.4: “It should be noted that the convolutional filter size in this study refers to the temporal dimension of the input sequence rather than the spatial grid dimension. The CNN structure was designed as a one-dimensional temporal convolution network operating on sequential feature data.”

Line 327: What is the baseline model against which the improvement is shown?

Response: Thank you for your important comment. In section 4.3, we add a related explanation: “The improvement shown in Figure 7 is calculated relative to the corresponding machine learning model without Boruta feature selection.”

Line 334: Without any information about the baseline model, it is not possible to verify the accuracy of this conclusion.

Response: Thank you for your comment. In section 4.3, we add a related explanation: “The improvement shown in Figure 7 is calculated relative to the corresponding machine learning model without Boruta feature selection.”

Line 14 and 16: Repetitive

Response: Thank you for your important comment. Modify “Furthermore, the prediction performance of the model is mainly influenced by factors such as precipitation, volumetric soil water (0-7cm), volumetric soil water (7-28cm) and surface net solar radiation. The model's prediction performance is most affected by precipitation, followed by volumetric soil water (0-7cm), volumetric soil water (7-28cm), and surface net solar radiation has the least impact.” to “Furthermore, the prediction performance of the model is mainly influenced by precipitation, followed by volumetric soil water (0-7 cm), volumetric soil water (7-28 cm), and surface net solar radiation, which has the least impact.”

Line 28: The authors list the drought indices as addressing the challenges of “monitoring and predicting droughts”. Since they are using SRI to characterize droughts, the sentence structure gives the impression that SRI can also be used to “predict” droughts. Suggest clarifying that the listed indices are only designed to characterize droughts.

Response: Thank you for your important comment. We modified the original sentence to: “To monitor and characterize drought conditions, researchers have developed various drought indices, such as the Palmer Drought Severity Index (PDSI) (Palmer, 1965), the Standardized Precipitation Index (SPI) (McKee et al., 1993), and the Standardized Runoff Index (SRI) (Shukla and Wood, 2008), which are designed to quantify drought severity.”

Line 54: The sentence is too broad, and does not provide sufficient reasons for the importance of feature selection. For neural network models, it is in fact the lack of need to select features that make them attractive models so why would feature selection be needed to improve their performance? This is important to include as it is one of the key motivations for the manuscript.

Response: Thank you for your comment. We modify the original sentence “While these hybrid models have achieved significant advancements by leveraging the strengths of multiple models, they still face major challenges in feature selection. Poor feature selection can lead to unstable model performance and increased computational costs, limiting the overall effectiveness of these approaches.” to “While these hybrid models have achieved significant advancements by leveraging the strengths of multiple models, they still face major challenges when dealing with high-dimensional and highly correlated hydrological variables. Although deep learning models can automatically learn nonlinear relationships, redundant and irrelevant input features may still introduce noise, increase computational complexity, and reduce model stability, especially under complex hydrological conditions.”

Line 67: What is “tool wear prediction” in drought analysis?

Response: Thank you for your comment. We agree that the example from tool wear prediction was not sufficiently relevant to the context of hydrological drought prediction. Therefore, the related sentence has been removed from the revised manuscript to improve the coherence and relevance of the Introduction section.

Line 102: It is not clear what is meant by “greater”. Does it mean the area of mountains within the selected grid is larger than plains? Can the authors add the area values in the sentence for a better comparison and understanding?

Response: Thank you for your comment. We agree that the previous wording was unclear. The related sentence has been revised to provide a clearer description of the spatial distribution characteristics of mountainous and plain areas within the Huaihe River Basin. We modify the original sentence “Mountainous areas are greater than plains, and coastal areas are greater than inland.” to “Mountainous and hilly areas are mainly distributed in the western and southern parts of the basin, while plains are mainly distributed in the eastern regions.”

Line 105: Can some examples of extreme climatic events be added?

Response: Thank you for your comment. We modified the original sentence “Due to recurrent droughts in winter and spring, coupled with high temperatures and intense rainfall during summer and autumn, the Huaihe River Basin is highly susceptible to extreme climatic events” to “Due to recurrent droughts in winter and spring, coupled with high temperatures and intense rainfall during summer and autumn, the Huaihe River Basin is highly susceptible to extreme climatic events, particularly seasonal droughts and precipitation-induced floods”. And added : “Previous studies have shown that hydrological droughts in the basin mainly occur from October to May, indicating a pronounced seasonal recurrence pattern, while drought variability also exhibits multi-year periodicity, with significant periods of about 2-7 years for annual drought and 1-3 years for seasonal drought (Sun et al., 2019; Li et al., 2022).”

Line 106: Is the area primarily used for agriculture so that cropland losses are the most significant source of losses?
Line 106: What is the total area of the grid compared to the affected area?
Line 106: Is the affected area from only droughts or all climatic events?

Response: Thank you for your valuable comments. We have revised the related description to clarify that the Huaihe River Basin is an important agricultural production region and that the reported affected cropland area specifically refers to drought-related agricultural impacts.

Line 110: Can the authors add temporal and spatial resolutions of each dataset, perhaps in a Table?

Response: Thank you for the helpful suggestion. A new table entitled “Summary of datasets used in this study” has been added in Section 2.2 to summarize the temporal resolution, spatial resolution, and data sources of the datasets used in this study.

Line 117: What is meant by “potential climate prediction factors”?

Response: Thank you for your valuable comments. We modified the original sentence “the potential climate prediction factors” to “large-scale climate indices potentially related to hydrological drought prediction”.

Line 134: What does item refer to?

Response: Thank you for the comment. We agree that the term “item” was unclear in the original manuscript. The related description has been revised to explicitly indicate that the cumulative probability refers to the runoff series used for SRI calculation. In addition, the wording related to the SRI calculation process was further clarified for improved accuracy and readability.

Line 163: Can the authors clarify what will y(0) be, i.e., how is the loss calculated for the first base model?

Response: Thank you for your comment. We have added a clarification in the revised manuscript indicating that represents the initial prediction value before the iterative boosting process begins. This modification improves the clarity of the model formulation.

Line 172: Since the authors have not used RNN, it is unnecessary to compare LSTM with RNN.

Response: Thank you for commenting. In this study, RNN was not used as a comparative model. The related description was included only to briefly introduce the theoretical background and development motivation of the LSTM architecture, since LSTM is a variant of recurrent neural networks. Therefore, we retained the original description to preserve the completeness and continuity of the methodological introduction.

Line 268: How are the time lag step and embedding dimension determined?

Response: Thank you for your comment. Additional clarification has been added in the revised manuscript to explain that the embedded dimension and time lag step were determined during the CCM reconstruction process through repeated experimental testing to ensure stable causal analysis results.

Line 270: Mx is undefined: I assume it refers to the manifold.

Response: Thank you for your comment. We add in the relevant position of Section 3.6: “ represents the reconstructed shadow manifold of variable .”

Line 281: What does it mean by “library size” and how is it increased?

Response: Thank you for your comment. In CCM analysis, the library size represents the number of samples used for manifold reconstruction, which gradually increases as more time series samples are included in the analysis.

Line 283: What are the “drivers”?

Response: Thank you for your comment. Modify “drivers” to “driving factors”.

Line 288: What is meant by “scale data”?

Response: Thank you for the comment. Modify “monthly scale data” to “monthly data”.

Line 294: The sentence is unclear: what is 1st, 17th, etc. region? Is the analysis done for only certain grid points, or is the analysis included in the manuscript for only certain grid points?

Response: Thank you for your comments. The 1st, 17th and other numbered areas refer to the grid area within the Huaihe River Basin (Figure 1). In addition, the revised manuscript now clearly states that all 28 grid areas were analysed and only representative areas (1^st, 7^th, 17^th, 21^st) were discussed due to space constraints.

Table 2: What does lead time mean? Does T=1 SPI-1 mean the SPI-1 value 1 (month?) prior to the value being predicted?

Response: Thank you for your comment. We have added the relevant explanation to the original table 2: “Drought influencing factors input by the Boruta feature selection algorithm ((t) represent drought indices at month t, while “T=1”, “T=2”, and “T=3” denote one-, two-, and three-month lead times, respectively. The prediction target is SRI-1 at month t+1.”

Table 2: Please include definitions of acronyms

Response: Thank you for your comment. Abbreviations in related definitions have been expressed in Section 2.2.

Table 2: Please include units where applicable

Response: Thank you for your comment. Units for the relevant variables have been added to Table 2 in the revised manuscript to improve the clarity and completeness of the dataset description.

Table 2: What are the ranges of each of the features in the training dataset and the validation dataset?

Response: Thank you for your comment. In this study, all input features were derived from monthly datasets covering the period 1960 to 2020. The training dataset included monthly samples from 1960 to 2010, while the validation dataset covered the period from 2011 to 2020.

Line 298: How were 35 factors selected out of 31?

Response: Thank you for your comment. The number “35” in the original manuscript was a typographical error. It has been corrected to “31” in the revised manuscript.

Line 299: How was the random forest model, over which the Boruta selection algorithm is applied, trained and its hyperparameters selected? Which dataset was used for training? Was any validation done for the trained model before implementing the Boruta feature selection?

Response: Thank you for your valuable comment. In this study, the Boruta feature selection process was implemented based on the random forest algorithm using the training dataset only. The random forest model was employed as an internal feature importance estimator within the Boruta framework rather than as an independent predictive model. Therefore, separate validation of the random forest model was not performed.

Line 300: What is the cutoff for determining feature significance, and what are the ranges and units of feature importance?

Response: Thank you for your comment. We have supplemented the relevant description in Section 4.1: “In this study, feature importance is quantified as a Z-score derived from the Mean Decrease Accuracy (MDA) of Random Forest; this score is unitless and data‑dependent. The significance cutoff is not fixed but adaptive: a feature is considered significant if its importance consistently exceeds the maximum importance of the shadow features across Boruta iteration.”

Fig 5, 6: What are the error bars on the plot?

Response: Thank you for your comment. We have added Section 4.1: “The boxplots represent the distribution of Z-score feature importance values generated during the Boruta iterations.”

Line 302: The figure scale and resolution did not allow for identifying the Blue color on the plots. As a result it is unclear if the blue features mark the boundary between red and green. I suspect that since blue features are randomly generated, any features less important than Blue implies non-significance.

Response: Thank you for the comment. The blue features do not represent the boundary between the red and green regions. Instead, the different colors indicate different contribution characteristics of the input features to the model prediction. We have modified “Blue represents shadow features, which are randomly disrupted features for comparison with actual features to determine whether actual features are more important than random features.” in Section 4.1 to “Blue boxplots represent the shadow features generated randomly by the Boruta algorithm and are used as the reference threshold for feature significance evaluation.”

Line 305: What leads to indetermination of feature significance (yellow)?

Response: Thank you for the comment. The importance of these yellow features cannot stably exceed the shadow features, so they are eliminated. We have added in Section 4.1: “Since the importance of these undetermined features could not consistently exceed that of the shadow features during the Boruta iterations, they were not retained in the final feature subset.”

Line 307: Feature redundancy is not considered a significant concern for well trained neural net models. Can the authors clarify whether they intend the statement to apply only for the random forest model that the Boruta selection algorithm is applied on?

Response: Thank you for the comment. In this study, a large number of drought-related influencing factors were considered as model inputs. The results indicate that Boruta feature selection effectively improved the prediction accuracy of the neural network models by removing redundant and irrelevant variables. We believe that, when a large number of input variables are involved in hydrological drought prediction, Boruta feature selection can effectively enhance the learning ability and prediction performance of neural network models.

Line 311: The basis of the statement is unclear. Since the authors are “removing” unimportant features, why does the feature selection process result in the conclusion that “addition of climate indices improves prediction”?

Response: Thank you for the comment. We agree that the original wording could lead to ambiguity. We have deleted the relevant statements.

Section 4.2: What were the other parameter values used in the models, e.g., stride, padding, batch normalization, layer normalization, etc.?

Response: Thank you for your comments. In order to improve the repeatability of the model, we have the following paragraph to Section 4.2: “Additional model settings include pooling stride = 2, padding = “same”, batch normalization, ReLU activation, and Adam optimizer.”

Table 3: How is xgboost used for time series forecasting in the 3rd and 6th models?

Response: Thank you for your comments. In the XGBoost-based models, the input features were constructed using hydrometeorological variables from the previous month, while the target variable corresponded to the current-month SRI value. Therefore, the models performed one-month lead-time hydrological drought prediction through a lagged input-output structure.

Figure 8: What is the range of SPI-1 index?

Response: Thank you for the comment. The revised manuscript now clarifies that the predicted variable in this study is SRI-1 rather than SPI-1, and that drought conditions were identified when the SRI-1 value was lower than -0.5.

Figure 13: Can the authors include a description of the boxplots in the figure caption: e.g., the interquartiles of the whiskers, outliers, etc.

Response: Thank you for the comment. Additional descriptions of the boxplot components, including the interquartile range and whiskers, have been added to the caption of Figure 13.

Figure 13: Is the box plot constructed across the entire spatiotemporal domain, i.e., all 28 grid points and the 120 months of validation set?

Response: Thank you for the comment. Figure 13 summarizes the prediction errors across all 28 grid regions and all validation-period samples.

Line 376: It is unclear how to interpret Figure 13 to identify that Boruta-CNN-XGBoost has the highest error. Error bars of other models are much higher with similar medians.

Response: Thank you for the comment. We have modified “In contrast, the Boruta-CNN-XGBoost model showed the highest error level among all indicators, reflecting weak predictive performance.” in Section 4.3 to “In contrast, the Boruta-CNN-XGBoost model generally exhibited relatively large errors and greater dispersion across multiple evaluation indicators, reflecting comparatively weaker predictive performance.”.

Line 378: Why did the addition of CNN reduce the performance in the Xgboost variants, and improve the performance in LSTM?

Response: Thank you for the comment. We have modified “For BiLSTM and LSTM models, prediction accuracy is significantly improved after adding CNN, which confirms the role of CNN in improving spatial feature extraction and model robustness.” in Section 4.3 to “For the BiLSTM and LSTM models, prediction accuracy improved significantly after introducing CNN-based feature extraction, indicating that CNN can effectively enhance local feature representation and improve temporal learning ability in recurrent neural network structures. However, for the XGBoost-based models, the transformed CNN features may weaken some of the original relationships among the input variables utilized by tree-based learning methods, leading to relatively limited performance improvement.”.

Line 380: I am unfamiliar with Taylor diagrams. Can the authors add a short description about how to interpret them?

Response: Thank you for the comment. We added the relevant explanation in Section 4.3: “In the Taylor diagram, points closer to the reference point indicate better model performance, corresponding to higher correlation coefficients and smaller deviations from the observed values.”

Line 394: Suggest including the results of the CCM in the Results section, instead of Discussion section.

Response: Thank you for the comment. In the revised manuscript, the main CCM analysis results and corresponding figures have been incorporated into the Results section, while the related mechanism interpretation and discussion have been retained in the Discussion section to improve the organization and readability of the manuscript.

Line 408, 409: Repeated statements

Response: Thank you for the comment. We have deleted the repeated expression.

Figure 17: Can a raincloud plot be described and how to interpret it?

Response: Thank you for the comment. We have added the relevant explanation in the new section 4.4: “Figure 17 is a raincloud plot, which combines probability density distributions, boxplots, and scatter distributions to visualize the distribution characteristics, dispersion, and variability of the CCM ρ-values.”

Citation: https://doi.org/10.5194/egusphere-2026-1033-AC2

RC2:
'Comment on egusphere-2026-1033', Anonymous Referee #2, 12 May 2026
This study presents a hybrid machine learning framework (Boruta-CNN-BiLSTM) for predicting hydrological drought (SRI-1) in the Huaihe River Basin. The authors integrated the Boruta feature selection algorithm with a CNN-BiLSTM deep learning architecture to enhance prediction accuracy. The methodology is generally sound and well-executed from a computational perspective. Below are specific comments and suggestions for improvement.
The conclusion mentions that "the average RMSE decreased by 0.42". It is suggested to add an explanation of which baseline model this is relative to (e.g., the model without using Boruta, or the single BiLSTM model), otherwise it will be difficult for readers to intuitively judge the extent of improvement.

The terms "Huaihe River Basin" and "Huaihe River basin" are used interchangeably throughout the text. It is recommended to use the term "Huaihe River Basin" with the first letter capitalized.

In the description of variable names in formulas (5)-(8) by LSTM in Section 3.3.2, "donate" is a spelling error and should be "denote".

The layout of Table 2 is relatively messy. It is recommended to divide it into two columns: "Factor ID" and "Description".

The paper primarily explains how to enhance the accuracy of the prediction model and suggests appropriately strengthening the discussion on hydrological physical mechanisms.

Apart from enhancing the accuracy of model predictions, the innovation of this paper lies in its appropriate integration with hydrological processes, elucidating its academic contribution to improving the interpretability of hydrological drought predictions.

Overall, the methodology is robust, the results are comprehensively validated across 28 grid regions, and the conclusions are well-supported by the data. Therefore, this paper makes a valuable contribution to the field of drought forecasting and water resources management and is recommended for publication after minor revisions.
Citation: https://doi.org/10.5194/egusphere-2026-1033-RC2
- AC3:
  'Reply on RC3', Li min, 31 May 2026
  Reviewer:
  The conclusion mentions that "the average RMSE decreased by 0.42". It is suggested to add an explanation of which baseline model this is relative to (e.g., the model without using Boruta, or the single BiLSTM model), otherwise it will be difficult for readers to intuitively judge the extent of improvement.
  
  Response: Thank you for the valuable comment. We agree that in order to improve the readability of the conclusions, the benchmark model should be clearly defined. Therefore, in the conclusion part, we modify “Boruta-CNN-BiLSTM showed the largest gain, with an average RMSE decrease of 0.42, a mean MAE decrease of 0.32, an average NRMSE decrease of 0.008, a mean MSE decrease of 0.03, and a median R² increase of 3.65%.” to “Boruta-CNN-BiLSTM showed the largest improvement compared with the original CNN-BiLSTM model without Boruta feature selection, with an average RMSE decrease of 0.42, a mean MAE decrease of 0.32, an average NRMSE decrease of 0.008, a mean MSE decrease of 0.03, and a median R² increase of 3.65%.”.
  
  The terms "Huaihe River Basin" and "Huaihe River basin" are used interchangeably throughout the text. It is recommended to use the term "Huaihe River Basin" with the first letter capitalized.
  
  Response: Thank you for your valuable comment. We have checked the entire manuscript and standardized the terminology by consistently using “Huaihe River Basin” throughout the paper.
  
  In the description of variable names in formulas (5)-(8) by LSTM in Section 3.3.2, "donate" is a spelling error and should be "denote".
  
  Response: Thank you for your valuable comment. The word “donate” has been corrected to “denote” in the descriptions of Eqs. (5)–(8) in Section 3.3.2.
  
  The layout of Table 2 is relatively messy. It is recommended to divide it into two columns: "Factor ID" and "Description".
  
  Response: Thank you for your valuable comment. We have reorganized Table 2 into a clear two-column format consisting of “Factor ID” and “Description”, which improves the readability and presentation of the drought influencing factors.
  
  The paper primarily explains how to enhance the accuracy of the prediction model and suggests appropriately strengthening the discussion on hydrological physical mechanisms.
  
  Response: Thank you for your valuable comment. In order to solve this problem, the first three paragraphs of the Discussion section have been revised as follows:
  In order to analyze the reasons for the spatial differences in the prediction accuracy of the Boruta-CNN-BiLSTM model, the most influential factors obtained by the Boruta method were selected, namely precipitation, volumetric soil water (0-7cm), volumetric soil water (7-28cm) and surface net solar radiation. The CCM method was used to quantify the impact of each influencing factor on the model evaluation index R2. The results are shown in Figures 16 and 17. According to Figures 15, 16 and 17, precipitation is the most significant influencing factor affecting the prediction accuracy of the model across the entire watershed. The mean value of the ρ-value is close to 0.9, and the range is mainly between 0.8 and 1.0. The data are relatively concentrated, which indicates that the model's prediction accuracy is sensitive to precipitation and is distributed relatively uniformly in space. This suggests that SRI-1 is strongly controlled by short-term water inputs, and precipitation directly influences runoff generation on a monthly scale.
  VSW1 and VSW2 are factors that have a greater impact on model prediction accuracy after precipitation. The mean ρ-value ranging from 0.4 to 0.5, and the range is mainly between 0.2 and 0.7. The wide range of data distribution indicates that the sensitivity of the model's prediction accuracy to volumetric soil water varies significantly in space. The specific manifestation is that the sensitivity in the upper and middle reaches is greater than that in downstream areas. This suggests that antecedent soil moisture influences runoff response through its persistence effect and plays a crucial role in drought persistence. The distribution of VSW1 and VSW2 across the basin is uneven, though its distribution is consistent with partial sensitivity.
  The factor that has the least impact on the model's prediction accuracy is SNSR, with a mean close to 0.2 and a distribution range between 0 and 0.4. Although its direct impact is limited, it may still influence hydrological drought indirectly through surface energy balance and evapotranspiration processes. The most influential factors obtained by the Boruta method indicate that model performance is closely related to key hydrological processes, including precipitation-driven runoff generation, soil moisture memory effects, and energy-controlled evapotranspiration, which jointly influence short-term hydrological drought evolution.
  
  Apart from enhancing the accuracy of model predictions, the innovation of this paper lies in its appropriate integration with hydrological processes, elucidating its academic contribution to improving the interpretability of hydrological drought predictions.
  
  Response: Thank you for your valuable comment. In the last paragraph of the introduction, we modified “Notably, the Boruta-CNN-BiLSTM framework developed in this study enables the quantitative interpretation of the relative importance of key drought-controlling factors. This not only improves the mechanistic understanding of data-driven drought prediction but also makes the model consistent with hydrological mechanisms, thereby providing a useful reference for integrating hydrological process understanding with data-driven drought forecasting.” to “Notably, the Boruta-CNN-BiLSTM framework developed in this study, together with the CCM analysis, enables the quantitative interpretation of the relative importance and hydrological influence of key drought-controlling factors. This not only improves the mechanistic interpretability of data-driven hydrological drought prediction, but also enhances the consistency between the model prediction results and hydrological response processes, thereby providing a useful reference for integrating hydrological process understanding with data-driven drought forecasting.”.
  We add in the third point of the conclusion: These findings indicate that the proposed model not only achieves high prediction accuracy, but also effectively reflects key hydrological response relationships associated with hydrological drought evolution across different regions of the Huaihe River Basin.
  
  Citation: https://doi.org/10.5194/egusphere-2026-1033-AC3

Min Li, Yuhang Yao, Ming Ou, and Changman Yin

Viewed

Total article views: 464 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
313	121	30	464	15	25

HTML: 313
PDF: 121
XML: 30
Total: 464
BibTeX: 15
EndNote: 25

Views and downloads (calculated since 02 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	169	77	21	267
May 2026	125	25	5	155
Jun 2026	17	4	4	25
Jul 2026	2	15	0	17

Cumulative views and downloads (calculated since 02 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	169	77	21	267
May 2026	125	25	5	155
Jun 2026	17	4	4	25
Jul 2026	2	15	0	17

Viewed (geographical distribution)

Total article views: 450 (including HTML, PDF, and XML) Thereof 450 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 16 Jul 2026

Short summary

We built a hybrid machine learning model that first screened many weather and land metrics, retaining only the most informative metrics, and then learned from decades of monthly records to predict droughts. Through the test of 28 regions in the Huaihe River Basin of China from 2011 to 2020, its accuracy is higher than that of multiple comparison models.


Total:	0
HTML:	0
PDF:	0
XML:	0