the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A hybrid model based on Boruta feature selection and neural network for forecasting hydrological drought
Abstract. Accurate hydrological drought prediction is vital for water management. This study proposes a hybrid model combining Boruta feature selection, convolutional neural network (CNN), and bidirectional long short-term memory (BiLSTM) methods, to predict hydrological drought in the Huaihe River Basin of China. The Boruta algorithm selected key predictors from 31 potential drought-influencing factors. By comparing the established model Boruta-CNN-BiLSTM with other models, including Boruta-CNN-LSTM, Boruta-CNN-XGBoost, Boruta-BiLSTM, Boruta-LSTM, and Boruta-XGBoost, the results show Boruta significantly enhances all models. The Boruta-CNN-BiLSTM model has achieved the highest accuracy across 28 basin grid regions, exhibiting the largest performance gains. Furthermore, the prediction performance of the model is mainly influenced by factors such as precipitation, volumetric soil water (0–7 cm), volumetric soil water (7–28 cm) and surface net solar radiation. The model's prediction performance is most affected by precipitation, followed by volumetric soil water (0–7 cm), volumetric soil water (7–28 cm), and surface net solar radiation has the least impact. It provides enhanced support for basin-scale drought risk assessment and water resources management.
- Preprint
(2475 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 05 Jun 2026)
-
CC1: 'Comment on egusphere-2026-1033', Nima Zafarmomen, 06 Apr 2026
reply
-
AC1: 'Reply on CC1', Li min, 08 May 2026
reply
- While the hybrid Boruta-CNN-BiLSTM framework performs well, similar CNN-LSTM/BiLSTM hybrid approaches have been widely explored. The manuscript would benefit from more explicitly clarifying what is fundamentally new beyond performance improvement.
Respond: Thank you for your valuable comments. In the original manuscript, the research contribution was not clearly stated. To address this issue, paragraphs 5 and 6 of the Introduction section are amended as follows:
Integrating hydrological process understanding with data-driven models has become a research focus in hydrological simulation, as it can effectively enhance the physical rationality and prediction reliability of models. For instance, Zafarmomen et al. (2024) demonstrated how integrating remotely sensed vegetation dynamics can improve hydrological representation and predictive performance. Such studies collectively highlight that incorporating physically meaningful variables and process knowledge is crucial for strengthening the interpretability of data-driven models, which is a key direction for current hydrological drought prediction research. However, existing studies on data-driven hydrological drought prediction still have obvious shortcomings. On the one hand, most studies rely on a large set of input variables without explicitly evaluating their relevance to drought, which easily introduces redundant information and further affects model stability. On the other hand, although many hybrid models have attempted to combine the advantages of multiple algorithms to improve prediction performance, they lack effective optimization of the matching between input features and model structures, which limits the further improvement of prediction accuracy.
To address the above shortcomings, this study aims to develop a novel hybrid machine learning model (Boruta-CNN-BiLSTM) to improve the accuracy and interpretability of hydrological drought prediction. The specific research objectives are as follows: First, the Boruta algorithm is adopted to objectively and accurately select the most relevant features for hydrological drought from 31 potential hydro-meteorological variables. This step is intended to reduce feature redundancy, mitigate the risk of model overfitting, and lay a solid foundation for improving model performance. Second, the selected key features are combined with the CNN-BiLSTM model to fully leverage the spatial feature extraction capability of CNN and the bidirectional temporal data processing capability of BiLSTM. This integration is designed to enhance the model’s ability to characterize complex hydrological drought dynamics and further improve its predictive performance. Finally, the performance of the proposed Boruta-CNN-BiLSTM model is validated using actual hydrological data from 28 regions in the Huaihe River Basin. Meanwhile, the model is compared with other benchmark models to verify its applicability and superiority under different spatial conditions. Notably, the Boruta-CNN-BiLSTM framework developed in this study enables the quantitative interpretation of the relative importance of key drought-controlling factors (e.g., precipitation, soil moisture, and net radiation). This not only improves the mechanistic understandability of data-driven drought prediction but also makes the model consistent with hydrological mechanisms, thereby providing a useful reference for integrating hydrological process understanding with data-driven drought forecasting.
- The manuscript could be strengthened by incorporating recent studies that integrate hydrological process understanding with data-driven modeling. For example, Zafarmomen et al. (2024), “Assimilation of Sentinel‐based Leaf Area Index for Modeling Surface–Ground Water Interactions in Irrigation Districts,” demonstrates how integrating remotely sensed vegetation dynamics can improve hydrological representation and predictive performance.
Response: Thank you for this valuable and constructive suggestion. We fully agree that integrating hydrological process understanding with data-driven modeling can significantly improve the physical rationality and interpretability of drought prediction models. Following your advice, we have carefully read the recommended literature by Zafarmomen et al. (2024) and supplemented by relevant discussion in the Introduction section (paragraphs 5 and 6). We have added citations and comments on the importance of combining physical mechanism understanding with advanced data-driven approaches to our study. These revisions have strengthened the connection between our data-driven framework and hydrological process understanding, and enriched the background and significance of this research.
- The study mainly focuses on predictive accuracy. A deeper discussion on hydrological interpretability of the model outputs would strengthen the contribution.
Response: Thank you for your valuable comments. In the original manuscript, the discussion section mainly focused on statistical relations, and the hydrological interpretation was limited. In order to solve this problem, the first three paragraphs of the Discussion section have been revised as follows:
In order to analyze the reasons for the spatial differences in the prediction accuracy of the Boruta-CNN-BiLSTM model, the most influential factors obtained by the Boruta method were selected, namely precipitation, volumetric soil water (0-7cm), volumetric soil water (7-28cm) and surface net solar radiation. The CCM method was used to quantify the impacts of each influencing factor on the model evaluation index R2. The results are shown in Figures 16 and 17. According to Figures 15, 16 and 17, precipitation is the most significant influencing factor affecting the prediction accuracy of the model across the entire watershed. The mean value of the ρ-value is close to 0.9, and the range is mainly between 0.8 and 1.0. The data are relatively concentrated, which indicates that the model's prediction accuracy is sensitive to precipitation and is distributed relatively uniformly in space. This suggests that SRI-1 is strongly controlled by short-term water input, and precipitation directly influences runoff generation on a monthly scale.
VSW1 and VSW2 are factors that have a greater impact on model prediction accuracy after precipitation. The mean ρ-value ranging from 0.4 to 0.5, and the range is mainly between 0.2 and 0.7. The wide range of data distribution indicates that the sensitivity of the model's prediction accuracy to volumetric soil water varies significantly in space. The specific manifestation is that the sensitivity in the upper and middle reaches is greater than that in downstream areas. This suggests that antecedent soil moisture influences runoff response through its persistence effect and plays a crucial role in drought persistence. The distribution of VSW1 and VSW2 across the basin is uneven, though its distribution is consistent with partial sensitivity.
The factor that has the least impact on the model's prediction accuracy is SNSR, with a mean close to 0.2 and a distribution range between 0 and 0.4. Although its direct impact is limited, it may still influence hydrological drought indirectly through surface energy balance and evapotranspiration processes. The most influential factors obtained by the Boruta method indicate that model performance is closely related to key hydrological processes, including precipitation-driven runoff generation, soil moisture memory effects, and energy-controlled evapotranspiration, which jointly influence short-term hydrological drought evolution.
- The analysis is limited to monthly SRI-1. Since drought processes are scale-dependent, a short discussion on multi-timescale applicability (e.g., SRI-3, SRI-6) would be valuable.
Response: Thank you for your valuable comments. In the original manuscript, the discussion on timescale applicability was relatively brief. To solve this problem, the paragraph 5 of the Discussion section is amended as follows:
This study analyzed the SRI based on a one-month time scale, constructed several prediction models, and evaluated the effectiveness of the prediction models from multiple aspects. The results show that the Boruta-CNN-BiLSTM model has the most effective prediction effect. However, the SRI on different time scales may have a significant impact on the performance of the prediction model. At longer timescales, such as SRI-3 or SRI-6, hydrological drought is more strongly influenced by cumulative precipitation, basin storage conditions, and low-frequency climate variability. As a result, the relative importance of predictors, particularly soil moisture and large-scale climatic factors, may change, and model performance may vary across timescales. Evaluating the proposed framework under multi-timescale conditions would provide a more comprehensive understanding of its applicability. In addition to that, drought is also affected by human activities, basin geographical features, etc. For future research, the uncertainty of the model's prediction performance due to different time scales and various influence factors can be considered.
- While multiple models are compared, the inclusion of a simpler baseline (e.g., MLP or linear model) would help better quantify the added value of the hybrid architecture.
Response: Thank you for this valuable and constructive suggestion. We fully agree that the inclusion of simpler baseline models is helpful to better quantify the added value of the proposed Boruta-CNN-BiLSTM hybrid architecture, as it can more intuitively reflect the performance advantages brought by the hybrid structure and the Boruta feature selection strategy, thereby enhancing the comprehensiveness and rigor of model comparison. To solve this problem, the paragraph 4 of the Discussion section is amended as follows:
The deep learning model relies on deep network structures and is adept at capturing spatio-temporal correlations and complex nonlinear patterns in data, making it a research hotspot in drought prediction in recent years. Traditional statistical models, such as linear regression models, are essentially unable to capture the complex nonlinear relationship between drought influencing factors and SRI, while traditional machine learning models, such as multi-layer perceptrons (MLPs), lack the ability to extract spatial features and capture bidirectional temporal dependencies. In contrast, the Boruta-CNN-BiLSTM model integrates Boruta feature selection, CNN-based spatial feature extraction, and BiLSTM-based bidirectional temporal learning, effectively overcoming the inherent limitations of simple baseline models.
-
AC1: 'Reply on CC1', Li min, 08 May 2026
reply
-
RC1: 'Comment on egusphere-2026-1033', Anonymous Referee #1, 11 May 2026
reply
Li et al have presented their analyses of 6 ML based models implemented for spatio-temporal forecasting of the SPI-1 drought index. The analysis and the proposed algorithms are interesting, and the manuscript language is easy to understand.
However, the manuscript is currently lacking in sufficient description of the dataset, especially the predicted variable, in order to validate the authors’ conclusions, and understand its utility. I also have concerns regarding data leakage based on my interpretation of the manuscript, especially related to using SPI-1 both as predicted and predictor variables, and using the validation set for hyperparameter tuning. Additionally more information about their chosen models, parameters, methods, and validation will further improve the manuscript for future readers.
I have listed my major and minor comments for improvement below:
- What is the lead time over which SPI-1 is predicted? For example, is it predicted 1 month is advance, or 10 years in advance? Are multiple temporal points predicted or a single future prediction is made?
- Table 2: If SPI-1 is the predicted variable, how is it also an influencing factor? Using it both as an input and the output of the model creates data leakage, and would result in a perfect model. This would also explain the extremely high importance of this variable in the analyses.
- Line 319: What was the cross-validation dataset used for hyperparameter tuning? If the authors used the same post-2010 data as validation set for hyperparameter tuning, then that creates a case for data leakage, and the predictive ability of the trained models over the same validation set can no longer be generalized.
- Paragraph 37: When describing drought prediction, can the authors add information about the time horizons for predictions, e.g., are the droughts predicted one year in advance, etc.?
- Line 21: Can the authors add some examples of loss amounts to further support the statement and help orient readers?
- Line 103: Can the authors add more information about the droughts, e.g., how often do they recur?
- Line 115: What is meant by “interpolation method in array”? Can more information about mapping data to the grid be added? This also relates to previous comment about spatial resolution to better understand the interpolation methodology.
- Eq1, 2: The equations do not have any SRI term, so it is unclear how the SRI relates to the probability distributions. Can the authors clarify the STI calculation, including the values of the parameters alpha, gamma, and x.
- Section 3.3.4: Can the authors also list the loss function that they used for the model depicted in Fig 4?
- Line 242: Both RMSE and MSE emphasize the larger error since they both use the squared error. Why is one better than the other?
- Section 3.5: Since the observations are spatiotemporal, can the authors also list how the observations are combined spatially and temporally for the listed metrics?
- Section 4: Can the predicted variable: SPI-1 range be also included? Additionally at what value is a drought state considered? How many times did droughts occur in the study region based on the SPI-1 value, both in the training and the validation sets?
- Section 4.1: How is the final list of the important features determined after identifying them separately at each of the 28 grid points?
- Line 320: The six models have not been described prior to their mention here except one statement in the Abstract. Suggest explaining the models prior to their inclusion here.
- Line 321: Can the authors provide a list of all the hyperparameters that were tuned for each model, the range used for hyperparameters, scaling (linear vs exponential), and number of iterations? Please also include the loss function and whether they were different across the models.
- Table 3: What is the input size used for CNN? Based on previous descriptions, it appears that the data is prepared for the 28 grid points, which is constructed roughly from a 10x5 grid. If that’s the case, how can the filter size be greater than the number of grid points, e.g., 25?
- Line 327: What is the baseline model against which the improvement is shown?
- Line 334: Without any information about the baseline model, it is not possible to verify the accuracy of this conclusion.
- Line 14 and 16: Repetitive
- Line 28: The authors list the drought indices as addressing the challenges of “monitoring and predicting droughts”. Since they are using SRI to characterize droughts, the sentence structure gives the impression that SRI can also be used to “predict” droughts. Suggest clarifying that the listed indices are only designed to characterize droughts.
- Line 54: The sentence is too broad, and does not provide sufficient reasons for the importance of feature selection. For neural network models, it is in fact the lack of need to select features that make them attractive models so why would feature selection be needed to improve their performance? This is important to include as it is one of the key motivations for the manuscript.
- Line 67: What is “tool wear prediction” in drought analysis?
- Line 102: It is not clear what is meant by “greater”. Does it mean the area of mountains within the selected grid is larger than plains? Can the authors add the area values in the sentence for a better comparison and understanding?
- Line 105: Can some examples of extreme climatic events be added?
- Line 106: Is the area primarily used for agriculture so that cropland losses are the most significant source of losses?
- Line 106: What is the total area of the grid compared to the affected area?
- Line 106: Is the affected area from only droughts or all climatic events?
- Line 110: Can the authors add temporal and spatial resolutions of each dataset, perhaps in a Table?
- Line 117: What is meant by “potential climate prediction factors”?
- Line 134: What does item refer to?
- Line 163: Can the authors clarify what will y(0) be, i.e., how is the loss calculated for the first base model?
- Line 172: Since the authors have not used RNN, it is unnecessary to compare LSTM with RNN.
- Line 268: How are the time lag step and embedding dimension determined?
- Line 270: Mx is undefined: I assume it refers to the manifold.
- Line 281: What does it mean by “library size” and how is it increased?
- Line 283: What are the “drivers”?
- Line 288: What is meant by “scale data”?
- Line 294: The sentence is unclear: what is 1st, 17th, etc. region? Is the analysis done for only certain grid points, or is the analysis included in the manuscript for only certain grid points?
- Table 2: What does lead time mean? Does T=1 SPI-1 mean the SPI-1 value 1 (month?) prior to the value being predicted?
- Table 2: Please include definitions of acronyms
- Table 2: Please include units where applicable
- Table 2: What are the ranges of each of the features in the training dataset and the validation dataset?
- Line 298: How were 35 factors selected out of 31?
- Line 299: How was the random forest model, over which the Boruta selection algorithm is applied, trained and its hyperparameters selected? Which dataset was used for training? Was any validation done for the trained model before implementing the Boruta feature selection?
- Line 300: What is the cutoff for determining feature significance, and what are the ranges and units of feature importance?
- Fig 5, 6: What are the error bars on the plot?
- Line 302: The figure scale and resolution did not allow for identifying the Blue color on the plots. As a result it is unclear if the blue features mark the boundary between red and green. I suspect that since blue features are randomly generated, any features less important than Blue implies non-significance.
- Line 305: What leads to indetermination of feature significance (yellow)?
- Line 307: Feature redundancy is not considered a significant concern for well trained neural net models. Can the authors clarify whether they intend the statement to apply only for the random forest model that the Boruta selection algorithm is applied on?
- Line 311: The basis of the statement is unclear. Since the authors are “removing” unimportant features, why does the feature selection process result in the conclusion that “addition of climate indices improves prediction”?
- Section 4.2: What were the other parameter values used in the models, e.g., stride, padding, batch normalization, layer normalization, etc.?
- Table 3: How is xgboost used for time series forecasting in the 3rd and 6th models?
- Figure 8: What is the range of SPI-1 index?
- Figure 13: Can the authors include a description of the boxplots in the figure caption: e.g., the interquartiles of the whiskers, outliers, etc.
- Figure 13: Is the box plot constructed across the entire spatiotemporal domain, i.e., all 28 grid points and the 120 months of validation set?
- Line 376: It is unclear how to interpret Figure 13 to identify that Boruta-CNN-XGBoost has the highest error. Error bars of other models are much higher with similar medians.
- Line 378: Why did the addition of CNN reduce the performance in the Xgboost variants, and improve the performance in LSTM?
- Line 380: I am unfamiliar with Taylor diagrams. Can the authors add a short description about how to interpret them?
- Line 394: Suggest including the results of the CCM in the Results section, instead of Discussion section.
- Line 408, 409: Repeated statements
- Figure 17: Can a raincloud plot be described and how to interpret it?
Citation: https://doi.org/10.5194/egusphere-2026-1033-RC1 -
RC2: 'Comment on egusphere-2026-1033', Anonymous Referee #2, 12 May 2026
reply
This study presents a hybrid machine learning framework (Boruta-CNN-BiLSTM) for predicting hydrological drought (SRI-1) in the Huaihe River Basin. The authors integrated the Boruta feature selection algorithm with a CNN-BiLSTM deep learning architecture to enhance prediction accuracy. The methodology is generally sound and well-executed from a computational perspective. Below are specific comments and suggestions for improvement.
- The conclusion mentions that "the average RMSE decreased by 0.42". It is suggested to add an explanation of which baseline model this is relative to (e.g., the model without using Boruta, or the single BiLSTM model), otherwise it will be difficult for readers to intuitively judge the extent of improvement.
- The terms "Huaihe River Basin" and "Huaihe River basin" are used interchangeably throughout the text. It is recommended to use the term "Huaihe River Basin" with the first letter capitalized.
- In the description of variable names in formulas (5)-(8) by LSTM in Section 3.3.2, "donate" is a spelling error and should be "denote".
- The layout of Table 2 is relatively messy. It is recommended to divide it into two columns: "Factor ID" and "Description".
- The paper primarily explains how to enhance the accuracy of the prediction model and suggests appropriately strengthening the discussion on hydrological physical mechanisms.
- Apart from enhancing the accuracy of model predictions, the innovation of this paper lies in its appropriate integration with hydrological processes, elucidating its academic contribution to improving the interpretability of hydrological drought predictions.
Overall, the methodology is robust, the results are comprehensively validated across 28 grid regions, and the conclusions are well-supported by the data. Therefore, this paper makes a valuable contribution to the field of drought forecasting and water resources management and is recommended for publication after minor revisions.
Citation: https://doi.org/10.5194/egusphere-2026-1033-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 216 | 81 | 23 | 320 | 12 | 19 |
- HTML: 216
- PDF: 81
- XML: 23
- Total: 320
- BibTeX: 12
- EndNote: 19
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The manuscript presents a relevant and well-structured contribution to hydrological drought prediction by proposing a hybrid Boruta–CNN–BiLSTM framework. The integration of feature selection with deep learning is timely and aligns well with current research trends in hydroinformatics and data-driven modeling.
One of the main strengths of the study is the systematic combination of feature selection (Boruta) and hybrid deep learning architectures, which addresses a common limitation in drought prediction models, the presence of redundant or irrelevant predictors. The use of 31 potential predictors and their reduction through Boruta provides a clear methodological advantage and improves model interpretability
Overall, the manuscript is methodologically sound, clearly organized, and relevant for both scientific and applied drought prediction contexts. I will put some minor comments:
The manuscript is strong and suitable for publication after minor revisions. The suggested comments mainly aim to improve clarity, positioning, and broader impact rather than requiring major methodological changes.