the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Hydrological drought prediction and its influencing factors analysis based on a machine learning model
Abstract. Predicting future drought conditions is crucial for effective disaster management. In this study, a machine learning framework is proposed to predict hydrological drought in the Huaihe River Basin, China. The interpretable Extreme Gradient Boosting (XGBoost) model is applied to forecast four drought categories in 28 grid regions, using 26 factors for monthly and 18 for seasonal predictions. The framework also integrates the Shapley Additive Explanation (SHAP) variable importance index to infer drought prediction factors. The model achieves 79.9 % accuracy in classifying droughts, with the Standard Precipitation Index (SPI) being the most influential factor. The SHAP values of SPI are 0.360, 0.261, 0.169, and 0.247 for spring, summer, autumn, and winter, respectively. Soil moisture content and evapotranspiration are particularly affected in spring and autumn, while large-scale climatic factors are more significant in summer and winter. Overall, this study offers valuable decision support for regional drought management and water resource allocation.
- Preprint
(1744 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-1891', Anonymous Referee #1, 14 Jul 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1891/egusphere-2025-1891-RC1-supplement.pdf
-
AC2: 'Reply on RC1', Li min, 26 Jul 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1891/egusphere-2025-1891-AC2-supplement.pdf
-
AC2: 'Reply on RC1', Li min, 26 Jul 2025
-
RC2: 'Comment on egusphere-2025-1891', Anonymous Referee #2, 16 Jul 2025
This paper proposes an interpretable machine learning framework to predict hydrological droughts in the Huaihe River Basin of China, emphasizing the impact of meteorological precursors, large-scale climate indices, and land surface processes across spatial grids and seasons. By integrating 26 influencing factors and quantifying their contributions through SHAP values, this study advances drought prediction beyond conventional statistical models and provides feasible insights for regional water resources management. This work addresses the timely need for interpretable AI in climate risk assessment, and the structure is clear. However, the following points need to be paid attention to strengthen its scientific rigor.
Â
- The method partially refers to ' Table 1 ' (SPI / SRI classification standard), but the subsequent ' Table 2 ' (monthly scale factor) is mislabeled as ' Table 1 '. Please check the full text table number consistency.
- Figure 5 does not explain the specific meaning of ' positive / negative difference '.
- It is only mentioned that XGBoost is superior to the traditional regression model, but it is not compared with the mainstream time series model (such as LSTM). Please briefly explain the reason for choosing XGBoost instead of time series model.
- Table 1 currently lists the drought level. To avoid ambiguity, please include a precise SPI / SRI value range.
- Section 3.2 describes the goals of XGBoost, but omits the specific hyperparameters and how they are selected. Please supplement the hyperparameter tuning method.
- Figure 1, why the authors delineate the Basin into 28 grids? Some regions are not numbered in this figure.
- The study period is 1960-2014, can you collect the recent data?
- In the figures, only the 7th grid region is displayed. Can you provide the results of other regions for comparison?
- Discussion part is very simple. Please discuss the result and limitations, future works deeply.
Citation: https://doi.org/10.5194/egusphere-2025-1891-RC2 -
AC1: 'Reply on RC2', Li min, 26 Jul 2025
This paper proposes an interpretable machine learning framework to predict hydrological droughts in the Huaihe River Basin of China, emphasizing the impact of meteorological precursors, large-scale climate indices, and land surface processes across spatial grids and seasons. By integrating 26 influencing factors and quantifying their contributions through SHAP values, this study advances drought prediction beyond conventional statistical models and provides feasible insights for regional water resources management. This work addresses the timely need for interpretable AI in climate risk assessment, and the structure is clear. However, the following points need to be paid attention to strengthen its scientific rigor.
Â
- The method partially refers to ' Table 1 ' (SPI / SRI classification standard), but the subsequent ' Table 2 ' (monthly scale factor) is mislabeled as ' Table 1 '. Please check the full text table number consistency.
Respond:Thank you for your valuable comments. We have made the necessary revisions.
Â
- Figure 5 does not explain the specific meaning of ' positive / negative difference '.
Respond:Thank you for your valuable comments. It has been supplemented in the name of the Figure 5.
Â
- It is only mentioned that XGBoost is superior to the traditional regression model, but it is not compared with the mainstream time series model (such as LSTM). Please briefly explain the reason for choosing XGBoost instead of time series model.
Respond:Thank you for your valuable comments. While LSTM and similar time-series models are effective for sequential data, our choice of XGBoost was driven by: 1. Significantly lower computational requirements for operational prediction. 2. Compatibility with SHAP interpretation.
Â
- Table 1 currently lists the drought level. To avoid ambiguity, please include a precise SPI / SRI value range.
Respond:Thank you for your valuable comments. It has been modified in Table 1 of Method 3.1.
Â
- Section 3.2 describes the goals of XGBoost, but omits the specific hyperparameters and how they are selected. Please supplement the hyperparameter tuning method.
Respond:Thank you for your valuable comments. It has been supplemented in Method 3.3:The model uses Bayesian hyperparameter optimization to find optimal parameters, such as learning rate, tree depth, and number of iterations.
- Figure 1, why the authors delineate the Basin into 28 grids? Some regions are not numbered in this figure.
Respond:Thank you for your valuable comments. We divided the entire basin into a 1 ° × 1 ° grid. Whether each region is numbered depends on whether the center is within the basin
- The study period is 1960-2014, can you collect the recent data?
Respond:Thank you for your valuable comments. There is no closer data for the time being, and closer data can be supplemented in future studies.
Â
- In the figures, only the 7thgrid region is displayed. Can you provide the results of other regions for comparison?
Respond:Thank you for your valuable comments. Due to the limitation of text size, we only put the relevant pictures of 7th. Although we did not show other regions in the form of a graph, table 6 shows the first three drought influencing features and the SHAP value of the absolute average influence of 28 grid areas in Huaihe River Basin. The first three drought influencing features are crucial to drought prediction.
- Discussion part is very simple. Please discuss the result and limitations, future works deeply.
Respond:Thank you for your valuable comments. It has been supplemented in the discussion:This study demonstrates the efficacy of an XGBoost-SHAP framework for hydrological drought prediction in the Huaihe River Basin. The model achieved robust accuracy for the ND and D1 categories, yet underperformed for the more severe categories (D2 and D3), likely due to limited extreme event samples. The prediction of a one-month lead time is helpful for drought monitoring. This enables water managers to adjust reservoir operations and irrigation schedules based on predicted drought conditions. The framework provides a 30-day buffer for proactive measures, such as mobilizing drought relief resources and implementing crop recommendations.
In the second paragraph of the discussion, add: ‘Such as Tanriverdi and Batmaz (2025) for U.S. drought prediction, also identified SPI as one of the most critical features across diverse regions and advanced models (including LightGBM, LSTM, and Transformer architectures). Their SHAP analysis consistently ranked SPI among the top predictors, reinforcing its fundamental role as a primary driver of drought conditions, even within sophisticated deep learning frameworks.’
‘Future research can extend the existing one-month-ahead framework to multiple prediction periods to evaluate the impact of different lead times on prediction accuracy. A variety of ensemble learning schemes can be compared to explore ways to improve the robustness of the model. At the same time, the introduction of uncertainty quantification and data enhancement helps to alleviate category imbalances and improve prediction reliability. The application of these methods provides strong support for more accurate drought trend prediction and management strategies.’
Â
Citation: https://doi.org/10.5194/egusphere-2025-1891-AC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
419 | 90 | 17 | 526 | 9 | 20 |
- HTML: 419
- PDF: 90
- XML: 17
- Total: 526
- BibTeX: 9
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1