Hydrological drought prediction and its influencing factors analysis based on a machine learning model

Li, Min; Yao, Yuhang; Feng, Zilong; Ou, Ming

doi:https://doi.org/10.5194/egusphere-2025-1891

Preprints

https://doi.org/10.5194/egusphere-2025-1891

Preprints

25 Jun 2025

| 25 Jun 2025

Hydrological drought prediction and its influencing factors analysis based on a machine learning model

Min Li, Yuhang Yao, Zilong Feng, and Ming Ou

Abstract. Predicting future drought conditions is crucial for effective disaster management. In this study, a machine learning framework is proposed to predict hydrological drought in the Huaihe River Basin, China. The interpretable Extreme Gradient Boosting (XGBoost) model is applied to forecast four drought categories in 28 grid regions, using 26 factors for monthly and 18 for seasonal predictions. The framework also integrates the Shapley Additive Explanation (SHAP) variable importance index to infer drought prediction factors. The model achieves 79.9 % accuracy in classifying droughts, with the Standard Precipitation Index (SPI) being the most influential factor. The SHAP values of SPI are 0.360, 0.261, 0.169, and 0.247 for spring, summer, autumn, and winter, respectively. Soil moisture content and evapotranspiration are particularly affected in spring and autumn, while large-scale climatic factors are more significant in summer and winter. Overall, this study offers valuable decision support for regional drought management and water resource allocation.

Received: 21 Apr 2025 – Discussion started: 25 Jun 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1744 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1744 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

04 Nov 2025

Hydrological drought prediction and its influencing features analysis based on a machine learning model

Min Li, Yuhang Yao, Zilong Feng, and Ming Ou

Nat. Hazards Earth Syst. Sci., 25, 4299–4316, https://doi.org/10.5194/nhess-25-4299-2025,https://doi.org/10.5194/nhess-25-4299-2025, 2025

Short summary

Min Li, Yuhang Yao, Zilong Feng, and Ming Ou

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1891', Anonymous Referee #1, 14 Jul 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1891/egusphere-2025-1891-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-1891-RC1
- AC2: 'Reply on RC1', Li min, 26 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1891/egusphere-2025-1891-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1891-AC2
RC2:
'Comment on egusphere-2025-1891', Anonymous Referee #2, 16 Jul 2025
This paper proposes an interpretable machine learning framework to predict hydrological droughts in the Huaihe River Basin of China, emphasizing the impact of meteorological precursors, large-scale climate indices, and land surface processes across spatial grids and seasons. By integrating 26 influencing factors and quantifying their contributions through SHAP values, this study advances drought prediction beyond conventional statistical models and provides feasible insights for regional water resources management. This work addresses the timely need for interpretable AI in climate risk assessment, and the structure is clear. However, the following points need to be paid attention to strengthen its scientific rigor.

The method partially refers to ' Table 1 ' (SPI / SRI classification standard), but the subsequent ' Table 2 ' (monthly scale factor) is mislabeled as ' Table 1 '. Please check the full text table number consistency.

Figure 5 does not explain the specific meaning of ' positive / negative difference '.

It is only mentioned that XGBoost is superior to the traditional regression model, but it is not compared with the mainstream time series model (such as LSTM). Please briefly explain the reason for choosing XGBoost instead of time series model.

Table 1 currently lists the drought level. To avoid ambiguity, please include a precise SPI / SRI value range.

Section 3.2 describes the goals of XGBoost, but omits the specific hyperparameters and how they are selected. Please supplement the hyperparameter tuning method.

Figure 1, why the authors delineate the Basin into 28 grids? Some regions are not numbered in this figure.

The study period is 1960-2014, can you collect the recent data?

In the figures, only the 7^th grid region is displayed. Can you provide the results of other regions for comparison?

Discussion part is very simple. Please discuss the result and limitations, future works deeply.
Citation: https://doi.org/10.5194/egusphere-2025-1891-RC2
- AC1:
  'Reply on RC2', Li min, 26 Jul 2025
  This paper proposes an interpretable machine learning framework to predict hydrological droughts in the Huaihe River Basin of China, emphasizing the impact of meteorological precursors, large-scale climate indices, and land surface processes across spatial grids and seasons. By integrating 26 influencing factors and quantifying their contributions through SHAP values, this study advances drought prediction beyond conventional statistical models and provides feasible insights for regional water resources management. This work addresses the timely need for interpretable AI in climate risk assessment, and the structure is clear. However, the following points need to be paid attention to strengthen its scientific rigor.
  
  The method partially refers to ' Table 1 ' (SPI / SRI classification standard), but the subsequent ' Table 2 ' (monthly scale factor) is mislabeled as ' Table 1 '. Please check the full text table number consistency.
  
  Respond：Thank you for your valuable comments. We have made the necessary revisions.
  
  Figure 5 does not explain the specific meaning of ' positive / negative difference '.
  
  Respond：Thank you for your valuable comments. It has been supplemented in the name of the Figure 5.
  
  It is only mentioned that XGBoost is superior to the traditional regression model, but it is not compared with the mainstream time series model (such as LSTM). Please briefly explain the reason for choosing XGBoost instead of time series model.
  
  Respond：Thank you for your valuable comments. While LSTM and similar time-series models are effective for sequential data, our choice of XGBoost was driven by: 1. Significantly lower computational requirements for operational prediction. 2. Compatibility with SHAP interpretation.
  
  Table 1 currently lists the drought level. To avoid ambiguity, please include a precise SPI / SRI value range.
  
  Respond：Thank you for your valuable comments. It has been modified in Table 1 of Method 3.1.
  
  Section 3.2 describes the goals of XGBoost, but omits the specific hyperparameters and how they are selected. Please supplement the hyperparameter tuning method.
  
  Respond：Thank you for your valuable comments. It has been supplemented in Method 3.3：The model uses Bayesian hyperparameter optimization to find optimal parameters, such as learning rate, tree depth, and number of iterations.
  Figure 1, why the authors delineate the Basin into 28 grids? Some regions are not numbered in this figure.
  
  Respond：Thank you for your valuable comments. We divided the entire basin into a 1 ° × 1 ° grid. Whether each region is numbered depends on whether the center is within the basin
  The study period is 1960-2014, can you collect the recent data?
  
  Respond：Thank you for your valuable comments. There is no closer data for the time being, and closer data can be supplemented in future studies.
  
  In the figures, only the 7^thgrid region is displayed. Can you provide the results of other regions for comparison?
  
  Respond：Thank you for your valuable comments. Due to the limitation of text size, we only put the relevant pictures of 7^th. Although we did not show other regions in the form of a graph, table 6 shows the first three drought influencing features and the SHAP value of the absolute average influence of 28 grid areas in Huaihe River Basin. The first three drought influencing features are crucial to drought prediction.
  Discussion part is very simple. Please discuss the result and limitations, future works deeply.
  
  Respond：Thank you for your valuable comments. It has been supplemented in the discussion：This study demonstrates the efficacy of an XGBoost-SHAP framework for hydrological drought prediction in the Huaihe River Basin. The model achieved robust accuracy for the ND and D1 categories, yet underperformed for the more severe categories (D2 and D3), likely due to limited extreme event samples. The prediction of a one-month lead time is helpful for drought monitoring. This enables water managers to adjust reservoir operations and irrigation schedules based on predicted drought conditions. The framework provides a 30-day buffer for proactive measures, such as mobilizing drought relief resources and implementing crop recommendations.
  In the second paragraph of the discussion, add: ‘Such as Tanriverdi and Batmaz (2025) for U.S. drought prediction, also identified SPI as one of the most critical features across diverse regions and advanced models (including LightGBM, LSTM, and Transformer architectures). Their SHAP analysis consistently ranked SPI among the top predictors, reinforcing its fundamental role as a primary driver of drought conditions, even within sophisticated deep learning frameworks.’
  ‘Future research can extend the existing one-month-ahead framework to multiple prediction periods to evaluate the impact of different lead times on prediction accuracy. A variety of ensemble learning schemes can be compared to explore ways to improve the robustness of the model. At the same time, the introduction of uncertainty quantification and data enhancement helps to alleviate category imbalances and improve prediction reliability. The application of these methods provides strong support for more accurate drought trend prediction and management strategies.’
  
  Citation: https://doi.org/10.5194/egusphere-2025-1891-AC1

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1891', Anonymous Referee #1, 14 Jul 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1891/egusphere-2025-1891-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-1891-RC1
- AC2: 'Reply on RC1', Li min, 26 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1891/egusphere-2025-1891-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1891-AC2
RC2:
'Comment on egusphere-2025-1891', Anonymous Referee #2, 16 Jul 2025
This paper proposes an interpretable machine learning framework to predict hydrological droughts in the Huaihe River Basin of China, emphasizing the impact of meteorological precursors, large-scale climate indices, and land surface processes across spatial grids and seasons. By integrating 26 influencing factors and quantifying their contributions through SHAP values, this study advances drought prediction beyond conventional statistical models and provides feasible insights for regional water resources management. This work addresses the timely need for interpretable AI in climate risk assessment, and the structure is clear. However, the following points need to be paid attention to strengthen its scientific rigor.

The method partially refers to ' Table 1 ' (SPI / SRI classification standard), but the subsequent ' Table 2 ' (monthly scale factor) is mislabeled as ' Table 1 '. Please check the full text table number consistency.

Figure 5 does not explain the specific meaning of ' positive / negative difference '.

It is only mentioned that XGBoost is superior to the traditional regression model, but it is not compared with the mainstream time series model (such as LSTM). Please briefly explain the reason for choosing XGBoost instead of time series model.

Table 1 currently lists the drought level. To avoid ambiguity, please include a precise SPI / SRI value range.

Section 3.2 describes the goals of XGBoost, but omits the specific hyperparameters and how they are selected. Please supplement the hyperparameter tuning method.

Figure 1, why the authors delineate the Basin into 28 grids? Some regions are not numbered in this figure.

The study period is 1960-2014, can you collect the recent data?

In the figures, only the 7^th grid region is displayed. Can you provide the results of other regions for comparison?

Discussion part is very simple. Please discuss the result and limitations, future works deeply.
Citation: https://doi.org/10.5194/egusphere-2025-1891-RC2
- AC1:
  'Reply on RC2', Li min, 26 Jul 2025
  This paper proposes an interpretable machine learning framework to predict hydrological droughts in the Huaihe River Basin of China, emphasizing the impact of meteorological precursors, large-scale climate indices, and land surface processes across spatial grids and seasons. By integrating 26 influencing factors and quantifying their contributions through SHAP values, this study advances drought prediction beyond conventional statistical models and provides feasible insights for regional water resources management. This work addresses the timely need for interpretable AI in climate risk assessment, and the structure is clear. However, the following points need to be paid attention to strengthen its scientific rigor.
  
  The method partially refers to ' Table 1 ' (SPI / SRI classification standard), but the subsequent ' Table 2 ' (monthly scale factor) is mislabeled as ' Table 1 '. Please check the full text table number consistency.
  
  Respond：Thank you for your valuable comments. We have made the necessary revisions.
  
  Figure 5 does not explain the specific meaning of ' positive / negative difference '.
  
  Respond：Thank you for your valuable comments. It has been supplemented in the name of the Figure 5.
  
  It is only mentioned that XGBoost is superior to the traditional regression model, but it is not compared with the mainstream time series model (such as LSTM). Please briefly explain the reason for choosing XGBoost instead of time series model.
  
  Respond：Thank you for your valuable comments. While LSTM and similar time-series models are effective for sequential data, our choice of XGBoost was driven by: 1. Significantly lower computational requirements for operational prediction. 2. Compatibility with SHAP interpretation.
  
  Table 1 currently lists the drought level. To avoid ambiguity, please include a precise SPI / SRI value range.
  
  Respond：Thank you for your valuable comments. It has been modified in Table 1 of Method 3.1.
  
  Section 3.2 describes the goals of XGBoost, but omits the specific hyperparameters and how they are selected. Please supplement the hyperparameter tuning method.
  
  Respond：Thank you for your valuable comments. It has been supplemented in Method 3.3：The model uses Bayesian hyperparameter optimization to find optimal parameters, such as learning rate, tree depth, and number of iterations.
  Figure 1, why the authors delineate the Basin into 28 grids? Some regions are not numbered in this figure.
  
  Respond：Thank you for your valuable comments. We divided the entire basin into a 1 ° × 1 ° grid. Whether each region is numbered depends on whether the center is within the basin
  The study period is 1960-2014, can you collect the recent data?
  
  Respond：Thank you for your valuable comments. There is no closer data for the time being, and closer data can be supplemented in future studies.
  
  In the figures, only the 7^thgrid region is displayed. Can you provide the results of other regions for comparison?
  
  Respond：Thank you for your valuable comments. Due to the limitation of text size, we only put the relevant pictures of 7^th. Although we did not show other regions in the form of a graph, table 6 shows the first three drought influencing features and the SHAP value of the absolute average influence of 28 grid areas in Huaihe River Basin. The first three drought influencing features are crucial to drought prediction.
  Discussion part is very simple. Please discuss the result and limitations, future works deeply.
  
  Respond：Thank you for your valuable comments. It has been supplemented in the discussion：This study demonstrates the efficacy of an XGBoost-SHAP framework for hydrological drought prediction in the Huaihe River Basin. The model achieved robust accuracy for the ND and D1 categories, yet underperformed for the more severe categories (D2 and D3), likely due to limited extreme event samples. The prediction of a one-month lead time is helpful for drought monitoring. This enables water managers to adjust reservoir operations and irrigation schedules based on predicted drought conditions. The framework provides a 30-day buffer for proactive measures, such as mobilizing drought relief resources and implementing crop recommendations.
  In the second paragraph of the discussion, add: ‘Such as Tanriverdi and Batmaz (2025) for U.S. drought prediction, also identified SPI as one of the most critical features across diverse regions and advanced models (including LightGBM, LSTM, and Transformer architectures). Their SHAP analysis consistently ranked SPI among the top predictors, reinforcing its fundamental role as a primary driver of drought conditions, even within sophisticated deep learning frameworks.’
  ‘Future research can extend the existing one-month-ahead framework to multiple prediction periods to evaluate the impact of different lead times on prediction accuracy. A variety of ensemble learning schemes can be compared to explore ways to improve the robustness of the model. At the same time, the introduction of uncertainty quantification and data enhancement helps to alleviate category imbalances and improve prediction reliability. The application of these methods provides strong support for more accurate drought trend prediction and management strategies.’
  
  Citation: https://doi.org/10.5194/egusphere-2025-1891-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (02 Sep 2025) by Anne Van Loon

AR by Li min on behalf of the Authors (02 Sep 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (03 Sep 2025) by Anne Van Loon

RR by Anonymous Referee #1 (09 Sep 2025)

RR by Anonymous Referee #2 (16 Sep 2025)

ED: Publish as is (16 Sep 2025) by Anne Van Loon

AR by Li min on behalf of the Authors (26 Sep 2025) Manuscript

Journal article(s) based on this preprint

04 Nov 2025

Hydrological drought prediction and its influencing features analysis based on a machine learning model

Min Li, Yuhang Yao, Zilong Feng, and Ming Ou

Nat. Hazards Earth Syst. Sci., 25, 4299–4316, https://doi.org/10.5194/nhess-25-4299-2025,https://doi.org/10.5194/nhess-25-4299-2025, 2025

Short summary

Min Li, Yuhang Yao, Zilong Feng, and Ming Ou

Viewed

Total article views: 814 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
684	111	19	814	11	22

HTML: 684
PDF: 111
XML: 19
Total: 814
BibTeX: 11
EndNote: 22

Views and downloads (calculated since 25 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	62	10	3	75
Jul 2025	71	37	13	121
Aug 2025	125	35	1	161
Sep 2025	384	17	1	402
Oct 2025	31	8	1	40
Nov 2025	11	4	0	15

Cumulative views and downloads (calculated since 25 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	62	10	3	75
Jul 2025	71	37	13	121
Aug 2025	125	35	1	161
Sep 2025	384	17	1	402
Oct 2025	31	8	1	40
Nov 2025	11	4	0	15

Viewed (geographical distribution)

Total article views: 812 (including HTML, PDF, and XML) Thereof 812 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 04 Nov 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1744 KB)
Metadata XML

Short summary

This study proposes an innovative method for predicting drought in the Huaihe River Basin of China using advanced machine learning and interpretable artificial intelligence techniques. By analyzing more than 50 years of data, the model successfully predicted four drought categories with an accuracy of 79.9 %. It used explanatory methods to analyze the contribution of different drought influencing factors, providing key insights for early warning systems and water resources planning.

Hydrological drought prediction and its influencing factors analysis based on a machine learning model

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)


Total:	0
HTML:	0
PDF:	0
XML:	0