the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Regionalization of IDF Curves for Mainland China: A Comparative Evaluation of Machine Learning versus Spatial Interpolation Techniques
Abstract. Regionalization of Intensity-Duration-Frequency (IDF) curves is essential for designing stormwater drainage systems, especially in regions without rainfall data of high temporal resolution. However, most studies have not thoroughly compared regionalization methods using sub-daily site observations versus gridded daily precipitation products. The potential of machine learning (ML) methods driven by daily gridded precipitation remains largely underexplored. This study addresses these gaps by regionalizing the IDF curves across mainland China for durations ranging between 1 and 72 hours and return periods ranging from 2 to 1,000 years. Five interpolation methods based on hourly observations from 2363 stations and five machine learning methods based on a gridded daily dataset were tested for accuracy. Both ML and traditional interpolation methods showed robust performances based on the Kling-Gupta Efficiency (KGE) performance measure. The most successful interpolation method was Kriging with External Drift using mean annual precipitation, with KGE > 0.96 for 1-hr-5-yr and 24-hr-5-yr storms and KGE > 0.84 for 1-hr-100-yr and 24-hr-100-yr storms, while Gradient Boosting was the best-performing ML model, with KGE > 0.94 for 1-hr-5-yr and 24-hr-5-yr storms and KGE > 0.87 for 1-hr-100-yr and 24-hr-100-yr storms. Notably, despite ML using daily data and interpolation using hourly data, the accuracy of ML gradually improved, eventually approaching or even surpassing the interpolation methods as duration and return period increased. Consequently, a regionalized dataset on IDF curves for mainland China with a spatial resolution of 0.1 degrees (and optionally 0.5 degrees) was generated using the optimal regionalization method.
- Preprint
(1990 KB) - Metadata XML
-
Supplement
(1012 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-3228', Anonymous Referee #1, 28 Jul 2025
The manuscript presents a comprehensive and methodologically sound comparison of traditional interpolation and machine learning methods for regionalizing Intensity Duration Frequency (IDF) curves across mainland China. The evaluation of five interpolation methods and five machine learning methods is a significant strength, demonstrating robust performance metrics and providing a valuable dataset for flood risk assessment and infrastructure planning. It is particularly noteworthy that the authors find machine learning methods, using only daily data, can achieve comparable accuracy to interpolation methods relying on hourly data, as it highlights the potential for IDF regionalization in regions lacking data coverage. The use of four representative IDF cases effectively captures the variability in prediction challenges across durations and return periods. The manuscript is well-organized, with a clear structure that guides readers through the methodology, results, and implications. However, the study could be strengthened by addressing some scientific gaps, such as the mechanisms behind machine learning's temporal downscaling, the reliability of results in data limited regions like the southwest, and the lack of comprehensive uncertainty quantification. Additionally, minor typographical errors and inconsistent figure formatting slightly detract from the presentation. Overall, this is a high quality study with significant contributions to hydrology and climate adaptation, but it requires minor revisions to enhance clarity, rigor, and practical applicability.
Specific comments are as follows:
The study demonstrates that ML models, like gradient boosting, can estimate sub-daily intensities from daily gridded data with accuracy comparable to interpolation methods using hourly data. However, the manuscript lacks a detailed explanation of how ML achieves this temporal downscaling. What specific features or model structures enable this capability? For example, are statistical features like daily extreme precipitation or skewness critical? A discussion or sensitivity analysis of key input variables (Table 1) would clarify this process.
Table 1 lists geographic coordinates, elevation, and precipitation statistics as independent variables for ML. Why were these variables chosen, and were other meteorological variables, such as temperature or humidity, tested? Given their potential influence on extreme precipitation, justifying their exclusion or inclusion would enhance the robustness of the ML approach.
Line 291, you mention it was repeated five times. Clarify if this was with or without replacement. Â
Section 2.1, the division of mainland China into four regions (NE, SE, NW, SW) is based on climate and topography, with the Eastern Monsoon region split along the Qinling-Huaihe line due to its heterogeneity. Was this subdivision sufficient to capture regional variability, particularly in the SE region with extreme precipitation? Could further sub-regionalization or alternative regionalization schemes improve model performance?
The study interpolates missing hourly data for gaps <12 hours and assigns zero for gaps ≥12 hours (beginning on line 157). How was the impact of this imputation strategy assessed, and what are its implications for IDF curve accuracy in regions with frequent missing data?
The SW region shows significantly lower accuracy (KGE as low as 0.31 for KED_AP and 0.14 for GB), attributed to sparse station density and complex topography. Given the lack of validation stations in parts of the NW and SW regions, how reliable are the IDF curves in these areas? Should users be explicitly cautioned against using these curves without further validation?
The manuscript notes that hyperparameter tuning via grid search did not significantly improve ML performance, so default settings were used. Why do you think tuning was ineffective? Were the default parameters near-optimal, or were the tuning ranges too narrow? Clarifying this would help readers assess the robustness of the ML models.
The introduction references non-stationarity in IDF curves due to climate change, but the methodology does not account for it (for example, different RCP scenarios). Were tests conducted to evaluate the impact of non-stationarity, particularly for long return periods, 100 or 1000 years? A brief discussion or analysis of this issue would align the study with current climate research trends.
The manuscript cites a high-resolution IDF dataset in the introduction for the Qinghai-Tibet Plateau (Ren et al., 2025). A quantitative comparison with this dataset in the SW region would benchmark the study’s results and highlight its unique contributions.
The IDF curves are provided at 0.1° and 0.5° resolutions, but their alignment with specific applications (such as urban drainage design, flood modeling) is unclear. Are these resolutions optimized for particular use cases, and how should users select between them? Providing guidance would enhance the dataset’s practical utility.
GB outperforms other ML methods, but the manuscript does not discuss its interpretability or the relative importance of input features. A feature importance analysis would provide insights into which variables drive performance, aiding future model development.
Inconsistent spacing in "machine learning" vs. "machinelearning" appears in several instances (for example lines 103, 408). Standardize to "machine learning."
Inconsistent spacing before references (for example, line 108, 110, 113). Check formatting.
Line 735: "Deepseek R1" clarify the tool’s name and provide a citation or link for transparency.
Figure 1: Include a description of the inset in the caption.
Figure 6: Standardize color scales across panels (a–d for KED_AP, e–h for GB) to facilitate direct comparisons. Ensure units (mm/h) are explicitly labeled in the caption or legend.
Figure 7: The caption mentions 500 samples but does not explain the sampling method (for example, bootstrap or Monte Carlo). Add a brief clarification.
Table 2 and Table 3: Ensure consistent formatting of numerical values (for example, PBIAS values should all include the % symbol). Add a footnote clarifying that negative PBIAS indicates underestimation.
The manuscript contains numerous acronyms and would benefit from a consolidated list or table of definitions.
Final recommendation: Minor revisions. The manuscript is of high quality and makes significant contributions to IDF curve regionalization and the scientific community, but addressing the specific comments (like clarifying temporal downscaling, quantifying uncertainty, and improving regional performance in the southwest) and technical corrections will enhance its scientific rigor and accessibility.
Citation: https://doi.org/10.5194/egusphere-2025-3228-RC1 -
RC2: 'Comment on egusphere-2025-3228', Anonymous Referee #2, 23 Aug 2025
The study investigates two methodologies (i.e, Interpolation and Machine Learning (ML)) for regionalising Intensity-Duration-Frequency (IDF) curves across various time scales. The findings indicate that Machine Learning outperforms Interpolation, both in terms of accuracy and data requirements. Given its methodological rigour and practical relevance, the study holds significant importance for publication in HESS.
However, the following questions could enhance the depth of the manuscript:
- With the possibility of extreme events predicted to increase in the future, why hasn’t the study considered duration less than 1 hour (i.e. 30 min)?
- The study mentions using widely adopted ML methods from previous research. Given LSTM’s proven effectiveness in IDF study, why hasn’t it been considered?
- What are the limitations of the study?
- Â Is it possible to transfer the outcomes to other geographically similar regions? If so, what considerations or adaptations would be necessary?
Citation: https://doi.org/10.5194/egusphere-2025-3228-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
368 | 53 | 12 | 433 | 23 | 6 | 11 |
- HTML: 368
- PDF: 53
- XML: 12
- Total: 433
- Supplement: 23
- BibTeX: 6
- EndNote: 11
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1