Regionalization of IDF Curves for Mainland China: A Comparative Evaluation of Machine Learning versus Spatial Interpolation Techniques

Jiang, Yuantian; Wang, Wenting; Fullhart, Andrew T.; Yu, Bofu; Chen, Bo

doi:10.5194/egusphere-2025-3228

Preprints

https://doi.org/10.5194/egusphere-2025-3228

Preprints

23 Jul 2025

| 23 Jul 2025

Regionalization of IDF Curves for Mainland China: A Comparative Evaluation of Machine Learning versus Spatial Interpolation Techniques

Yuantian Jiang, Wenting Wang, Andrew T. Fullhart, Bofu Yu, and Bo Chen

Abstract. Regionalization of Intensity-Duration-Frequency (IDF) curves is essential for designing stormwater drainage systems, especially in regions without rainfall data of high temporal resolution. However, most studies have not thoroughly compared regionalization methods using sub-daily site observations versus gridded daily precipitation products. The potential of machine learning (ML) methods driven by daily gridded precipitation remains largely underexplored. This study addresses these gaps by regionalizing the IDF curves across mainland China for durations ranging between 1 and 72 hours and return periods ranging from 2 to 1,000 years. Five interpolation methods based on hourly observations from 2363 stations and five machine learning methods based on a gridded daily dataset were tested for accuracy. Both ML and traditional interpolation methods showed robust performances based on the Kling-Gupta Efficiency (KGE) performance measure. The most successful interpolation method was Kriging with External Drift using mean annual precipitation, with KGE > 0.96 for 1-hr-5-yr and 24-hr-5-yr storms and KGE > 0.84 for 1-hr-100-yr and 24-hr-100-yr storms, while Gradient Boosting was the best-performing ML model, with KGE > 0.94 for 1-hr-5-yr and 24-hr-5-yr storms and KGE > 0.87 for 1-hr-100-yr and 24-hr-100-yr storms. Notably, despite ML using daily data and interpolation using hourly data, the accuracy of ML gradually improved, eventually approaching or even surpassing the interpolation methods as duration and return period increased. Consequently, a regionalized dataset on IDF curves for mainland China with a spatial resolution of 0.1 degrees (and optionally 0.5 degrees) was generated using the optimal regionalization method.

Received: 07 Jul 2025 – Discussion started: 23 Jul 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1990 KB)

Supplement (1012 KB)

Download & links

Yuantian Jiang, Wenting Wang, Andrew T. Fullhart, Bofu Yu, and Bo Chen

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-3228', Anonymous Referee #1, 28 Jul 2025

The manuscript presents a comprehensive and methodologically sound comparison of traditional interpolation and machine learning methods for regionalizing Intensity Duration Frequency (IDF) curves across mainland China. The evaluation of five interpolation methods and five machine learning methods is a significant strength, demonstrating robust performance metrics and providing a valuable dataset for flood risk assessment and infrastructure planning. It is particularly noteworthy that the authors find machine learning methods, using only daily data, can achieve comparable accuracy to interpolation methods relying on hourly data, as it highlights the potential for IDF regionalization in regions lacking data coverage. The use of four representative IDF cases effectively captures the variability in prediction challenges across durations and return periods. The manuscript is well-organized, with a clear structure that guides readers through the methodology, results, and implications. However, the study could be strengthened by addressing some scientific gaps, such as the mechanisms behind machine learning's temporal downscaling, the reliability of results in data limited regions like the southwest, and the lack of comprehensive uncertainty quantification. Additionally, minor typographical errors and inconsistent figure formatting slightly detract from the presentation. Overall, this is a high quality study with significant contributions to hydrology and climate adaptation, but it requires minor revisions to enhance clarity, rigor, and practical applicability.
Specific comments are as follows:
The study demonstrates that ML models, like gradient boosting, can estimate sub-daily intensities from daily gridded data with accuracy comparable to interpolation methods using hourly data. However, the manuscript lacks a detailed explanation of how ML achieves this temporal downscaling. What specific features or model structures enable this capability? For example, are statistical features like daily extreme precipitation or skewness critical? A discussion or sensitivity analysis of key input variables (Table 1) would clarify this process.
Table 1 lists geographic coordinates, elevation, and precipitation statistics as independent variables for ML. Why were these variables chosen, and were other meteorological variables, such as temperature or humidity, tested? Given their potential influence on extreme precipitation, justifying their exclusion or inclusion would enhance the robustness of the ML approach.
Line 291, you mention it was repeated five times. Clarify if this was with or without replacement.
Section 2.1, the division of mainland China into four regions (NE, SE, NW, SW) is based on climate and topography, with the Eastern Monsoon region split along the Qinling-Huaihe line due to its heterogeneity. Was this subdivision sufficient to capture regional variability, particularly in the SE region with extreme precipitation? Could further sub-regionalization or alternative regionalization schemes improve model performance?
The study interpolates missing hourly data for gaps <12 hours and assigns zero for gaps ≥12 hours (beginning on line 157). How was the impact of this imputation strategy assessed, and what are its implications for IDF curve accuracy in regions with frequent missing data?
The SW region shows significantly lower accuracy (KGE as low as 0.31 for KED_AP and 0.14 for GB), attributed to sparse station density and complex topography. Given the lack of validation stations in parts of the NW and SW regions, how reliable are the IDF curves in these areas? Should users be explicitly cautioned against using these curves without further validation?
The manuscript notes that hyperparameter tuning via grid search did not significantly improve ML performance, so default settings were used. Why do you think tuning was ineffective? Were the default parameters near-optimal, or were the tuning ranges too narrow? Clarifying this would help readers assess the robustness of the ML models.
The introduction references non-stationarity in IDF curves due to climate change, but the methodology does not account for it (for example, different RCP scenarios). Were tests conducted to evaluate the impact of non-stationarity, particularly for long return periods, 100 or 1000 years? A brief discussion or analysis of this issue would align the study with current climate research trends.
The manuscript cites a high-resolution IDF dataset in the introduction for the Qinghai-Tibet Plateau (Ren et al., 2025). A quantitative comparison with this dataset in the SW region would benchmark the study’s results and highlight its unique contributions.
The IDF curves are provided at 0.1° and 0.5° resolutions, but their alignment with specific applications (such as urban drainage design, flood modeling) is unclear. Are these resolutions optimized for particular use cases, and how should users select between them? Providing guidance would enhance the dataset’s practical utility.
GB outperforms other ML methods, but the manuscript does not discuss its interpretability or the relative importance of input features. A feature importance analysis would provide insights into which variables drive performance, aiding future model development.
Inconsistent spacing in "machine learning" vs. "machinelearning" appears in several instances (for example lines 103, 408). Standardize to "machine learning."
Inconsistent spacing before references (for example, line 108, 110, 113). Check formatting.
Line 735: "Deepseek R1" clarify the tool’s name and provide a citation or link for transparency.
Figure 1: Include a description of the inset in the caption.
Figure 6: Standardize color scales across panels (a–d for KED_AP, e–h for GB) to facilitate direct comparisons. Ensure units (mm/h) are explicitly labeled in the caption or legend.
Figure 7: The caption mentions 500 samples but does not explain the sampling method (for example, bootstrap or Monte Carlo). Add a brief clarification.
Table 2 and Table 3: Ensure consistent formatting of numerical values (for example, PBIAS values should all include the % symbol). Add a footnote clarifying that negative PBIAS indicates underestimation.
The manuscript contains numerous acronyms and would benefit from a consolidated list or table of definitions.
Final recommendation: Minor revisions. The manuscript is of high quality and makes significant contributions to IDF curve regionalization and the scientific community, but addressing the specific comments (like clarifying temporal downscaling, quantifying uncertainty, and improving regional performance in the southwest) and technical corrections will enhance its scientific rigor and accessibility.

Citation: https://doi.org/10.5194/egusphere-2025-3228-RC1
- AC1: 'Reply on RC1', Wenting Wang, 28 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3228/egusphere-2025-3228-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-3228-AC1
RC2:
'Comment on egusphere-2025-3228', Anonymous Referee #2, 23 Aug 2025
The study investigates two methodologies (i.e, Interpolation and Machine Learning (ML)) for regionalising Intensity-Duration-Frequency (IDF) curves across various time scales. The findings indicate that Machine Learning outperforms Interpolation, both in terms of accuracy and data requirements. Given its methodological rigour and practical relevance, the study holds significant importance for publication in HESS.
However, the following questions could enhance the depth of the manuscript:
With the possibility of extreme events predicted to increase in the future, why hasn’t the study considered duration less than 1 hour (i.e. 30 min)?

The study mentions using widely adopted ML methods from previous research. Given LSTM’s proven effectiveness in IDF study, why hasn’t it been considered?

What are the limitations of the study?

Is it possible to transfer the outcomes to other geographically similar regions? If so, what considerations or adaptations would be necessary?
Citation: https://doi.org/10.5194/egusphere-2025-3228-RC2
- AC2: 'Reply on RC2', Wenting Wang, 28 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3228/egusphere-2025-3228-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-3228-AC2

Yuantian Jiang, Wenting Wang, Andrew T. Fullhart, Bofu Yu, and Bo Chen

Supplement

https://doi.org/10.5194/egusphere-2025-3228-supplement

Yuantian Jiang, Wenting Wang, Andrew T. Fullhart, Bofu Yu, and Bo Chen

Viewed

Total article views: 1,191 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
922	226	43	1,191	81	33	39

HTML: 922
PDF: 226
XML: 43
Total: 1,191
Supplement: 81
BibTeX: 33
EndNote: 39

Views and downloads (calculated since 23 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	103	25	4	132
Aug 2025	216	26	8	250
Sep 2025	388	18	9	415
Oct 2025	51	16	6	73
Nov 2025	30	15	2	47
Dec 2025	41	55	7	103
Jan 2026	39	32	4	75
Feb 2026	23	11	2	36
Mar 2026	31	28	1	60

Cumulative views and downloads (calculated since 23 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	103	25	4	132
Aug 2025	216	26	8	250
Sep 2025	388	18	9	415
Oct 2025	51	16	6	73
Nov 2025	30	15	2	47
Dec 2025	41	55	7	103
Jan 2026	39	32	4	75
Feb 2026	23	11	2	36
Mar 2026	31	28	1	60

Viewed (geographical distribution)

Total article views: 1,194 (including HTML, PDF, and XML) Thereof 1,194 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 23 Mar 2026

Short summary

Intensity-Duration-Frequency (IDF) curves is important for designing infrastructure that can withstand floods. We compared traditional interpolation methods with machine learning to map these curves across mainland China. ML using widely available daily gridded data can estimate sub-daily intensity as accurately as methods needing rarer hourly site data. This study provides a valuable understanding for IDF in data-limited regions and generates a new IDF dataset for mainland China.


Total:	0
HTML:	0
PDF:	0
XML:	0