Saudi Rainfall (SaRa): Hourly 0.1&deg; Gridded Rainfall (1979&ndash;Present) for Saudi Arabia via Machine Learning Fusion of Satellite and Model Data

Wang, Xuetong; Alharbi, Raied S.; Baez-Villanueva, Oscar M.; Green, Amy; McCabe, Matthew F.; Wada, Yoshihide; Van Dijk, Albert I. J. M.; Abid, Muhammad A.; Beck, Hylke

doi:10.5194/egusphere-2025-254

Preprints

https://doi.org/10.5194/egusphere-2025-254

Preprints

03 Feb 2025

| 03 Feb 2025

Saudi Rainfall (SaRa): Hourly 0.1° Gridded Rainfall (1979–Present) for Saudi Arabia via Machine Learning Fusion of Satellite and Model Data

Xuetong Wang, Raied S. Alharbi, Oscar M. Baez-Villanueva, Amy Green, Matthew F. McCabe, Yoshihide Wada, Albert I. J. M. Van Dijk, Muhammad A. Abid, and Hylke Beck

Abstract. We introduce Saudi Rainfall (SaRa), a gridded historical and near real-time precipitation (P) product specifically designed for the Arabian Peninsula, one of the most arid, water-stressed, and data-sparse regions on Earth. The product has an hourly 0.1° resolution spanning from 1979 to the present and is continuously updated with a latency of less than two hours. The algorithm underpinning the product involves 18 machine learning model stacks trained for different combinations of satellite and (re)analysis P products along with several static predictors. As a training target, hourly and daily P observations from gauges in Saudi Arabia (n=113) and globally (n=14,256) are used. To evaluate the performance of SaRa, we carried out the most comprehensive evaluation of gridded P products in the region to date, using observations from independent gauges (excluded from training) in Saudi Arabia as a reference (n=119). Among the 20 evaluated P products, our new product, SaRa, consistently ranked first across all evaluation metrics, including the Kling-Gupta Efficiency (KGE), correlation, bias, peak bias, wet days bias, and critical success index. Notably, SaRa achieved a median KGE — a summary statistic combining correlation, bias, and variability — of 0.36, while widely used non-gauge-based products such as CHIRP, ERA5, GSMaP V8, and IMERG-L V07 achieved values of -0.07, 0.21, -0.13, and -0.39, respectively. SaRa also outperformed four gauge-based products such as CHIRPS V2, CPC Unified, IMERG-F V07, and MSWEP V2.8 which had median KGE values of 0.17, -0.03, 0.29, and 0.20, respectively. Our new P product — available at www.gloh2o.org/sara — addresses a crucial need in the Arabian Peninsula, providing a robust and reliable dataset to support hydrological modeling, water resource assessments, flood management, and climate research.

Received: 21 Jan 2025 – Discussion started: 03 Feb 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3863 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (3863 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

08 Oct 2025

Saudi Rainfall (SaRa): hourly 0.1° gridded rainfall (1979–present) for Saudi Arabia via machine learning fusion of satellite and model data

Xuetong Wang, Raied S. Alharbi, Oscar M. Baez-Villanueva, Amy Green, Matthew F. McCabe, Yoshihide Wada, Albert I. J. M. Van Dijk, Muhammad A. Abid, and Hylke E. Beck

Hydrol. Earth Syst. Sci., 29, 4983–5003, https://doi.org/10.5194/hess-29-4983-2025,https://doi.org/10.5194/hess-29-4983-2025, 2025

Short summary

Xuetong Wang, Raied S. Alharbi, Oscar M. Baez-Villanueva, Amy Green, Matthew F. McCabe, Yoshihide Wada, Albert I. J. M. Van Dijk, Muhammad A. Abid, and Hylke Beck

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-254', Anonymous Referee #1, 05 Mar 2025
This study presents a machine-learning-based approach to estimating gridded rainfall data for Saudi Arabia, an arid region with significant data limitations. The proposed dataset, SaRa, is compared against multiple existing precipitation datasets. The approach uses a combination of random forests and XGboots models. While not very novel, the results suggest superior performance, thus adding value and contributing to the data availability in the region. While the paper is well-structured with a sound methodology, fundamental concerns arise regarding the model accuracy away from training sites, generalizability, and reliability of the identified trends.
Major:
I appreciate the authors filtering for potentially double precipitation gauges within 2 km, but the paper needs more clarity on how the split training/testing sample was performed. Was it random? Stratified? Distance-based?

When applying ML to geospatial datasets, a critical issue is the use of testing sites near training sites that often artificially boost validation statistics. That’s because precipitation data is spatially correlated. To enhance transparency and thrust into ML approaches, the accuracy of the ML models should also be evaluated based on their distance from training sites. Please plot the KGE testing accuracy of each testing point vs. its distance from the nearest training site (km). This will evidence how well the proposed ML approach is trusted in distant/ungauged areas. This plot would be informative for the main individual ML models and the ensemble stack.

The ensemble approach, while interesting, results in a black-box system—there is little discussion on the physical interpretability of the model structures and the predictive power of the inputs. Sklearn Random forests and XGboost have out-of-the-box libraries that can be easily deployed to evaluate model interpretability further. This could improve model understanding and expand the proposed approaches' generalizability.

The study does not sufficiently address uncertainty in trend estimations. There are no confidence intervals, no discussion of interannual variability, and no attempt to separate natural variability from long-term trends. Given the known issues with historical precipitation datasets, particularly in arid regions, one must question how much of the trend results from dataset evolution rather than actual climate change.

Moderate:
The paper would benefit from a quantitative analysis and discussion of how temporal resolution mismatches in the gauge data impact validation results.

Minor:
L 143 clarify what are gross errors.
Citation: https://doi.org/10.5194/egusphere-2025-254-RC1
- AC1: 'Reply on RC1', Xuetong Wang, 05 Jun 2025
  
  Dear Anonymous Reviewer 1,
  Thank you for your comments and suggestions! Please find attached our detailed responses to your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2025-254-AC1
RC2:
'Comment on egusphere-2025-254', Anonymous Referee #2, 30 May 2025
Review for " Saudi Rainfall (SaRa): Hourly 0.1° Gridded Rainfall (1979–Present) for Saudi Arabia via Machine Learning Fusion of Satellite and Model Data" by Wang et al. submitted to EGUsphere (MS No.: egusphere-2025-254).
General comments:
The authors introduce Saudi Rainfall (SaRa), a gridded precipitation product for the Arabian Peninsula developed using Machine Learning (ML) techniques. They clearly present the motivation behind the development of such a dataset, describe the procedures used to generate the SaRa product, and evaluate its performance. By leveraging a large amount of available gauge-based and gridded datasets, the authors produce a new dataset that shows improved performance compared to existing products—particularly in areas with sparse station observations and in the dominantly arid regions of the Arabian Peninsula.
This work makes a valuable contribution to the data community and enhances scientific understanding of precipitation patterns in data-scarce, arid environments. The overall quality of the manuscript is good, with well-cited references and generally clear writing. However, there is still room for further improvement. In particular, I would like to raise two main concerns:
Limitations of Machine Learning: What are the potential limitations, challenges and sources of error introduced by using Machine Learning techniques in generating this dataset? A discussion on uncertainties and biases associated with ML itself would strengthen the paper.

Broader Impact and Global Appeal: What is the relevance of this work beyond the Arabian Peninsula? Discussing the broader applicability of the methodology and insights would enhance the global significance of the study.

In addition, I suggest the authors consider the following points:
Include a study area map: Add a map of the Arabian Peninsula showing the region’s topography and its location in a global context. This would help orient readers unfamiliar with the area.

Describe ML Challenges: Provide a more detailed discussion of the challenges and limitations in implementing ML for P data generation.

Discuss Practical Applications: Expand the discussion to highlight potential applications of the dataset, such as its use in flash flood risk mitigation, water resource management, or climate-related decision-making in arid regions.
Citation: https://doi.org/10.5194/egusphere-2025-254-RC2
- AC2: 'Reply on RC2', Xuetong Wang, 05 Jun 2025
  
  Dear Anonymous Reviewer 2,
  Thank you for your comments and suggestions! Please find attached our detailed responses to your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2025-254-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-254', Anonymous Referee #1, 05 Mar 2025
This study presents a machine-learning-based approach to estimating gridded rainfall data for Saudi Arabia, an arid region with significant data limitations. The proposed dataset, SaRa, is compared against multiple existing precipitation datasets. The approach uses a combination of random forests and XGboots models. While not very novel, the results suggest superior performance, thus adding value and contributing to the data availability in the region. While the paper is well-structured with a sound methodology, fundamental concerns arise regarding the model accuracy away from training sites, generalizability, and reliability of the identified trends.
Major:
I appreciate the authors filtering for potentially double precipitation gauges within 2 km, but the paper needs more clarity on how the split training/testing sample was performed. Was it random? Stratified? Distance-based?

When applying ML to geospatial datasets, a critical issue is the use of testing sites near training sites that often artificially boost validation statistics. That’s because precipitation data is spatially correlated. To enhance transparency and thrust into ML approaches, the accuracy of the ML models should also be evaluated based on their distance from training sites. Please plot the KGE testing accuracy of each testing point vs. its distance from the nearest training site (km). This will evidence how well the proposed ML approach is trusted in distant/ungauged areas. This plot would be informative for the main individual ML models and the ensemble stack.

The ensemble approach, while interesting, results in a black-box system—there is little discussion on the physical interpretability of the model structures and the predictive power of the inputs. Sklearn Random forests and XGboost have out-of-the-box libraries that can be easily deployed to evaluate model interpretability further. This could improve model understanding and expand the proposed approaches' generalizability.

The study does not sufficiently address uncertainty in trend estimations. There are no confidence intervals, no discussion of interannual variability, and no attempt to separate natural variability from long-term trends. Given the known issues with historical precipitation datasets, particularly in arid regions, one must question how much of the trend results from dataset evolution rather than actual climate change.

Moderate:
The paper would benefit from a quantitative analysis and discussion of how temporal resolution mismatches in the gauge data impact validation results.

Minor:
L 143 clarify what are gross errors.
Citation: https://doi.org/10.5194/egusphere-2025-254-RC1
- AC1: 'Reply on RC1', Xuetong Wang, 05 Jun 2025
  
  Dear Anonymous Reviewer 1,
  Thank you for your comments and suggestions! Please find attached our detailed responses to your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2025-254-AC1
RC2:
'Comment on egusphere-2025-254', Anonymous Referee #2, 30 May 2025
Review for " Saudi Rainfall (SaRa): Hourly 0.1° Gridded Rainfall (1979–Present) for Saudi Arabia via Machine Learning Fusion of Satellite and Model Data" by Wang et al. submitted to EGUsphere (MS No.: egusphere-2025-254).
General comments:
The authors introduce Saudi Rainfall (SaRa), a gridded precipitation product for the Arabian Peninsula developed using Machine Learning (ML) techniques. They clearly present the motivation behind the development of such a dataset, describe the procedures used to generate the SaRa product, and evaluate its performance. By leveraging a large amount of available gauge-based and gridded datasets, the authors produce a new dataset that shows improved performance compared to existing products—particularly in areas with sparse station observations and in the dominantly arid regions of the Arabian Peninsula.
This work makes a valuable contribution to the data community and enhances scientific understanding of precipitation patterns in data-scarce, arid environments. The overall quality of the manuscript is good, with well-cited references and generally clear writing. However, there is still room for further improvement. In particular, I would like to raise two main concerns:
Limitations of Machine Learning: What are the potential limitations, challenges and sources of error introduced by using Machine Learning techniques in generating this dataset? A discussion on uncertainties and biases associated with ML itself would strengthen the paper.

Broader Impact and Global Appeal: What is the relevance of this work beyond the Arabian Peninsula? Discussing the broader applicability of the methodology and insights would enhance the global significance of the study.

In addition, I suggest the authors consider the following points:
Include a study area map: Add a map of the Arabian Peninsula showing the region’s topography and its location in a global context. This would help orient readers unfamiliar with the area.

Describe ML Challenges: Provide a more detailed discussion of the challenges and limitations in implementing ML for P data generation.

Discuss Practical Applications: Expand the discussion to highlight potential applications of the dataset, such as its use in flash flood risk mitigation, water resource management, or climate-related decision-making in arid regions.
Citation: https://doi.org/10.5194/egusphere-2025-254-RC2
- AC2: 'Reply on RC2', Xuetong Wang, 05 Jun 2025
  
  Dear Anonymous Reviewer 2,
  Thank you for your comments and suggestions! Please find attached our detailed responses to your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2025-254-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (06 Jun 2025) by Rohini Kumar

AR by Xuetong Wang on behalf of the Authors (11 Jul 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (14 Jul 2025) by Rohini Kumar

AR by Xuetong Wang on behalf of the Authors (17 Jul 2025) Manuscript

Journal article(s) based on this preprint

08 Oct 2025

Saudi Rainfall (SaRa): hourly 0.1° gridded rainfall (1979–present) for Saudi Arabia via machine learning fusion of satellite and model data

Xuetong Wang, Raied S. Alharbi, Oscar M. Baez-Villanueva, Amy Green, Matthew F. McCabe, Yoshihide Wada, Albert I. J. M. Van Dijk, Muhammad A. Abid, and Hylke E. Beck

Hydrol. Earth Syst. Sci., 29, 4983–5003, https://doi.org/10.5194/hess-29-4983-2025,https://doi.org/10.5194/hess-29-4983-2025, 2025

Short summary

Xuetong Wang, Raied S. Alharbi, Oscar M. Baez-Villanueva, Amy Green, Matthew F. McCabe, Yoshihide Wada, Albert I. J. M. Van Dijk, Muhammad A. Abid, and Hylke Beck

Viewed

Total article views: 1,346 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,111	205	30	1,346	44	67

HTML: 1,111
PDF: 205
XML: 30
Total: 1,346
BibTeX: 44
EndNote: 67

Views and downloads (calculated since 03 Feb 2025)

Month	HTML	PDF	XML	Total
Feb 2025	170	45	5	220
Mar 2025	70	20	1	91
Apr 2025	51	18	2	71
May 2025	69	28	2	99
Jun 2025	84	28	11	123
Jul 2025	41	17	0	58
Aug 2025	129	27	0	156
Sep 2025	476	19	9	504
Oct 2025	21	3	0	24
Nov 2025	0

Cumulative views and downloads (calculated since 03 Feb 2025)

Month	HTML	PDF	XML	Total
Feb 2025	170	45	5	220
Mar 2025	70	20	1	91
Apr 2025	51	18	2	71
May 2025	69	28	2	99
Jun 2025	84	28	11	123
Jul 2025	41	17	0	58
Aug 2025	129	27	0	156
Sep 2025	476	19	9	504
Oct 2025	21	3	0	24
Nov 2025	0

Viewed (geographical distribution)

Total article views: 1,289 (including HTML, PDF, and XML) Thereof 1,289 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 13 Nov 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (3863 KB)
Metadata XML

Short summary

Our paper introduces Saudi Rainfall (SaRa), a high-resolution, near real-time rainfall product for the Arabian Peninsula. Using machine learning, SaRa combines multiple satellite and (re)analysis datasets with static predictors, outperforming existing products in the region. With the fast development and continuing growth in water demand over this region, SaRa could help to address water challenges and support resource management.


Total:	0
HTML:	0
PDF:	0
XML:	0