the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Saudi Rainfall (SaRa): Hourly 0.1° Gridded Rainfall (1979–Present) for Saudi Arabia via Machine Learning Fusion of Satellite and Model Data
Abstract. We introduce Saudi Rainfall (SaRa), a gridded historical and near real-time precipitation (P) product specifically designed for the Arabian Peninsula, one of the most arid, water-stressed, and data-sparse regions on Earth. The product has an hourly 0.1° resolution spanning from 1979 to the present and is continuously updated with a latency of less than two hours. The algorithm underpinning the product involves 18 machine learning model stacks trained for different combinations of satellite and (re)analysis P products along with several static predictors. As a training target, hourly and daily P observations from gauges in Saudi Arabia (n=113) and globally (n=14,256) are used. To evaluate the performance of SaRa, we carried out the most comprehensive evaluation of gridded P products in the region to date, using observations from independent gauges (excluded from training) in Saudi Arabia as a reference (n=119). Among the 20 evaluated P products, our new product, SaRa, consistently ranked first across all evaluation metrics, including the Kling-Gupta Efficiency (KGE), correlation, bias, peak bias, wet days bias, and critical success index. Notably, SaRa achieved a median KGE — a summary statistic combining correlation, bias, and variability — of 0.36, while widely used non-gauge-based products such as CHIRP, ERA5, GSMaP V8, and IMERG-L V07 achieved values of -0.07, 0.21, -0.13, and -0.39, respectively. SaRa also outperformed four gauge-based products such as CHIRPS V2, CPC Unified, IMERG-F V07, and MSWEP V2.8 which had median KGE values of 0.17, -0.03, 0.29, and 0.20, respectively. Our new P product — available at www.gloh2o.org/sara — addresses a crucial need in the Arabian Peninsula, providing a robust and reliable dataset to support hydrological modeling, water resource assessments, flood management, and climate research.
- Preprint
(3863 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2025-254', Anonymous Referee #1, 05 Mar 2025
reply
This study presents a machine-learning-based approach to estimating gridded rainfall data for Saudi Arabia, an arid region with significant data limitations. The proposed dataset, SaRa, is compared against multiple existing precipitation datasets. The approach uses a combination of random forests and XGboots models. While not very novel, the results suggest superior performance, thus adding value and contributing to the data availability in the region. While the paper is well-structured with a sound methodology, fundamental concerns arise regarding the model accuracy away from training sites, generalizability, and reliability of the identified trends.
Major:
- I appreciate the authors filtering for potentially double precipitation gauges within 2 km, but the paper needs more clarity on how the split training/testing sample was performed. Was it random? Stratified? Distance-based?
- When applying ML to geospatial datasets, a critical issue is the use of testing sites near training sites that often artificially boost validation statistics. That’s because precipitation data is spatially correlated. To enhance transparency and thrust into ML approaches, the accuracy of the ML models should also be evaluated based on their distance from training sites. Please plot the KGE testing accuracy of each testing point vs. its distance from the nearest training site (km). This will evidence how well the proposed ML approach is trusted in distant/ungauged areas. This plot would be informative for the main individual ML models and the ensemble stack.
- The ensemble approach, while interesting, results in a black-box system—there is little discussion on the physical interpretability of the model structures and the predictive power of the inputs. Sklearn Random forests and XGboost have out-of-the-box libraries that can be easily deployed to evaluate model interpretability further. This could improve model understanding and expand the proposed approaches' generalizability.
- The study does not sufficiently address uncertainty in trend estimations. There are no confidence intervals, no discussion of interannual variability, and no attempt to separate natural variability from long-term trends. Given the known issues with historical precipitation datasets, particularly in arid regions, one must question how much of the trend results from dataset evolution rather than actual climate change.
Moderate:
- The paper would benefit from a quantitative analysis and discussion of how temporal resolution mismatches in the gauge data impact validation results.
Minor:
- L 143 clarify what are gross errors.
Citation: https://doi.org/10.5194/egusphere-2025-254-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
258 | 72 | 6 | 336 | 8 | 7 |
- HTML: 258
- PDF: 72
- XML: 6
- Total: 336
- BibTeX: 8
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 98 | 29 |
Saudi Arabia | 2 | 51 | 15 |
China | 3 | 28 | 8 |
Germany | 4 | 15 | 4 |
Belgium | 5 | 11 | 3 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 98