GraphIDW: Incorporating spatial autocorrelation in satellite&ndash;gauge precipitation merging using graph neural networks over a tropical region

Peiris, Nadee; Perera, Chamal; Wijayaratna, Nimal; Rajapakse, Lalith; Wijemannage, Ajith

doi:10.5194/egusphere-2025-6551

Preprints

https://doi.org/10.5194/egusphere-2025-6551

Preprints

13 Feb 2026

| 13 Feb 2026

Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

GraphIDW: Incorporating spatial autocorrelation in satellite–gauge precipitation merging using graph neural networks over a tropical region

Nadee Peiris, Chamal Perera, Nimal Wijayaratna, Lalith Rajapakse, and Ajith Wijemannage

Abstract. Ground-based rain gauges remain the benchmark for accurate precipitation measurement; however, their sparse spatial distribution limits the representation of rainfall heterogeneity. Satellite-based Precipitation Products (SPPs) provide consistent spatial coverage but are often affected by retrieval errors and regional biases, restricting their direct use in local-scale hydrological applications. To overcome these limitations, Precipitation Data Merging (PDM) techniques integrating gauge and satellite observations have gained prominence. This study introduces a novel Machine Learning (ML) framework, GraphIDW, which combines Graph Neural Networks (GNNs) with Inverse Distance Weighting (IDW) interpolation to explicitly incorporate spatial autocorrelation into the merging process, addressing a major limitation of traditional ML-based PDM approaches. The framework was evaluated across the Wet Zone of Sri Lanka from 2001 to 2015 using two state-of-the-art SPPs (IMERG and CHIRPS) together with ground observations. IMERG data (0.1°) were first downscaled to 0.05° using CHIRPS, after which the downscaled product was merged with gauge observations through GraphIDW. A total of 60 gauges (70 %) were used for training and 28 (30 %) for validation. Results show that GraphIDW outperforms conventional ML algorithms, including Random Forest, Artificial Neural Network, Support Vector Regression, and XGBoost. It achieved the highest probability of detection (0.97) and reduced root mean square error (RMSE) and mean absolute error (MAE) by 13 %–41 % and 9 %–36 %, respectively, compared with the original SPPs. The results demonstrate that explicitly accounting for spatial dependence through graph-based learning significantly improves precipitation estimation, particularly in regions characterized by strong spatial heterogeneity. By embedding spatial autocorrelation directly into the merging process, GraphIDW provides a robust and computationally efficient framework for generating high-resolution rainfall datasets that are better suited for hydrological analysis in complex climatic and topographic settings.

Received: 30 Dec 2025 – Discussion started: 13 Feb 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Nadee Peiris, Chamal Perera, Nimal Wijayaratna, Lalith Rajapakse, and Ajith Wijemannage

Status: open (until 19 Apr 2026)

Post a comment Subscribe to comment alert

RC1:
'Comment on egusphere-2025-6551', Anonymous Referee #1, 11 Mar 2026 reply
This manuscript presents a novel study applying graph-based machine learning methods to precipitation estimation. The use of Graph Neural Networks (GNNs) is becoming increasingly popular in the Earth sciences, particularly for problems involving non-Euclidean data structures. In this regard, the study addresses an important topic and has the potential to contribute to the growing research exploring graph-based approaches in the spatial mapping of precipitation.
Overall, the paper is well structured and generally easy to follow, with a clear presentation of the study objectives and methodology. However, several major issues related to the methodology, evaluation framework, and clarity of some sections should be addressed before the manuscript can be considered for further review. These comments are outlined in the section below. Additional minor comments, including grammar, typographical corrections, and reference-related issues (like missing references), will be provided in a subsequent review round after the major concerns have been addressed.
Minor Comments
The introduction would benefit from incorporating several important recent studies that have applied innovative deep learning approaches and explicitly accounted for spatial autocorrelation in precipitation estimation frameworks. Including these studies would better position the current work within the existing literature and highlight methodological differences and contributions. Some relevant examples include (there are also several papers missing related ot GNNs methods and their recent applications):
https://doi.org/10.3390/rs15174160

https://doi.org/10.1016/j.rse.2023.113723

https://doi.org/10.1016/j.atmosres.2022.106159

Although the study primarily uses IMERG V6, the Data Availability section mentions the use of IMERG V7. This inconsistency should be clarified

It is recommended to include the Kling–Gupta Efficiency (KGE) metric in the evaluation. KGE has become a widely accepted performance metric in hydrological studies because it simultaneously accounts for correlation, bias, and variability, providing a more balanced assessment of model performance.

A more comprehensive statistical analysis of the gauge observations is needed. For example, it would be helpful to present seasonal variability of precipitation, mean precipitation distribution across stations, and the elevation-precipitation correlation.

In the manuscript, it is stated in the table that only monthly CHIRPS data were used. However, the methodology section and Figure 3 indicate that daily CHIRPS data were also used for downscaling. This discrepancy should be clarified.

Additionally, the manuscript should include a comparison of IMERG before and after downscaling. Downscaling should ideally lead to at least some improvement in accuracy; otherwise, simple interpolation techniques such as bilinear interpolation or nearest neighbor resampling might produce similar results.

Latitude and longitude were used as input features in the model. However, these variables are static spatial attributes, while satellite observations are dynamic temporal features. It can be concluded that the location is already encoded in the graph structures through adjacency and edge weight matrices.

Since the grid structure appears to be regular, the edge weights between nodes are likely identical. In such cases, a binary adjacency matrix may be sufficient. The manuscript should clarify whether weighted edges provide additional benefits in this context.

Please clarify which software packages or libraries were used to implement the GraphIDW model and the other machine learning methods. Providing implementation details improves reproducibility.

Major Comments
The post-processing residual correction was applied only to the GraphIDW approach, while the other machine learning methods were evaluated without this correction. This introduces an inconsistency in the comparison. It is recommended to apply the correction method to all machine learning models, which would allow for more direct (apple-to-apple) comparison between GraphIDW and the traditional ML approaches.

Inverse Distance Weighting (IDW) is a simple yet effective spatial interpolation method and is commonly used as a benchmark method. However, its spatial patterns are often strongly influenced by the uniform weighting scheme and bull’s-eye effect. The manuscript should clearly position IDW as a baseline and discuss its limitations relative to more advanced methods.

The manuscript mentions the Single Mass Curve method, but it is not sufficiently explained. Please provide a brief description of the method and explain how it contributes to assessing the reliability of the products.

It is strongly recommended to include maps showing the spatial distribution of mean precipitation (or representative high-intensity events) across the study region. Such visual comparisons are important because realistic spatial precipitation patterns are a critical indicator of model performance.

The manuscript states that the proposed approach follows the methodology of Baez-Villanueva et al. (2020) and Zhang et al. (2021). However, based on the description provided in Section 3.3, it appears that the implementation corresponds only to the method proposed by Baez-Villanueva et al. (2020). The approach introduced by Zhang et al. (2021) differs slightly. Therefore, the statement that the study follows both approaches may need clarification. Incorporating the method proposed by Zhang et al. (2021) could improve model accuracy. It would therefore be helpful if the authors could clarify this point and explain which method they exactly implemented.

It is recommended to reconsider the inclusion of Figure 13. The comparison of computational speed may not be fully informative without providing details about the computational hardware. For example, methods such as ANN and GNNs can be significantly accelerated when implemented on GPUs, whereas Random Forest (RF) models typically depend heavily on CPU-based multithreading. It is also unclear whether multithreaded training was used for the RF model and how many CPU cores were available. If the authors intend to keep this analysis, it is strongly recommended to report the hardware configuration used for training, including CPU specifications, number of cores, GPU usage (if any), and relevant software settings.

Reply
Citation: https://doi.org/10.5194/egusphere-2025-6551-RC1

Nadee Peiris, Chamal Perera, Nimal Wijayaratna, Lalith Rajapakse, and Ajith Wijemannage

Viewed

Total article views: 376 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
254	103	19	376	41	38

HTML: 254
PDF: 103
XML: 19
Total: 376
BibTeX: 41
EndNote: 38

Views and downloads (calculated since 13 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	174	71	12	257
Mar 2026	80	32	7	119

Cumulative views and downloads (calculated since 13 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	174	71	12	257
Mar 2026	80	32	7	119

Viewed (geographical distribution)

Total article views: 348 (including HTML, PDF, and XML) Thereof 348 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 28 Mar 2026

Short summary

Rain gauges give very accurate rainfall estimates, but they are too widely spaced to capture local rainfall variability. Satellites cover large regions but often contain local errors. Our study introduces GraphIDW, a new method that smartly combines satellite data and ground observations, considering spatial rainfall patterns. Applied across Sri Lanka, the method produced more accurate rainfall estimates, offering clear benefits for flood forecasting and climate analysis in complex environments.


Total:	0
HTML:	0
PDF:	0
XML:	0