Technical note: Mind the gap – benchmarking of various imputation approaches for precipitation stable isotope time series
Abstract. Stable isotopes of hydrogen and oxygen in precipitation (δₚ) are important natural tracers in wide range of environmental applications (e.g., the exploration of the water cycle, ecology and food authenticity), yet observational records commonly contain gaps, although applications in hydrology and earth science frequently require complete cases. Eight imputation approaches were benchmarked using monthly δₚ time series from Austria, Slovenia, and Hungary. Uninterrupted periods were selected, and monthly data were masked site-wise with an increasing degree of missingness, removing 1 to 32 % of the data using bootstrapping. The imputation performance of the following methods was assessed on the masked monthly data using the mean absolute difference and root mean square error between the observed and imputed values for primary and secondary isotopic parameters: Last Observation Carried Forward, Linear Interpolation, Spline Interpolation, Stineman Interpolation, Kalman Smoothing, Moving Average Imputation, Sinusoidal fit, and a spatial proximity-based imputation (SPbI) approach introduced in the present paper. SPbI estimates missing δₚ values using the mean of altitude-corrected δₚ data from within a predefined search radius. Across masking levels, SPbI was the most accurate and least prone to amplitude damping in δₚ records. Sinusoidal imputation remained robust under increasing missingness but has shown a tendency of reducing extremes, indicating amplitude loss in both δₚ and d-excess. Spline performed worst overall with the rest performing similarly up to ~16 % masking beyond which their performance deteriorated. A sensitivity analysis using non-cumulative 50-km distance bands up to 400 km showed that SPbI errors increase with distance; beyond ~250 km, mean errors approach those of the sinusoidal method, making the sinusoidal—or even the simpler linear interpolation—a viable alternative when proximal observations are sparse. The benchmarking results recommend the use of SPbI where station data are available within 250 km distance, and the sinusoidal or linear approach otherwise.
This review is for the manuscript “Technical note: Mind the gap – benchmarking of various imputation approaches for precipitation stable isotope time series” by Kern and Hatvani. The study seeks to compare different approaches for data imputation in monthly precipitation isotope time series records, such as those most commonly archived by the Global Network of Isotopes in Precipitation. The authors present several existing methods as well as a new method that they developed and test the efficacies of each using a study domain located largely in Austria with a few additional sites in Hungary and Slovenia. The methods are compared by omitting varying percentages of the data to evaluate the accuracy of imputation approaches against known values for missing data. Overall, I found the manuscript readable and informative, but in places the writing could be expanded for clarity, while in other cases condensed for brevity. I marked a few such examples in the line specific comments below. I think the topic is relevant for the journal and would be of interest to its readers. While I appreciate that this text was submitted as a technical note, I do also think that it might be worthwhile to expand the analyses to include other regions and to test for how the approaches perform when whole stations are removed, not just some data points from records at each station. My personal recommendation is that this should be published after at least some minor work to address the small comments, but also that it could be greatly improved with some additional analyses, and that the impact of the work would ultimately be higher if the authors did decide to take things a step further.
General comments:
The evaluation of the different methods was done in a single geographic region with relatively high station density. The effectiveness of different gap filling procedures was evaluated by randomly dropping varying numbers of observations, but retaining all stations (no whole datasets were dropped). This leaves me wondering how the results might differ if one were to take this approach to another part of the world with a different climate regime, different geography, such as more or fewer mountains or coastlines, and fewer stations overall. Since the code is all written, it actually wouldn’t be a tremendous effort to pick, let’s say, two more geographic regions in other parts of the world to repeat the analysis in and evaluate and compare the results. While it would require a bit of new coding, it also wouldn’t be extremely difficult to analyze the role that station density has on the results. In other words, how do the results look and compare if increasing numbers of complete time series are omitted, and not just individual observations within each time series?
Another suggestion is that it would be useful to compare the imputation approaches to model predictions of monthly values from model products that can predict monthly values, including piso.ai and isogsm, and even possibly OIPC and RCWIP. Since these models already exist and can predict monthly values, is it actually better to do imputation, or to use these existing models? Of course, OIPC and RCWIP don’t make predictions for specific months, but even so, it might be useful to compare the imputation methods tested here to them as a benchmark.
Line specific comments:
Line 33: “Stable isotope” to “The stable isotope”
Line 34: Could update reference to the Coplen (2011) paper
Line 43: suggest changing “data sets” to “datasets”
Line 53: clarify what a “decadal problem” is
Line 54: “issue” to “issues”
Line 72: “would outperforms” to “outperforms”
Line 84: This is a general comment, but using this citation as a specific example, some of the references aren’t formatted consistently. In this case, I wanted to look for the Hatvani (2026) reference, but I couldn’t find it. I think it relates to this, but I couldn’t find the paper.
Hatvani, I. G., Erdélyi, D., Vreča, P., Lojen, S., Žagar, K., Gačnik, J., and Kern, Z.: Online screening tool for precipitation stable isotopes records: hybrid distance / density based outlier filtering approach via interactive web application, Journal of Hydrology, HYDROL69130R69131, 2026.
Lines 87-89: I don’t follow this. So, you calculated the d-excess, and then “This metric was applied consistently to identify the same candidate period with the most continuous station data for both parameters.” Please expand/clarify what you mean by this exactly. I also don’t understand the next sentence. What does it mean to compare the set of months with d-excess measurements against the full time span of a given record? I think ideas in these two sentences could be clarified by separating statements about what was done from statements about why it was done. It would make for a few more, but shorter sentences. It would also offer more space to clearly explain what exactly was done.
Line 90: “required” for what? We still don’t have an overview of what’s going on.
Line 92: “became meaningless” for what? We are again missing some context.
Lines 146-156: This paragraph may not be necessary. I’m not sure that you need to give such a detailed explanation with equations for RMSE and MAD. In any case, the mean absolute difference uses the difference between each prediction and the mean of all predictions. It does not use the actual value of the predictand. So, in other words, it characterizes precision rather than accuracy. If you want to use the known value in an equation like this, then you want to calculate the mean absolute error (MAE).
Lines 177-178: “In the meanwhile” is incorrect/uncommon phrasing.
Lines 183-184: It’s not exactly fair to say this. The patten is certainly smaller than it is for other methods, but the point cloud is in all cases still not flat.