the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evaluating Surface Mass Balance Variability from Climate Models using GPS Bedrock Vertical Time Series data
Abstract. Accurate estimates of Antarctic Surface Mass Balance (SMB) are essential for quantifying ice-sheet mass changes and their contributions to global sea level rise. Regional Climate Models (RCMs) and atmospheric reanalyses provide SMB products that are widely used in glaciology and climatology studies, yet substantial discrepancies between models persist. This study evaluates interannual to decadal variability in seven SMB models by comparing computed SMB elastic vertical bedrock displacements with GPS vertical timeseries from across Antarctica. The models vary in spatial and temporal resolution: RACMO2.3p2 (27 km), RACMO2.4p1 (11 km), statistically downscaled RACMO2.3p2 (2 km), MAR (35 km), GEMB (10 km), HIRHAM5 (12.5 km) and MERRA2 (12.5 km). Model performance is assessed through the quantification of low-frequency variance reduction in GPS residuals after SMB loading correction and by computing scale factors between the observed and model time series. Results indicate that all considered SMB models reduce long-period (>1.5 yr) GPS variance on average, but performance varies across Antarctic regions and GPS sites. All RACMO variants, specifically the higher-resolution variants (2 and 11 km) show better performance overall, achieving typically the largest variance reductions and yielding scale factors closest to unity, particularly in the Antarctic Peninsula and coastal margin of Antarctica; MERRA2 and HIRHAM5 have the weakest overall performance. Our findings suggest that GPS observations, with some limitations, provide a useful new constraint on SMB model evaluation that yields insights into spatial and temporal variabilities that traditional SMB model evaluations are unable to fully resolve.
- Preprint
(2274 KB) - Metadata XML
-
Supplement
(3382 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2026-1146', Anonymous Referee #1, 30 Apr 2026
-
AC1: 'Reply on RC1', Jenan Rajavarathan, 10 Jun 2026
We thank Reviewer 1 for their positive assessment of our study and the constructive comments below.
Reviewer Comment:
- The authors indicate in Section 2.1 that they compute the elastic loading displacements in a centre-of-solid Earth (CE) reference frame at each GPS site location. However, GPS displacement time series are in the centre-of-figure (CF) frame. Although the CE frame closely approximates the CF frame, it is important to acknowledge this potential mismatch in the manuscript.
Author’s response: We thank the reviewer for raising this point. While the CE-CF approximation is already noted in the manuscript (line 89), we agree that this deserves a more explicit acknowledgement. We will expand the relevant sentence in Section 2.1 to explicitly acknowledge the CE/CF frame distinction and its negligible impact (~2%; Dong et al. (1997)) at the interannual to decadal periods considered in this study.
Reviewer Comment:
- The authors also mention that SMB anomalies are bilinearly interpolated onto a common regular grid of 2 km resolution. It would be very complementary to the discussion to address the effect of interpolation, specifically smoothing of the signal versus using the native grid resolution.
Author’s response: The choice of interpolation into a common 2km grid was motivated by the requirement for a consistent spatial framework for the elastic loading calculations and to enable cross-model comparison. Bilinear interpolation produces smooth fields. However, using native resolution introduces errors at the coastline which map into the loading computations. We will explicitly include discussion of this issue in section 2.1.
Reviewer Comment:
- In line 170, it is also worth mentioning that different SMB models adopt different topography grids, which might potentially contribute to spatial coherence variability.
Author’s response: We agree with the suggestion. We will add a line near line 170 (in section 3.1) noting that the differences in model topography grids represent an additional source of spatial variability in the coherence.
Citation: https://doi.org/10.5194/egusphere-2026-1146-AC1
-
AC1: 'Reply on RC1', Jenan Rajavarathan, 10 Jun 2026
-
CC1: 'Comment on egusphere-2026-1146', Nicole-Jeanne Schlegel, 01 May 2026
Hi Authors,
Thank you for your work on this. Just a note about the spatial resolution of GEMB. It looks like you are using the zenodo version forced with ERA5. As opposed to the version in the GEMB paper, that was forced with high spatial resolution RACMO output, the ERA5 GEMB are directly forced with low-resolution reanalysis (~.25 degrees). This means the input is lower resolution than 10km. Even though the model is run, and reported, at a higher spatial resolution, in this version of GEMB there is no true "downscaling" involved. The ERA5 inputs are simply interpolated to the GEMB grid before the energy and mass balance is calculated. This is probably the reason that the results look smoother and more like a lower resolution output. While the current GEMB release includes the model just running the column snow model in the input it receives, downscaling routines will be part of a release in the near future, and we should expect to see a sharpening if gradients in areas like the margins. Happy to discuss more if you need more information.
Best wishes and good luck with paper, Nicole
Citation: https://doi.org/10.5194/egusphere-2026-1146-CC1 -
AC2: 'Reply on CC1', Jenan Rajavarathan, 10 Jun 2026
Community Comment:
- Thank you for your work on this. Just a note about the spatial resolution of GEMB. It looks like you are using the zenodo version forced with ERA5. As opposed to the version in the GEMB paper, that was forced with high spatial resolution RACMO output, the ERA5 GEMB are directly forced with low-resolution reanalysis (~.25 degrees). This means the input is lower resolution than 10km. Even though the model is run, and reported, at a higher spatial resolution, in this version of GEMB there is no true "downscaling" involved. The ERA5 inputs are simply interpolated to the GEMB grid before the energy and mass balance is calculated. This is probably the reason that the results look smoother and more like a lower resolution output. While the current GEMB release includes the model just running the column snow model in the input it receives, downscaling routines will be part of a release in the near future, and we should expect to see a sharpening if gradients in areas like the margins. Happy to discuss more if you need more information.
Author's response:
We confirm that we used the ERA5-forced GEMB product (Zenodo version 1) in this study. We thank Nicole for this very helpful comment, which clarifies an important aspect of the GEMB product used in our study. This comment helps explain our finding that the performance of GEMB_10, particularly over the Antarctic Peninsula, is comparatively weaker than RACMO_2DS and RACMO_11, despite its nominally comparable grid resolution. We will update the relevant material in the data section and will add a sentence in the discussion to clarify that the GEMB product used is fundamentally at a lower resolution.
Citation: https://doi.org/10.5194/egusphere-2026-1146-AC2
-
AC2: 'Reply on CC1', Jenan Rajavarathan, 10 Jun 2026
-
RC2: 'Comment on egusphere-2026-1146', Brooke Medley, 12 May 2026
Summary
Here, the authors use novel GPS vertical time series from sites across Antarctica to evaluate the variability in SMB from several atmospheric models, focusing on the low frequency variance reduction in the GPS signals after SMB loading correction. They find all models reduce the variance but to different magnitudes and their performance also varies by location. The paper adds a new element to SMB model evaluation: the ability to assess variations in SMB as they are often evaluated against annual to centennial averages in net SMB from in situ measurements or ground-based and airborne radar studies.
Evaluation
Overall, the manuscript was well-written and contained clear and concise descriptions of the methodology and interpretations (although there are a few minor suggestions for some additional detail). Given the difficulty in evaluating SMB across Antarctica, it is a timely and very important submission, especially as more SMB models come online at increasingly finer spatial resolution. While the solid earth modeling is beyond my expertise, I have few comments regarding the methodology, I do believe that the paper would benefit from additional exploration or discussion of various interpretations of their findings, which are detailed below. A few minor comments follow these more substantive comments.
- The authors are clearly aware of this issue as it is discussed briefly in the text, but it is not really clear (and perhaps that is because it is not) on which dimension of variability is being evaluated here, and I’m assuming the answer is likely both variability in space and time. The title states, ”Evaluating Surface Mass Balance Variability from Climate Models…”, but it’s obviously a convolution of space and time, which make its somewhat more difficult for the modelers to assess why their model performs as it does. This conundrum is central to the paper, and I believe deserves more exploration. A few additional thoughts to consider:
- Can one evaluate from each model the cells where the spatial signal exceeds the local signal? As in, where does the far field signal exceed the near field? Perhaps this would help us better understand the strengths/weaknesses of each model.
- Related to the above, the GPS measurements are typically restricted to areas of high spatial SMB variability in coastal regions (although a few interior sites do exist). Because of this potential bias, the GPS sites that you use might be focused on regions where the far field signal exceeds the local scale, suggesting the GPS are more evaluating the spatial signal in the variability as opposed to the actual local variability through time, which would then bias performance metrics towards those models with finer spatial resolution.
- And related to the above, was there consideration of weighting of the variance reductions across all sites for each model? One might attribute the best performance to a specific model but perhaps that is solely because the sites happened to be concentrated in their region of highest performance. Some additional discussion of the sparsity of sampling is worthwhile.
- It would be interesting to explore the site-by-site variance reductions more. Which sites are all negative or positive contributors to variance reduction? Which has the largest range of performances? Across all models there are a substantial number of sites where the variance increases (red dots); what does this imply? There is SMB signal that the model is not capturing? It’s also important to note that the model with the highest variance explained had a median reduction in variance of 23%, which is still very small, suggesting there is a lot of unexplained SMB variability even in the highest performer.
- Finally, is there any impact of the choice of the time span of the reference climate interval or the long-term mean SMB on your results? After generating the cumulative anomalies, the authors detrend the time series, so that likely accounts for much of the differences that would result from a variable choice of reference time interval. There could still be minor differences, however. Is it worthwhile to explore various time reference intervals to see if it impacts the ability of the SMB models to reduce the variance in the residuals? Does the solid earth response have a memory of the historical mass loadings? The authors state that (while not perfect) there does appear to be some correlation between higher accumulation and higher accumulation variability in time with more variance explained. It could be that regions with higher mass loadings have a higher mass flux where the total mass signal has a “shorter” memory, resulting in a ~40 year record that can approximate most of the fluctuations whereas at lower accumulation sites there is a much longer memory that models cannot explain.
Minor Comments
- Line 65, states the SMB anomalies are determined by using a refence climate computed over 1980-2022, but TableS1 shows different start and end dates. I’m assuming that indeed they are all calculated over their common interval and that the start and end dates are simply the temporal extent of the model. Also, I assume that the detrending is based on that same interval (1980-2022)?
- For clarification, the MERRA-2 SMB product is provided at 12.5 km resolution, but the temporal variability in time comes from the MERRA-2 reanalysis, which is very coarse in space (0.5 degrees latitude by 0.625 degrees longitude). The precipitation magnitudes come from the 12.5 km high resolution M2R12K data product. Therefore, it is more of a blend between 12.5 km and several 10s of kms. Given that this paper focuses on the variability more, its representative resolution is likely much larger than 12.5 km given variability is driven by MERRA-2 at coarse resolution (see Medley et al., 2022).
References
- Medley, B., Neumann, T. A., Zwally, H. J., Smith, B. E., & Stevens, C. M. (2022). Simulations of firn processes over the Greenland and Antarctic ice sheets: 1980–2021. The Cryosphere, 16(10), 3971-4011.
Citation: https://doi.org/10.5194/egusphere-2026-1146-RC2 -
AC3: 'Reply on RC2', Jenan Rajavarathan, 10 Jun 2026
We thank reviewer 2 for their through and constructive review. We address the major and minor comments below:
Reviewer Comment:
- The authors are clearly aware of this issue as it is discussed briefly in the text, but it is not really clear (and perhaps that is because it is not) on which dimension of variability is being evaluated here, and I’m assuming the answer is likely both variability in space and time. The title states, “Evaluating Surface Mass Balance Variability from Climate Models…”, but it’s obviously a convolution of space and time, which make its somewhat more difficult for the modelers to assess why their model performs as it does. This conundrum is central to the paper, and I believe deserves more exploration. A few additional thoughts to consider:
Author's response: Yes, the variability occurs across both space and time, and GPS records from different sites often begin at different epochs, may end early, or contain data gaps. We acknowledge that these limitations are inherent to the dataset and will revise the data section to provide clear explanation and will revisit this issue in the discussion section.
The primary aim of our study is to evaluate the utility of GPS bedrock vertical time series to test SMB model variability – rather than to diagnose the physical reasons why individual models perform as they do, which would require a dedicated model-intercomparison study beyond the scope of the current paper. We will review the text to make sure the text is clear on this point.
Reviewer Comment:
- Can one evaluate from each model the cells where the spatial signal exceeds the local signal? As in, where does the far field signal exceed the near field? Perhaps this would help us better understand the strengths/weaknesses of each model.
Author's response: This is a very valuable suggestion. Figure 1 in the current manuscript illustrates for the CAS1 site that a far-field signal dominates over the immediate near-field signals at that site. We propose to add some further examples of this computation to illustrate how the sensitivity of each site varies as a function of both the distance to load variability and the magnitude of that variability. A comprehensive per-site, per-model computation of near-field and far-field signal dominance across all 98 sites and 7 models would be computationally extensive analysis. In the revised manuscript, we will include a figure showing the deformation contributions from SMB load variability around a selected set of stations along with corresponding variance reductions for three models (RACMO_2DS, MERRA2_12 and GEMB_10). We believe this additional analysis will also be useful for SMB model developers.
Reviewer Comment:
- Related to the above, the GPS measurements are typically restricted to areas of high spatial SMB variability in coastal regions (although a few interior sites do exist). Because of this potential bias, the GPS sites that you use might be focused on regions where the far field signal exceeds the local scale, suggesting the GPS are more evaluating the spatial signal in the variability as opposed to the actual local variability through time, which would then bias performance metrics towards those models with finer spatial resolution.
Author's response: The method is most useful when there is large mass variability in close proximity of a GPS station. The additional figures and table we will produce will illustrate this further. The Antarctic GPS network and the selected 98 sites in this study is distributed along coastal margins of Antarctica. We do not agree that the technique is evaluating the spatial signal as opposed to the temporal signal. Rather, the GPS is assessing the spatially integrated (a weighted spatial convolution) temporal mass variability. We will add an explicit discussion of this sampling bias in Section 4, noting that the GPS-based evaluation is most informative in the coastal and peninsula regions. Because we do not have enough sites in the interior, conclusions regarding model performance in the interior of Antarctica are comparatively under-constrained by the current network.
Reviewer Comment:
- And related to the above, was there consideration of weighting of the variance reductions across all sites for each model? One might attribute the best performance to a specific model but perhaps that is solely because the sites happened to be concentrated in their region of highest performance. Some additional discussion of the sparsity of sampling is worthwhile.
Author's response: Spatial weighting was not applied in the analysis, and we agree that unweighted median variance reduction metrics are influenced by the geographic clustering of sites, particularly in Antarctic Peninsula and West Antarctica. The regional breakdown analysis in Section 3.3 partially addresses this by allowing region-based interpretation rather than relying on a single continent-wide metric. We will add a little more discussion on the need to be aware that the site distribution needs to be considered when considering a model’s performance.
Reviewer Comment:
- It would be interesting to explore the site-by-site variance reductions more. Which sites are all negative or positive contributors to variance reduction? Which has the largest range of performances? Across all models there are a substantial number of sites where the variance increases (red dots); what does this imply? There is SMB signal that the model is not capturing? It’s also important to note that the model with the highest variance explained had a median reduction in variance of 23%, which is still very small, suggesting there is a lot of unexplained SMB variability even in the highest performer.
Author's response: We will expand the discussion regarding the sites that are consistently, across models, seeing increased variance and the sites which show the largest range of variance reductions across models.
Regarding the sites where variance increases (red circles in Figure 4b and Figure 5): as discussed in Sections 3.3 and 4, the increases in variance can arise from several causes such as: (i) the SMB loading signal at that site has poor temporal agreement with the GPS signal (ii) the GPS record at the site contains substantial non-SMBL low-frequency signal, such as from ice dynamic mass changes and volcanic site deformation, (iii) the SMB amplitude is significantly over- or under-estimated by the model, such that the correction degrades rather than improves the residual and (iv) the GPS time series is compromised at low frequencies due to snow instruction into radome or other equipment issues. We will make these interpretations more explicit in the discussion of the sites showing variance increases.
Regarding the variance reduction of 23% (median) of the best-performing model, we agree that according to Figure 4, a substantial unexplained variability remains when we consider all the sites. A substantial portion of the residual unexplained variance at GPS sites is attributable to non-SMBL geophysical signals that are common to all models evaluated, including solid Earth responses to ice dynamic changes, viscoelastic deformation, residual unmodelled non-SMB loading signals as well as low frequency GPS noise. In addition, the viscoelastic signals could also add together to the residual variance. These signals are not expected to be reduced by any SMB loading correction and thus add to the achievable variance reduction in the evaluation. The 23% median figure therefore does not imply that 77% of the GPS variance reflects SMB model deficiencies, and it reflects the combined effect of model limitations and non-SMBL signals described above. We will add explicit discussion of this interpretation in Section 4.
Reviewer Comment:
- Finally, is there any impact of the choice of the time span of the reference climate interval or the long-term mean SMB on your results? After generating the cumulative anomalies, the authors detrend the time series, so that likely accounts for much of the differences that would result from a variable choice of reference time interval. There could still be minor differences, however. Is it worthwhile to explore various time reference intervals to see if it impacts the ability of the SMB models to reduce the variance in the residuals?
Author's response: The reference period affects the mean of the anomalies which affects GPS site velocities as indicated in previous studies (e.g., King et al. (2022)). If one choses a very short reference period, this would affect the variability. However, choosing any sensible reference period (of several decades) results in tiny differences in loading variability. We will make a brief statement that the results are not sensitive to any sensible reference period.
Reviewer Comment:
- Does the solid earth response have a memory of the historical mass loadings? The authors state that (while not perfect) there does appear to be some correlation between higher accumulation and higher accumulation variability in time with more variance explained. It could be that regions with higher mass loadings have a higher mass flux where the total mass signal has a “shorter” memory, resulting in a ~40-year record that can approximate most of the fluctuations whereas at lower accumulation sites there is a much longer memory that models cannot explain.
Author's response:
Over periods shorter than centuries, the Earth is typically considered to be purely elastic, and hence there is no memory. However, in some parts of west Antarctica and the Antarctic Peninsula, mantle viscosity is sufficiently low to mean there can be an effect. This is the focus of the recent study by Nield et al. (2025), which we discussed in L373. Note that this is a small effect compared to the purely elastic response, and we will explicitly include this discussion starting at line 373.
Reviewer's minor comment:
- Line 65, states the SMB anomalies are determined by using a refence climate computed over 1980-2022, but TableS1 shows different start and end dates. I’m assuming that indeed they are all calculated over their common interval and that the start and end dates are simply the temporal extent of the model. Also, I assume that the detrending is based on that same interval (1980-2022)?
Author's response: The reviewer is correct on both the occasions. Reference common period of 1980-2022 are considered for SMB anomaly computations and while detrending. TableS1 shows the temporal coverage of each model considered in the study. We will modify the caption of Table S1 to reflect this without any ambiguity.
Reviewer's minor comment:
- For clarification, the MERRA-2 SMB product is provided at 12.5 km resolution, but the temporal variability in time comes from the MERRA-2 reanalysis, which is very coarse in space (0.5 degrees latitude by 0.625 degrees longitude). The precipitation magnitudes come from the 12.5 km high resolution M2R12K data product. Therefore, it is more of a blend between 12.5 km and several 10s of kms. Given that this paper focuses on the variability more, its representative resolution is likely much larger than 12.5 km given variability is driven by MERRA-2 at coarse resolution (see Medley et al., 2022).
Author's response: We thank the reviewer for this clarification. We will modify our discussion to note that the coarser effective resolution of the MERRA2_12 model as one of its contributing factors to its comparatively weaker performance.
Citation: https://doi.org/10.5194/egusphere-2026-1146-AC3
- The authors are clearly aware of this issue as it is discussed briefly in the text, but it is not really clear (and perhaps that is because it is not) on which dimension of variability is being evaluated here, and I’m assuming the answer is likely both variability in space and time. The title states, ”Evaluating Surface Mass Balance Variability from Climate Models…”, but it’s obviously a convolution of space and time, which make its somewhat more difficult for the modelers to assess why their model performs as it does. This conundrum is central to the paper, and I believe deserves more exploration. A few additional thoughts to consider:
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 339 | 112 | 24 | 475 | 78 | 32 | 34 |
- HTML: 339
- PDF: 112
- XML: 24
- Total: 475
- Supplement: 78
- BibTeX: 32
- EndNote: 34
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The paper "Evaluating Surface Mass Balance Variability from Climate Models using GPS Bedrock Vertical Time Series Data", by Rajavarathan et al., presents a new approach for assessing the performance of Surface Mass Balance (SMB) models using vertical land motion derived from GPS observations. In this paper, the authors use seven SMB model products to calculate the corresponding loading displacements, which are then compared to GPS as an independent observational reference. The paper suggests that GPS provides a useful constraint on SMB model evaluation, with varying performance between models depending on their resolution and forcing. Interestingly, all SMB-corrected GPS time series consistently show reduced long-period (>1.5 yr) variance on average, but performance varies across Antarctic regions and GPS sites.
The data processing is rigorous, and the discussion of the influence of SMBL on GPS time series, as well as the spectral analysis of residual time series, is adequate. Overall, the study is well executed and contributes valuable insights into estimates of ice-sheet mass variability and its varying contribution to sea-level change. However, a few minor aspects of the analysis require further justification:
- The authors indicate in Section 2.1 that they compute the elastic loading displacements in a centre-of-solid Earth (CE) reference frame at each GPS site location. However, GPS displacement time series are in the centre-of-figure (CF) frame. Although the CE frame closely approximates the CF frame, it is important to acknowledge this potential mismatch in the manuscript.
- The authors also mention that SMB anomalies are bilinearly interpolated onto a common regular grid of 2 km resolution. It would be very complementary to the discussion to address the effect of interpolation, specifically smoothing of the signal versus using the native grid resolution.
- In line 170, it is also worth mentioning that different SMB models adopt different topography grids, which might potentially contribute to spatial coherence variability.