the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Towards the Bayesian calibration of a glacier surface energy balance model for unmonitored glaciers
Abstract. Building on Bayesian calibration techniques and leveraging high-resolution climate simulations, we test the current capabilities of calibrating a surface energy balance model for unmonitored glaciers using only globally available satellite observations at Hintereisferner. We developed a multi-objective Bayesian framework for the Coupled Snowpack and Ice surface energy and mass balance model in Python (COSIPY), calibrated with satellite-derived geodetic mass balance, transient snowline altitudes, and mean glacier albedo. The framework is evaluated with in-situ observations and weather station measurements. A Latin Hypercube Sampling ensemble was used to investigate the model's parameter behaviour, establish priors and enable a Markov Chain Monte Carlo calibration helped by a novel and computationally efficient COSIPY emulator. The multi-objective calibration successfully constrains parameter distributions, addressing the equifinality between accumulation and albedo parameters. However, tighter parameter constraints, combined with imperfect climate simulations as forcing, create a model solution that requires a compromise between the three observational targets. Model evaluation shows that the calibrated ensemble reproduces glacier-mean albedos and inter-annual mass-balance variability well, but exhibits a negative bias in mean annual mass balance and a delayed snowline rise. This apparent contradiction arises from both forcing and model limitations. Incoming longwave radiation is overestimated throughout the year, while a warm and humid bias in the meteorological input during the later melt season enhances the positive turbulent fluxes. Early season melt is delayed by underestimated incoming shortwave radiation, and by an overly slow albedo decay caused by too high albedo aging and firn albedo parameters. Such biases in the meteorological forcing data remain a major obstacle to applying surface energy balance models to unmonitored glaciers. The calibration framework presented in this study provides diagnostic tools that help identify shortcomings and compensation effects within the modelling chain, paving the way for correcting forcing biases within a Bayesian framework as more observations become available. The open-source tools developed here have the potential to lower the barrier to studying the atmospheric drivers of glacier change at unmonitored sites with explicit treatment of uncertainty.
- Preprint
(21883 KB) - Metadata XML
-
Supplement
(7219 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-6249', Anonymous Referee #1, 07 Mar 2026
-
RC2: 'Comment on egusphere-2025-6249', Anonymous Referee #2, 15 May 2026
This study builds a Bayesian calibration technique to calibrate the surface energy and mass balance model COSIPY. The main goal of this study is to investigate a calibration approach enabling parameter tuning for unmonitored glaciers, relying on satellite-derived observational datasets only. Furthermore, sources of parameter equifinality and uncertainty are investigated. The calibration framework relies on Latin Hypercube Sampling to explore the model’s parameter behavior and establish priors for a Markov Chain Monte Carlo calibration.
The study is of high relevance for the energy balance modelling community, as calibration of complex energy balance models is a large challenge. The motivation and relevance of this study are well outlined in the introduction. The manuscript is overall well structured, and the figures are well presented. Overall, I would suggest minor revisions: I have two bigger comments and several specific comments which need addressing and revision before publishing the article.
Major comments
1. Description of methods
The applied methods and conducted methodological steps are pretty complex. Therefore, more details should be provided at several points, e.g.:
- The description of the statistical methods applied in section 2.4.1 & 2.4.2 (e.g., PCA, what are half-normal priors and why are they used). TC is a journal with a lot of readers from the glaciology community, which might not all be very familiar with these methods.
- The surrogate models: I miss a more detailed description of this approach. It is an important part of the calibration scheme. I would suggest including an entire subsection and explain the entire procedure in detail, including also the performance testing.
2. Parameters at edges of prior distribution
The posterior distributions of the model parameters (Fig. A1) are for some parameters very close to the prior’s edges. This is the case for ice and fresh snow albedo and albedo depth. For ice and fresh snow albedo, these values would be well within the literature range (Table1). Is this an indicator that the prior distribution was not well selected for these parameters? Could the model performance be improved by a prior distribution extended on these edges? It might be worth to test this, or at least address the issue more deeply in section 4.1.
Specific comments
L39ff: Highlight the advances of SEB models (e.g., physical basis, enable process understanding) to make it clearer to the reader why it is important to use them
L71: remove “to”
L87: Add Hintereisferner
Fig. 1: I suggest including terrain information in the overview map in the lower right corner.
L51ff: What is the temporal resolution of the inner domain, 2.2km data?
L161: What is the Earth System Grid Federation, and which data did you get from there?
Above, you state that the forcing data is from COSMO-CLM?L171ff: What is the resolution of your COSIPY simulations after distribution of the glacier surface?
L212: What is the rough frequency of SLA information? Is it daily, weekly, monthly?
Fig. 2: This figure is not fully clear to me. It is difficult to follow the lines and understand which element belongs to which part. Maybe you could try to use different background colors, or outline colors? I would also get rid of the color gradient in the background.
L261ff: I suggest rephrasing for readability, e.g.: “This error was set to zero during the accumulations season (...), since the model matches well the fully snow-covered winter conditions, and set to XY for the snowline and albedo in the ablation season (...).”
L302: Refer to Table 1
L305: Refer to Table 1
L314: Add one sentence what the PCA is used for and why it is needed. See major comment 1.
L337ff: Give more details on the surrogate model. Potentially add a complete subchapter. See major comment 1.
L349: What is meant by “posterior predictive checks”? Please add some details here.
L351: How do the results of Klug et al. (2018) fit to the Hugonnet (2021) data that you used for calibration? If these two datasets do not fit well together, the COSIPY data won’t fit well.
L370ff: I assume this whole paragraph refers to the AWS-assisted COSIPY simulations. If so, please mention it again at the beginning of the section to make it easier for the reader.
L378: Remove “analysis” as it is part of the abbreviation
L391: Due to the limitations of the Spearman rank correlation, would there be a better method for this analysis?
Fig.3: Is it really necessary to display the correlation between all parameters & biases here? I would suggest limiting the figure to parameters vs. observation biases.
L402f: Do the three observational datasets fit together well? If the observations do not fit together, the model won’t be able to satisfy all three constraints either. So the question here is: Does this difficulty really come from the model parameters, or from the observational datasets?
Fig. 4: Panel b: Why is one of the boxes pale?
I would not put the legend as subpanel f in the middle of the plot. I would remove the panel letter and put it at the beginning, end or completely on the right or left side of the plot.L415: The abbreviations for ESS, IAT and MCSE are all used only two times in the paper. I would just write them out.
Fig. 6: Panel e & f: There are dots in black and grey. What is the difference?
L455: R2 in the text is different to Fig. 6c.
L467ff: Do you trust the result from the glaciological mass balance here with more negative mass balance above 3325m? Could also be an artefact due to lack of observations above 3000m (L477)?
L469: Why is the simulated mass balance not getting more negative below 2550m? It should be warmer, less snow, … Where does this come from?
L474: Refer to Table 2
L475ff: If data above 3000 m is missing in 3 out of 8 years, how reliable is the WGMS curve in Fig. 7b? See comment to L467ff.
L480f: Are the correlations given somewhere in a figure or table? If not, it might be useful to add correlation and MAE to Table 2.
L483: Mention at the beginning of the section that you are comparing the COSIPY results from the posterior ensemble with the AWS-assisted ensemble, and that the latter is limited to the year 2003/04.
L493: Refer to Fig. 91
L494: Why is the glacier surface in May to June warmer than the air, making QH an energy sink? I would rather expect this in winter.
L512: Shouldn’t this be “overestimation of QLWnet”?
Discussion: I think, he manuscript would benefit from a comparison of the performance of the Bayesian calibration with other calibration approaches found in literature in the discussion. This way, it would be easier to understand if/how much improvement the presented method brings over more basic calibration schemes.
L536ff: It might be useful to repeat your posterior values for the comparison. At least, you should refer to Fig. A1 + the respective panel in all subsections so that the reader knows where to look up your values.
L560f: “This holds for many of our model parameters…” This is an important sentence, but more of a general discussion. Consider moving this sentence to the end of the section or only keep these considerations in section 4.3, where you also mention this issue (L636ff).
L572: Replace the word “worsened”, e.g. with “exacerbated” or “increased”
L650f: Where do we see this?
L690: Replace “too low” with “underestimated”
Citation: https://doi.org/10.5194/egusphere-2025-6249-RC2
Model code and software
Scripts used for the preparation of the manuscript titled "Towards the Bayesian calibration of a glacier surface energy balance model for unmonitored glaciers" Niklas Richter https://doi.org/10.5281/zenodo.17426864
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 1,177 | 2,405 | 98 | 3,680 | 291 | 101 | 101 |
- HTML: 1,177
- PDF: 2,405
- XML: 98
- Total: 3,680
- Supplement: 291
- BibTeX: 101
- EndNote: 101
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
In this study, the authors present the first Bayesian calibration framework for surface energy balance (SEB) modeling for unmonitored glaciers. The framework uses COSIPY calibrated against satellite-derived geodetic mass balance, transient snowline altitudes, and mean remote sensing glacier albedo. The authors employ a Latin Hypercube Sampling ensemble together with Markov Chain Monte Carlo (MCMC) simulations, supported by a novel surrogate model designed to emulate COSIPY outputs. Their analysis indicates that parameter equifinality primarily arises between the precipitation factor and albedo decay-related parameters. Model evaluation shows that the calibrated ensemble reproduces glacier-mean albedo and interannual mass-balance variability well, but shows a negative bias in mean annual mass balance and a delayed snowline rise.
Overall, this manuscript represents a very valuable contribution to the glacier modeling community. Bayesian calibration approaches for SEB modeling are long overdue, and I am pleased to see that the authors address this important and complex problem. The study represents an important first step toward more robust calibration of SEB models with uncertainty quantification and is highly relevant for advancing regional and global physics-based glacier modeling. The manuscript is very well written, the figures are clear and well presented, and the analysis is thorough. The authors also apply a wide range of carefully considered methods, analyses, and modeling tools. While Bayesian calibration itself is not my area of expertise, I am able to comment on the physics-based glacier modeling and machine learning components of the manuscript. There are some shortcomings that should be addressed before publication, but once these issues are resolved, I would support publication. Detailed comments and suggestions are provided below.
Major comments:
1) Geodetic mass balance calibration data
The main concern I have relates to how the geodetic mass balance data is used for calibration. In L 204–205, the authors state that the model is calibrated against the “annual average specific mass change rate” for the period 2000–2010 derived from the dataset of Hugonnet et al. (2021). The reported uncertainty is −1.0425 ± 0.261 m w.e. a⁻¹ (L 204). While the dataset of Hugonnet et al. (2021) indeed provides estimates for shorter subperiods than the full 2000–2019 interval, the signal-to-noise ratio decreases substantially as the aggregation period shortens. In practice, this means that for most glaciers, periods shorter than ~10 years are basically just noise. Most current global or regional glacier modeling studies use the Hugonnet dataset only as a single 2000–2019 average value (i.e., one data point per glacier), precisely because shorter time intervals tend to have low signal-to-noise ratios.
Are the authors using the 1-year subperiods for calibration? If so, I think it should be carefully checked whether these values can actually be used, since their uncertainties might be too large to provide meaningful information. One way to verify this would be to compare the geodetic mass balance rates with the uncertainties reported by Hugonnet et al. (2021) (which are also known to be potentially optimistic) for different aggregation periods (e.g., 1-year, 2-year, 5-year, and 10-year) and calculate the corresponding signal-to-noise ratios. For example, Table 2 suggests that the observed mass balance values (though from WGMS and Klug et al., 2018 and not from Hugonnet et al.) exceed the uncertainty range (−1.0425 ± 0.261 m w.e. a⁻¹) in four (?) out of eight years. What implications would the use of longer aggregation periods (e.g., 5-year or 10-year averages from Hugonnet et al., 2021) have for the Bayesian calibration framework? I am aware that the primary focus of the manuscript is on the framework for Bayesian calibration. However, it is still very important that the calibration data are used with a clear understanding of their limitations and uncertainties.
2) Climatic input data
The authors use COSMO as the climate forcing, produced as part of the CORDEX initiative with a grid spacing of 2.2 km. They state that these simulations are based on ERA-Interim (~80 km, L 156), which has been superseded by ERA5 (~31 km) for nearly a decade. ERA5 includes several updates compared to ERA-Interim that are relevant for glacier studies. Has the COSMO dataset been specifically evaluated over glacierized terrain? I am asking because regional climate model performance can be quite sensitive to the choice of physical parameterizations, and models optimized for other land surface types do not necessarily perform well in cryospheric environments.
I am also wondering why this dataset was chosen instead of, for example, ERA5-Land (~9 km) or ERA5 itself. Both have been shown to provide reliable forcing for SEB modeling studies, and higher spatial resolution does not necessarily translate into better forcing data. For example, Draeger et al. (2025) found that dynamical downscaling did not improve SEB model forcing for a sample of glaciers in Canada compared to ERA5, and that ERA5 even performed better than the higher-resolution ERA5-Land in their case. I also assume that the study period is limited to 1999–2010 (L155) because of the temporal availability of the COSMO dataset.
I am not suggesting that the authors redo the analysis with a different climate forcing dataset, but I do think it would be important to more clearly justify the choice of COSMO in this context, particularly since the manuscript puts a strong emphasis on uncertainties related to the forcing data in the discussion and conclusions.
3) Katabatic winds
In L170 the authors mention that a logarithmic wind profile is used. However, logarithmic profiles are known to poorly represent wind speeds under katabatic conditions and can lead to substantial underestimation of near-surface wind speeds over glaciers (L 598) For example, Shannon et al. (2019) addressed this issue by applying a calibration-dependent multiplicative factor (ranging from 1 to 4) to the 10 m wind speed from reanalysis data.
I wonder whether wind speeds during katabatic conditions might be systematically underestimated in the present setup, which would lead to underestimated turbulent fluxes. However, in L515 the authors mention an overestimation of turbulent fluxes in summer. Could the authors provide more detail on how this can be explained?
4) Machine learning surrogate model
Using a machine learning surrogate model to avoid the computationally expensive COSIPY simulations is an interesting and valuable contribution. However, since this component is a key part of the modeling chain, I think the manuscript would benefit from providing more details.
For example, I am wondering how a test set was constructed (not mentioned in L339). It would also be helpful to include some technical details about the machine learning models (this could be placed in the SI), such as the number of epochs, batch size, loss function, optimizer, and the number of neurons in each layer. It would also help the reader if the input features and model outputs were stated more clearly. It is partly implied in L337, but there is already a lot of information in that section, which makes it somewhat difficult to follow. I might have missed it, but it is also unclear to me why the forcing data (except precipitation) are kept constant (L344).
5) Clarity and readability
The analysis is quite complex and involves several methodological steps, which at times makes it difficult for the reader to follow the technical details. For example, the PCA analysis is only briefly introduced in L314, and for me there are not enough details to clearly understand how it was implemented. It is also not entirely clear to me how the effect of the observed data on individual parameter values is isolated from the K-means clusters (L319).
I recognize that the manuscript is already quite long and that some additional details are provided in the SI. Nevertheless, I would encourage the authors to include a few brief, intuitive explanations of the more complex parts of the analysis in the main text. This would make the methodology much more accessible, particularly for readers in the glaciology community who may not be familiar with all the statistical techniques used.
Minor comments:
L39: To set the stage, I would briefly introduce SEB models here. It would help to mention that they are more physically based than simpler approaches but require substantially more input variables. It may also be useful to clarify the role of calibration parameters in SEB models, which differs from TI models. Unlike TI models, where the melt factor is an empirical parameter inherent to the model itself, many calibration parameters in SEB models are used to account for limitations in the forcing data (e.g., downscaling or unresolved processes).
L75: It would be helpful to provide a bit more detail on the Bayesian calibration frameworks introduced by Rounce et al. (2020) and Sjursen et al. (2023, 2025), as these are the only existing applications of Bayesian calibration in glacier modeling so far.
L71: Typo
L83: I would clarify here how (and with which data) the simulations are evaluated. This is mentioned at the end of the paragraph below, but it might fit better directly after the three main points.
Figure 1: I believe the Copernicus WorldDEM-30 (https://doi.org/10.5270/ESA-c5d3d65) should also be cited in the references.
L156: I am a bit confused about the grid spacings mentioned. ERA-Interim has a resolution of ~79 km, but in L156 it is referred to as “12 km”. Could the authors please clarify this?
L169: Could you provide a few brief details on the analysis of Jennings et al. (2018) to clarify why it is used as a justification here?
L175: Could you please provide details on the “large spacing between vertically stored levels in the COSMO simulation at standard pressure levels”?
L176–179: I am not sure I fully follow this part. Does this mean that, in the end, all 3x3 grid cells were used to calculate lapse rates, regardless of land surface type?
L195: Please provide a short description of HORZYZON. Also, please cite the original publication, as required when using the code (Steger et al., 2022).
L 229: I might have missed it, but it is unclear to me how you distinguish between alpha_ice, alpha_firn, and alpha_snow based on the Landsat-based pixel data. How does this vary temporally and spatially?
L 254-258: Am I understanding correctly that “COSIPY model output” here does not only refer to mass balance, but also to other model outputs, such as the albedo simulated by the COSIPY albedo model (Eq. 5)? If so, the wording might be confusing.
L304: It would be helpful to already name LHS1 and LHS2 here (as defined in Table 1).
Figure 7: In the caption, it might improve readability to explicitly mention “ENS MEDIAN” and “95% CI”.
L 472: Typo
L510: Could you expand on what is meant by the “misrepresentation of surface albedo”? What do you think is the main issue here, for example, the simple decay-type albedo parameterization, limitations in the Landsat data (e.g., cloud cover leading to irregular observations or the 16-day revisit time), or something else?
L 542: Brackets appear twice
L545–L547: It might be useful to briefly mention here that glacier albedo has been observed to decrease with climate change (e.g., Williamson et al., 2025). This would also highlight that capturing interannual variability in the albedo simulations is important, not only intra-annual variability.
Section 4.3: I find this future work section very helpful and thorough.
L629: The mismatches between observed and simulated mass balance gradients are attributed here to linear lapse rates, but other factors could also contribute, so the wording may be somewhat misleading.
L640: I am wondering whether you applied any temporally-guided averaging of the Landsat observations (e.g., monthly or multi-month averages). This might help here.
L649: This is a very interesting observation: the “impact of parameter uncertainty is small compared to that of the forcing data.” I wonder whether there is a simple test or short analysis that could help quantify this effect.
References:
Draeger, C., Radić, V., White, R. H., and Tessema, M. A.: Evaluation of reanalysis data and dynamical downscaling for surface energy balance modeling at mountain glaciers in western Canada, The Cryosphere, 18, 17–42, https://doi.org/10.5194/tc-18-17-2024, 2024.
Steger, C. R., Steger, B. and Schär, C. (2022): HORAYZON v1.2: an efficient and flexible ray-tracing algorithm to compute horizon and sky view factor, Geosci. Model Dev., 15, 6817–6840, https://doi.org/10.5194/gmd-15-6817-2022
Shannon, S., Smith, R., Wiltshire, A., Payne, T., Huss, M., Betts, R., Caesar, J., Koutroulis, A., Jones, D., and Harrison, S. (2019). Global glacier volume projections under high-end climate change scenarios. The Cryosphere, 13(1):325–350.
Williamson, S.N., Marshall, S.J. & Menounos, B. Temperature mediated albedo decline portends acceleration of North American glacier mass loss. Commun Earth Environ 6, 555 (2025). https://doi.org/10.1038/s43247-025-02503-x