the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine learning improves seasonal mass balance prediction for unmonitored glaciers
Abstract. Glacier evolution models based on temperature-index approaches are commonly used to assess hydrological impacts of glacier changes. However, in large-scale applications, these models lack calibration frameworks that efficiently leverage sparse high-resolution observations, limiting their ability to resolve seasonal mass changes. Machine learning approaches can potentially address this limitation by learning relationships from sparse data that are transferable in space and time, including to unmonitored glaciers. Here, we present the Mass Balance Machine (MBM), a data-driven mass balance model based on the XGBoost architecture, designed to provide accurate and high spatio-temporal resolution regional-scale reconstructions of glacier mass balance. We trained and tested MBM using a dataset of approximately 4000 seasonal and annual point mass balance measurements from 32 glaciers across heterogeneous climate settings in mainland Norway, spanning from 1962 to 2021. To assess the advantage MBM's generalisation capabilities, we compared its predictions on independent test glaciers at various spatio-temporal scales with those of regional-scale simulations from three glacier evolution models. MBM successfully predicted annual and seasonal point mass balance on the test glaciers (RMSE of 0.59–1.00 m w.e. and bias of -0.01–0.04 m w.e.). On seasonal mass balance, MBM outperformed the other models across spatial scales, reducing RMSE by up to 46 % and 25 % on glacier-wide winter and summer mass balance, respectively. Our results demonstrate the capability of machine learning models to generalise across glaciers and climatic settings from relatively sparse mass balance data, highlighting their potential for a wide range of applications.
- Preprint
(4069 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-1206', Anonymous Referee #1, 08 May 2025
The paper by Sjursen et al. introduces the Mass Balance Machine (MBM), a machine learning-based model build on XGBoost, to improve seasonal and annual glacier mass balance predictions across Norway. Using ~4000 in-situ seasonal and annual point measurements from 32 glaciers between 1962 and 2021, the authors demonstrate that the model can generalize well across unmonitored glaciers with diverse climatic settings. MBM outperforms traditional temperature-index glacier evolution models (GloGEM, OGGM, and PyGEM) particularly in predicting seasonal mass balance, reducing RMSE by up to 46% (winter) and 25% (summer). The model performance is robust across multiple spatial and temporal scales, showing strong potential for enhancing hydrological predictions and climate impact assessments in glacierized regions.
I think the MBM is a very promising addition to the traditional glacier evolution models. However, at first instance after reading the manuscript I was questioning to what extent the comparison between MBM and the other models is fair because they are based on different datasets (glaciological versus geodetic) that exhibit very different characteristics. See for instance the recent papers by the GlaMBIE team (2025) and Dussaillant (preprint) who compare and combine different mass balance data sources. It seems obvious that when comparing to data of type A (which model A is trained with) model A outperforms model B (which is calibrated with data of type B). I was wondering to what extent the authors are comparing models instead of differences between datasets.
Nevertheless, I believe that the fact that the MBM can be trained with the glaciological data and still predict mass balances for unseen glaciers is its key advantage compared to traditional models. I would recommend the authors to emphasize this more and not jump to “straight-forward” conclusions too fast (such as: the MBM is better at seasonal predictions. Yes, it is, but it is also the only model that has seen seasonal data). In addition, I would like to see more support for the selection of features and feature importance.
The manuscript is very well written, and the language is of a high standard. Occasionally, the readability is somewhat reduced by excessive sentence length and accumulation of complex terminology. This particularly applies to the introduction, see an example below. The analysis is well described, and the figures are of high quality.
All in all, I deem the manuscript fit for publication after a major revision. The suggested changes require minimal additional analyses and some textual considerations. Please consider the more detailed list of suggestions below.
Abstract
L9: To assess the advantage MBM’s generalization capabilities, --> To assess the advantage of MBM’s generalization capabilities,
1. Introduction
L39-42: Despite significant efforts … unmonitored glaciers. This is one such example of a rather long and complex sentence that reduces readability.
L67: the fact that point mass balance measurements from glaciological surveys consist of stake measurements hasn’t been introduced yet. It is a minor detail, but rewording “each individual stake” to e.g. “each individual mass balance stake” would improve clarity.
L72: The term ‘generalising’ has been used throughout the manuscript but at this point it was unclear to me what you mean by “Generalising from seasonal and annual point mass balance measurements”. I now understand that you refer to the generalisation of distributed measurements on different glaciers (spatially), but here the emphasize seems to be on the seasonal versus annual time scales.
2. Mass balance dataset and study area
L94: To reduce potential confusion regarding the numbers 4170 vs 3910/3929/3751, you may change “4170 stake locations” to “4170 unique stake locations”.
3. The Mass Balance Machine (MBM)
L116: including the design of an independent test dataset.
3.2 Model targets and features
Feature selection, collinearity, and feature importance: I support your choice of refraining from using climate derivates such as snow depth and snow cover, but I still wonder how you came to this exact choice of climate features. Sensible and latent heat fluxes also depend on other meteorological variables, such as temperature and humidity. Why didn’t you for instance use humidity directly? Have you assessed the collinearity within your feature space? I suggest to either include a collinearity assessment in your paper (appendix) or include a statement that this is not relevant or negligible depending on your findings. What made you decide to use net thermal radiation but downward solar radiation? Considering variables like the albedo makes sense based on physical relevance, but how meaningful is the albedo at the 9 km resolution of ERA5-land? In addition, I suspect many point measurements to be inside a single ERA5-land grid cell causing nearest neighbor interpolation to result in non-unique features.
Since you haven’t assessed feature importance in your study (or at least not presented in this manuscript), I suggest including more information on your reasoning and considerations in the selection of climate features. To my knowledge, XGBoost returns feature importance of variables, and it would be feasible to include this analysis in the paper.
L163-166: It is unclear to me how your model learns to predict monthly variability in mass balance. How can you be sure that the monthly predictions make sense? Since there is never any overlap in your seasonal mass balance measurements, couldn’t equifinality still play a role?
3.3 Model training and testing
While in L191-192 you state that “The performance evaluation of MBM on the test dataset thus reflects the model’s ability to predict mass balance on glaciers without mass balance observations”, you did make sure that the distribution of both targets and features in the train versus test dataset are similar. Is this fair? It is no surprise to me that your model can predict the mass balance on unseen glaciers ‘as long as they exist in the same distribution…’ In reality, you cannot be sure that the target of an unseen glacier fits into the distribution of targets in your training dataset, you could only know this for the features.
L215-216: If I understood correctly, the location of the stake measurements is not constant throughout the years (since you mention 4170 stake locations, but only up to 200 annual mass balance measurements per year). I assume that this reflects the displacement of a stake due to glacier flow? This usually being only a small displacement, I do not expect the topographic features to vary greatly through time. Therefore, by splitting the data in the 5-fold cross validation only based on time, I expect this to reduce the apparent importance of the topographic features. Have you considered this? Would this affect the hyperparameter tuning?
L231: how is the R2 metric computed? Why are you comparing four different metrics but not the MSE that was used in cross-validation?
4. Mass balance model comparison
L247: Unclear what “these glaciers” refers to: the whole test dataset, 11 of the 14 glaciers or the three glaciers referred to in brackets.
L252-L253: is the spatial resolution in table 2 the width/height of the elevation bands? I suggest referring to this more explicitly. From what I understand, GloGEM and OGGM use a fixed vertical spacing (elevation) while PyGEM uses a horizontal spacing (distance).
I am wondering to what extent the resolution of these elevation bands can explain the differences in performance of the different models. How does the point elevation at the mass balance stakes compare to eg the average elevation of the model elevation bands? For instance, if for whatever reason or by coincidence the stakes are typically located at the higher end of the elevation bands, this would explain the model underestimating the mass balance.
Table 2: Include Tcorr in the list of parameters for GloGEM and include the annotation e there. In caption: e only included if no match is found with other parameters within predefined bounds.
5.2 Model comparison on different spatio-temporal scales
L297-300: This sentence is confusing and the word ‘glacier-wide’ is often repeated. Glacier-wide mass balances are compared on different time scales. You evaluate glacier-wide predictions using seasonal and annual glacier-wide observations from glaciological records AND you evaluate decadal predictions using glacier-wide glaciological and geodetic observations. Reword to:
“Glacier-wide mass balances are compared in Sect. 5.2.3 on monthly to decadal time scales. We evaluate seasonal and annual predictions using observations from glaciological records (Kjollmoen et al., 2024), and decadal predictions using glaciological and geodetic (Andreassen et al., 2016, 2020; Hugonnet et al., 2021) observations.”
Figure 6: measured --> observed point mass balance
L330-331: In contrast to the glacier evolution models who exhibit too linear gradients, it seems that the MBM can predict unlikely variability in the gradients. See for instance the knickpoints at higher elevation in Figure 7a and c. These do not seem to correspond to the observations (there is no data point at this elevation). Can you explain the occurrence of such knickpoints?
Figure 7: the almost vertical lines in 7f demonstrate the equifinality issue with the glacier evolution models being calibrated with glacier-wide 20-year average geodetic data and no way of knowing whether there is a shallow or steep mass balance gradient.
L371-372: This is a fair point, but the opposite is also true. The predictions by MBM correspond better to the glaciological observations because they are trained using this data. Even though you test the model on unseen glaciers, you still train the model using data with similar variability, while the glacier evolution models are calibrated with a 20-year average and will never learn the interannual variability. This could be emphasized more.
L373-375: Please consider the uncertainties of the geodetic data. I suspect the over- or underestimation of the models to still be within the 95% confidence bound of the geodetic data.
Discussion
L394: How can you be sure that MBM effectively downscales the meteorological data instead of relying on the high-resolution topographic features? Is there any way to support this statement? A feature importance analysis may have provided more insights in this. Alternatively, although this is most probably not within the scope of this manuscript, one could have compared the performance of MBM with coarse meteorological data + elevation difference to already downscaled meteorological data. Or you could have explicitly learned the MBM to downscale climate data using some high-resolution climate variable as additional target. It may have been that elevation difference “appears” to be important because it is one of the few variables that are actually unique for each stake location. Without any support, I question whether you can make the statement that MBM effectively downscales. Especially with regards to Figure 11.
L414: I think it is important to distinguish between and not confuse two different assets of your model: 1) it can predict mass balances for unmonitored glaciers while the glacier evolution models need calibration data for every single glacier, and 2) it is trained with seasonal and annual data while the glacier evolution models were only provided on single 20-year average value. I think the first point is the big advantage of the MBM and this should be highlighted more, while the second point is an artifact of the first. Because regular models need data for every glacier it cannot be calibrated with the higher temporal resolution data because this is only available for a limited number of glaciers.
L451-453: It is unclear what you mean. How does the steep terrain influencing the tongue affect the more negative mass balance for steep and south-facing slopes?
L463: In my opinion, it is not necessary a bad thing to assess the capability of your model in ‘extrapolating’ to adjacent glaciers. It would be interesting to include a little more of your findings regarding the ability to extrapolate in relation to the distance away from the nearest ‘seen’ glacier.
Citation: https://doi.org/10.5194/egusphere-2025-1206-RC1 -
AC1: 'Reply on RC1', Kamilla Hauknes Sjursen, 30 Jun 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1206/egusphere-2025-1206-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Kamilla Hauknes Sjursen, 30 Jun 2025
-
RC2: 'Comment on egusphere-2025-1206', Brian Kyanjo, 19 May 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1206/egusphere-2025-1206-RC2-supplement.pdf
-
AC2: 'Reply on RC2', Kamilla Hauknes Sjursen, 30 Jun 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1206/egusphere-2025-1206-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Kamilla Hauknes Sjursen, 30 Jun 2025
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
875 | 130 | 21 | 1,026 | 17 | 25 |
- HTML: 875
- PDF: 130
- XML: 21
- Total: 1,026
- BibTeX: 17
- EndNote: 25
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1