Machine learning improves seasonal mass balance prediction for unmonitored glaciers

Sjursen, Kamilla Hauknes; Bolibar, Jordi; van der Meer, Marijn; Andreassen, Liss Marie; Biesheuvel, Julian Peter; Dunse, Thorben; Huss, Matthias; Maussion, Fabien; Rounce, David R.; Tober, Brandon

doi:10.5194/egusphere-2025-1206

Preprints

https://doi.org/10.5194/egusphere-2025-1206

Preprints

31 Mar 2025

| 31 Mar 2025

Machine learning improves seasonal mass balance prediction for unmonitored glaciers

Kamilla Hauknes Sjursen, Jordi Bolibar, Marijn van der Meer, Liss Marie Andreassen, Julian Peter Biesheuvel, Thorben Dunse, Matthias Huss, Fabien Maussion, David R. Rounce, and Brandon Tober

Abstract. Glacier evolution models based on temperature-index approaches are commonly used to assess hydrological impacts of glacier changes. However, in large-scale applications, these models lack calibration frameworks that efficiently leverage sparse high-resolution observations, limiting their ability to resolve seasonal mass changes. Machine learning approaches can potentially address this limitation by learning relationships from sparse data that are transferable in space and time, including to unmonitored glaciers. Here, we present the Mass Balance Machine (MBM), a data-driven mass balance model based on the XGBoost architecture, designed to provide accurate and high spatio-temporal resolution regional-scale reconstructions of glacier mass balance. We trained and tested MBM using a dataset of approximately 4000 seasonal and annual point mass balance measurements from 32 glaciers across heterogeneous climate settings in mainland Norway, spanning from 1962 to 2021. To assess the advantage MBM's generalisation capabilities, we compared its predictions on independent test glaciers at various spatio-temporal scales with those of regional-scale simulations from three glacier evolution models. MBM successfully predicted annual and seasonal point mass balance on the test glaciers (RMSE of 0.59–1.00 m w.e. and bias of -0.01–0.04 m w.e.). On seasonal mass balance, MBM outperformed the other models across spatial scales, reducing RMSE by up to 46 % and 25 % on glacier-wide winter and summer mass balance, respectively. Our results demonstrate the capability of machine learning models to generalise across glaciers and climatic settings from relatively sparse mass balance data, highlighting their potential for a wide range of applications.

Received: 14 Mar 2025 – Discussion started: 31 Mar 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 4069 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (4069 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

17 Nov 2025

Machine learning improves seasonal mass balance prediction for unmonitored glaciers

Kamilla Hauknes Sjursen, Jordi Bolibar, Marijn van der Meer, Liss Marie Andreassen, Julian Peter Biesheuvel, Thorben Dunse, Matthias Huss, Fabien Maussion, David R. Rounce, and Brandon Tober

The Cryosphere, 19, 5801–5826, https://doi.org/10.5194/tc-19-5801-2025,https://doi.org/10.5194/tc-19-5801-2025, 2025

Short summary

Kamilla Hauknes Sjursen, Jordi Bolibar, Marijn van der Meer, Liss Marie Andreassen, Julian Peter Biesheuvel, Thorben Dunse, Matthias Huss, Fabien Maussion, David R. Rounce, and Brandon Tober

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1206', Anonymous Referee #1, 08 May 2025

The paper by Sjursen et al. introduces the Mass Balance Machine (MBM), a machine learning-based model build on XGBoost, to improve seasonal and annual glacier mass balance predictions across Norway. Using ~4000 in-situ seasonal and annual point measurements from 32 glaciers between 1962 and 2021, the authors demonstrate that the model can generalize well across unmonitored glaciers with diverse climatic settings. MBM outperforms traditional temperature-index glacier evolution models (GloGEM, OGGM, and PyGEM) particularly in predicting seasonal mass balance, reducing RMSE by up to 46% (winter) and 25% (summer). The model performance is robust across multiple spatial and temporal scales, showing strong potential for enhancing hydrological predictions and climate impact assessments in glacierized regions.
I think the MBM is a very promising addition to the traditional glacier evolution models. However, at first instance after reading the manuscript I was questioning to what extent the comparison between MBM and the other models is fair because they are based on different datasets (glaciological versus geodetic) that exhibit very different characteristics. See for instance the recent papers by the GlaMBIE team (2025) and Dussaillant (preprint) who compare and combine different mass balance data sources. It seems obvious that when comparing to data of type A (which model A is trained with) model A outperforms model B (which is calibrated with data of type B). I was wondering to what extent the authors are comparing models instead of differences between datasets.
Nevertheless, I believe that the fact that the MBM can be trained with the glaciological data and still predict mass balances for unseen glaciers is its key advantage compared to traditional models. I would recommend the authors to emphasize this more and not jump to “straight-forward” conclusions too fast (such as: the MBM is better at seasonal predictions. Yes, it is, but it is also the only model that has seen seasonal data). In addition, I would like to see more support for the selection of features and feature importance.
The manuscript is very well written, and the language is of a high standard. Occasionally, the readability is somewhat reduced by excessive sentence length and accumulation of complex terminology. This particularly applies to the introduction, see an example below. The analysis is well described, and the figures are of high quality.
All in all, I deem the manuscript fit for publication after a major revision. The suggested changes require minimal additional analyses and some textual considerations. Please consider the more detailed list of suggestions below.

Abstract
L9: To assess the advantage MBM’s generalization capabilities, --> To assess the advantage of MBM’s generalization capabilities,

1. Introduction
L39-42: Despite significant efforts … unmonitored glaciers. This is one such example of a rather long and complex sentence that reduces readability.
L67: the fact that point mass balance measurements from glaciological surveys consist of stake measurements hasn’t been introduced yet. It is a minor detail, but rewording “each individual stake” to e.g. “each individual mass balance stake” would improve clarity.
L72: The term ‘generalising’ has been used throughout the manuscript but at this point it was unclear to me what you mean by “Generalising from seasonal and annual point mass balance measurements”. I now understand that you refer to the generalisation of distributed measurements on different glaciers (spatially), but here the emphasize seems to be on the seasonal versus annual time scales.

2. Mass balance dataset and study area
L94: To reduce potential confusion regarding the numbers 4170 vs 3910/3929/3751, you may change “4170 stake locations” to “4170 unique stake locations”.

3. The Mass Balance Machine (MBM)
L116: including the design of an independent test dataset.

3.2 Model targets and features
Feature selection, collinearity, and feature importance: I support your choice of refraining from using climate derivates such as snow depth and snow cover, but I still wonder how you came to this exact choice of climate features. Sensible and latent heat fluxes also depend on other meteorological variables, such as temperature and humidity. Why didn’t you for instance use humidity directly? Have you assessed the collinearity within your feature space? I suggest to either include a collinearity assessment in your paper (appendix) or include a statement that this is not relevant or negligible depending on your findings. What made you decide to use net thermal radiation but downward solar radiation? Considering variables like the albedo makes sense based on physical relevance, but how meaningful is the albedo at the 9 km resolution of ERA5-land? In addition, I suspect many point measurements to be inside a single ERA5-land grid cell causing nearest neighbor interpolation to result in non-unique features.
Since you haven’t assessed feature importance in your study (or at least not presented in this manuscript), I suggest including more information on your reasoning and considerations in the selection of climate features. To my knowledge, XGBoost returns feature importance of variables, and it would be feasible to include this analysis in the paper.
L163-166: It is unclear to me how your model learns to predict monthly variability in mass balance. How can you be sure that the monthly predictions make sense? Since there is never any overlap in your seasonal mass balance measurements, couldn’t equifinality still play a role?

3.3 Model training and testing
While in L191-192 you state that “The performance evaluation of MBM on the test dataset thus reflects the model’s ability to predict mass balance on glaciers without mass balance observations”, you did make sure that the distribution of both targets and features in the train versus test dataset are similar. Is this fair? It is no surprise to me that your model can predict the mass balance on unseen glaciers ‘as long as they exist in the same distribution…’ In reality, you cannot be sure that the target of an unseen glacier fits into the distribution of targets in your training dataset, you could only know this for the features.
L215-216: If I understood correctly, the location of the stake measurements is not constant throughout the years (since you mention 4170 stake locations, but only up to 200 annual mass balance measurements per year). I assume that this reflects the displacement of a stake due to glacier flow? This usually being only a small displacement, I do not expect the topographic features to vary greatly through time. Therefore, by splitting the data in the 5-fold cross validation only based on time, I expect this to reduce the apparent importance of the topographic features. Have you considered this? Would this affect the hyperparameter tuning?
L231: how is the R2 metric computed? Why are you comparing four different metrics but not the MSE that was used in cross-validation?

4. Mass balance model comparison
L247: Unclear what “these glaciers” refers to: the whole test dataset, 11 of the 14 glaciers or the three glaciers referred to in brackets.
L252-L253: is the spatial resolution in table 2 the width/height of the elevation bands? I suggest referring to this more explicitly. From what I understand, GloGEM and OGGM use a fixed vertical spacing (elevation) while PyGEM uses a horizontal spacing (distance).
I am wondering to what extent the resolution of these elevation bands can explain the differences in performance of the different models. How does the point elevation at the mass balance stakes compare to eg the average elevation of the model elevation bands? For instance, if for whatever reason or by coincidence the stakes are typically located at the higher end of the elevation bands, this would explain the model underestimating the mass balance.
Table 2: Include Tcorr in the list of parameters for GloGEM and include the annotation ^e there. In caption: ^e only included if no match is found with other parameters within predefined bounds.

5.2 Model comparison on different spatio-temporal scales
L297-300: This sentence is confusing and the word ‘glacier-wide’ is often repeated. Glacier-wide mass balances are compared on different time scales. You evaluate glacier-wide predictions using seasonal and annual glacier-wide observations from glaciological records AND you evaluate decadal predictions using glacier-wide glaciological and geodetic observations. Reword to:
“Glacier-wide mass balances are compared in Sect. 5.2.3 on monthly to decadal time scales. We evaluate seasonal and annual predictions using observations from glaciological records (Kjollmoen et al., 2024), and decadal predictions using glaciological and geodetic (Andreassen et al., 2016, 2020; Hugonnet et al., 2021) observations.”
Figure 6: measured --> observed point mass balance
L330-331: In contrast to the glacier evolution models who exhibit too linear gradients, it seems that the MBM can predict unlikely variability in the gradients. See for instance the knickpoints at higher elevation in Figure 7a and c. These do not seem to correspond to the observations (there is no data point at this elevation). Can you explain the occurrence of such knickpoints?
Figure 7: the almost vertical lines in 7f demonstrate the equifinality issue with the glacier evolution models being calibrated with glacier-wide 20-year average geodetic data and no way of knowing whether there is a shallow or steep mass balance gradient.
L371-372: This is a fair point, but the opposite is also true. The predictions by MBM correspond better to the glaciological observations because they are trained using this data. Even though you test the model on unseen glaciers, you still train the model using data with similar variability, while the glacier evolution models are calibrated with a 20-year average and will never learn the interannual variability. This could be emphasized more.
L373-375: Please consider the uncertainties of the geodetic data. I suspect the over- or underestimation of the models to still be within the 95% confidence bound of the geodetic data.

Discussion
L394: How can you be sure that MBM effectively downscales the meteorological data instead of relying on the high-resolution topographic features? Is there any way to support this statement? A feature importance analysis may have provided more insights in this. Alternatively, although this is most probably not within the scope of this manuscript, one could have compared the performance of MBM with coarse meteorological data + elevation difference to already downscaled meteorological data. Or you could have explicitly learned the MBM to downscale climate data using some high-resolution climate variable as additional target. It may have been that elevation difference “appears” to be important because it is one of the few variables that are actually unique for each stake location. Without any support, I question whether you can make the statement that MBM effectively downscales. Especially with regards to Figure 11.
L414: I think it is important to distinguish between and not confuse two different assets of your model: 1) it can predict mass balances for unmonitored glaciers while the glacier evolution models need calibration data for every single glacier, and 2) it is trained with seasonal and annual data while the glacier evolution models were only provided on single 20-year average value. I think the first point is the big advantage of the MBM and this should be highlighted more, while the second point is an artifact of the first. Because regular models need data for every glacier it cannot be calibrated with the higher temporal resolution data because this is only available for a limited number of glaciers.
L451-453: It is unclear what you mean. How does the steep terrain influencing the tongue affect the more negative mass balance for steep and south-facing slopes?
L463: In my opinion, it is not necessary a bad thing to assess the capability of your model in ‘extrapolating’ to adjacent glaciers. It would be interesting to include a little more of your findings regarding the ability to extrapolate in relation to the distance away from the nearest ‘seen’ glacier.

Citation: https://doi.org/10.5194/egusphere-2025-1206-RC1
- AC1: 'Reply on RC1', Kamilla Hauknes Sjursen, 30 Jun 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1206/egusphere-2025-1206-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1206-AC1
RC2:
'Comment on egusphere-2025-1206', Brian Kyanjo, 19 May 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1206/egusphere-2025-1206-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-1206-RC2
- AC2: 'Reply on RC2', Kamilla Hauknes Sjursen, 30 Jun 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1206/egusphere-2025-1206-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1206-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1206', Anonymous Referee #1, 08 May 2025

The paper by Sjursen et al. introduces the Mass Balance Machine (MBM), a machine learning-based model build on XGBoost, to improve seasonal and annual glacier mass balance predictions across Norway. Using ~4000 in-situ seasonal and annual point measurements from 32 glaciers between 1962 and 2021, the authors demonstrate that the model can generalize well across unmonitored glaciers with diverse climatic settings. MBM outperforms traditional temperature-index glacier evolution models (GloGEM, OGGM, and PyGEM) particularly in predicting seasonal mass balance, reducing RMSE by up to 46% (winter) and 25% (summer). The model performance is robust across multiple spatial and temporal scales, showing strong potential for enhancing hydrological predictions and climate impact assessments in glacierized regions.
I think the MBM is a very promising addition to the traditional glacier evolution models. However, at first instance after reading the manuscript I was questioning to what extent the comparison between MBM and the other models is fair because they are based on different datasets (glaciological versus geodetic) that exhibit very different characteristics. See for instance the recent papers by the GlaMBIE team (2025) and Dussaillant (preprint) who compare and combine different mass balance data sources. It seems obvious that when comparing to data of type A (which model A is trained with) model A outperforms model B (which is calibrated with data of type B). I was wondering to what extent the authors are comparing models instead of differences between datasets.
Nevertheless, I believe that the fact that the MBM can be trained with the glaciological data and still predict mass balances for unseen glaciers is its key advantage compared to traditional models. I would recommend the authors to emphasize this more and not jump to “straight-forward” conclusions too fast (such as: the MBM is better at seasonal predictions. Yes, it is, but it is also the only model that has seen seasonal data). In addition, I would like to see more support for the selection of features and feature importance.
The manuscript is very well written, and the language is of a high standard. Occasionally, the readability is somewhat reduced by excessive sentence length and accumulation of complex terminology. This particularly applies to the introduction, see an example below. The analysis is well described, and the figures are of high quality.
All in all, I deem the manuscript fit for publication after a major revision. The suggested changes require minimal additional analyses and some textual considerations. Please consider the more detailed list of suggestions below.

Abstract
L9: To assess the advantage MBM’s generalization capabilities, --> To assess the advantage of MBM’s generalization capabilities,

1. Introduction
L39-42: Despite significant efforts … unmonitored glaciers. This is one such example of a rather long and complex sentence that reduces readability.
L67: the fact that point mass balance measurements from glaciological surveys consist of stake measurements hasn’t been introduced yet. It is a minor detail, but rewording “each individual stake” to e.g. “each individual mass balance stake” would improve clarity.
L72: The term ‘generalising’ has been used throughout the manuscript but at this point it was unclear to me what you mean by “Generalising from seasonal and annual point mass balance measurements”. I now understand that you refer to the generalisation of distributed measurements on different glaciers (spatially), but here the emphasize seems to be on the seasonal versus annual time scales.

2. Mass balance dataset and study area
L94: To reduce potential confusion regarding the numbers 4170 vs 3910/3929/3751, you may change “4170 stake locations” to “4170 unique stake locations”.

3. The Mass Balance Machine (MBM)
L116: including the design of an independent test dataset.

3.2 Model targets and features
Feature selection, collinearity, and feature importance: I support your choice of refraining from using climate derivates such as snow depth and snow cover, but I still wonder how you came to this exact choice of climate features. Sensible and latent heat fluxes also depend on other meteorological variables, such as temperature and humidity. Why didn’t you for instance use humidity directly? Have you assessed the collinearity within your feature space? I suggest to either include a collinearity assessment in your paper (appendix) or include a statement that this is not relevant or negligible depending on your findings. What made you decide to use net thermal radiation but downward solar radiation? Considering variables like the albedo makes sense based on physical relevance, but how meaningful is the albedo at the 9 km resolution of ERA5-land? In addition, I suspect many point measurements to be inside a single ERA5-land grid cell causing nearest neighbor interpolation to result in non-unique features.
Since you haven’t assessed feature importance in your study (or at least not presented in this manuscript), I suggest including more information on your reasoning and considerations in the selection of climate features. To my knowledge, XGBoost returns feature importance of variables, and it would be feasible to include this analysis in the paper.
L163-166: It is unclear to me how your model learns to predict monthly variability in mass balance. How can you be sure that the monthly predictions make sense? Since there is never any overlap in your seasonal mass balance measurements, couldn’t equifinality still play a role?

3.3 Model training and testing
While in L191-192 you state that “The performance evaluation of MBM on the test dataset thus reflects the model’s ability to predict mass balance on glaciers without mass balance observations”, you did make sure that the distribution of both targets and features in the train versus test dataset are similar. Is this fair? It is no surprise to me that your model can predict the mass balance on unseen glaciers ‘as long as they exist in the same distribution…’ In reality, you cannot be sure that the target of an unseen glacier fits into the distribution of targets in your training dataset, you could only know this for the features.
L215-216: If I understood correctly, the location of the stake measurements is not constant throughout the years (since you mention 4170 stake locations, but only up to 200 annual mass balance measurements per year). I assume that this reflects the displacement of a stake due to glacier flow? This usually being only a small displacement, I do not expect the topographic features to vary greatly through time. Therefore, by splitting the data in the 5-fold cross validation only based on time, I expect this to reduce the apparent importance of the topographic features. Have you considered this? Would this affect the hyperparameter tuning?
L231: how is the R2 metric computed? Why are you comparing four different metrics but not the MSE that was used in cross-validation?

4. Mass balance model comparison
L247: Unclear what “these glaciers” refers to: the whole test dataset, 11 of the 14 glaciers or the three glaciers referred to in brackets.
L252-L253: is the spatial resolution in table 2 the width/height of the elevation bands? I suggest referring to this more explicitly. From what I understand, GloGEM and OGGM use a fixed vertical spacing (elevation) while PyGEM uses a horizontal spacing (distance).
I am wondering to what extent the resolution of these elevation bands can explain the differences in performance of the different models. How does the point elevation at the mass balance stakes compare to eg the average elevation of the model elevation bands? For instance, if for whatever reason or by coincidence the stakes are typically located at the higher end of the elevation bands, this would explain the model underestimating the mass balance.
Table 2: Include Tcorr in the list of parameters for GloGEM and include the annotation ^e there. In caption: ^e only included if no match is found with other parameters within predefined bounds.

5.2 Model comparison on different spatio-temporal scales
L297-300: This sentence is confusing and the word ‘glacier-wide’ is often repeated. Glacier-wide mass balances are compared on different time scales. You evaluate glacier-wide predictions using seasonal and annual glacier-wide observations from glaciological records AND you evaluate decadal predictions using glacier-wide glaciological and geodetic observations. Reword to:
“Glacier-wide mass balances are compared in Sect. 5.2.3 on monthly to decadal time scales. We evaluate seasonal and annual predictions using observations from glaciological records (Kjollmoen et al., 2024), and decadal predictions using glaciological and geodetic (Andreassen et al., 2016, 2020; Hugonnet et al., 2021) observations.”
Figure 6: measured --> observed point mass balance
L330-331: In contrast to the glacier evolution models who exhibit too linear gradients, it seems that the MBM can predict unlikely variability in the gradients. See for instance the knickpoints at higher elevation in Figure 7a and c. These do not seem to correspond to the observations (there is no data point at this elevation). Can you explain the occurrence of such knickpoints?
Figure 7: the almost vertical lines in 7f demonstrate the equifinality issue with the glacier evolution models being calibrated with glacier-wide 20-year average geodetic data and no way of knowing whether there is a shallow or steep mass balance gradient.
L371-372: This is a fair point, but the opposite is also true. The predictions by MBM correspond better to the glaciological observations because they are trained using this data. Even though you test the model on unseen glaciers, you still train the model using data with similar variability, while the glacier evolution models are calibrated with a 20-year average and will never learn the interannual variability. This could be emphasized more.
L373-375: Please consider the uncertainties of the geodetic data. I suspect the over- or underestimation of the models to still be within the 95% confidence bound of the geodetic data.

Discussion
L394: How can you be sure that MBM effectively downscales the meteorological data instead of relying on the high-resolution topographic features? Is there any way to support this statement? A feature importance analysis may have provided more insights in this. Alternatively, although this is most probably not within the scope of this manuscript, one could have compared the performance of MBM with coarse meteorological data + elevation difference to already downscaled meteorological data. Or you could have explicitly learned the MBM to downscale climate data using some high-resolution climate variable as additional target. It may have been that elevation difference “appears” to be important because it is one of the few variables that are actually unique for each stake location. Without any support, I question whether you can make the statement that MBM effectively downscales. Especially with regards to Figure 11.
L414: I think it is important to distinguish between and not confuse two different assets of your model: 1) it can predict mass balances for unmonitored glaciers while the glacier evolution models need calibration data for every single glacier, and 2) it is trained with seasonal and annual data while the glacier evolution models were only provided on single 20-year average value. I think the first point is the big advantage of the MBM and this should be highlighted more, while the second point is an artifact of the first. Because regular models need data for every glacier it cannot be calibrated with the higher temporal resolution data because this is only available for a limited number of glaciers.
L451-453: It is unclear what you mean. How does the steep terrain influencing the tongue affect the more negative mass balance for steep and south-facing slopes?
L463: In my opinion, it is not necessary a bad thing to assess the capability of your model in ‘extrapolating’ to adjacent glaciers. It would be interesting to include a little more of your findings regarding the ability to extrapolate in relation to the distance away from the nearest ‘seen’ glacier.

Citation: https://doi.org/10.5194/egusphere-2025-1206-RC1
- AC1: 'Reply on RC1', Kamilla Hauknes Sjursen, 30 Jun 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1206/egusphere-2025-1206-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1206-AC1
RC2:
'Comment on egusphere-2025-1206', Brian Kyanjo, 19 May 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1206/egusphere-2025-1206-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-1206-RC2
- AC2: 'Reply on RC2', Kamilla Hauknes Sjursen, 30 Jun 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1206/egusphere-2025-1206-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1206-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (30 Jun 2025) by Gong Cheng

AR by Kamilla Hauknes Sjursen on behalf of the Authors (11 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (22 Aug 2025) by Gong Cheng

RR by Anonymous Referee #1 (05 Sep 2025)

Suggestions for revision or reasons for rejection

Following the first round of reviews, the authors improved the readability of several sections of text, clarified their reasoning, moderated strong claims, added a feature importance analysis (appendix C) and refined the description of their data cleaning and selection approach (appendix A, B). Several other suggestions were addressed in the responses to the reviewers.

The feature importance analysis is a welcome addition to the manuscript. I accept the work as is but would like to raise one point for consideration in future research. In L670 you note that correlation can cause some features to appear less important when assessed with permutation importance. However, in the case of SVF, permuting this feature actually decreases the MSE across all months, which implies that removing it could improve model performance. This apparent contradiction — improved performance when SVF is permuted (Fig. C2) despite its relatively high weight (Fig. C1) — may suggest overfitting. Because XGBoost does not automatically exclude weak predictors, all variables remain available for splitting. A feature with little or no true relationship to the target can still appear frequently in the trees, particularly if the training setup is not strongly regularized. This can artificially inflate its weight. It may therefore be informative to compare MBM performance on the training and test sets to assess whether overfitting is happening, and whether additional feature optimization could mitigate it. While in this case study the effect seems minimal, it could become more pronounced in other regions or applications.

All in all, I deem this version of the manuscript fit for publication.

Below some final minor textual corrections that may be considered:
- L 43: reword “alleviated the lack of” to “alleviated the shortage of”
- L59: There is some repetition here with the addition of the new text. I suggest removing “including to unsurveyed glaciers and years” from L55 as this is now explained in more detail in the following sentences.
- L302: I suggest moving “respectively” earlier in the sentence, as it is not clear whether you are referring to point/elevation-band mass balance vs. mass balance gradients, or seasonal vs. annual.
- L406: The addition of the feature importance discussion and the more cautious statement regarding MBM downscaling are great. Minor comment: the wording “The key to this ability” is somewhat confusing and could be rephrased.

Hide

ED: Publish subject to technical corrections (22 Sep 2025) by Gong Cheng

AR by Kamilla Hauknes Sjursen on behalf of the Authors (29 Sep 2025) Author's response Manuscript

Journal article(s) based on this preprint

17 Nov 2025

Machine learning improves seasonal mass balance prediction for unmonitored glaciers

Kamilla Hauknes Sjursen, Jordi Bolibar, Marijn van der Meer, Liss Marie Andreassen, Julian Peter Biesheuvel, Thorben Dunse, Matthias Huss, Fabien Maussion, David R. Rounce, and Brandon Tober

The Cryosphere, 19, 5801–5826, https://doi.org/10.5194/tc-19-5801-2025,https://doi.org/10.5194/tc-19-5801-2025, 2025

Short summary

Kamilla Hauknes Sjursen, Jordi Bolibar, Marijn van der Meer, Liss Marie Andreassen, Julian Peter Biesheuvel, Thorben Dunse, Matthias Huss, Fabien Maussion, David R. Rounce, and Brandon Tober

Viewed

Total article views: 1,506 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,325	154	27	1,506	23	32

HTML: 1,325
PDF: 154
XML: 27
Total: 1,506
BibTeX: 23
EndNote: 32

Views and downloads (calculated since 31 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	36	5	0	41
Apr 2025	251	40	7	298
May 2025	87	24	4	115
Jun 2025	80	15	3	98
Jul 2025	86	28	5	119
Aug 2025	187	11	2	200
Sep 2025	493	11	3	507
Oct 2025	72	18	3	93
Nov 2025	33	2	0	35

Cumulative views and downloads (calculated since 31 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	36	5	0	41
Apr 2025	251	40	7	298
May 2025	87	24	4	115
Jun 2025	80	15	3	98
Jul 2025	86	28	5	119
Aug 2025	187	11	2	200
Sep 2025	493	11	3	507
Oct 2025	72	18	3	93
Nov 2025	33	2	0	35

Viewed (geographical distribution)

Total article views: 1,461 (including HTML, PDF, and XML) Thereof 1,461 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 17 Nov 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (4069 KB)
Metadata XML

Short summary

Understanding glacier mass changes is crucial for assessing freshwater availability in many regions of the world. We present the Mass Balance Machine, a machine learning model that learns from sparse measurements of glacier mass change to make predictions on unmonitored glaciers. Using data from Norway, we show that the model provides accurate estimates of mass changes at different spatiotemporal scales. Our findings show that machine learning can be a valuable tool to improve such predictions.


Total:	0
HTML:	0
PDF:	0
XML:	0