the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Modelling glacier-wide annual mass balance of continental-type glaciers in China using a deep neural network
Abstract. Glacier mass balance is crucial for climate and hydrological research. Although data-driven techniques have advanced mass balance estimation, their reliance on comprehensive and reliable training datasets still limits their practical application. In this study, a lightweight feed-forward fully connected neural network (FF-FCNN) was developed to simulate glacier-wide annual mass balance using multi-temporal meteorological variables from ERA5-Land, MODIS-derived summer mean albedo, and topographical attributes from ASTGTM_003 as input features, with 180 glaciological observations from ten continental-type glaciers in China as reference data. To mitigate overfitting in the “small-sample, high-dimensional” scenario, key meteorological variables were selected using the Pearson correlation analysis combined with the Random Forest (RF) algorithm, and several strategies including Gaussian noise injection, L1 regularization, and early stopping were incorporated into the model architecture. Two training dataset construction strategies were evaluated to address temporal inconsistencies in albedo data, and both results demonstrated that the FF-FCNN effectively avoids overfitting and maintains stable and reliable performance. Under the reduced-sample strategy, the FF-FCNN significantly outperformed the Random Forest model (R² = 0.82, RMSE = 0.19 m w.e., MAE = 0.15 m w.e.). Spatial and temporal cross-validations further confirmed the robustness and generalization capability of the proposed model. Although the dynamic loss-based weighting strategy enhanced the model’s ability to capture pronounced interannual variability in glacier mass balance, reproducing extreme values remains challenging under severely limited sample conditions. Overall, the proposed framework provides a feasible pathway for estimating regional glacier mass balance in high-altitude and cold regions where observations are scarce.
- Preprint
(2556 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2026-333', Marijn van der Meer, 23 Mar 2026
SummaryThis study develops a lightweight feed-forward fully connected neural network (FF-FCNN) to simulate glacier-wide annual mass balance for ten continental-type glaciers in China. The model is trained on ERA5-Land meteorological variables, MODIS-derived summer-mean albedo, and topographic attributes, using 180 glaciological observations as reference data. Feature selection is performed using Pearson correlation analysis combined with Random Forest importance ranking, and the model is evaluated against a Random Forest benchmark using random, spatial, and temporal cross-validation schemes.Writing qualityThe manuscript is generally readable but would benefit from careful editing. Several acronyms are defined more than once, and a number of passages repeat information already stated earlier in the text. Some material also appears in sections where it does not belong, as flagged in the commented text, and would read more naturally if reorganised.Minor concern: data scarcity in High Mountain AsiaThe authors are tackling a genuinely difficult problem. Glacier mass balance observations in High Mountain Asia are extremely sparse, discontinuous, and unevenly distributed in both space and time, and the need for scalable modeling approaches is real and well-motivated. However, the very scarcity of data that motivates this study also fundamentally limits what a purely data-driven approach can achieve here. With only 109–180 samples spanning ten glaciers across highly heterogeneous climatic settings, a deep learning model faces an inherently ill-constrained problem. The authors invest considerable effort in mitigating overfitting through architectural choices, but the core issue is that the available data may simply be insufficient to reliably train and evaluate a model of this complexity. Alternative approaches, such as physically-constrained models, transfer learning from better-observed regions, or hybrid physical-statistical frameworks, may be better suited to this data regime, and the authors should more explicitly acknowledge and engage with this fundamental limitation rather than treating it primarily as an engineering problem to be solved through regularization.Minor concern: missing reference to the Mass Balance Machine frameworkIn the discussion, the authors claim that their approach is novel in training a machine learning model with energy-balance-type variables (radiation, turbulent fluxes, albedo), contrasting it with prior work they characterise as relying primarily on temperature and precipitation. However, this claim overlooks the Mass Balance Machine framework (Sjursen et al. 2025, van der Meer et al. 2026), which already adopts a broadly comparable philosophy of driving a data-driven model with generally the same variables drawn from the energy balance rather than simple temperature-index approaches. This work should be cited and discussed in the introduction, where the landscape of existing data-driven mass balance models is reviewed. As it stands, the authors overstate the novelty of their input feature design, and readers familiar with the literature will notice the omission. The discussion should be revised to situate the FF-FCNN more accurately relative to existing approaches that similarly go beyond temperature-index inputs.Major concern: data leakage in feature selectionWang & Zhang (2026) evaluate their FF-FCNN model using cross-validation, but feature selection was performed on the full dataset prior to any cross-validation split. Specifically, both the Pearson correlation filtering and the Random Forest importance ranking, which together reduced 271 candidate variables to a final set of 20 meteorological predictors, were conducted using all available samples, including those later designated as validation data in each cross-validation fold.This constitutes data leakage: the features chosen for model training were implicitly selected based on information from the validation folds, meaning the cross-validation no longer provides a truly independent assessment of generalization performance. The issue is compounded by the fact that the Random Forest used for importance ranking is itself a learned model, capable of capturing nonlinear patterns across the full dataset rather than just linear associations as in the Pearson step.Furthermore, the authors evaluate two dataset construction strategies and select the better-performing one (the 109-sample reduced strategy) for all subsequent analysis. Without a fully held-out test set that is untouched by any model or strategy selection decision, this introduces an additional layer of implicit optimization; the reported results reflect the best outcome across tested configurations rather than an unbiased estimate of model performance. Combined with the feature selection leakage, this suggests the model is likely still overfitting to the available data despite the anti-overfitting measures incorporated into the architecture, and that the reported metrics meaningfully overestimate true generalization performance.The correct approach would be to nest the entire feature selection pipeline inside the cross-validation and to evaluate all strategic choices on a fully independent holdout set. As implemented, the reported performance metrics are likely optimistic, though the degree of inflation is difficult to quantify without re-running the analysis with a properly nested procedure.Recommendation: major revisionThe data leakage issue identified above is not a minor methodological detail; it undermines the validity of the paper's central results. Because the reported performance metrics cannot be taken at face value, the entire analysis would need to be redone with a properly nested feature selection and cross-validation procedure before the paper's claims can be assessed. Combined with the concerns about the missing literature context and the fundamental limitations of applying deep learning in this data-scarce regime, I recommend a major revision.Citation: https://doi.org/
10.5194/egusphere-2026-333-RC1 -
RC2: 'Comment on egusphere-2026-333', Anonymous Referee #2, 07 Apr 2026
General comments
This manuscript describes the use of a neural network to model glacier-wide mass balance at the annual scale. It considers various inputs including topographical features, MODIS albedo and meteorological features at the monthly scale through ERA5 Land. Given the very limited glaciological data, a feature selection analysis is applied to reduce the number of input features. The modeled glaciers are continental glaciers in China.
The contribution, although not novel in terms of approach, is timely as the community tries to develop machine learning based models that can learn correlations from proxy variables that represent the glacier conditions. The challenge is in this setup is to make a machine learning model work and validate its performance in a super data scarce regime.
The paper reads well and the reported results are promising. My main concern is the way the model is validated. In machine leaning we typically divide the dataset into three subsets: train, validation and test sets. The test set should be used only to report the performance on a truly independent dataset and make comparisons. According to the manuscript this is not the case (see specific comments below). One other main limitation is the absence of comparison with other conventional models which are essential to convince the glaciological community that these machine learning based models can properly capture the mass balance signal.
According to me, given these two flaws, and the review criteria of The Cryosphere, this does not make the contribution robust enough.
Specific comments
It would be helpful for the reader to know how the glacier-wide mass balance measurements were acquired. One can guess this comes from glaciological measurements, extrapolated spatially with the glacier hypsometry but this is not clearly stated in the manuscript.
From the feature selection study it seems that the meteorological features of the accumulation period contribute marginally to the annual mass balance in comparison to the ablation ones (Fig 3). This makes sense since most of the annual measurements are negative (Fig 2). A study of the performance for positive annual mass balance would improve the confidence in the learned model, for example by including some Karakoram glaciers.
Taking as input features the longitude and latitude is questionable (L212). The whole point is to approximate the glacier-wide annual mass balance with a statistical regressor based on topographical, meteorological and albedo information. Why do the authors include the location of the glaciers? At best this does nothing, at worst this encodes spatial information into the model which can lead to overfitting.
The size of the neural network seems very large given the limited amount of data. Depending if the authors use bias or not (not stated in the manuscript), the number of parameters is around 2100 or 2200. In contrast there are 27 predictors and only 109 samples in the reduced dataset. This learning problem is prone to overfitting and special attention should be paid to the validation of the model. They perform a cross validation which is the way to go. However, a grid search is also performed to tune the model architecture along with other hyper-parameters and they converge to the 40, 20, 10, 5 neurons combination. According to the test performance metrics from Figure 6 which are the same as the ones reported in Figure 5 (validation metrics), there is no independent test set that was kept aside from this grid search. From a statistical point of view, performing a grid search on the same data used for the test is equivalent to using the test samples for training. The direct implication is that the reported performance values are optimistic and a more robust test should be performed. If the authors have the opportunity to revise their work I would strongly recommend to keep some glaciers aside in a test set that is used only for comparison and not in the cross validation, nor in the grid search.
The performance study across different years of section 3.3.2 and the effort of the authors to explain the conditions under which the model performs poorly is appreciated. The approach is interesting in terms of machine learning as this is at the limit of what ML can support given the very limited amount of data. Providing more details on how overfitting was mitigated (e.g. regularization strategies L232) would be relevant for this work.
However, a comparison with other mass balance models would also be welcome to assess the advantages. Especially since the proposed approach stills relies on the outputs of energy-balance models (ERA5) for some of the variables (see technical corrections). I would recommend at the very least to compare the approach with a temperature-index model.Finally the positioning of this work is ambiguous. It claims to be the first to develop a machine learning model that uses “more sophisticated energy-mass balance models” and that “this approach significantly improves predictive performance”. For the first claim, [3] already developed a similar approach, although point-wise and at the monthly scale, with an extensive comparison to other models. The second claim is simply not supported at all by any comparison.
Technical corrections:
- L44: “they capture only discrete temporal snapshots and are therefore limited in resolving the continuous evolution of glaciers”: Not true, technically many remote sensing methods could provide data at a frequency higher than annual; for example [1] (the Hugonnet et al. 2021 dataset) rely on a continuous time series through interpolation, CryoSat-2 provides monthly revisits, and ICESat-2 has a 91 days repeat cycle [2].
- L152: Snow evaporation, snow density and temperature of snow layer are all the outputs of a re-analysis model which incorporates snow packing and energy model components. I would make it clear since the authors make the distinction with energy-balance models (L57-59) but their model still relies on these energy-balance models.
- Imprecision L220: The optimizer is not part of the neural network, this is an algorithm to solve the learning problem.
- L230: What is the search space of the grid search? This should be given for reproducibility.
- L245: Vague statement “improving computational efficiency and training stability”: remove computational efficiency.
- L246: A reference should be given about the “loss-based dynamic weighting strategy”, or it should be explained. This seems like an important component of the approach but it is not explained.
- Fig 5a and 5b: A semilogy plot would be better to assess the convergence and the absence of overfitting.
- L336 “Although the introduction of a dynamic loss-based weighting strategy enhances [...], the improvement remains limited under extremely small sample conditions.”: This is not clear where the authors want to go with this. This is in contradiction with L246. If in the end it is not something that helps the training, I would suggest removing any mention to that dynamic loss-based weighting.
- L355 on AUC: A reference would be helpful for non statistician glaciologists.
- L431-434: The proposed work is not novel in that sense. For example [3] already developed a ML model that uses “more sophisticated energy-mass balance models”. They did a complete comparison with other mass balance models.
- L434 “This approach significantly improves predictive performance”: This is not supported by any experiment of the manuscript as no comparison at all was performed.
References
[1] Hugonnet, R., McNabb, R., Berthier, E., Menounos, B., Nuth, C., Girod, L., ... & Kääb, A. (2021). Accelerated global glacier mass loss in the early twenty-first century. Nature, 592(7856), 726-731.
[2] Berthier, E., Floriciou, D., Gardner, A. S., Gourmelen, N., Jakob, L., Paul, F., ... & Zemp, M. (2023). Measuring glacier mass changes from space—a review. Reports on Progress in Physics, 86(3), 036801.
[3] Sjursen, K. H., Bolibar, J., Van Der Meer, M., Andreassen, L. M., Biesheuvel, J. P., Dunse, T., ... & Tober, B. (2025). Machine learning improves seasonal mass balance prediction for unmonitored glaciers. The Cryosphere, 19(11), 5801-5826.Citation: https://doi.org/10.5194/egusphere-2026-333-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 187 | 136 | 17 | 340 | 13 | 33 |
- HTML: 187
- PDF: 136
- XML: 17
- Total: 340
- BibTeX: 13
- EndNote: 33
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1