An ensemble machine learning approach for filling voids in surface elevation change maps over glacier surfaces
Abstract. Glacier mass balance assessments in mountainous regions often rely on digital elevation models (DEMs) to estimate surface elevation change. However, these DEMs are prone to spatial data voids, particularly during historical reconstructions using older imagery. These voids, which are most common in glacier accumulation zones, introduce uncertainty into estimates of glacier mass balance and surface elevation change. Traditional void-filling methods, such as constant and hypsometric interpolation, have limitations in capturing spatial variability in elevation change. This study introduces a machine-learning- based approach using gradient-boosted tree regression (XGBoost) to estimate glacier surface-elevation change across voids. High Mountain Asia (HMA) is an ideal study area for assessing the accuracy of different void-filling approaches across glaciers with varying morphology and climatic settings. We compare XGBoost predictions to traditional void-filling methods across the Western and Eastern Himalayas using a dataset of DEM-derived elevation changes. Results indicate that XGBoost consistently outperforms simpler methods, reducing root mean square error (RMSE) and mean absolute error (MAE) while improving alignment with observed elevation changes. The study highlights the advantages of integrating multiple glaciological and topographic predictors, demonstrating the potential of machine learning to improve assessments of glacier mass balance and elevation change. Future research should explore additional predictors, such as climate data, to further enhance predictive accuracy.
Review of “An ensemble machine learning approach for filling voids in surface elevation change maps over glacier surfaces”, Markovsky et al., submitted to “The Cryosphere”
by Romain Hugonnet, University of Alaska Fairbanks
General comment:
This study by Markovsky and co-authors explores a valuable topic for the glaciological community. In order to provide constrained estimates of past glacier mass/volume change, we need benchmarked void-filling approaches for glacier elevation changes. This is especially important as voids are increasingly common when using historical archives, which hold the potential to unlock decades of past glacier change. Thus, the work explores the performance of an existing machine-learning approach to improve void-filling using exclusively topographical variables, in the aim to improve simple binning approaches currently used.
The study is clear, well-structured and reads well. However, I think important issues regarding its presentation and the reliability of its statistical analysis need to be addressed.
Firstly, the core finding (which is that there is only a marginal improvement in prediction using the machine-learning approach) is somewhat misrepresented, especially in the abstract (but less in the discussion/conclusion). If it is the authors’ intention to publish negative/neutral results, I think it is OK (and an important part of the scientific process), but that would need to be conveyed directly.
Secondly, I believe that the current statistical analysis has major limitations that have not been fully identified or discussed by the authors, namely: 1/ The test data is itself very noisy, so measurement errors are mixed with prediction errors, preventing a reliable statistical validation, 2/ This type of machine-learning approach is known to suffer from error autocorrelation and training regionalization, and can thus be poorly fit to provide uncertainty estimates, which is not discussed.
Finally, while the text reads well, I found that it greatly lacked diversity in the scope of its discussion and was somewhat biased in its scientific references (keeping to only 20 citations and omitting highly relevant work).
Major comments:
1/ Describing the prediction improvement (or lack thereof) accurately
From a statistical viewpoint, the improvement in prediction compared to the widely-used hypsometric method is clearly marginal:
- Per-pixel: RMSE 0.379 vs 0.328, i.e. barely 15%,
- Glacier-wide: Basically no difference (one region slightly worse, one region slightly better).
This “neutral finding” is not conveyed accurately by the authors, especially in their abstract. Compounded with the fact that the per-pixel data is noisy and thus does not necessarily represent true elevation changes (see next comment for details), the statistical significance of these results is quite limited.
While I suspect this could be partly due to the noisy test data, it might be that the prediction performance is also incompressible when using (almost) only topographical variables. If so, that would be an interesting finding in itself: The hypsometric method is largely sufficient when using only topographical characteristics, especially for glacier-wide estimates (what is currently used for model calibration). I know the authors put effort into developing a new prediction approach for this study, and thus a conclusion conveying only a narrow improvement is difficult to put forward, but negative/neutral conclusions are not a bad thing in research and should be clearly reported.
2/ Poor statistical validation due to noisy test data
As the core data for their entire analysis, the authors use elevation changes estimates from Shean et al. (2020), which are derived mostly from ASTER DEMs known to be very noisy (Girod et al., 2017). In High Mountain Asia in particular, where many accumulation areas are extremely bright, ASTER cannot resolve high elevations reliably. This means that the input data of the authors (used for training/validation) is itself affected by measurement errors often higher than the elevation change signal itself, especially at the pixel-scale. Therefore, this data is poorly adapted to study potential improvements in per-pixel prediction in hypsometric gap-filling. In Shean et al. (2020) (or Hugonnet et al. (2021), that performed a similar analysis with more validation and interpretation regarding errors), it is only by spatially aggregating many pixels that random errors cancel out (depending on their spatial autocorrelation) and that reliable glacier-wide estimates can eventually be derived with ASTER. (this is also why it was less of an issue for McNabb et al. 2019, mostly concerned with glacier-wide estimates).
To address this issue of test data, I think the authors have several options:
- Ideally, I'd recommend to use only high-resolution elevation changes, either from local surveys (lidar, aerial) or from high-resolution DEMs such as those of the Pléiades Glacier Observatory that are distributed at various sites globally (Berthier et al., 2024).
- Otherwise, as a “drop-in replacement” for the same region, the authors could potentially still use ASTER elevation change products, but would need to filter pixels with very high uncertainty relative to the signal. For this, the authors need a predicted uncertainty at the pixel level, which are not available from Shean et al. (2020). Hugonnet et al. (2021) provides uncertainty products based on a validated empirical framework, where the per-pixel variability in uncertainty varies with slope and quality of stereo-correlation (Hugonnet et al., 2022), and is propagated during the temporal fit, with validation against high-precision measurements. However, using this data, the authors should expect to have to remove a large part of the dataset including many of the accumulation areas they focused on, or to partition the relative per-pixel errors due to input error and prediction error (more difficult)…
3/ Relevance of the machine-learning approach and its validation
While the authors mostly praise the (potential) advantage of their approach, they fail to discuss known limitations. Many machine-learning methods have been shown to underperform in specific applications in geoscience, which include in particular variables prone to error autocorrelation, or subject to difficult regionalization during training (e.g. review by Hoffimann et al., 2021). Glacier elevation changes have errors that are highly autocorrelated, whether from noise in the DEMs (e.g., Rolstad et al., 2009; Hugonnet et al., 2022), or simply by adding error during temporal prediction, so the first limitation is highly relevant and potentially quite limiting here. Regionalization is also an issue here, given that elevation changes vary significantly from region to region (polar ice caps, alpine glaciers, tidewater glaciers), but also because the authors chose to only focus on upper-area voids (while voids can exist everywhere due to acquisition swath, see my line-to-line comment later) and chose a fixed relative size (37% of the accumulation area, defined as upper 50%).
In particular, providing reliable uncertainty estimates is something that this type of machine-learning approach can struggle with (by overfitting significantly the autocorrelated data), contrary to other machine-learning approaches (such as Gaussian Processes). As the reported improvement in prediction is marginal compared to hypsometric methods, I would argue that improving our estimate of the uncertainty in the prediction is currently as important (if not more) as further improving the prediction itself, which is a topic that was covered slightly in Seehaus et al. (2020). However this topic is omitted in the present manuscript.
All of these limitations should be thoroughly discussed, and the analysis expanded accordingly (e.g., using a varying size of void and not a fixed 37%).
Additionally, concerning the validation:
- Per-pixel accuracy analysis: RMSE and MAE are both pretty bad metrics as they mix random and systematic errors, consider reporting primarily the mean (or median) and standard deviation (or NMAD) of residuals, which capture both independently, as well as the metric used to optimize/learn.
- Glacier-wide analysis: Good inclusion by the authors, because glacier-wide accuracy is the most important output for total mass change. However, this analysis is very size-dependent (as mentioned above, errors cancel out over the glacier based on area), so the authors should not group glaciers of all sizes together, and rather study the performance depending on glacier size. Currently, the errors are probably entirely driven by those of tiny glaciers.
4/ Biased references
The authors repeatedly cite a few references for very different aspects of their study, omitting other relevant studies in the literature, sometimes even those at the origin of a given method. A couple of examples:
- McNabb et al. (2019) is used for the gap-filling methods, without citing original references,
- Shean et al. (2020) is used for most of the world of HMA/remote sensing, even when not especially relevant,
- Maurer et al. (2019) for everything historical and DEM-processing, even when widely used in much earlier and generic studies.
The authors should diversify their citations, and find the original references for a given method or processing step (sometimes cited within the study they cite). I have included some of these references below in line-by-line comments, but I didn’t elaborate on all, and there are many more to address across the manuscript.
Line-by-line comments:
23: The reference to the old Bamber & Rivera review feels a bit specific, given that the end statement is about density. For density, cite for example Huss (2013) that is the most widely used. To add a more recent review including DEM differencing, cite for example Berthier et al. (2023).
26-31: In the whole section, it is not explained that the voids “predominant in accumulation areas” and later described as “common in historical images” are directly due to limits during stereophotogrammetry (this key term almost never appears in the manuscript) performed on optical imagery (thus including historical archives) to generate DEMs. This needs to be clarified. But beyond this, voids also exist in every large-scale (= satellite) DEMs simply because of fixed-width satellite swaths during acquisition, no matter the instrument (optical, radar).
60: Shean et al. (2020) is clearly not the right citation for this statement… There are extensive review, inventories or other studies more adapted to describing HMA glaciers as a whole.
132: How is the artificial void grown from the seed? I assume you use a flood-filling (or seed-filling) algorithm with 4/8-pixel direction? If yes, describe which and include the appropriate reference, such as Newman et al. (1979).
145-154: The first occurrence of hypsometric void filling in glaciology is, to my knowledge, Arendt et al. (2002), and the elevation dependency has been greatly described long before the citations mentioned (Jakob et al., 2021, or McGrath et al, 2017; which can be removed), especially for spatial extrapolation. See for instance Huss (2012).
179-185: Those components are also called “Northness” and “Eastness”.
Fig. 3: Add a colormap for the density, even if it is a linear scale?
Table 4: There’s probably an error in the reported value of the STD of Western Himalaya/Hypsometric (it is an order of magnitude above all other STDs).
New references from this review
Arendt, A. et al. (2002), Rapid Wastage of Alaska Glaciers and Their Contribution to Rising Sea Level.Science297,382-386.DOI:10.1126/science.1072497
Berthier, E., Floriciou, D., Gardner, A. S., Gourmelen, N., Jakob, L., Paul, F., Treichler, D., Wouters, B., Belart, J. M. C., Dehecq, A., Dussaillant, I., Hugonnet, R., Kääb, A., Krieger, L., Pálsson, F., & Zemp, M. (2023). Measuring glacier mass changes from space-a review. Reports on Progress in Physics, 86(3). https://doi.org/10.1088/1361-6633/acaf8e
Girod, L., Nuth, C., Kääb, A., McNabb, R., & Galland, O. (2017). MMASTER: Improved ASTER DEMs for Elevation Change Monitoring. Remote Sensing, 9(7), 704. https://doi.org/10.3390/rs9070704
Hoffimann, J., Zortea, M., de Carvalho, B., & Zadrozny, B. (2021). Geostatistical Learning: Challenges and Opportunities. Frontiers in Applied Mathematics and Statistics, 7. https://doi.org/10.3389/fams.2021.689393
Huss, M. (2012): Extrapolating glacier mass balance to the mountain-range scale: the European Alps 1900–2100, The Cryosphere, 6, 713–727, https://doi.org/10.5194/tc-6-713-2012.
Huss, M. (2013). Density assumptions for converting geodetic glacier volume change to mass change. The Cryosphere, 7(3), 877–887. https://doi.org/10.5194/tc-7-877-2013
Hugonnet, R., Brun, F., Berthier, E., Dehecq, A., Mannerfelt, E. S., Eckert, N., & Farinotti, D. (2022). Uncertainty Analysis of Digital Elevation Models by Spatial Inference From Stable Terrain.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, 6456–6472. https://doi.org/10.1109/JSTARS.2022.3188922
Hugonnet, R., McNabb, R., Berthier, E., Menounos, B., Nuth, C., Girod, L., Farinotti, D., Huss, M., Dussaillant, I., Brun, F., & Kääb, A. (2021). Accelerated global glacier mass loss in the early twenty-first century. Nature, 592(7856), 726–731. https://doi.org/10.1038/s41586-021-03436-z
Newman, William M; Sproull, Robert Fletcher (1979). Principles of Interactive Computer Graphics (2nd ed.). McGraw-Hill. p. 253. ISBN 978-0-07-046338-7.
Rolstad, C., Haug, T., & Denby, B. (2009). Spatially integrated geodetic glacier mass balance and its uncertainty based on geostatistical analysis: Application to the western Svartisen ice cap, Norway. Journal of Glaciology, 55(192), 666–680. https://doi.org/10.3189/002214309789470950