the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Calibrating calving parameterizations using graph neural network emulators: Application to Helheim Glacier, East Greenland
Abstract. Calving is responsible for the retreat, acceleration, and thinning of numerous tidewater glaciers in Greenland. An accurate representation of this process in ice sheet numerical models is critical in order to better predict the future response of the ice sheet to climate change. While traditional numerical models have succeeded in simulating ice dynamics and calving under specific parameterized conditions, the computational demand of these models makes it difficult to efficiently fine-tune these parameterizations, adding to the overall uncertainty in future sea level rise. Here, we develop various standard Graph Neural Network (GNN) architectures, including graph convolutional network (GCN), graph attention network (GAT), and equivariant graph convolutional network (EGCN), to construct surrogate models of finite-element simulations from the Ice-sheet and Sea-level System Model. GNNs are particularly well suited for this problem as they naturally capture the representation of unstructured meshes used by finite-element models. When these GNNs are trained with the simulation results of Helheim Glacier, Greenland, for different calving stress thresholds, they successfully reproduce the evolution of ice velocity, ice thickness, and ice front migration between 2007 and 2020. GNNs show better fidelity than convolutional neural networks (CNN) particularly near the boundaries of fast ice streams, and EGCN outperforms the others by preserving the equivariance of graph structures. By using the GPU-based GNN emulators, which are 260–560 times faster than the numerical simulations, we determine the optimal range of the calving threshold that minimizes the misfit between the modeled and observed ice fronts.
- Preprint
(72065 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-1620', Anonymous Referee #1, 17 Jul 2024
- AC1: 'Reply on RC1', YoungHyun Koo, 02 Sep 2024
-
RC2: 'Comment on egusphere-2024-1620', Anonymous Referee #2, 14 Aug 2024
This is a review of the pre-print by Koo et al. in The Cryosphere titled "Calibrating calving parameterizations using graph neural network emulators: Application to Helheim Glacier, East Greenland". This study describes the use of Graph Neural Networks (GNNs) of various types to emulate the behavior of ISSM, a finite-element ice sheet model. There is also extensive comparison of the use of GNN to more traditional FCNs, which require input data to be on a uniform, rectangular grid. Once the NN is trained, it can be used to predict ice thickness, velocity and terminus position (through an ice mask) at a subsequent time step based on these fields at the prior time step and a parameter governing calving behavior in a model parameterization (sigma_max).
The most novel aspect of this study is the use of GNNs to emulate a finite-element ice sheet model. The study makes a good case for why this type of NN makes sense for emulating a model on a non-uniform mesh, though I'm not sure the way in which its accuracy was compated to FCNs is completely fair. In that sense, I found this the most compelling potential advance of this study to be the development of a general purpose ice-sheet-model emulator (like IGM, but with some advantages). I am less convinced that we have necessarily learned much about calving from the use of this new method. I explain some of my major issues in this regard below and then a list of smaller suggestions below that.
Major points:
1. As I read through the study, I found myself unclear about the scientific use of this methodological advance. The GNN will emulate ISSM at high accuracy and signficantly lower computational cost. What questions will that help us to answer that isn't possible with conventional methods? This is a particularly important question to answer since this is submitted to The Cryosphere, a disciplinary journal, as opposed to a more methods oriented journal like GMD or JGR:MLC.
Once I got to the end, I saw that the main application this new emulator was used for was essentially something like transient parameter estimation (using the low cost of the NN to enable an exhaustive grid search for sigma_max at each time step). But then the result of this application didn't make physical sense. The calving front retreats while sigma_max increases, which is sort of opposite what should happen *if* calving drives retreat (which it may not). The text pushes off the explanation on "other processes" without much investigation of whether the methods may be at fault, or other potential explanations. Ultimately, this is a challenge of using completely data-driven ML without further investigation of the latent space of the NN - the emulator is a black box, so it is challenging to diagnose what is happening in it that causes this counter-intuitive result.
2. The study, as it stands, has not convinced me that the GNNs trained as they were in this study, generalize at all outside of the very limited training data. The test data is completely within the interior of the limited parameter/state space on which the GNNs are trained. If I simply used linear interpolation to generalize from the training data to the test data, how accurate would that be in comparison? It would certainly be computationally cheap.
More importantly, the GNNs have not been tested on any cases that are out of the temporal or spatial sample of the training data. If the aim is to narrowly train the model to do a really good job learning what Helheim did from 2007 to 2020, thats OK, but state that narrow expectation explicitly. There are places in the study where you say that these GNNs could be used to replace an ice sheet model more generally, or in future simulations, but you haven't really shown the ability of the GNNs to do that, since they haven't been tested outside of this very narrow place and time period.
3. The accuracy metrics and differences therein are not very convincing. Interpreting a difference between 0.997 R value and 0.999 is not good statistics, particularly without assessing significance of these statistics on the training data. Similarly, I'm not sure how different a calving front accuracy of 98.6% vs. 99.4% is. I'm guessing both are significant at some very high level and so reading much into the difference beyond that isn't very meaningful. What happens if you drop some of the training data? Does the accuracy degrade? This is a common way to determine whether the NN has learned anything about the underlying dynamics of the system vs. acting as a fancy interpolator of the training data.
Additionally, the way that you train and then assess the accuracy of the FCN does not provide a fair comparison to the GNNs. By interpolating from the finite-element mesh to a uniform rectangular mesh, you've done two things: lowered the resolution of the training data in the finest parts of the grid and inflated the relative weight of the coarse parts of the grid by increasing the number of grid points in these areas. The places with the finest resolution in ISSM are also places where velocity is the highest and where the ice mask is changing (i.e. near the terminus) which will tend to make errors more important. effectively, after interpolating you have given the FCN worse training data than the GNNs. The least you can do is interpolate the FCN training data onto a uniform grid with resolution equal to the finest resolution in the ISSM mesh. Additionally, using some knowledge about where errors are likely to be the largest, you can apply weights in the FCN training loss function which are proportional to the finite-element grid resolution. In that way, you will be "fixing" the mis-weighting that has occured by interpolating the training data that you then assess accuracy on.
I get that in some sense your whole point is that FCNs are not natural fits for finite-element training data, but with the relatively minor differences in accuracy you find, its hard to discern whether this is due to the NN being superior at capturing the data vs. the training data just being different due to interpolation artifacts. These are very different claims.
None of this changes the fact that GNNs are likely to be much more efficient at natively training and then running on the finite element mesh. I believe your case that they are computationally superior, but I'm not sure I see much difference (or a fair comparison of differences) in the accuracy. My suggestion is simply to focus on the fact that emulating finite-element models (which most modern ice sheet models are) is more natural using GNNs since it doesn't require interpolation and that the computational advantage of GNNs over FCNs is massive. The GNNs do a great job accurately emulating the model by any objective measure, so emphasize this.
Minor suggestions:
L1: Increasing calving has been linked to the retreat
L3: have been used to simulate ice
L10: reproduce the observed evolution
L22: total ice sheet mass loss
L28,30: optimal in what sense?
L35: as a boundary condition in numerical
L41: necessitate using high-performance
L56: the training of emulators
L60: outlet glaciers in Greenland
L79: The migration rate of the ice front
L82: ice front migration rate (velocity is confusing here because it could refer to other things)
L87: VM has not been defined as an acronym
L87: correlates with weaker ice
L89: many observational studies have found tensile strength as low as 100 kPa (Vaughn 1993 is a particularly well known paper), so I'm not sure where this lower bound is coming from.
L91: is important to accurately reproducing observed glacier evolution
L117: CNN cannot represent finite-element ice sheet models on their native grid
L122: focused on calibrating calving parameterizations using
L135: each transient simulation denerates of a total of 261 outputs between.
L136: calibrated and held constant
L136-140: the use of semicolons here is a bit challenging to read. Why not just write these as separate sentences?
L146: adjacency matrices?
L151: we compare to remote-sensing
L201: you aren't the first to develop an EGCN - you train a NN architecture that has previously been described in other papers
L242: don't you mean validation instead of testing on this line?
L243: Related to point #2 above - it seems that you have chosen test cases non-randomly, and I wonder what would happen if you chose 0.7 as a test case instead (with 0.7 not in the training/validation data)?
L271: remarkable in what sense? This is related to point #3 above - what is your benchmark that you are comparing to? Signficance at 0.95 or above?
L305-315: it could be made clearer here that when tested on the exact same hardware, GNN are faster. Comparing wall time on two difference processses or a different number of processors is not a fair comparison.
L350: I'm not sure I buy this argument partly because enough information hasn't been provided. Was ocean frontal melt included in the ISSM simulations? Do we know if melange increased at Helheim over these years? If so, it would presumably have an influence on calving rate, which could be captured effectively through sigma_max...This gets to the point above that this discussion here is entirely too brief and doesn't engage with any prior work on Helheim and its recent changes. If this paper is to be appropriate for TC, instead of say, GMD, then that discussion would be needed.
L358: this begs the question: what would happen if you interpolated all the training data onto a rectangular grid, and then used that to train both the FCNs and the GNNs? This would be a fairer comparison than what you have now.
L364: CNN->FCN
L373: why should GNNs be trained with numerical simulations?
L385 with 13-year transient simulations of Helheim
L391: how are these emulators promising for parameterizing future behavior? They provide no way of constraining sigma_max without observations and you haven't desmontrated that they can extrapolate outside the temporal sample of the training data. Perhaps they could be used to do uncertainty quantification since they enable cheap MCMC sampling of parameters space.Citation: https://doi.org/10.5194/egusphere-2024-1620-RC2 - AC2: 'Reply on RC2', YoungHyun Koo, 02 Sep 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
323 | 123 | 40 | 486 | 17 | 14 |
- HTML: 323
- PDF: 123
- XML: 40
- Total: 486
- BibTeX: 17
- EndNote: 14
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1