ISEFlow: A Flow-Based Neural Network Emulator for Improved Sea Level Projections and Uncertainty Quantification

Van Katwyk, Peter; Fox-Kemper, Baylor; Nowicki, Sophie; Seroussi, Hélène; Bergen, Karianne J.

doi:10.5194/egusphere-2025-4914

Preprints

https://doi.org/10.5194/egusphere-2025-4914

Preprints

01 Dec 2025

| 01 Dec 2025

Status: this preprint is open for discussion and under review for The Cryosphere (TC).

ISEFlow: A Flow-Based Neural Network Emulator for Improved Sea Level Projections and Uncertainty Quantification

Peter Van Katwyk, Baylor Fox-Kemper, Sophie Nowicki, Hélène Seroussi, and Karianne J. Bergen

Editorial note: this preprint is an update of https://doi.org/10.5194/egusphere-2025-870. The original version had been posted on EGUsphere for submission to the journal Geoscientific Model Development. After interactive public discussion the preprint was archived since the authors decided to stop the peer review and substantially revise their manuscript for submission to the journal The Cryosphere.

Abstract. Ice sheets are the primary contributors to global sea level rise, yet projecting their future contributions remains challenging due to the complex, nonlinear processes governing their dynamics and uncertainties in future climate scenarios. This study introduces ISEFlow, a neural network-based emulator of the ISMIP6 ice sheet model ensemble designed to accurately and efficiently predict sea level contributions from both ice sheets while quantifying the sources of projection uncertainty. By integrating a normalizing flow architecture to capture data coverage uncertainty and a deep ensemble of LSTM models to assess emulator uncertainty, ISEFlow separates uncertainties arising from training data from those inherent to the emulator. Compared to existing emulators such as Emulandice and LARMIP, ISEFlow achieves substantially lower mean squared error and improved distribution approximation while maintaining faster inference times. This study investigates the drivers of increased accuracy and emission scenario distinction and finds that the inclusion of all available climate forcings, ice sheet model characteristics, and higher spatial resolution significantly enhances predictive accuracy and the ability to capture the effects of varying emissions scenarios compared to other emulators. We include a detailed analysis of importance of input variables using Shapley Additive Explanations, and highlight both the climate forcings and model characteristics that have the largest impact on sea level projections. ISEFlow offers a computationally efficient tool for generating accurate sea level projections, supporting climate risk assessments and informing policy decisions.

Received: 03 Oct 2025 – Discussion started: 01 Dec 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3945 KB)

Please read the editorial note first before accessing the preprint.
Preprint (3945 KB)

Download & links

Please read the editorial note first before accessing the preprint.

Peter Van Katwyk, Baylor Fox-Kemper, Sophie Nowicki, Hélène Seroussi, and Karianne J. Bergen

Status: open (extended)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2025-4914', Christopher Smith, 05 Jan 2026 reply

This paper introduces ISEFlow, a machine-learning based emulator for determining the sea-level rise components from mass lost from the Greenland and Antarctic ice sheets.
This problem is a rich field for emulator development, since ice sheet dynamics are difficult to model. It is an important problem given the contributions to sea level rise that the GrIS and AIS are likely to make in the 21^st century and beyond.
In my opinion this is a useful model that will add to the rapidly growing toolbox for modelling sea level rise and its components. From what I can tell, ISEFlow can emulate ISMIP models well, which I believe were from a small number of SSP/RCP projections. What would be very useful indeed would be the ability to produce projections for emissions scenarios not run in CMIP/ISMIP models. Is this possible at the moment in ISEFlow?

The remaining comments are mostly minor.
Figure 4: I think the SLR anomaly axis has the wrong sign, by tracing the origin of this figure back to IPCC (we have mass loss, so SLR anomaly should increase). I also think it would be nice to be consistent with the majority of literature and switch the colours of RCP2.6 and RCP8.5.
In all cases where comparative metrics are used such as MSE, MAE, KLD, JSD, etc. do they have units? I would be surprised if MSE and MEA didn’t.
Line 5: LSTM – introduce acronym
Line 25: “climate projections”: if I was being pedantic, the references in this sentence are pertaining to projections of sea level rise from ice sheet loss rather than the whole climate. Many examples of climate projection models are given in Romero-Prieto et al., accepted (https://egusphere.copernicus.org/preprints/2025/egusphere-2025-2691/). Or you could say ice sheet/cryosphere emulations to be specific.
Line 66: “accurately … projects future sea level”: again pedantic, it’s probably not correct to suggest that this model accurately projects the future sea level since we don’t have this observation; would it be better to say it accurately emulates the sea level rise components from ISMIP models?
Line 88: Coupled (not Climate) Model Intercomparison Project
Line 89: “yearly-averaged atmospheric and oceanic forcing anomalies”: it would be nice to have a list of these forcings, if it isn’t too long
Line 97: signpost to figures A1 and A2 on the ISMIP6 regions somewhere in this paragraph.
Line 115: 635 projections in the training set, 136 projections in the validation set. How were these numbers decided? And what makes up the total of 771 projections? Presumably this is some number of ice sheet models taking forcing data from a number of CMIP models under some number of scenarios?
Line 141: “another model”: “other models”?
Line 200: 256 GB RAM, I assume
Lines 189-190: This is a very opaque sentence for somebody not versed in machine learning.
Line 284, related to my first question: This paragraph reports that including variables beyond temperature improves emulations, which isn’t surprising. (Can you confirm whether this is global mean temperature or local temperature)? However, the mainstream climate emulators would generally give you only global mean surface temperature from emissions scenarios, which would allow a user to produce climate projections from any emissions scenario and not just the ones run by CMIP/ISMIP models. Therefore, can SLR projections from the GrIS and AIS components be produced from ISEFlow which are “good enough”, even if not ideal? Figure 5, if I interpret it correctly, seems to suggest so. This would really help to find a valuable use case for this model by piggybacking off GMST projections by emulators.
Line 301: D statistic of 0.158. Is this good? I have no feeling of what a good value is. Are there units here?
Caption to figure 5: you can drop the word “carbon” to be more general and accurate.

Reply

Citation: https://doi.org/10.5194/egusphere-2025-4914-RC1
RC2: 'Comment on egusphere-2025-4914', Denis Felikson, 08 Feb 2026 reply

General summary:

This paper presents a new machine-learning emulator called ISEFlow for ice sheet model projections. The emulator is trained with simulations from the ISMIP6 ensemble and the paper documents the computational performance and quality of the results, including a thorough comparison with the emulator used in Assessment Report 6, emulandice. This study is an excellent forward step in the development of ice sheet emulators, which are critical components in sea level projections.
There are several issues that should be addressed in the revision and these are detailed below.
Major comments:

1.) Additional discussion is needed on the topic of distinguishing between scenarios.

a.) The introduction needs additional text to clearly define the metrics that will be used to judge whether ISEFlow performs better in terms of distinguishing between scenarios.

b.) The text in Section 3.3 uses the KS statistic and claims that ISEFlow is able to distinguish between scenarios better than emulandice. However, Table 4 shows that emulandice has larger KS values than ISEFlow for both ice sheets. My understanding is that a larger value for emulandice indicates that it produces distributions that are further apart than for ISEFlow. This seems to contradict the claim that ISEFlow is better able to distinguish between scenarios and this must be addressed.

c.) I also wonder about whether this lack of ability to distinguish between scenarios is actually being caused by the underlying ISMIP6 ensemble and not by the emulators themselves. I suggest adding text on this in the Results and Discussion sections and clearly stating whether the ISMIP6 ensemble can distinguish between scenarios, maybe by showing the KS values for that ensemble (if it is valid to use that statistic for that ensemble).

d.) Please also add some discussion on whether the differences in the KS D statistics in Table 3 are statistically significant and what values of KS indicate a "strong" ability to distinguish between distributions (as stated in Section 3.3). Qualitatively, it's not immediately obvious from the box-and-whisker plots in Fig 4a that ISEFlow can distinguish between the two scenarios and it's not clear whether a KS value of 0.16 indicates that these two distributions strongly differ.
2.) Throughout the Results section, there are results reported for the NN emulator and the GP emulator (e.g., line 299) and these are seemingly referring to different architectures of ISEFlow. Although the Methods section (paragraph beginning on line 170) discusses different ISEFlow architectures, it doesn't describe a "GP" architecture for ISEFlow. I suggest adding text to the Methods section that provides more detail about the GP architecture and describes the differences between the NN architecture and the GP architecture for ISEFlow.
3.) The utility of the "classification model" is not clear to me. The results from this model (lines 344-354) are framed as showing the ability of ISEFlow to distinguish between emissions scenarios. However, in my mind, although the emulator architecture is the same, this "classification model" is a completely distinct emulator from the regular ISEFlow that is used to predict SLE. Two major issues should be addressed:

a.) It is not clear how these results demonstrate that ISEFlow can distinguish between scenarios, given that this is a completely different set of inputs, outputs, and a retrained model. Text should be added (either in Methods or elsewhere) to address this.

b.) Given (a), the utility of these results is not clear. I am perhaps oversimplifying but these results are showing that, for example, surface air temperature over the GrIS and the AIS is a strong predictor of scenario. In other words, the emulator is able to correctly identify the scenario given the temperature. This is a valid result but what does this information tell us about the climate models, ice sheet models, or the ISEFlow emulator? There should be text added to the Discussion section to address this.
4.) Throughout the manuscript, "emulator uncertainty" is the term used to refer to the uncertainty on the output of the emulator. Does this capture both the uncertainty introduced by the emulation process and the uncertainty within the underlying input data (i.e., the ISMIP6 ensemble itself)? If so, I suggest explicitly stating this somewhere (possibly in the paragraph on lines 129-139).
5.) The use of LARMIP in the paper is a bit unclear to me and it stems from the "training time" reported in Table 1. Am I interpreting correctly that it takes 20 minutes to generate 20,000 random samples using the LARMIP process (but with ISMIP6 AIS GSAT as inputs instead of what the LARMIP paper used)? Or was there something else introduced as part of the process that caused it to take a long time? I think that I'm just surprised that generating 20,000 samples takes so long but I've also not thought too deeply about the computations involved in LARMIP. I just wanted to check that I'm understanding this part of the paper correctly. If my interpretation is incorrect, please add a clarification to the text to explain how the LARMIP emulator was used and how it differs from what was done in the original LARMIP paper.
6.) Text should be added to the introduction to the paragraph on lines 75-84 to specify how this study will determine whether the emulator is "learning correct physical principles". The Results section uses a SHAP analysis of the emulator inputs and compares this against previous studies that have used other methods to partition uncertainty. This should be explained in the introduction for added clarity.
7.) The input variables used for training ISEFlow (for all of the different trainings done in this paper) need to be more clear. This can be done in the main text or supplement (or appendix by adding to the existing Table A1 and A2).

a.) Provide a table that lists all input variables used to train the emulator that produced the results in Table 1.

b.) Provide a table that lists the input variables used for the "All", "Temperature", and "SMB" versions of the emulator shown in Table 3.

Minor comments:
Line 21: Here, you cite DeConto and Pollard (2016) in the explanation of sources of uncertainty for ice sheet projections. I suggest expanding on this just a bit to make it clear that one major source of uncertainty is MICI, which reflects "deep uncertainty," and this is separate from the other sources of uncertainty (climate forcings, ice sheet model parameters, initial state, etc.).
Line 24: Change "climate simulations" to "ice sheet simulations"
Line 25: Similarly, change "climate projections" to "ice sheet projections"
Line 26: I suggest changing "small-scale" and "large" to "computationally efficient" and "computationally expensive," however I defer to the authors on this. I'm suggesting this because "small-scale" and "large" can be used to refer to spatial scales but I think what's actually being discussed here is the computational scale.
Line 28: Change "climate model" to a more general term like "physical model"
Line 29: Change "climate system" to "physical system"
Lines 46-51: In addition to ISMIP6 and LARMIP, AR6 used the results from the Structured Expert Judgement (SEJ) process to produce the high-end, deeply uncertain projections for the ice sheets. You can call that out here (this is related to my first minor comment above). This doesn't need any sort of detailed explanation - it's not your responsibility in this paper to explain all of these nuances - but I think it's worth mentioning for completeness.
Line 55: Specify that these fixed parameters resulted from choices made in the ensemble design of ISMIP6 and not limitations imposed by Emulandice itself.
Lines 114-120: This paragraph could be clarified by stating the total number of projections available for each ice sheet from ISMIP6. Additionally, please clarify what is meant by the word "full" when stating "136 full projections."
Lines 129-139: To obtain the "true" uncertainty, should the "data coverage uncertainty" and "emulator uncertainty" be combined? I suggest adding some text on whether this is the case.
Line 158: Should this read: "areas where the data is highly variably or missing"?
Line 185: Specify what additional data is needed from ISMIP6 and how it should be combined with the output from the LARMIP function to produce SLE values.
Line 233: There's a typo here: "that was help out of the original ISMIP6 ensemble". Please rephrase.
Line 237: Please clarify what an "individual projection" is. Is it the projection for one ISMIP6 experiment (i.e., one emissions scenario, one climate model forcing, and one set of model parameters) or something else?
Line 240: How are MSE and MAE calculated? Is the error calculated as the difference between (1) emulated and (2) mean ISMIP6 SLE at each time step? Or just at 2100? Please explicitly state how these metrics are calculated.
Table 1: Add units to all reported values in the table, either within the table or in the header. If MSE and MAE (or other metrics) are reported in units of mm SLE, please make sure to include the "SLE".
Line 283: I suggest adding "the emulator would need to be re-trained and" before "the rankings are likely to differ."
Line 284: I suggest replacing the word "additional" with "individual climate"
Line 289: Would it be a bit more specific to replace "offers more information for modeling SLE" with "is a better predictor of SLE"?
Lines 295-316: I suggest moving and combining the text in these paragraphs with what's in Section 3.3.
Line 315: The phrase "is insufficient for producing accurate future sea level projections" should be changed to something like "results in less accurate sea level projections." Although the Temp- or SMB-only results are less accurate, it hasn't been shown that they are insufficient.
Line 317: Add text to clarify that the results shown in Figure 3 are for the "All" set of training inputs.
Figure 4: There are a couple of things that are unclear with the ISMIP6 time series that shown:

a.) It would be useful to show the RCP 2.6 and RCP 8.5 time ISMIP6 projections in two different colors. This can probably be accomplished by using the same blue/orange colors but keeping the thinner lines for ISMIP6.

b.) It's not clear whether the AIS ISMIP6 projections all start at 0 SLE in 2015. Some of the lines look like they're offset from zero. These lines should be shifted or, if they do start at 0 but then "jump" instantly to another value in 2016, this should be mentioned in the caption.
Line 390: Change "widely used emulators" to "previous emulators such as emulandice"
Line 394: I suggest changing the phrase "capture the correct underlying process" to something like "capture the physical processes that have been shown by previous studies to be the dominant drivers of future mass change".
Line 412: In this paragraph, you should mention that emulandice was also previously used to fill in the different SSPs that weren't modeled by ISMIP6.
Line 434: This sentence states that ISEFlow can be a "valuable as a tool in designing ISMIP7 configurations." Please add a sentence or two to describe how the outputs of ISEFlow could be used in ISMIP7 design.
Lines 438-440: The text states that ISEFlow can be used to predict ice sheet model output for any CMIP7 forcing. This reads like it is restating what was written in the paragraph on lines 412-420. If this is the case, this duplicate text here should be removed. If I am misunderstanding and this text is introducing a different idea, it should be rewritten to be more clear.
Line 477: Change "contributions" to "contribution"

Reply

Citation: https://doi.org/10.5194/egusphere-2025-4914-RC2

Peter Van Katwyk, Baylor Fox-Kemper, Sophie Nowicki, Hélène Seroussi, and Karianne J. Bergen

Viewed

Total article views: 399 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
260	115	24	399	26	35

HTML: 260
PDF: 115
XML: 24
Total: 399
BibTeX: 26
EndNote: 35

Views and downloads (calculated since 01 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	131	45	14	190
Jan 2026	62	38	4	104
Feb 2026	67	32	6	105

Cumulative views and downloads (calculated since 01 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	131	45	14	190
Jan 2026	62	38	4	104
Feb 2026	67	32	6	105

Viewed (geographical distribution)

Total article views: 370 (including HTML, PDF, and XML) Thereof 370 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 01 Mar 2026

Download

Please read the editorial note first before accessing the preprint.

Preprint (3945 KB)
Metadata XML

Short summary

We developed ISEFlow, a new climate emulator model that predicts how melting ice sheets in Greenland and Antarctica will contribute to future sea levels. Unlike past tools, it uses advanced machine learning to capture complex ice processes, distinguish between different greenhouse gas scenarios, and provide clearer estimates of uncertainty. This makes sea level projections more accurate and reliable, helping scientists and policymakers better plan for climate risks.


Total:	0
HTML:	0
PDF:	0
XML:	0