A framework for assessing and understanding sources of error in Earth System Model emulation

Womack, Christopher B.; Flierl, Glenn; Bouabid, Shahine; Souza, Andre N.; Giani, Paolo; Eastham, Sebastian D.; Selin, Noelle E.

doi:10.5194/egusphere-2025-3792

Preprints

https://doi.org/10.5194/egusphere-2025-3792

Preprints

19 Aug 2025

| 19 Aug 2025

A framework for assessing and understanding sources of error in Earth System Model emulation

Christopher B. Womack, Glenn Flierl, Shahine Bouabid, Andre N. Souza, Paolo Giani, Sebastian D. Eastham, and Noelle E. Selin

Abstract. Full-scale Earth system models are too computationally expensive to keep pace with the growing demand for climate projections across a large range of emissions pathways. Climate emulators, reduced-order models that reproduce the output of full-scale models, are poised to fill this niche. However, the large number of emulation techniques available and lack of a comprehensive theoretical basis to understand their relative strengths and weaknesses compromises fundamental methodological comparisons. Here, we present a theoretical framework that connects disparate emulation techniques, using it to analyze sources of emulator error focusing on memory effects, hidden variables, system noise, and nonlinearities. This framework includes popular emulation techniques such as pattern scaling and response functions, relating them to less commonly used methods, such as Dynamic Mode Decomposition and the Fluctuation Dissipation Theorem (FDT). To support our theoretical contributions, we provide practical implementation details for each technique, evaluating performance across a series of experiments designed to highlight different potential sources of error. We find that response function-based emulators outperform other techniques, particularly pattern scaling, across all scenarios tested. We additionally outline potential advantages of incorporating statistical mechanics into climate emulation through the use of the FDT, though this technique requires greater computational resources and non-standard scenarios for training. Results highlight the relative utility of each technique discussed, along with the importance of designing future scenarios for Earth system models with emulation in mind, suggesting that large-ensemble experiments utilizing the FDT could benefit climate modeling and impacts communities.

Received: 04 Aug 2025 – Discussion started: 19 Aug 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Christopher B. Womack, Glenn Flierl, Shahine Bouabid, Andre N. Souza, Paolo Giani, Sebastian D. Eastham, and Noelle E. Selin

Status: closed

RC1:
'Comment on egusphere-2025-3792', Anonymous Referee #1, 23 Sep 2025

The authors present a framework for comparing emulation techniques. They do so by showing the theoretical connections between several existing emulation methods and relating them to two types of linear operators. These operators are shown to explain the same information about the system, demonstrating a link among all methods considered. The authors then test these methods’ abilities to predict four forcing response scenarios in four simplified toy models of either the climate system or the Lorenz convection approximation. Response function methods outperform both pattern scaling and attempts to directly estimate the linear operator in these example tests. The discussion around modeled results in the various tests is thorough and the connections to a common set of linear operators will likely be useful when considering how different emulators might perform. I have experience with pattern scaling, FDT, and ridge regression (which is how the deconvolution method has been practically implemented), though less so with much of the emulator-specific background cited here. As such, I will limit my comments to how this work fits with understanding ESMs more broadly.

Specific comments:
My main comment covers the goal and applicability of this work. I understand that the intent of the paper is to establish a “framework”, by which the authors mean the ability to frame each of these emulators as a variation or simplification on the paired linear response operators Fokker-Planck/Koopman. What is less clear to me is how directly the link can be made to “sources of error in Earth System Model emulation”. Generally, I understand if this paper is laying the groundwork for ESM testing, but in that case I felt that the writing did not make that intention clear. As presented, it reads as offering a tool that is directly applicable to evaluating emulators with respect to ESMs. The tests get at particular challenges in ESMs: memory effects, hidden variables, noise, and nonlinearities. However, the reader does not see the actual interaction between these methods and errors in ESM emulation.

530: While the 2- and 3-box models are frequent approximations to the climate system, they lack many of the physical mechanisms that make the climate system difficult to model. The parameters in these models are fit to ESMs, so are themselves simplified estimates of the actual behavior. I felt that the link between ability to emulate these examples and the ability to emulate ESMs deserved more discussion. I would have found this conceptually more useful than the level of technical detail included for the linear operators and each emulation model in the main text.

846: “This framework currently relies on simple experiments, and further work is needed to determine if operator-based methods like EDMD can be practically realized to emulate nonlinear processes in full-scale climate models.”: this sentence to me suggests that the step of showing that this framework is useful for ESMs is left to future work. I can see that there is some value in being able to connect the different models through a common framework in the way the authors use it to diagnose differences in the toy model. This may be more in line with a proof of concept for the framework rather than demonstrating how the framework applies to ESMs. However, if the goal is for this framework to be used by others and applied to ESMs, this seems like an important step to include. This may just be a framing issue.

Figure 4: If the results suggest that directly estimating response operators is the most prone to error, does this challenge the response operator framework as the most useful common link for the different emulation methods? This seems to suggest the Koopman operator is not the most useful simplification of the climate system.

Minor technical:
42: “Impulse response (response/Green’s function) methods” this wording is confusing, how is “response” an example of “impulse response”?

Citation: https://doi.org/10.5194/egusphere-2025-3792-RC1
- AC1: 'Reply on RC1', Christopher Womack, 28 Oct 2025
  
  Please see the attached pdf file.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3792-AC1
RC2:
'Comment on egusphere-2025-3792', Anonymous Referee #2, 24 Sep 2025

My comments are attached.

Citation: https://doi.org/10.5194/egusphere-2025-3792-RC2
- AC2: 'Reply on RC2', Christopher Womack, 28 Oct 2025
  
  Please see the attached pdf file.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3792-AC2

Status: closed

RC1:
'Comment on egusphere-2025-3792', Anonymous Referee #1, 23 Sep 2025

The authors present a framework for comparing emulation techniques. They do so by showing the theoretical connections between several existing emulation methods and relating them to two types of linear operators. These operators are shown to explain the same information about the system, demonstrating a link among all methods considered. The authors then test these methods’ abilities to predict four forcing response scenarios in four simplified toy models of either the climate system or the Lorenz convection approximation. Response function methods outperform both pattern scaling and attempts to directly estimate the linear operator in these example tests. The discussion around modeled results in the various tests is thorough and the connections to a common set of linear operators will likely be useful when considering how different emulators might perform. I have experience with pattern scaling, FDT, and ridge regression (which is how the deconvolution method has been practically implemented), though less so with much of the emulator-specific background cited here. As such, I will limit my comments to how this work fits with understanding ESMs more broadly.

Specific comments:
My main comment covers the goal and applicability of this work. I understand that the intent of the paper is to establish a “framework”, by which the authors mean the ability to frame each of these emulators as a variation or simplification on the paired linear response operators Fokker-Planck/Koopman. What is less clear to me is how directly the link can be made to “sources of error in Earth System Model emulation”. Generally, I understand if this paper is laying the groundwork for ESM testing, but in that case I felt that the writing did not make that intention clear. As presented, it reads as offering a tool that is directly applicable to evaluating emulators with respect to ESMs. The tests get at particular challenges in ESMs: memory effects, hidden variables, noise, and nonlinearities. However, the reader does not see the actual interaction between these methods and errors in ESM emulation.

530: While the 2- and 3-box models are frequent approximations to the climate system, they lack many of the physical mechanisms that make the climate system difficult to model. The parameters in these models are fit to ESMs, so are themselves simplified estimates of the actual behavior. I felt that the link between ability to emulate these examples and the ability to emulate ESMs deserved more discussion. I would have found this conceptually more useful than the level of technical detail included for the linear operators and each emulation model in the main text.

846: “This framework currently relies on simple experiments, and further work is needed to determine if operator-based methods like EDMD can be practically realized to emulate nonlinear processes in full-scale climate models.”: this sentence to me suggests that the step of showing that this framework is useful for ESMs is left to future work. I can see that there is some value in being able to connect the different models through a common framework in the way the authors use it to diagnose differences in the toy model. This may be more in line with a proof of concept for the framework rather than demonstrating how the framework applies to ESMs. However, if the goal is for this framework to be used by others and applied to ESMs, this seems like an important step to include. This may just be a framing issue.

Figure 4: If the results suggest that directly estimating response operators is the most prone to error, does this challenge the response operator framework as the most useful common link for the different emulation methods? This seems to suggest the Koopman operator is not the most useful simplification of the climate system.

Minor technical:
42: “Impulse response (response/Green’s function) methods” this wording is confusing, how is “response” an example of “impulse response”?

Citation: https://doi.org/10.5194/egusphere-2025-3792-RC1
- AC1: 'Reply on RC1', Christopher Womack, 28 Oct 2025
  
  Please see the attached pdf file.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3792-AC1
RC2:
'Comment on egusphere-2025-3792', Anonymous Referee #2, 24 Sep 2025

My comments are attached.

Citation: https://doi.org/10.5194/egusphere-2025-3792-RC2
- AC2: 'Reply on RC2', Christopher Womack, 28 Oct 2025
  
  Please see the attached pdf file.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3792-AC2

Christopher B. Womack, Glenn Flierl, Shahine Bouabid, Andre N. Souza, Paolo Giani, Sebastian D. Eastham, and Noelle E. Selin

Viewed

Total article views: 1,519 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,234	248	37	1,519	41	38

HTML: 1,234
PDF: 248
XML: 37
Total: 1,519
BibTeX: 41
EndNote: 38

Views and downloads (calculated since 19 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	219	58	4	281
Sep 2025	812	22	8	842
Oct 2025	104	44	14	162
Nov 2025	54	60	5	119
Dec 2025	35	49	6	90
Jan 2026	10	15	0	25

Cumulative views and downloads (calculated since 19 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	219	58	4	281
Sep 2025	812	22	8	842
Oct 2025	104	44	14	162
Nov 2025	54	60	5	119
Dec 2025	35	49	6	90
Jan 2026	10	15	0	25

Viewed (geographical distribution)

Total article views: 1,509 (including HTML, PDF, and XML) Thereof 1,509 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 09 Jan 2026

Short summary

Climate emulators allow for rapid projections without the computational costs associated with full-scale climate models. Here, we outline a framework to compare a variety of emulation techniques both theoretically and practically through a series of stress tests that expose common sources of emulator error. Our results help clarify which emulators are best suited for different tasks and show how future climate scenarios can be used to support emulator design.


Total:	0
HTML:	0
PDF:	0
XML:	0