A framework for assessing and understanding sources of error in Earth System Model emulation
Abstract. Full-scale Earth system models are too computationally expensive to keep pace with the growing demand for climate projections across a large range of emissions pathways. Climate emulators, reduced-order models that reproduce the output of full-scale models, are poised to fill this niche. However, the large number of emulation techniques available and lack of a comprehensive theoretical basis to understand their relative strengths and weaknesses compromises fundamental methodological comparisons. Here, we present a theoretical framework that connects disparate emulation techniques, using it to analyze sources of emulator error focusing on memory effects, hidden variables, system noise, and nonlinearities. This framework includes popular emulation techniques such as pattern scaling and response functions, relating them to less commonly used methods, such as Dynamic Mode Decomposition and the Fluctuation Dissipation Theorem (FDT). To support our theoretical contributions, we provide practical implementation details for each technique, evaluating performance across a series of experiments designed to highlight different potential sources of error. We find that response function-based emulators outperform other techniques, particularly pattern scaling, across all scenarios tested. We additionally outline potential advantages of incorporating statistical mechanics into climate emulation through the use of the FDT, though this technique requires greater computational resources and non-standard scenarios for training. Results highlight the relative utility of each technique discussed, along with the importance of designing future scenarios for Earth system models with emulation in mind, suggesting that large-ensemble experiments utilizing the FDT could benefit climate modeling and impacts communities.
The authors present a framework for comparing emulation techniques. They do so by showing the theoretical connections between several existing emulation methods and relating them to two types of linear operators. These operators are shown to explain the same information about the system, demonstrating a link among all methods considered. The authors then test these methods’ abilities to predict four forcing response scenarios in four simplified toy models of either the climate system or the Lorenz convection approximation. Response function methods outperform both pattern scaling and attempts to directly estimate the linear operator in these example tests. The discussion around modeled results in the various tests is thorough and the connections to a common set of linear operators will likely be useful when considering how different emulators might perform. I have experience with pattern scaling, FDT, and ridge regression (which is how the deconvolution method has been practically implemented), though less so with much of the emulator-specific background cited here. As such, I will limit my comments to how this work fits with understanding ESMs more broadly.
Specific comments:
My main comment covers the goal and applicability of this work. I understand that the intent of the paper is to establish a “framework”, by which the authors mean the ability to frame each of these emulators as a variation or simplification on the paired linear response operators Fokker-Planck/Koopman. What is less clear to me is how directly the link can be made to “sources of error in Earth System Model emulation”. Generally, I understand if this paper is laying the groundwork for ESM testing, but in that case I felt that the writing did not make that intention clear. As presented, it reads as offering a tool that is directly applicable to evaluating emulators with respect to ESMs. The tests get at particular challenges in ESMs: memory effects, hidden variables, noise, and nonlinearities. However, the reader does not see the actual interaction between these methods and errors in ESM emulation.
530: While the 2- and 3-box models are frequent approximations to the climate system, they lack many of the physical mechanisms that make the climate system difficult to model. The parameters in these models are fit to ESMs, so are themselves simplified estimates of the actual behavior. I felt that the link between ability to emulate these examples and the ability to emulate ESMs deserved more discussion. I would have found this conceptually more useful than the level of technical detail included for the linear operators and each emulation model in the main text.
846: “This framework currently relies on simple experiments, and further work is needed to determine if operator-based methods like EDMD can be practically realized to emulate nonlinear processes in full-scale climate models.”: this sentence to me suggests that the step of showing that this framework is useful for ESMs is left to future work. I can see that there is some value in being able to connect the different models through a common framework in the way the authors use it to diagnose differences in the toy model. This may be more in line with a proof of concept for the framework rather than demonstrating how the framework applies to ESMs. However, if the goal is for this framework to be used by others and applied to ESMs, this seems like an important step to include. This may just be a framing issue.
Figure 4: If the results suggest that directly estimating response operators is the most prone to error, does this challenge the response operator framework as the most useful common link for the different emulation methods? This seems to suggest the Koopman operator is not the most useful simplification of the climate system.
Minor technical:
42: “Impulse response (response/Green’s function) methods” this wording is confusing, how is “response” an example of “impulse response”?