the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
New classes of climate model emulators to improve paleoclimate reconstructions
Abstract. Reconstructing spatial climate variability from proxy records requires forward models “emulators” that capture the dynamical structure of the climate system while remaining computationally efficient. Traditional emulators based on Empirical Orthogonal Functions and Linear Inverse Models (LIMs) face inherent limitations due to linearity and variance-based dimensionality reduction. Here we develop and evaluate a hierarchy of CMIP-class climate model emulators that integrate autoencoder-based dimensionality reduction with nonlinear prediction architectures, including Reservoir Computing (RC) and Recurrent Neural Networks (RNNs). Using a comprehensive experimental protocol applied to the IPSL-CM6A-LR model and 52 CMIP6s models, we show that combining autoencoder and RC (AERCn) provides the most robust performance across time scales and dynamical regimes when data are plentiful. The AERCn configuration captures nonlinear features of ENSO and Atlantic Multidecadal Variability, maintains high spatial reconstruction skill, and generalizes across distinct climate model structures. When training data are scarce, a multimodel pre-trained AERNN provides a data-efficient and competitive alternative. These properties make the proposed architectures particularly well suited for integration into Particle Filters and Ensemble Kalman Filter PDA frameworks. Our results highlight the importance of predictability-oriented dimensionality reduction and nonlinear dynamical memory for emulator design, and they provide a scalable path toward improved reconstructions of climate variability over the Common Era.
- Preprint
(7592 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2026-1337', Anonymous Referee #1, 25 Apr 2026
-
RC2: 'Comment on egusphere-2026-1337', Anonymous Referee #2, 27 May 2026
Review of the manuscript "New classes of climate model emulators to improve paleoclimate reconstructions"
by Auguste Gaudin and Myriam Khodri
submitted for publication in Geophysical Model Development
General
Authors use different setups with classical and novel techniques related to emulators for reconstruction of climate indices for past periods, specifically related to ENSO. For the setup of the emulators the CMIP6 suite of Earth System models covering the past two millennia is used with a specific focus on the IPSL model. The manuscript is very clear and concisely written and the methodological steps are described in detail, including formal descriptions for reproduction of results.
I suggest publication of the manuscript after some modifications and clarifications listed further below are addressed.
Specific
Title
A conceptual question is, whether the study presents a full climate model emulator. In the present form statistical models are presented for key parameters or indices of climate modes of variability. In itself this is very important but should somehow be reflected in the title.
Abstract
The abstract is well written concerning the main methodological issues. Still it would be helpful to provide more concrete information on how the emulator outperforms traditional concepts and some critical comments of the new emulators.
1 Introduction
ll 33ff: The authors should already mention here the two conceptually different approaches within the data assimilation process. The online approach that is typically used for present-day applications and the offline used for paleo applications.
l. 75: It is true that the individual EOFs do not necessarily represent physical meaningful processes. However, in the combined use the EOFs (together with their principal components) still represent a very large part of the full state-space vector, also including potential non linear effects. Therefore one should point to the fact that it is important to know how the EOF concept is introduced and used within a prediction (or reconstruction) approach.
2 Emulator building blocks
l. 131 The term “filtering” is misleading. In fact the Eigenvectors are just a re-ordering of the original covariance matrix related to variance. The truncation – and hence what is defined as “noise” – is a somewhat subjective decision. For instances, in cases where it s important that original fields can be re-constructed just based on the EOFs, the addition of higher indexed (and more “noisy”) EOFs is very important. This allows a more realistic representation of the original field, being important for the generation and prediction of extreme events.
l. 121: Usually the Eigenvectors are not based on standardized variables. In the standard approach area-weighted anomaly fields are used. Using standardized fields is only be applied when different variables/units are used in a joint EOF analysis. Otherwise the spatial EOF patterns might be substantially different, because all grid points show the same amount of variance by definition in the normalized case. In the case standardized variables are used for the calculation of the covariance matrix an explanation should be provided why variables are standardized prior to the calculation of EOFs.
l. 132: Here it should be noted whether a spatial or temporal prediction is meant. Again, using EOFs still can also reproduce non-linear effects, depending on the algorithm that is eventually used for prediction (e.g. Analog method (Zorita and von Storch, 1998) or any other non-linear method). The real challenge is how the information contained within the EOFs is used for setting up such a model and if there is any predictive skill included in the temporal structure (i.e. red noise or low-frequency variability).
This difference should be better elaborated: For the temporal prediction it is not the (spatial) EOF or eigenvectors that are preventing a better prediction per se. It is rather the method that is used for temporal training/validation and eventual prediction of the process or variable under consideration. EOFs in this context are usually used for spatial dimensionality reduction rather than for temporal.
l. 145: The authors state that for AE the temporal information is already implicitly used for setting up a prediction model using temporal information. This is not the case in the standard EOF. A way EOFs could be used in this way is so called “Extended EOFs” where a sequence of EOF patterns might be used to setup an improved prediction scheme.
l. 169: Using white noise as stochastic component is of course an option, especially for atmospheric variables with only little or no memory. Is it possible also to include other noise terms and when would it be advisable ? Maybe the algorithm can use some a-priori information based on the training data (in this case tas ?)
4 Datasets and experimental protocol
l 262: Since ENSO is a target variable of the manuscript I suggest to include a figure with the frequency spectrum of observed ENSO (Based on ERA/NCEP or similar) together with the one of IPSL including the projection of the Nino3 index on the tropical SSTs.
l. 304: Could you state how results might deviate using other sources of noise (e.g red Noise with different memory ?)
ll 345 ff: Authors could also use the Brier skill score (Wilks, 2010) including both correlation and variance information
5.1 Dimension reduction
ll. 380 ff: The fact that the variance patterns show this specific structure might be related to the fact that the variables are normalized to unit variance before entering the dimension reduction. Patterns with original units might show a substantial different structure.
The comparison between EOF and AE could of course be carried out. I just wonder what is really learned given the (fundamental) different construction and interpretation of the individual results or patterns. I suggest to shorten the entire section and summarize the most important conclusion without being too speculative on potential (physical) connections.
5.2 GCM emulator deterministic prediction skills
ll 425 ff: The authors should also take into account potential effect of over fitting and the deterioration of performance skill when using a large amount of predictors. In this case it is advisable to use e.g. metrics related to the concept of the adjusted R2 (Heinzl and Mittlböck, 2002). The adjusted R2 takes into account the number of predictors used. The difference is between the original and the adjusted R2 will be especially large when predictors are co-linear, i.e. not statistically independent. On top, standard performance metrics related to stat. significance (e.g. using Monte Carlo or Bootstrap methods) should be provided.
Ll 487 ff: Again, authors should address the impact of increasing number of predictors on the real and potentially spurious increase in the performance metric.
ll 490 ff: I suggest to include also running (Pearson) correlations r with global level of statistical significance (Based on Monte Carlo or Bootstrap).
5.2.2. CMIP6-class models
ll 539ff: A hint here is to just estimate the lag(1) autocorrelation of the individual PC1/PC2 which also should be a good indication of the differences in between the different CMIP6 models over the tropical Pacific.
Ll 545ff: I was wondering if non-linearity and memory are orthogonal in a statistical sense or whether also overlaps exist in terms of commonalities.
6 Summary and Conclusions
ll 704 ff: The statement that the EOF based dimensionality reduction “prioritize variance rather than predictability” is in my opinion questionable. There is no contradiction between the two, because it s the setup of the statistical/emulator model using temporal evolution that renders the capabilities for prediction
Figures and Tables:
Table1: I suggest to be more specific which type of “model” is referred too. The study uses both, numerical CMIP6 type of Earth System Models together with statistical models/emulators for index reconstruction.
Figure1: Which variable is used as a basis ? – this should be mentioned in the Figure titles and Figure captions.
Table 2: Values for upper and lower confidence/statistical significance should be provided for correlations using bootstrap or Monte-Carlo methods. (I assume that with 1300 degrees of freedom the effect of serial correlations on the nominal level of significance can be ignored in this context).
Figure 4: Please also include confidence intervals for correlations for the running indices and I suggest to include a companion Figure for running correlations in the Appendix.
Figure 6 and 7: A short note in the caption helps to know what the lines using same color and with different train sizes represent (I assume they relate to the different CMIP6 models ???).
Figure 8. please re-scale size of the colorbar.
Additional References:
Heinzl, H. and M. Mittlböck (2002): Adjusted R2 Measures for the Inverse Gaussian Regression Model. Computational Statistics, 17, 525–544. https://doi.org/10.1007/s001800200125.
Wilks, D.S. (2010): Sampling distributions of the Brier score and Brier skill score under serial dependence. Q.J.R. Meteorol. Soc., 136: 2109-2118. https://doi.org/10.1002/qj.709
Zorita, E. and H. v. Storch (1999): The analog method as a simple statistical downscaling technique: comparison with more complicated methods. Journal of Climate 12, 2474-2489.
Citation: https://doi.org/10.5194/egusphere-2026-1337-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 504 | 356 | 37 | 897 | 29 | 38 |
- HTML: 504
- PDF: 356
- XML: 37
- Total: 897
- BibTeX: 29
- EndNote: 38
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This study analyzes the limitations of the LIM-EOF model in traditional paleoclimate reconstruction methods. To address these issues, an improved climate model simulator is proposed, targeting the structural deficiencies of the traditional method with three specific improvements. The study evaluates whether these innovative measures enhance the simulation accuracy of large-scale climate variability, improve predictive capability, and reduce simulation errors in extreme events. The research is detailed, thoroughly elaborating on the dimensionality reduction algorithm and prediction model, describing the architecture and implementation of the simulator, and using extensive datasets and climate models to evaluate its multifaceted performance. Simulations based on the CMIP6 model suite demonstrate that the proposed improvements significantly outperform the traditional LIM-EOF model in terms of prediction accuracy, dynamic performance, avoidance of error accumulation, etc. In addition to a detailed analysis of the advantages of these improvements, the study also discusses certain limitations of the simulator under specific conditions, while proposing directions for further research. The work contributes notably to enhancing the accuracy of paleoclimate reconstruction and holds strong exploratory significance for advancing various research methods in the field.
The following technical questions need your answers: