the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
ML-IAM v1.0: Emulating Integrated Assessment Models With Machine Learning
Abstract. Integrated Assessment Models (IAMs) are essential tools for projecting future environmental variables under diverse environmental, economic, and technological scenarios. However, their computational intensity limits accessibility and application scope. We present ML-IAM v1.0, the first machine learning emulator trained on the IPCC AR6 Scenarios Database to replicate IAM functionality across diverse model families. Our best-performing model, XGBoost, achieves an R² of 0.97 against original IAM data, outperforming the more complex models Long Short-Term Memory (LSTM) and Temporal Fusion Transformer (TFT). ML-IAM v1.0 generates results for 2,000 scenarios in 50 seconds and can produce predictions for any IAM family. This enables rapid exploration of climate scenarios, complementing traditional IAMs with efficient, scalable computation.
- Preprint
(3166 KB) - Metadata XML
-
Supplement
(5515 KB) - BibTeX
- EndNote
Status: open (until 06 Mar 2026)
- RC1: 'Comment on egusphere-2025-5305', Anonymous Referee #1, 23 Jan 2026 reply
-
CEC1: 'Comment on egusphere-2025-5305 - No compliance with the policy of the journal', Juan Antonio Añel, 11 Feb 2026
reply
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your data in a web page that does not comply with the requirements of the journal. Namely, for the AR6 data the Zenodo repository does not contain it, but links to an external site: "The data is available for download at the AR6 Scenario Explorer hosted by IIASA."
The GMD review process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends. Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
The 'Code and Data Availability’ section must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-5305-CEC1 -
CC1: 'Reply on CEC1', Haewon McJeon, 13 Feb 2026
reply
Dear Editor,
Thank you for bringing this issue to our attention. We have investigated the matter carefully.
The IPCC AR6 Scenario Database is not our own data; rather, it is third-party raw data we used to train our model. This database is published under the Creative Commons Attribution 4.0 International License by IIASA. However, IIASA's terms of use explicitly state that:
"... it is not permitted to republish (i.e., make available for download or otherwise distribute) a substantial portion (or the whole) of the scenario ensemble data without written permission from IIASA."
As our study uses the full AR6 Scenario Database, we are not permitted to re-host the data independently on Zenodo without written permission from the data owners, which would constitute a license violation.
While it is indeed unfortunate that the data is under restricted access at Zenodo, we would like to clarify that the AR6 Scenario Database is, in fact, fully accessible for direct download without registration requirements. Specifically, reviewers and readers can access and download the complete dataset by:
- Visiting: https://data.ece.iiasa.ac.at/ar6/#/downloads
- Selecting "login as guest" (no account registration required)
- Selecting “AR6_Scenarios_Database…” v1.1 files with the release year (email address needed to receive the download link)
Furthermore, the AR6 Scenario Database is a versioned, institutionally maintained dataset. The specific version used in our study (v1.1) is permanently identified and accessible at the above URL. We believe this constitutes a persistently accessible version of the data in the sense required by GMD policy, hosted by IIASA (International Institute for Applied Systems Analysis), a well-established international research institution with more than 50 year history providing long-term archival support.
The database is also formally cited via its Zenodo DOI (https://doi.org/10.5281/zenodo.7197970, Byers et al., 2022), which provides a persistent identifier for the exact version we used.
We propose to update the "Code and Data Availability" section as follows:
---
Code and data availability. The source code for ML-IAM v1.0 is permanently archived on Zenodo at https://doi.org/10.5281/zenodo.17390678 (Shin et al., 2025b). The supporting data files (base year mappings and input/output variable classifications) are archived separately at
https://doi.org/10.5281/zenodo.17390113 (Shin et al., 2025a). The code is also available on GitHub at https://github.com/YenShin1891/ml-iam.
The IPCC AR6 Scenario Database (Byers et al., 2022) is available for direct download at https://data.ece.iiasa.ac.at/ar6/#/downloads (requires email address; accessible with or without account creation). The Zenodo record at https://doi.org/10.5281/zenodo.7197970 (restricted access) provides the formal citation and DOI.
---
We believe this updated section, together with the explanation above, demonstrates that the AR6 Scenario Database is accessible to reviewers and the community in a manner consistent with GMD policy.
Please let us know if further clarification is needed.
Best regards,
Haewon McJeon
Associate Professor, Korea Advanced Institute of Science and Technology
Citation: https://doi.org/10.5194/egusphere-2025-5305-CC1 -
CEC2: 'Reply on CC1', Juan Antonio Añel, 13 Feb 2026
reply
Dear authors,
First, I would like to point that we can not accept the IIASA as a server to host the assets related to manuscripts submitted to GMD, as it is not a trusted place to comply with the scientific method. Namely, it does not appear to have a published policy for data preservation over many years or decades (some flexibility exists over the precise length of preservation, but the policy must exist). it is unfortunate that the authors of the mentioned dataset are not sharing it without restrictions, which compromises the compliance with the scientific method, the provenance of materials and replicability of works based on them. However, as you are not the direct authors of it and can not take any action to share the data, and you have already performed the work, we can consider the private Zenodo repository shared by the IIASA as enough. Please, in your manuscript remove any link to the IIASA site, as it does not serve the compliance with the scientific method of the work, and keep only the Zenodo repository.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-5305-CEC2 -
CC2: 'Reply on CEC2', Haewon McJeon, 13 Feb 2026
reply
Dear Editor,
Thank you for your understanding and for providing a path forward.
We will proceed as requested: removing all IIASA links from the manuscript and pointing exclusively to the Zenodo repository for the related assets.
Best regards,
Haewon McJeonCitation: https://doi.org/10.5194/egusphere-2025-5305-CC2
-
CC2: 'Reply on CEC2', Haewon McJeon, 13 Feb 2026
reply
-
CC1: 'Reply on CEC1', Haewon McJeon, 13 Feb 2026
reply
-
RC2: 'Comment on egusphere-2025-5305', Anonymous Referee #2, 28 Feb 2026
reply
Main comments
The paper presents an attempt to develop an emulator of Integrated Assessment Models (IAMs). As the authors wrote, while emulation is extensively used in climate science, its application to IAMs is relatively nascent. The paper is relevant, interesting, and has the potential to contribute to the literature; however, I have significant reservations and doubts.
IAM emulation is a rapidly evolving field, a topic currently being undertaken by several research groups. The authors failed to discuss Xiong et al. (2025), which appears to be the first systematic attempt to emulate multiple IAMs. The Xiong study relies on the ENGAGE dataset (Riahi et al., 2021), one of the richest IAM datasets included in the AR6 scenario database. The Xiong study directly addresses “the core challenge of predictive emulation across diverse IAM families” mentioned on Line 49.
I suggest that the authors integrate the Xiong study into the discussion (e.g., starting at Line 42) and provide a comparative analysis of their approach versus the Xiong approach, particularly regarding underlying data, methodology, target variables, performance, reliability, and potential applications. I also bring the authors’ attention to Xiong and Tanaka (2025), which further applies Xiong’s emIAM approach to scenario extension beyond 2100. Given these existing works, the authors must avoid overstating the novelty and articulate their unique contribution more clearly.
Another issue is that it is currently unclear to me how the authors’ ML-IAM can be effectively utilized in practice. The introduction focuses heavily on technical challenges and lacks scientific motivation. The authors claim several potential applications at the end of the paper. For example, “Researchers can now optimize for multiple targets simultaneously—such as achieving specific temperature goals while maximizing sustainable development outcomes—through grid searches across millions of parameter combinations that would be computationally prohibitive with traditional IAMs” (Line 268). On Line 274, the authors state “Future extensions could enable ML-IAM to learn specific dynamics from individual models—such as COVID-19 impacts or rapid technological transitions captured by some IAMs but not others—and propagate these patterns across model families, potentially enriching the scenario landscape beyond what any single IAM provides.” However, it is not obvious how the ML-IAM, given the input variables in Table A1, supports these goals.
IAMs typically produce a least-cost emission pathway for a given carbon budget by optimization. Can the proposed IAM emulator be used in the same way as the original IAMs? The authors are well positioned to demonstrate selected applications. I highly recommend including a few concrete examples of applications as a proof-of-concept. This would provide necessary evidence that the emulator works as intended, especially outside of the range of training data. Such demonstrations would enhance the case as the current work has only limited validations.
Finally, the paper is highly technical and does not seem to consider the journal’s broad audience. Many ML terminologies are introduced with citations but lacks conceptual explanations. I raise several such examples in my detailed comments, but my comments are far from exhaustive. I suggest the authors move highly technical specifications to the Appendix and provide clearer, intuitive explanations for non-specialists in the main text to ensure the work is accessible to the wider community.
Detailed comments
Line 27: With the exception of FaIR, the models mentioned here have a long history as “simple climate model” or “reduced-complexity climate model” (Romero-Prieto et al., 2026). They directly represent physical and biogeochemical processes, without relying on ML techniques. A clear distinction should be made between these physical emulators and the ML-based emulators discussed in this paper.
Line 42: See my major comments regarding the omission of relevant literature (Xiong et al., 2025).
Line 51: Also see my major comments on the need for scientific motivation in the introduction.
Line 65 This statement raises concerns about what the emulator actually captures, given that IAMs are highly diverse and behave differently. How does the emulator handle “inter-model spread” and “intra-model spread”?
Line 82: This work addresses different gases and regions, but not different sectors. Please provide a rationale for not including different sectors.
Line 85: I highly recommend moving Table A1 to the main paper, as input and output are essential information for describing the emulator.
Line 92: Please provide conceptual explanations for “mixed-effects modeling.”
Line 100: Harmonization typically influences near-term data (Gidden et al. 2019). Please explain how the harmonization effects were treated in the emulator?
Line 114: The definition of these terms can be made much earlier in the paper, at the first instance of the appearance of these terms.
Line 122: Please provide conceptual explanations for “tabular regression.”
Line 124: Many IAMs (e.g., REMIND-MAgPIE and MESSAGE) are optimization-based and produce pathways based on constraints such as a carbon budget. Please describe how the IAM emulator treats these optimization targets.
Line 141: This sub-section is highly technical (see my major comment).
Line 157: See my comment above.
Line 180: See my comment above.
Line 198: See my comment above.
Line 211: For better readability, Figure 3 should be moved to the section where it is actually discussed (not Section 2.4).
Line 229: Please define the “original IAM projections” in the Figure 4 caption. Which models or model families are presented in the figure? I suggest showing results for marker IAMs, such as REMIND-MAgPIE, IMAGE, and MESSAGE to more clearly demonstrate reproducibility.
Line 229: The orange shaded zone in the left panel shows a large discrepancy between the original and reproduced pathways. If I understood correctly, the original scenario shows a zero-emission scenario without net negative emissions, while the reproduced scenarios shows net negative emission scenarios (or vise versa). The emulator seems to confuse these two types of pathways. I wonder how the ML IAM distinguish between these two fundamentally different pathways, which can occur under the same remaining carbon budget.
Line 243: Please define “missingness indicators.”
Line 266: See my major comment on the practical utility of the ML IAM.
References
Gidden, M. J., Riahi, K., Smith, S. J., Fujimori, S., Luderer, G., Kriegler, E., . . . Takahashi, K. (2019). Global emissions pathways under different socioeconomic scenarios for use in CMIP6: a dataset of harmonized emissions trajectories through the end of the century. Geosci. Model Dev., 12(4), 1443-1475. doi:10.5194/gmd-12-1443-2019
Riahi, K., Bertram, C., Huppmann, D., Rogelj, J., Bosetti, V., Cabardos, A.-M., . . . Zakeri, B. (2021). Cost and attainability of meeting stringent climate targets without overshoot. Nature Climate Change, 11(12), 1063-1069. doi:10.1038/s41558-021-01215-2
Romero-Prieto, A., Mathison, C., & Smith, C. (2026). Review of climate simulation by Simple Climate Models. Geosci. Model Dev., 19(1), 115-165. doi:10.5194/gmd-19-115-2026
Xiong, W., Tanaka, K., Ciais, P., Johansson, D. J. A., & Lehtveer, M. (2025). emIAM v1.0: an emulator for integrated assessment models using marginal abatement cost curves. Geosci. Model Dev., 18(5), 1575-1612. doi:10.5194/gmd-18-1575-2025
Xiong, W., & Tanaka, K. (2025). Extending Integrated Assessment Model scenarios until 2150 using an emulation approach. arXiv (preprint), 2512.06026 doi:10.48550/arXiv.2512.06026
Citation: https://doi.org/10.5194/egusphere-2025-5305-RC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 628 | 213 | 30 | 871 | 64 | 18 | 17 |
- HTML: 628
- PDF: 213
- XML: 30
- Total: 871
- Supplement: 64
- BibTeX: 18
- EndNote: 17
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Summary
This paper presents ML-IAM v1.0, a machine learning emulator capable of replicating the outputs of diverse Integrated Assessment Models (IAMs). The authors compare three architectures and show that the tree-based XGBoost model achieves high accuracy while reducing computational runtimes from hours to seconds. This manuscript makes a solid contribution to the literature. It addresses a critical computational bottleneck in the field and provides a practical, open-source tool that facilitates rapid scenario exploration. The manuscript is well written, well structured, and uses a sound approach to separating exogenous and endogenous variables. Furthermore, the paper shares both data and code and provides extensive implementation details, improving reproducibility.
However, while the tool itself is valuable for the modeling community, the machine learning methodology used to benchmark the models is weak. The conclusion that tree-based models inherently outperform transformers and other models for this task is not fully supported by the experimental setup. The chosen experiments hinder the deep learning baselines through restrictive hyperparameters and simplistic imputation. The "failures" of the machine learning models should be framed as a result of specific configuration choices rather than an intrinsic incompatibility with IAM data.
Strengths
The primary strength of this work is its practical utility. Unlike previous emulators that focused on single models, this work successfully learns from 95 model families. The resulting tool allows for fast, model-agnostic scenario generation, which can be an important tool for researchers in the field. The inclusion of an interactive Emulation Viewer enhances the accessibility of the results. Additionally, the paper is highly reproducible. The separation of historical data from projections is handled rigorously, preventing the data leakage issues that often plague time-series emulation papers. Finally, the paper is very clearly written and follows a coherent structure.
Methodological Weaknesses and Limitations
Machine Learning Benchmarking
Despite the success of the XGBoost implementation, the machine learning benchmarking requires critical contextualization. From a machine learning perspective, the baselines are very limited and arbitrary. Specifically, the hyperparameter search space for the Temporal Fusion Transformer (TFT) listed in Table C1 is restrictive, exploring layer dimensions of only [16, 32, 64]. These layers are arguably too small to capture complex dependencies in a transformer architecture. Given the dataset characteristics, the transformer comparison offers limited generalizable insights for the ML community. Additionally, TFT received only 20 search iterations compared to 50 for XGBoost, with validation split rather than 5-fold cross-validation. Each hyperparameter sweep for each model is different and minimises a different metric. Consequently, the poor performance of the TFT is likely an artifact of this inconsistent configuration and limited search rather than any evidence that transformers are not suitable for IAM emulation. The manuscript should avoid broad claims that tree-based models outperform DL approaches in general, and instead clarify that they outperformed DL as configured in this specific study. Besides, non-DL algorithms could also be tested as baselines. I advise the authors to explicitly mention that this ML comparison is more illustrative than exhaustive and that, while XGBoost provides good results, significant additional research is needed to fully understand how ML can best be applied for IAM emulation and interpolation.
Furthermore, the data imputation strategy introduces a potential bias against the neural network models. While XGBoost handles sparsity natively, the authors employed variable-specific median imputation for the LSTM and TFT models. IAM scenarios may rely on distinct, internally consistent narratives where variables deviate intentionally from the median. Using median imputation likely suppresses the signal in these scenarios, disproportionately penalizing the neural networks, further weakening the claim that transformers are ill-suited for this task. This limitation should be explicitly acknowledged in the text as a confounding factor in the model comparison. Additional imputation techniques should have been studied.
Clarification of "Emulation" Scope
The paper positions ML-IAM as an "emulator" of IAMs, but the authors should more carefully distinguish what the model can and cannot do. Since ML-IAM is trained on existing IAM outputs, it is essentially interpolating within the space of scenarios already generated by the IAM community. The critical question is whether ML-IAM can generate meaningful predictions for scenario configurations that fall outside the training distribution. For example: novel policy combinations, extreme GDP trajectories, or technology cost assumptions not represented in the training data. The paper would benefit from explicit out-of-distribution testing to characterize generalization limits, or, at least, a clear statement that the emulator's validity is bounded by the scenario space present in the training data. Relatedly, the claim that the method generates predictions "for any IAM family" requires clarification: it can only produce predictions styled after IAM families present in the training data.
Interpretability Analysis
The analysis using SHAP values is somewhat over-interpreted. While SHAP is a useful tool for feature attribution, it is not a complete Explainable AI (XAI) framework, and attributions for deep networks can be unstable. The discussion implies causal relationships, but they may simply be correlations identified by the XAI tool. This analysis raises questions about what the model is actually learning: underlying climate-economy dynamics, or pattern-matching based on which IAMs report which variables? I advise for a more cautious interpretation of SHAP values, particularly those of deep learning methods. Complementary approaches like sensitivity analysis or partial dependence plots would provide more robust insights on what these models are learning. I request that the final manuscript at least acknowledges the limitations of using SHAP more candidly.
Physical and Structural Limitations
Physical Consistency
The evaluation relies primarily on correlation and RMSE metrics. However, IAM outputs must satisfy physical and economic constraints (e.g., energy balance, non-negative emissions for certain sectors, plausible relationships between GDP growth and energy demand). The manuscript does not report whether ML-IAM predictions satisfy basic physical constraints, whether there are scenarios where the emulator produces physically implausible outputs, or how predictions behave at trajectory endpoints where extrapolation errors may compound. I recommend adding analysis of physical consistency, perhaps including examples of failure cases. This is partly mentioned in future work (PINNs) but explicit analysis on this would benefit the paper.
Regional Independence
The decision to treat regions independently creates a model that ignores inter-regional interactions, such as trade flows, carbon leakage, and energy market equilibrium. While this assumption is necessary for computational tractability, it significantly limits the emulator's validity.
Uncertainty Quantification
The paper notes that ML-IAM enables uncertainty quantification via Monte Carlo sampling (line 267), but the emulator's own uncertainty is underexplored. ML-IAM provides point predictions without uncertainty estimates. For policy-relevant applications, users need to understand prediction confidence and separation between epistemic and aleatoric uncertainty. The authors should at least discuss how uncertainty could be incorporated in future work (e.g., quantile regression, ensemble approaches, or Bayesian methods).
Specific Comments and Technical Corrections
Recommendation: Accept subject to minor revisions, focused on significant textual clarifications of limitations. The revisions requested are primarily about removing claims regarding model comparison generality and explicitly stating the boundaries of the emulator's applicability.