Preprints
https://doi.org/10.48550/arXiv.2605.06944
https://doi.org/10.48550/arXiv.2605.06944
01 Jun 2026
 | 01 Jun 2026
Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

AIMIP Phase 1: systematic evaluations of AI weather and climate models

Brian Henn, Christopher S. Bretherton, Nikolay Koldunov, Christian Lessig, Maria J. Molina, Troy Arcomano, Oliver Watt-Meyer, Guillaume Couairon, Renu Singh, Robert Brunstein, Yana Hasson, Antonia Jost, Noah Brenowitz, Peter Manshausen, Nathaniel Cresswell-Clay, Dale Durran, Kyle Joseph Chen Hall, Janni Yuval, Dmitrii Kochkov, Stephan Hoyer, and Ignacio Lopez-Gomez

Abstract. We present the AI weather and climate model intercomparison project (AIMIP), phase 1. Drawing from the rich tradition of intercomparisons in climate model development, we specify a common experiment, output data format, and training constraints (namely, training against historical reanalysis data) for AIMIP Phase 1 models. We aim to identify differences in modeling frameworks and AI architectural choices that influence model behavior, and build trust in AI weather and climate models through open data and evaluation. AIMIP Phase 1 models must simulate the atmosphere given specified historical sea surface temperatures over 1979–2024. We evaluate the models' performance using five major evaluation criteria: biases, trends, response to El Niño-related sea surface temperature anomalies, temporal variability, and out-of-sample generalization tests. We find that the AI models are able to simulate the historical climate and response to forcing as well as a conventional physically-based model, but some AI models underestimate historical warming trends, and their predictions diverge in the out-of-sample generalization tests. We describe the AIMIP Phase 1 dataset that is publicly available for additional evaluations.

Share
Brian Henn, Christopher S. Bretherton, Nikolay Koldunov, Christian Lessig, Maria J. Molina, Troy Arcomano, Oliver Watt-Meyer, Guillaume Couairon, Renu Singh, Robert Brunstein, Yana Hasson, Antonia Jost, Noah Brenowitz, Peter Manshausen, Nathaniel Cresswell-Clay, Dale Durran, Kyle Joseph Chen Hall, Janni Yuval, Dmitrii Kochkov, Stephan Hoyer, and Ignacio Lopez-Gomez

Status: open (until 27 Jul 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Brian Henn, Christopher S. Bretherton, Nikolay Koldunov, Christian Lessig, Maria J. Molina, Troy Arcomano, Oliver Watt-Meyer, Guillaume Couairon, Renu Singh, Robert Brunstein, Yana Hasson, Antonia Jost, Noah Brenowitz, Peter Manshausen, Nathaniel Cresswell-Clay, Dale Durran, Kyle Joseph Chen Hall, Janni Yuval, Dmitrii Kochkov, Stephan Hoyer, and Ignacio Lopez-Gomez
Brian Henn, Christopher S. Bretherton, Nikolay Koldunov, Christian Lessig, Maria J. Molina, Troy Arcomano, Oliver Watt-Meyer, Guillaume Couairon, Renu Singh, Robert Brunstein, Yana Hasson, Antonia Jost, Noah Brenowitz, Peter Manshausen, Nathaniel Cresswell-Clay, Dale Durran, Kyle Joseph Chen Hall, Janni Yuval, Dmitrii Kochkov, Stephan Hoyer, and Ignacio Lopez-Gomez
Metrics will be available soon.
Latest update: 02 Jun 2026
Download
Short summary
AIMIP (AI Model Intercomparison Project) is a community effort to rigorously evaluate AI weather and climate models, which simulate Earth's climate with extraordinary efficiency compared to traditional systems. Phase 1 is an atmosphere-only standardized experiment, showing that AI models are competitive on average historical patterns but may struggle with long-term warming trends and generalizing to unseen scenarios. The AIMIP Phase 1 dataset is publicly available for open model evaluation.
Share