Preprints
https://doi.org/10.48550/arXiv.2604.06567
https://doi.org/10.48550/arXiv.2604.06567
19 Jun 2026
 | 19 Jun 2026
Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

A PMP-inspired Evaluation Framework for Assessing Deep-Learning Earth System Models

Giuliana Pallotta, Shiheng Duan, Celine Bonfils, Jiwoo Lee, Seth Goodnight, and Paul Ullrich

Abstract. In recent years, Deep-Learning Earth System Models (DL-ESMs) have emerged as promising, computationally efficient complements to traditional Earth system models. Here, we present an evaluation framework for testing DL-ESMs from an Earth system model-development perspective using standardized diagnostics from the PCMDI Metrics Package (PMP). This framework allows DL-ESMs, including Ai2’s ACE2 and Google’s NeuralGCM, to be assessed with metrics that quantify their ability to reproduce climatology, major modes of variability, monsoon behavior, and precipitation variability relative to observational reference datasets and CMIP-class benchmarks. By evaluating DL-ESMs with tools commonly used for traditional models, we extend their assessment beyond short-range forecast skill and toward longer Earth system–relevant applications. The results identify encouraging strengths in several large-scale fields and modes of variability while also highlighting persistent challenges in precipitation, tropical variability, and long-run stability for some model versions. This evaluation is a critical step toward building trust in DL-ESMs, guiding future model development, and clarifying their fitness for Earth system science applications.

Share
Giuliana Pallotta, Shiheng Duan, Celine Bonfils, Jiwoo Lee, Seth Goodnight, and Paul Ullrich

Status: open (until 14 Aug 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Giuliana Pallotta, Shiheng Duan, Celine Bonfils, Jiwoo Lee, Seth Goodnight, and Paul Ullrich
Giuliana Pallotta, Shiheng Duan, Celine Bonfils, Jiwoo Lee, Seth Goodnight, and Paul Ullrich
Metrics will be available soon.
Latest update: 20 Jun 2026
Download
Short summary
Deep-learning Earth system models (DL-ESMs) offer a computationally efficient alternative to traditional Earth system models, but their suitability for Earth system applications requires rigorous evaluation. We apply the PCMDI Metrics Package to assess ACE2 and NeuralGCM using climatology, variability, monsoon, and precipitation diagnostics, providing one of the first systematic Earth system model evaluations of DL-ESMs.
Share