New framework for benchmarking decadal predictions leveraging the PCMDI Metric Package with interactive visualization

Choi, Jung; Lee, Jiwoo; Chang, Kristin; Ullrich, Paul A.; Gleckler, Peter J.; Jun, Sang-Yoon

doi:10.5194/egusphere-2026-958

Preprints

https://doi.org/10.5194/egusphere-2026-958

Preprints

04 Mar 2026

| 04 Mar 2026

New framework for benchmarking decadal predictions leveraging the PCMDI Metric Package with interactive visualization

Jung Choi, Jiwoo Lee, Kristin Chang, Paul A. Ullrich, Peter J. Gleckler, and Sang-Yoon Jun

Abstract. Reliable climate predictions across multiple timescales are increasingly critical as climate-related risks continue to rise. With the growing number and diversity of climate prediction systems, systematic intercomparison has become essential. Here, we present a comprehensive evaluation framework based on the PCMDI Metric Package to assess the performance of multiple decadal climate prediction systems. Unlike uninitialized simulations, initialized predictions exhibit bias and predictive skill that evolve with forecast lead time. To address this, we introduce (1) model-by-lead-time portrait plots, which efficiently summarize metrics of global temperature, precipitation, and Arctic/Antarctic sea-ice extent, and (2) an HTML-based interactive visualization platform that provides detailed regional and seasonal diagnostics of model bias, skill scores, and ensemble spread for each model and lead time. Comparisons with uninitialized simulations further quantify the relative impacts of initialization and external forcing on prediction skill. The proposed framework provides a scalable and transparent approach for multi-model climate prediction assessments and can be readily extended to a wide range of operational and research forecasting systems.

Received: 20 Feb 2026 – Discussion started: 04 Mar 2026

Competing interests: At least one of the (co-)authors is a member of the editorial board of Geoscientific Model Development.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Jung Choi, Jiwoo Lee, Kristin Chang, Paul A. Ullrich, Peter J. Gleckler, and Sang-Yoon Jun

Status: final response (author comments only)

RC1:
'Comment on egusphere-2026-958', Anonymous Referee #1, 31 Mar 2026

This manuscript presents a multi-model evaluation framework for initialized decadal climate prediction, implemented within the PCMDI Metrics Package, featuring two diagnostic tools: (1) a model-by-lead-time portrait plot and (2) an interactive HTML visualization platform, to assess how biases and prediction skill evolve with forecast lead time across temperature, precipitation, and sea ice. Application of the framework reveals that temperature biases drift toward each model's climatology over time, precipitation biases reflect systematic model physics errors, and sea-ice skill degrades rapidly with lead time while remaining closely coupled to temperature prediction quality. Overall, this work offers great practical value to the decadal predictability community; the publicly accessible diagnostics provide a useful reference that researchers can readily draw on for their own work. It is completely understandable that a single manuscript could not provide detailed mechanistic explanations for every systematic bias identified, and I expect this framework will serve as a springboard for many future studies in this area.
Minor comments:
1. The authors state in Lines 149–150 that forecasts are expected to drift toward the model's biased climatology as lead time increases. However, Figure 1 shows that tropical TAS biases in most models actually decrease with lead time rather than grow, which is somewhat counterintuitive, since one would expect small biases at short lead times that would then amplify as the forecast drifts toward the model climatology. Could the authors comment on this?
2. Line 160: It would be helpful to include a brief discussion of the physical reasons behind the tendency for models to exhibit a wet bias in the tropics and a dry bias in the mid-latitudes, rather than relying solely on the brief attribution to the summer ITCZ at Line 196.
3. Figure 1: Comparing TAS and PR results, TAS biases appear to evolve with lead time while PR biases remain largely constant across all initializations. A short comment on why precipitation biases are more stable with lead time than temperature biases would be great.

Citation: https://doi.org/10.5194/egusphere-2026-958-RC1
- AC1: 'Reply on RC1', Jiwoo Lee, 19 May 2026
  
  We appreciate the reviewer's comments. Please find the attached PDF for our responses.
  
  Citation: https://doi.org/10.5194/egusphere-2026-958-AC1
RC2:
'Comment on egusphere-2026-958', Anonymous Referee #2, 08 Apr 2026

This manuscript presents a framework and results from a new benchmarking tool for decadal prediction experiments that is based on the PCMDI metrics package. The new benchmarking tool fills a critical gap in the ecosystem of standardized benchmarking software by being engineered to handle aspects of prediction datasets, such as forecast lead time. With this tool, there is a significant opportunity to understand how initialization impacts seasonal prediction, as well as how quickly long-term mean biases in climate models emerge in shorter decadal-scale experiments. The initial application of the tool to DCPP experiments shows several interesting themes. Long-term mean biases in temperature and precipitation tend to emerge during the decadal simulations for the three primary climate variables considered in this study: surface air temperature, precipitation, and sea ice extent. Trend biases also emerge in decadal predictions in some key processes, including wintertime mid-latitude precipitation in the Northern Hemisphere. The decadal prediction models also struggle to represent seasonal trends in Antarctic sea ice extent, but the results of the benchmarking analysis suggest that biases are reduced at shorter timescales, benefiting from the model initialization. Overall, the results suggest that much more work is needed to improve the fidelity of precipitation simulation. The manuscript is well written, and the results demonstrate the novelty and utility of this new benchmarking approach. After some revision, this manuscript and benchmarking tools will be important contributions to the field.
Major Comments:
1. The manuscript is a bit light in some key areas for context. The second-to-last paragraph of the introduction falls short of demonstrating what software and approaches already exist, and why this approach is novel. I would recommend a more thorough review of past efforts and existing tools to better frame the new work. Similarly, benchmarking diagnostics differ distinctly from process-based diagnostics, where the latter are more suited towards understanding why the models perform the way that they do. This is an important caveat that needs to be covered in both the introduction and the conclusions. Benchmarking diagnostics are sometimes suggestive of possible reasons for biases in climate variables, but are not definitive.
2. The introduction mentions the importance of model drift in the introduction (third-to-last paragraph), but it is largely ignored throughout the rest of the manuscript. The methods section would benefit from a more in-depth discussion of model drifts in decadal simulations and how the benchmarking tool handles them.

3. Statistical significance is largely absent from the benchmarking metrics. The benchmarking tool would benefit greatly by incorporating measures of statistical significance, for example, by shading/stippling on portrait plots and maps.

Minor Comments:
Lines 104-105: The regional approach is fine, but the comment about regridding errors is curious. Ocean model grid remapping is not an intractable problem.
Section 2.4: This section on the display interface should be expanded. More details on how it is done, why it is important, and the design choices made would be helpful.
Figures 1,2,5,6,9,12,13: The “LY” labels might be better suited along the bottom of the plot. They are a bit lost with the plot titles.
Figure 10: It is difficult to see the yellow contours in these panels.

Citation: https://doi.org/10.5194/egusphere-2026-958-RC2
- AC2: 'Reply on RC2', Jiwoo Lee, 19 May 2026
  
  We appreciate the reviewer's comments. Please find the attached PDF for our responses.
  
  Citation: https://doi.org/10.5194/egusphere-2026-958-AC2

Jung Choi, Jiwoo Lee, Kristin Chang, Paul A. Ullrich, Peter J. Gleckler, and Sang-Yoon Jun

Viewed

Total article views: 1,186 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
701	419	66	1,186	74	94

HTML: 701
PDF: 419
XML: 66
Total: 1,186
BibTeX: 74
EndNote: 94

Views and downloads (calculated since 04 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	588	242	56	886
Apr 2026	78	98	6	182
May 2026	35	79	4	118

Cumulative views and downloads (calculated since 04 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	588	242	56	886
Apr 2026	78	98	6	182
May 2026	35	79	4	118

Viewed (geographical distribution)

Total article views: 1,181 (including HTML, PDF, and XML) Thereof 1,181 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 30 May 2026

Short summary

As climate risks grow, society needs reliable predictions for the coming years and decades. We developed a framework to collectively compare climate prediction systems and examine their performances on global temperature, rainfall, and sea ice. As a complementary to traditional analyses, our new framework offers tracking evolution of model performance in simulation time, helping scientists and stakeholders better understand strengths and limits of decadal climate prediction.


Total:	0
HTML:	0
PDF:	0
XML:	0