A Multi-Criteria Framework for CORDEX-CORE2 GCM Selection

Ashfaq, Moetasim; Coppola, Erika; Lennard, Chris; Teichmann, Claas; Rastogi, Deeksha; Massoud, Elias; Buonomo, Erasmo; Im, Eun-Soon; Zittis, George; Evans, Jason P.; Fernandez, Jesus; Evans, Katherine J.; Silva, Maria Leidinice da; Adinolfi, Marianna; Bukovsky, Melissa; Rocha, Rosmeri Porfirio da; Hasson, Shabeh ul; Solman, Silvina A.; Sobolowski, Stefan; Das, Sushant; Brands, Swen; Cavazos, Tereza; Ngo-Duc, Thanh; Gao, Xuejie

doi:10.5194/egusphere-2026-2649

Preprints

https://doi.org/10.5194/egusphere-2026-2649

Preprints

21 May 2026

| 21 May 2026

A Multi-Criteria Framework for CORDEX-CORE2 GCM Selection

Moetasim Ashfaq, Erika Coppola, Chris Lennard, Claas Teichmann, Deeksha Rastogi, Elias Massoud, Erasmo Buonomo, Eun-Soon Im, George Zittis, Jason P. Evans, Jesus Fernandez, Katherine J. Evans, Maria Leidinice da Silva, Marianna Adinolfi, Melissa Bukovsky, Rosmeri Porfirio da Rocha, Shabeh ul Hasson, Silvina A. Solman, Stefan Sobolowski, Sushant Das, Swen Brands, Tereza Cavazos, Thanh Ngo-Duc, and Xuejie Gao

Abstract. We present a structured multi-criteria framework for the sub-selection of CMIP6 global climate models (GCMs) to support CORDEX-CORE2 dynamical downscaling. The framework integrates five key criteria: historical performance, model independence, regional temperature sensitivity, precipitation spread, and data availability, and is designed to identify a single, consistent subset of GCMs across all CORDEX domains to improve the comparability and interpretability of regional projections. A total of 45 GCMs are evaluated over the historical period (1981–2014), with 31 models further assessed for projected changes over 2015–2100. Application of the framework shows that model performance is systematically higher for large-scale circulation and thermodynamic fields than for precipitation seasonality and monsoon-related processes, which remain a dominant source of uncertainty across regions. Despite the diversity of climates represented across CORDEX domains, model rankings are broadly consistent, with top-performing models exhibiting stable performance across both tropical and extratropical regions, while lower-ranked models show more pervasive deficiencies rather than region-specific weaknesses. Sensitivity analyses demonstrate that rankings are largely insensitive to the choice of aggregation method but depend strongly on the breadth of evaluation metrics, with robust and reproducible rankings emerging only when a large fraction of the full metric suite is retained. Assessment of model independence reveals substantial clustering within the ensemble, indicating that many models share similar performance characteristics, while a smaller subset provides distinct and complementary information. Regional temperature sensitivity exhibits a coherent ordering across domains, suggesting that differences in projected warming are primarily governed by intrinsic model characteristics rather than region-specific effects. In contrast, precipitation spread shows strong regional variability, with both the magnitude and temporal structure of precipitation change differing widely across models. The relationship between precipitation and warming further highlights that, in some regions, precipitation responses scale with temperature, while in others they are dominated by circulation variability. By combining these criteria with data availability constraints, the framework identifies a reduced set of models that retains key aspects of performance, diversity, and projected change. This approach provides a transparent and reproducible basis for GCM selection within CORDEX-CORE2 and offers a generalizable strategy for coordinated regional climate modeling efforts.

How to cite. Ashfaq, M., Coppola, E., Lennard, C., Teichmann, C., Rastogi, D., Massoud, E., Buonomo, E., Im, E.-S., Zittis, G., Evans, J. P., Fernandez, J., Evans, K. J., Silva, M. L. D., Adinolfi, M., Bukovsky, M., Rocha, R. P. D., Hasson, S. U., Solman, S. A., Sobolowski, S., Das, S., Brands, S., Cavazos, T., Ngo-Duc, T., and Gao, X.: A Multi-Criteria Framework for CORDEX-CORE2 GCM Selection, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2026-2649, 2026.

Received: 08 May 2026 – Discussion started: 21 May 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Status: final response (author comments only)

CEC1:
'Comment on egusphere-2026-2649 - No compliance with the policy of the journal', Juan Antonio Añel, 21 Jun 2026

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
First, for your work you use multiple models, and in the Code and Data Availability section of your manuscript you do not provide a repository containing their code. Also, for the analysis code you state that "is currently being prepared for archiving in a public repository and will be made openly available upon publication of the manuscript." Due to these issues your manuscript should have never been accepted for Discussions or peer-review in GMD. The policy of the journal is clear regarding the fact that all the code and data necessary to perform and replicate the work presented in a manuscript must be published openly and without restrictions before submitting a manuscript to the journal.
Moreover, you cite the Copernicus Climate Data Store to access the ERA5 data, which we can not accept as a repository for the data. You must store in a repository that we can accept the specific ERA5 data that you have used in your work.
The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
Later, if the Topical Editor decides to continue with the review or publication process of your manuscript and you are requested to upload a new version of it, then The 'Code and Data Availability’ section of your manuscript must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2026-2649-CEC1
- AC1:
  'Reply on CEC1', Moetasim Ashfaq, 22 Jun 2026
  Dear Dr. Añel,
  Thank you for bringing this issue to our attention and for clarifying the requirements of the GMD Code and Data Policy. We take the matter seriously and are treating it as our immediate priority.
  We would first like to clarify the role of the CMIP6 models in our study. Our work does not involve developing, modifying, or running any CMIP6 climate model code. We use publicly available CMIP6 model output only for evaluation and analysis. The CMIP6 source codes are therefore not part of the workflow needed to reproduce our analyses. The original modeling centers maintain and distribute these codes through their own channels. We will, however, provide full citations and dataset identifiers (DOIs/version information from the ESGF records) for the exact CMIP6 output we analyzed. This ensures the provenance of these data is fully traceable.
  We acknowledge that our analysis code and generated datasets were not archived in a public repository at the time of submission. We recognize that this does not comply with the policy. To address this, we will deposit the following in a GMD-accepted repository (e.g. Zenodo):
  The analysis code used in this study;
  
  The datasets generated and used for the analysis, including the specific ERA5 subset used; and
  
  Full citations and permanent identifiers (DOIs/version information) for the specific CMIP6 model output analyzed.
  
  We expect to complete this within this week. We will reply to this comment with the repository links and DOIs as soon as they are available.
  We apologize for this oversight. We appreciate the opportunity to bring the manuscript into full compliance with the journal's requirements.
  Sincerely,
  
  Moetasim Ashfaq
  
  Corresponding author
  
  Citation: https://doi.org/10.5194/egusphere-2026-2649-AC1
RC1:
'Comment on egusphere-2026-2649', Anonymous Referee #1, 24 Jun 2026
This study develops a multi-criteria GCM selection framework across domains for CORDEX-CORE2 dynamical downscaling. The work systematically evaluates CMIP6 models, establishes an evaluation indicator system adapted to regional climate characteristics, and integrates future projections under the SSP3-7.0 scenario. This study provides transparent and reproducible standardized methodological to select GCM candidate pool that balances simulation credibility, ensemble diversity, and practical feasibility. Undoubtedly, this is valuable research with solid methodological design and clear application potential. However, given the complexity of the framework, explanations for some key processes remain insufficient, and the presentation of several results can be further improved. I recommend the following revisions before publication:
Line 144 states that the framework is built on the five criteria proposed by Sobolowski et al. (2025). Please explain why these five dimensions were selected as screening criteria, and why other potentially important dimensions (e.g., extreme climate simulation capability, model resolution, initial-condition ensemble members) were not included.

What is the rationale for selecting the evaluation metrics listed in Table 2? Why are only partial metric categories listed in Line 192?

The African domain includes the largest number across all domains, which can be attributed to its “diverse climate regime” in L240. However, an alternative interpretation is that limited research or data availability in this region has not yet supported the condensation of more efficient indicators. Could the regional disparity in metric count indirectly undermine the fairness of cross-domain rankings?

All current evaluation metrics are based on monthly mean states, with no extreme climate indicators included. For end users of climate impact research, the ability to simulate extreme events is a core requirement. It is recommended to add usage guidance for this user group, or may be explicitly acknowledge this setting as a limitation.

Why was the exponential penalty function chosen for the normalization of bias and RMSE? Will it artificially compress performance differences among high-scoring models? Additionally, for models with comparable total scores, does this scoring criterion favor models with balanced overall performance over those with outstanding performance in specific aspects?

Most general evaluation metrics adopt standard JJA, DJF for regional consistency. But for regions where wet/dry seasons are misaligned (like some tropical regions), the representativeness may be limited.

In Figure 2b, the z-axis values and color mapping appear to represent the same thing, could be redundant. Why not simplify the plot to a 2D scatter plot with color only, which would be more readable than a 3D.

Why were these four aggregation methods tested? Why were other rank aggregation methods (e.g., Borda count, Copeland method) not adopted? And what are the statistical advantages and disadvantages of Method 2 and the default Method 3?

It is recommended to add a technical flowchart in the Methods section, clearly illustrating the full calculation workflow from single-metric scoring, indicator weight calculation, to aggregation.

"Figure 3b" referenced in Line 426 does not appear in the text. Please check and correct it.

The design of Figure 4 is well conceived, but the comprehensive scores of regions other than NAM are relatively close, making it hard to compare. It is recommended to sort regions clockwise by their medium or mean scores. In addition to the median line, a dashed line for the mean value may provide richer statistical information.

Metric weights across regions show clear common patterns in Figure 5. For example, PR amplitude and seasonality have low weights in all domains. Then it might be helpful to group metrics with similar physical meanings together, to help readers compare across regions.

Instead of general good models, it could be helpful to additionally identify "region-specialized" models that perform outstandingly in a single domain but have lower global rankings. This would be highly valuable for users focusing only on a single region.

What is the quantitative basis for the claim in Line 528 that some models show high performance inconsistency across regions? It is recommended to add a column of SD or range in the last column of Figure 7 to visually represent the “inconsistency”. In addition, CNRM-CM6-1 shows large performance variations both across aggregation methods and regions, does it carry clear physical implications?

The figure reference in Line 544 appears to be incorrect. please verify and correct.

The threshold stated in Line 570 is “75%”, while the corresponding figure shows “70%”. Please make this consistent.

The study concludes that more than 75% of metrics are required to obtain stable rankings. Is it possible to identify a few metrics that can yield approximately consistent stable rankings?

What is the rationale for setting the threshold at 3%?

Line 649 describes the warming trajectory of some models as "steady", what quantitative analysis supports this conclusion? If it is based on Figure 11b, it is recommended to cite this figure when it is first introduced, rather than deferring the reference to Line 666. Similarly, is the "trend" mentioned in Line 715 supported by quantitative calculation? Maybe it’s helpful to add relevant statistics directly to Figure 11.

The WDI is mathematically equivalent to the sum, but this equivalence is not explicitly stated in the paper. This may lead readers to mistakenly regard WDI as an entirely new independent diagnostic. In addition, why choose WDI instead of using the sum or even the mean as in the previous temperature analysis?

The physical mechanism findings from Section 3.7 should be translated into more actionable guidelines. For example, for thermodynamically dominated regions such as EAS and WAS, covering low, medium, and high-warming tiers in the selected ensemble may naturally capture the main uncertainty in precipitation. But for circulation-dominated regions, the diversity of precipitation responses needs to be preserved separately.

The two leftmost columns of Figure 14 are redundant. It can be simplified as models in the upper half can be merged into a single gray branch, while models in the lower half flow into different warming groups. The color of the wet/dry category labels on the far right is not distinguishable, and streamlines already present their categories by color, not sure if labels are still necessary.

There still should be a concluding section or even paragraph at the end of the paper, simply summarizing the GCM selection into clear step-by-step guidance. Even though a single fixed recommendation list is not required, an example of a standard subset would be greatly helpful.
Citation: https://doi.org/10.5194/egusphere-2026-2649-RC1
RC2:
'Comment on egusphere-2026-2649', Subimal Ghosh, 26 Jun 2026
This manuscript presents a timely, comprehensive, and transparent framework for selecting CMIP6 GCMs for CORDEX-CORE2 dynamical downscaling. While the individual evaluation criteria are not novel, their systematic integration into a reproducible multi-criteria framework is a significant contribution. The rigorous assessment of model performance, independence, metric weighting, and ranking sensitivity makes this work particularly valuable ahead of IPCC AR7. The manuscript is technically sound, well-organised, and clearly written. I particularly commend the authors for their transparent evaluation strategy, which provides an objective and robust foundation for future coordinated regional climate modeling efforts. I suggest minor revision following some discussions on the following points:
Process-based evaluation: Consider briefly discussing if future work on this framework could incorporate process-based evaluation (e.g., monsoon dynamics, land–atmosphere coupling, ENSO, storm tracks) in addition to reproducing observed climatology.

"Right evaluation for the right reason": A brief acknowledgement (if true) that good historical performance may sometimes arise from compensating errors or unrealistic process representations.

Historical performance versus future credibility: It would be useful to note that strong historical performance does not necessarily guarantee more credible future projections, as future responses also depend on the representation of climate feedbacks and other emergent processes.

Emergent constraints: The authors may briefly comment on whether future work on this framework could benefit from emergent constraints or other physically based approaches linking present-day model performance to future climate projections.
Citation: https://doi.org/10.5194/egusphere-2026-2649-RC2

Viewed

Total article views: 352 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
218	121	13	352	12	13

HTML: 218
PDF: 121
XML: 13
Total: 352
BibTeX: 12
EndNote: 13

Views and downloads (calculated since 21 May 2026)

Month	HTML	PDF	XML	Total
May 2026	151	54	8	213
Jun 2026	41	33	3	77
Jul 2026	26	34	2	62

Cumulative views and downloads (calculated since 21 May 2026)

Month	HTML	PDF	XML	Total
May 2026	151	54	8	213
Jun 2026	41	33	3	77
Jul 2026	26	34	2	62

Viewed (geographical distribution)

Total article views: 327 (including HTML, PDF, and XML) Thereof 327 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 22 Jul 2026

Short summary

Producing reliable regional climate projections requires carefully selecting which global climate models drive them. We developed a transparent and reproducible framework to identify a balanced, representative subset of models for the Coordinated Regional Climate Downscaling Experiment, a major international effort to generate high-resolution climate projections worldwide. While designed for this initiative, the framework is broadly applicable to other models sub-selection efforts.


Total:	0
HTML:	0
PDF:	0
XML:	0