the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Code accessibility and code quality across phases of the models of the Coupled Model Intercomparison Project
Abstract. This study extends previous research on CMIP5 models to investigate the reproducibility of climate models within the Coupled Model Intercomparison Project (CMIP). It evaluates the accessibility to the source code of the CMIP models through all their phases, emphasizing the need for public repositories to ensure transparency regarding model input, output, and usage rights, along with an analysis of licenses for compliance with scientific standards. A central focus of the research is the assessment of code quality against best practices. In addition, the study examines the historical evolution of computational and code quality across various phases of CMIP, highlighting progress and improving traceability to support scientific reproducibility. We provide valuable insights for future research, proposing solutions and tools designed to improve replicability and enhance project lifecycles that are applicable not only to CMIP but also to broader scientific contexts.
Competing interests: Juan A. Añel, co-author of this paper, is Executive Editor of the journal.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(1666 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-6108', Anonymous Referee #1, 24 Apr 2026
-
AC1: 'Reply on RC1', Michael García Rodríguez, 04 May 2026
We would like to thank the reviewer for the careful reading of the manuscript and for the constructive comments. We greatly appreciate the positive assessment of the relevance of this work for understanding the accessibility, reproducibility, and software practices of CMIP climate models. Below we respond to each point in detail.
Comment 1
We thank the reviewer for this comment and agree that further clarification is useful.
A detailed and comprehensive description of the metrics, methodology, and scoring system implemented in FortranAnalyser is provided in a dedicated article published in Software Impacts (García‑Rodríguez et al., 2024). In that work, the reader can find the full technical explanation of what FortranAnalyser evaluates, including the complete set of static metrics, how they are computed, and how they are aggregated into a final normalized score ranging from 0 to 10. For this reason, the present manuscript includes only a high-level description and cites that companion paper for methodological details.With regard to the interpretation of the score, we agree that the expression “relatively high” requires additional context. This wording is not meant to imply a high absolute level of software quality in a general software engineering sense. Rather, it is used strictly in a comparative, empirical sense, relative to the distribution of scores obtained for the set of accessible CMIP models analyzed in this study.
Specifically, although the theoretical scale of FortranAnalyser spans from 0 to 10, the scores obtained for CMIP models fall within a much narrower empirical range between 2.7 and 4.3, despite covering several CMIP phases. Within this observed range, a value of 4.014 lies in the upper part of the distribution and is therefore relatively high with respect to other operational CMIP models, including many from more recent phases. At the same time, the fact that no model approaches even half of the theoretical maximum score underlines one of the central conclusions of the study: namely, that there remains substantial room for improvement in the structural quality and maintainability of climate model code, even among the best-performing cases.
We will revise the manuscript to better emphasize this relative interpretation and to explicitly clarify that the FortranAnalyser score should not be understood as an absolute measure of software excellence.Comment 2
We thank the reviewer for pointing this out. The overlap in the annexed table is a formatting issue introduced during the compilation of the manuscript. We will carefully revise the layout and ensure that all tables are correctly rendered and fully legible in the revised version.Comment 3
We appreciate the reviewer’s attention to this detail. We will revise the references section and ensure that all “last accessed” notes are consistently written in English, in accordance with the journal’s language standards.Once again, we thank the reviewer for the constructive feedback, which will help us improve the clarity and presentation of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2025-6108-AC1
-
AC1: 'Reply on RC1', Michael García Rodríguez, 04 May 2026
-
RC2: 'Comment on egusphere-2025-6108', Anonymous Referee #2, 20 May 2026
This manuscript introduces a systematic evaluation of code accessibility and code quality across the CMIP phases. The authors tried to access the code for as many climate models as possible that participated in all past CMIP phases. After discussing the code accessibility, programming languages and code quality were examined. This study is very useful to explain the status quo, but also to identify needs for future CMIP phases.
I have three comments:
- At the beginning of Section 3 (L.89/99), I am confused about the different numbers. First, it is written about "59 out of the 262 models", and then "18 out of 121". What is the difference? Are the second numbers only about the subset of coupled models? But then in L.114, there are "59 out of the 121 couples models" mentioned. How is this connected? Maybe you can make that clearer.
- In Section 4 about the programming languages, it would be very interesting to add an outlook or expectation about changes when hybrid models are added to the pool of CMIP models.
- In Section 5, it would be helpful to have some examples of lacks in the code quality which are found by the analysis tool. And which parts or topics should be focused on primarily by the modelling groups to improve their model code?
Citation: https://doi.org/10.5194/egusphere-2025-6108-RC2 -
AC2: 'Reply on RC2', Michael García Rodríguez, 04 Jun 2026
In Section 4 about the programming languages, it would be very interesting to add an outlook or expectation about changes when hybrid models are added to the pool of CMIP models.
We thank the reviewer for the time and effort dedicated to reading and assessing our manuscript. We greatly appreciate the constructive comments provided. Below, we provide detailed responses to each of the comments raised.
Comment 1
At the beginning of Section 3 (L.89/99), I am confused about the different numbers. First, it is written about "59 out of the 262 models", and then "18 out of 121". What is the difference? Are the second numbers only about the subset of coupled models? But then in L.114, there are "59 out of the 121 couples models" mentioned. How is this connected? Maybe you can make that clearer.Response
In order to eliminate the confusion regarding the numbering of models when referring to coupled models and individual models, Section 3 has been revised to clarify and explain this numbering scheme. Specifically, lines 89 through 95 have been rewritten:
"Following the attempts made, successful access was obtained for 59 of the 262 individual models identified across all CMIP phases (see Figure 1 and Table A1). Here, the total of 262 refers to all individual models considered in this study. Within this broader set, a subset of 121 corresponds specifically to coupled climate models, which represent the core systems used in CMIP experiments.
Out of these coupled models, 18 were successfully recovered; one from CMIP3, ten from CMIP5, and seven from CMIP6. Thus, the 59 accessible individual models include these 18 coupled models as a subset. For CMIP1 and CMIP2, no model could be recovered, as their code had not been preserved over time. CMIP7 models were not included in this study because this is the current CMIP phase of the project, and active development and usage are still ongoing."Comment 2
Response
Section 4 has been modified by adding the following text about the outlook or expectation about changes when hybrid models are added to the pool of CMIP models (lines 149-162):
"The emergence of hybrid modelling approaches, combining physics implemented to code with machine learning or artificial intelligence techniques, is expected to further increase this heterogeneity. Although such approaches have not yet been integrated within the CMIP phases analysed in this study, their incorporation is foreseeable in future developments. These hybrids models may introduce new programming ecosystems alongside established high performance languages like Fortran. From a software engineering perspective, this transition may significantly impact code structure, dependencies, and development workflows. In particular, machine learning components introduce additional layers of complexity related not only to the implemented code, but also to training data, calibration procedures, and evolving model configurations. This raises important challenges for reproducibility and traceability, as the behaviour of such components may depend on factors beyond the static source code itself. Consequently, evaluating future CMIP models may require extending current approaches to software quality assessment. Traditional static code analysis techniques, such as those employed in this study, may need to be complemented with methodologies capable of addressing data provenance, model training processes, and the interpretability of learned parameterisations. In this context, hybrid models could be more appropriately analysed as systems composed of heterogeneous components, combining multiple tools specialised to a programming language and tools specialised on the analysis of machine learning usage modelling paradigms in order to obtain an integrated assessment of the full system."Comment 3In Section 5, it would be helpful to have some examples of lacks in the code quality which are found by the analysis tool. And which parts or topics should be focused on primarily by the modelling groups to improve their model code?Response
In the manuscript, we will briefly include representative examples of typical issues detected by the static analysis, such as overly large and poorly modularised routines, high structural complexity, inconsistent variable declarations, and limited documentation practices. We will also clarify the main areas where improvement efforts could be most effective, particularly in terms of modularisation, coding standards, documentation, and the integration of analysis tools into development workflows.
At the same time, we note that a much more detailed discussion of these aspects is already available in our recent publication dedicated to the FortranAnalyser tool (García-Rodríguez et al., 2024), where both hypothetical examples and real-case improvements are presented and analysed in depth. To avoid redundancy while still addressing the reviewer’s request, we will include a concise summary in Section 5 together with a clearer reference to that work, where readers can find a more comprehensive treatment of the identified code quality issues and possible solutions.Citation: https://doi.org/10.5194/egusphere-2025-6108-AC2
-
AC2: 'Reply on RC2', Michael García Rodríguez, 04 Jun 2026
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 621 | 254 | 64 | 939 | 56 | 65 |
- HTML: 621
- PDF: 254
- XML: 64
- Total: 939
- BibTeX: 56
- EndNote: 65
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
In this paper, the authors study the accessibility, understandability, and licensing of climate models used across all CMIP exercises. They found that 18 models out of 121 were accessible, with one being from CMIP3, ten from CMIP5, and seven from CMIP6. Then, with the accessible models, the authors performed code analysis. They found out that all but one were written in Fortran, the exception being NICAM.09. Finally, for the models written in Fortran, they scored the code using the FortranAnalyser, a static code analysis tool that grades Fortran code from zero to ten. Their findings show that the quality of the code has increased with the newer phases of CMIP.
This work is of great importance as it provides an overview of the accessibility of the climate models used in CMIP. Equally important is its discussion regarding the reproducibility of these experiments, which are extremely consequential for advancing our understanding of climate.
I have three critiques for the authors.
1) In section 5, "Code Analysis," I would like to have a clarification, in general terms, of what the FortranAnalyser analyzes to score the models. Moreover, if possible, give an intuition of what a high score code would be. This is because the authors claim that a score of 4.014 is "relatively high," but I have no reference to compare this value with.
2) In the annexed table, where all the models are present, there is overlapped text (page 18, page 20).
3) In the references, the last accessed text is written in Spanish.