the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A case for open communication of bugs in climate models, made with ICON version 2024.01
Abstract. Climate models are not just numerical representations of scientific knowledge, they are also human-written software programs. As such, they contain coding mistakes, which may look mundane, but can affect the results of interconnected and complex models in unforeseen ways. These bugs are underacknowledged in the climate science community.
We describe a sea ice bug in the coupled atmosphere-ocean-sea ice model ICON and its history. The bug was caused by a logical flag that was set incorrectly, such that the ocean did not experience friction from sea ice and thus the surface velocity did not slow down, especially in the presence of ocean eddies. While describing the bug and its effects, we also give an example of visual and concise bug communication. In addition, we conceptualize this bug as representing a novel species of resolution-dependent bugs. These are long-standing bugs that are discovered during the transition to high-resolution climate models due to features that are resolved at the kilometer scale. This case study serves to illustrate the value of open documentation of bugs in climate models and to encourage our community to adopt a similar approach.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(16200 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2024-3493', David Bailey, 16 Dec 2024
This manuscript makes the case that coding bugs happen in climate and earth system models and that there should be a more open process. I agree that bugs happen, but fervently disagree that we are somehow "hiding" bugs from the user community. For example, there are many code revision tools like github and even subversion before it. These are used for systematic software testing and robustness. One example is the CICE Consortium.
We have a public issues page here:
https://github.com/CICE-Consortium/CICE/issues
Then as we issue pull requests back to the code we have to go through a series of regression tests checking if the answers have changed against a previous version. If there are answer changes we must run a quality control suite to check if the results are statistically different from the previous version. You can see the process here:
https://github.com/CICE-Consortium/CICE/pull/965
So, yes bugs happen, but they are corrected as soon as found and clearly documented during the pull request process and then the changes are also added to the release notes for each new version. Most big modeling centers are using this type of process and it is completely open. So, to me this is a one off case for the authors particular model and I do not believe this would really be of interested to the greater GMD community.
Citation: https://doi.org/10.5194/egusphere-2024-3493-RC1 -
AC1: 'Reply on RC1', Ulrike Proske, 13 Jan 2025
Dear Reviewer,
Thank you for your review and the constructive feedback. We appreciate your acknowledgment that coding bugs occur in climate and Earth system models and your emphasis on the importance of systematic testing and robust processes for addressing them.
We would like to address a few points to clarify our paper's objectives and scope:
-
Scope and Acknowledgment of Current Practices We do not claim that the climate science community is "hiding" bugs. Instead, we argue that the process of documenting and analyzing bugs—particularly those with nuanced effects like resolution-dependence—deserves greater attention in the literature, also considering the effect on previously published results. We acknowledge the exemplary practices in repositories such as the CICE Consortium and their public documentation. ICON has a similar gitlab repository, which is however not yet public. We are happy to add a specific reference to your cited process as an example of robust bug tracking and resolution.
-
Novelty of the Case Study The bug we discuss is notable due to its resolution-dependent nature, which emerged during the transition to high-resolution models. Such bugs are likely to become more common as the field advances, highlighting the importance of examining their impacts in depth. We believe this makes our findings relevant beyond the specific model studied.
-
Broader Implications While we agree that many large modeling centers already have robust workflows, our goal is to advocate for universal adoption of transparent bug documentation practices and to demonstrate their value through our case study – to ensure that lessons from individual cases benefit the wider climate modeling community.
We hope these clarifications address your concerns and demonstrate the broader relevance of our work. We are open to revising the manuscript to explicitly reference the good practices you have highlighted, ensuring a balanced discussion that reflects the diversity of approaches in the community.
Thank you again for your review and helpful suggestions.
Citation: https://doi.org/10.5194/egusphere-2024-3493-AC1 -
-
AC1: 'Reply on RC1', Ulrike Proske, 13 Jan 2025
-
RC2: 'Comment on egusphere-2024-3493', Anonymous Referee #2, 26 Jan 2025
The paper is a case-study of detection, identification and code-change for a settings clash in the open-source Earth system model ICON. Starting with the title, the paper opts for "open communication of bugs", and uses the case study to advocate for it. One of possible strategies proposed is to define a new paper type in a journal like GMD, where bug reports could be communicated.
While, the title is indeed intriguing, and the topic is relevant to GMD, I do not consider the paper convincing. In a way, I would categorize it as a counterexample - both in terms of software engineering, as well as manuscript composition and scope choice. The latter is of course highly subjective, but some numbers speak for themselves: 110 literature references cited (but with numerous inconsistencies - see below), including 13 inline quotes, to tell a story of a single "bug", and advocating that the paper is "a convincing example of the value and feasibility of open bug documentation". In my opinion, engaging journal format and peer-review workflows for documenting bugs in form of research papers is neither effective, nor generally feasible. The long timelines of peer review, the significant cost of open-access publications, the detachment of the paper reviews from actual code development, the lack of added value from peer review at the stage where a bug was already addressed, all contradict the needs and aims of bug reporting and communication. Peer-reviewing code, using tools integrated into the code development systems, and thus tightly integrating the relevant discussions with the record of code changes, is what is likely more relevant and effective.
In terms of software engineering, unfortunately the paper is not representing a best-practice example. The fix to the "bug" in question changes a default value for a settings flag, and makes the code emit warnings in cases in which user settings could lead to a misconfiguration.
The "fix" does not include any automated test, and there seem to be no guarantee introduced into the project that further codebase developments would not incur regressions if the original behavior would be reinstated. The "fix" presented seems to be an example of a patch that would be automatically rejected by any code-coverage workflow due to having no test coverage. Furthermore, according to the text in section 3.2.3, part of the fix is expected to be removed in future, but this is not communicated in any way to the user or developer in the patch (e.g., through a deprecation logic). In the spirit of the paper's rich literature references, let me refer here to Wilson et al. 2014 "Best Practices for Scientific Computing" (https://doi.org/10.1371/journal.pbio.1001745), and highlight the "Turn bugs into test cases" rule summarized there with "... writing tests that trigger a bug that has been found in the code and (once fixed) will prevent the bug from reappearing unnoticed".Despite the paper describing a single particular "bug" in the codebase, and advocating for clarity in communication of the relevant software updates, the way the bug is communicated in the ICON codebase does not seem to provide a best-practice examples, either. The changes in the files/lines of code in question are visible in the public code repository (https://gitlab.dkrz.de/icon/icon-model/-/commit/7016082e7bfb6faf6dffd8af0ebc0a2d77867004) but are not linked in any way to a bug report or discussion. The relevant commit is vaguely labeled as "Release export ICON 2024.07 based on tag tags/icon-2024.07" and features changes to over 1000 different files with a bulk portion of 167 items in the RELEASE_NOTES.md file (including what I guess is a relevant entry: "ICON-Ocean: Bugfix: Enable sea-ice drag on ocean as default" - but clearly not mentioning the interesting and important resolution dependence). The commit seems to announce ICON version 2024.07, while the paper does not link to it, but rather refers to custom Zenodo-archived tarball labeled "2.6.6-nextgems_prefinal-ngc4008".
In the Code & Data availability section, the reference to "that release" of ICON is unclear - which release? More importantly, trying to access the URL provided for the bug report, the user is prompted with a GitLab message: "You need to sign in or sign up before continuing. All users with an active DKRZ account can use this resource." There is no indication how to sign up. Browsing through the docs.dkrz.de website, one can learn that requesting an account involves provision of personal data (name, phone, physical address are obligatory fields, while the form even asks for "country of citizenship" and "country of residence"), but the docs also openly state that "we accept only email addresses from institutions known to us". Despite the paper title encouraging open communication of bugs, in fact the relevant development history is not openly accessible.
I would argue that what the paper quotes in the introduction: that "climate scientists learn programming by doing" is actually exemplified in the paper. Despite software industry having established communication methods for vulnerabilities, established methodologies for annotating and linking code changes to bug reports and development discussions, and established methods for ensuring bugs do not re-appear in further developments, neither the story of the paper, nor the actual bugfix follow it.
How a user of the new code would link the warning message with the openly communicated bug? How a user of the old code could tell from release notes if her/his usage is affected? How a developer would know that a given line is related to the bug fix, and how to demonstrate its effects? How a maintainer would be assured that the openly communicated bug is not reintroduced through subsequent developments? Answers to all of the above questions are provided if the development is carried out using industry-standard workflows, and using tools enabling: automated linking of code changes with code reviews, automated generation of linked release notes, automated execution of regression tests coupled with code-coverage analysis that helps in code reviews, and versioned documentation clearly indicating changes and deprecations in user-defined settings. There is no reason to reinvent. No reason to adapt scientific journal format to achieve what is working well with tools and platforms developed for software maintenance and development-history tracking. A public issue-tracking system with linked commit, discussion, release-notes entry and warning messages would better serve the purpose of openly communicating the bug. The resolution-dependence characteristic of the bug included.
In the paper, the references are provided with inconsistent format: journal names abbreviated or not, volume numbers given or not, page numbers given or not, some names are spelled in ALL CAPS, book ISBNs given or not. There are numerous entries without DOI provided and missing or partial bibliographic data:
- Avizienis 2004: missing DOI (10.1109/TDSC.2004.2), missing authors
- Baker et al. 2015: missing DOI (10.5194/gmd-8-2829-2015)
- Baker et al. 2016: missing DOI (10.5194/gmd-9-2391-2016)
- Bethke et al. 2021: missing DOI (10.5194/gmd-14-7073-2021)
- Blanchard & Dũng 2024: missing permament URL (e.g., https://web.archive.org/web/*/https://undonecs.sciencesconf.org/data/Undonecs_2024_abstract_30.pdf)
- Carver et al. 2007: missing DOI (10.1109/ICSE.2007.77), misspelled conference name
- Depaz 2023: missing permanent URL (https://theses.fr/2023PA030084)
- Easterbrook & Johns 2009: missing DOI (10.1109/MCSE.2009.193)
- Gutjahr & Mehlmann 2024: bogus DOI link (https://doi.org/https://doi.org/...)
- Hatton 1997: missing DOI (10.1007/978-1-5041-2940-4_2)
- Heymann & Hundebol 2017: missing DOI (10.4324/9781315406282)
- Heymann et al. 2017: ditto
- Hibler 1979: bogus DOI link (https://doi.org/https://doi.org/...)
- Hook 2009: missing permanent URL (http://hdl.handle.net/1974/1765)
- Hunke & Dukowicz 1997: bogus DOI link (https://doi.org/https://doi.org/...)
- Jöckel et al. 2016: missing DOI (10.5194/gmd-9-1153-2016), wrong version id in the title
- Kelly & Sanders 2008: missing permanent URL (e.g., https://web.archive.org/web/*/http://se4science.org/workshops/secse08/Papers/Kelly.pdf)
- Kendall et al. 2008: missing DOI (10.1109/MS.2008.86), title shortened, missing journal name
- Kimmritz et al. 2017: bogus DOI link (https://doi.org/https://doi.org/...)
- Korkin et al. 2022: missing DOI (10.1016/j.cpc.2021.108198)
- Korn et al. 2022: bogus DOI link (https://doi.org/https://doi.org/...)
- Kreyscher et al. 2000: bogus DOI link (https://doi.org/https://doi.org/...)
- Livingstone 2003: missing DOI (10.7208/9780226487243)
- Luther et al. 1988: missing DOI (10.1175/1520-0477-69.1.40)
- Mastroianni 2022: missing everything
- Miller 2006: missing DOI (10.1126/science.314.5807.1856)
- Mioduszewski et al. 2018: missing DOI (10.1175/JCLI-D-18-0109.1)
- Müller et al. 2024: submitted works not available to the public are not permitted in GDM (https://www.geoscientific-model-development.net/submission.html#references)
- Newman 2016: missing DOI (10.1007/978-3-319-47286-7_18)
- Pipitone & Easterbrook 2012: missing DOI (10.5194/gmd-5-1009-2012)
- Polanyi & Nye 2015: should it be just Polanyi? (Nye is only credited for foreword to the reprint of this 1958 book)
- Rackow: title and year need update, missing journal name, missing DOI (10.5194/gmd-18-33-2025)
- Rahman et al. 2023: missing DOI (10.1145/3626313)
- Segal 2008: missing permanent URL (https://oro.open.ac.uk/17673/)
- Semtner et al. 1976: bogus DOI link (https://doi.org/https://doi.org/...)
- Shackley 2001: missing DOI (10.7551/mitpress/1789.003.0007)
- Storer 2017: title shortened, missing DOI (10.1145/3084225)
- Tallapragada 2014: missing DOI (10.1175/MWR-D-13-00010.1)
- Tucker et al. 2022: missing DOI (10.5194/gmd-15-1413-2022)
- Wilson 2006: missing a permanent URL (e.g., https://proquest.com/openview/e5bd547f61b53a78523d61490cecbc28/1 ?)
- Winsberg 2018: missing DOI (10.1017/9781108164290)
All figures in the paper are included as raster rather than vector graphics, inadequate for journal submission.
Citation: https://doi.org/10.5194/egusphere-2024-3493-RC2 -
AC2: 'Reply on RC2', Ulrike Proske, 17 Feb 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-3493/egusphere-2024-3493-AC2-supplement.pdf
-
RC3: 'Comment on egusphere-2024-3493', Anonymous Referee #3, 27 Jan 2025
Summary: This paper is an experience report regarding a specific bug in the ICON climate model. The bug resulted from a patch that enforced a consistency constraint between two configuration options: if ice_dyn was nonzero (sea dynamics enabled) then ice stress on ocean would be disabled (stress_ice_zero). This resulted in simulating physically infeasible behavior, which was manifested in holes in predicted sea ice.
As an experience report, the paper may be useful to those who seek to understand how bugs in climate systems might arise and how they might be found. I applaud the goal of analyzing the situation thoroughly and communicating with others with the goal of preventing similar bugs from recurring. I appreciate the effort in the paper to convey the physical details to readers who, like myself, have a background in software rather than geoscience. However, the paper lacks a systematic approach. Robert Yin's book on case studies could be a valuable resource if the authors want to use a more structured method: identify a research question, select cases according to criteria, collect data in a planned and consistent manner, and make inferences from the data to address the research question (these slides may be helpful in thinking about this: https://users.ece.utexas.edu/~perry/work/papers/DP-04-icse-tut-slides.pdf). As is, the paper tells part of the story, not giving sufficient depth to enable others to know how they can do better.
The title, which suggests that the paper will argue that open communication of bugs would benefit the community, is not sufficiently justified by the content. I of course agree with the premise, but making the case would require discussing more than just one bug. The content of the paper serves as open communication itself rather than arguing for the broad adoption of open communication. An analysis of many bugs in climate models could be much more compelling. Are there many bugs like this one? If so, it may be worth doing research in how to detect and prevent the whole class of bugs.
Turning to the specifics of the experience report, it seems to me that there are critical missed opportunities here to understand why the bug came into existence in the first place. Without this information, it is hard to know what to learn. The paper says "The code structure suggests that the second check was created by duplicating the first, and subsequently adapted incompletely." Was the version control history not studied to see whether this was the case? Were the developers who wrote that code not interviewed? Is it possible that there are other instances of similar bugs in the codebase in which incompatible options can be enabled? What software engineering processes were used at the time, what processes are used now, and what changes were put into place after this analysis was completed? What kind of code reviews done at the time the bug was inserted?
Next, I think a thorough examination would consider what could have been done (and what will now be done, considering this analysis) to prevent this bug from arising in the future. Regression tests? Assertions? Code reviews? Property-based testing regarding, for example, physical constraints like conservation of momentum? Formal verification of adherence to physical constraints? Without this, I worry that the community will miss opportunities to learn from the experience.
One more suggestion: in software engineering, bugs are often prioritized and addressed in order of priority. It seems like perhaps that was not done here: investigating this bug waited until a convenient moment arose to fix it. This, too, is an opportunity to reconsider process. What kinds of activities were higher priority for the team than this bug, and why? Was that a correct judgment, or, in retrospect, what should have been done differently to increase the priority of this bug?
Overall, I think there are significant opportunities here to learn from this experience. A more structured approach to this work (e.g. what were the research questions?) and a deeper analysis could make this article much more valuable.
Citation: https://doi.org/10.5194/egusphere-2024-3493-RC3 -
AC2: 'Reply on RC2', Ulrike Proske, 17 Feb 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-3493/egusphere-2024-3493-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Ulrike Proske, 17 Feb 2025
-
EC1: 'Comment on egusphere-2024-3493', Juan Antonio Añel, 30 Jan 2025
Dear authors,
Unfortunately, after the comments by the reviewers, I do not envisage the publication of your paper in Geoscientific Model Development, and therefore, I discourage the submission of a revised manuscript. I am sorry that we can not be more positive on this occasion.
Best regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2024-3493-EC1 -
AC3: 'Reply on EC1', Ulrike Proske, 17 Feb 2025
Dear Juan A. Añel,
Thank you for sharing your preliminary assessment. Please let us know if, based on our comments, you reconsider and would be open to receive a revised manuscript.
Best regards,
The author teamCitation: https://doi.org/10.5194/egusphere-2024-3493-AC3
-
AC3: 'Reply on EC1', Ulrike Proske, 17 Feb 2025
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2024-3493', David Bailey, 16 Dec 2024
This manuscript makes the case that coding bugs happen in climate and earth system models and that there should be a more open process. I agree that bugs happen, but fervently disagree that we are somehow "hiding" bugs from the user community. For example, there are many code revision tools like github and even subversion before it. These are used for systematic software testing and robustness. One example is the CICE Consortium.
We have a public issues page here:
https://github.com/CICE-Consortium/CICE/issues
Then as we issue pull requests back to the code we have to go through a series of regression tests checking if the answers have changed against a previous version. If there are answer changes we must run a quality control suite to check if the results are statistically different from the previous version. You can see the process here:
https://github.com/CICE-Consortium/CICE/pull/965
So, yes bugs happen, but they are corrected as soon as found and clearly documented during the pull request process and then the changes are also added to the release notes for each new version. Most big modeling centers are using this type of process and it is completely open. So, to me this is a one off case for the authors particular model and I do not believe this would really be of interested to the greater GMD community.
Citation: https://doi.org/10.5194/egusphere-2024-3493-RC1 -
AC1: 'Reply on RC1', Ulrike Proske, 13 Jan 2025
Dear Reviewer,
Thank you for your review and the constructive feedback. We appreciate your acknowledgment that coding bugs occur in climate and Earth system models and your emphasis on the importance of systematic testing and robust processes for addressing them.
We would like to address a few points to clarify our paper's objectives and scope:
-
Scope and Acknowledgment of Current Practices We do not claim that the climate science community is "hiding" bugs. Instead, we argue that the process of documenting and analyzing bugs—particularly those with nuanced effects like resolution-dependence—deserves greater attention in the literature, also considering the effect on previously published results. We acknowledge the exemplary practices in repositories such as the CICE Consortium and their public documentation. ICON has a similar gitlab repository, which is however not yet public. We are happy to add a specific reference to your cited process as an example of robust bug tracking and resolution.
-
Novelty of the Case Study The bug we discuss is notable due to its resolution-dependent nature, which emerged during the transition to high-resolution models. Such bugs are likely to become more common as the field advances, highlighting the importance of examining their impacts in depth. We believe this makes our findings relevant beyond the specific model studied.
-
Broader Implications While we agree that many large modeling centers already have robust workflows, our goal is to advocate for universal adoption of transparent bug documentation practices and to demonstrate their value through our case study – to ensure that lessons from individual cases benefit the wider climate modeling community.
We hope these clarifications address your concerns and demonstrate the broader relevance of our work. We are open to revising the manuscript to explicitly reference the good practices you have highlighted, ensuring a balanced discussion that reflects the diversity of approaches in the community.
Thank you again for your review and helpful suggestions.
Citation: https://doi.org/10.5194/egusphere-2024-3493-AC1 -
-
AC1: 'Reply on RC1', Ulrike Proske, 13 Jan 2025
-
RC2: 'Comment on egusphere-2024-3493', Anonymous Referee #2, 26 Jan 2025
The paper is a case-study of detection, identification and code-change for a settings clash in the open-source Earth system model ICON. Starting with the title, the paper opts for "open communication of bugs", and uses the case study to advocate for it. One of possible strategies proposed is to define a new paper type in a journal like GMD, where bug reports could be communicated.
While, the title is indeed intriguing, and the topic is relevant to GMD, I do not consider the paper convincing. In a way, I would categorize it as a counterexample - both in terms of software engineering, as well as manuscript composition and scope choice. The latter is of course highly subjective, but some numbers speak for themselves: 110 literature references cited (but with numerous inconsistencies - see below), including 13 inline quotes, to tell a story of a single "bug", and advocating that the paper is "a convincing example of the value and feasibility of open bug documentation". In my opinion, engaging journal format and peer-review workflows for documenting bugs in form of research papers is neither effective, nor generally feasible. The long timelines of peer review, the significant cost of open-access publications, the detachment of the paper reviews from actual code development, the lack of added value from peer review at the stage where a bug was already addressed, all contradict the needs and aims of bug reporting and communication. Peer-reviewing code, using tools integrated into the code development systems, and thus tightly integrating the relevant discussions with the record of code changes, is what is likely more relevant and effective.
In terms of software engineering, unfortunately the paper is not representing a best-practice example. The fix to the "bug" in question changes a default value for a settings flag, and makes the code emit warnings in cases in which user settings could lead to a misconfiguration.
The "fix" does not include any automated test, and there seem to be no guarantee introduced into the project that further codebase developments would not incur regressions if the original behavior would be reinstated. The "fix" presented seems to be an example of a patch that would be automatically rejected by any code-coverage workflow due to having no test coverage. Furthermore, according to the text in section 3.2.3, part of the fix is expected to be removed in future, but this is not communicated in any way to the user or developer in the patch (e.g., through a deprecation logic). In the spirit of the paper's rich literature references, let me refer here to Wilson et al. 2014 "Best Practices for Scientific Computing" (https://doi.org/10.1371/journal.pbio.1001745), and highlight the "Turn bugs into test cases" rule summarized there with "... writing tests that trigger a bug that has been found in the code and (once fixed) will prevent the bug from reappearing unnoticed".Despite the paper describing a single particular "bug" in the codebase, and advocating for clarity in communication of the relevant software updates, the way the bug is communicated in the ICON codebase does not seem to provide a best-practice examples, either. The changes in the files/lines of code in question are visible in the public code repository (https://gitlab.dkrz.de/icon/icon-model/-/commit/7016082e7bfb6faf6dffd8af0ebc0a2d77867004) but are not linked in any way to a bug report or discussion. The relevant commit is vaguely labeled as "Release export ICON 2024.07 based on tag tags/icon-2024.07" and features changes to over 1000 different files with a bulk portion of 167 items in the RELEASE_NOTES.md file (including what I guess is a relevant entry: "ICON-Ocean: Bugfix: Enable sea-ice drag on ocean as default" - but clearly not mentioning the interesting and important resolution dependence). The commit seems to announce ICON version 2024.07, while the paper does not link to it, but rather refers to custom Zenodo-archived tarball labeled "2.6.6-nextgems_prefinal-ngc4008".
In the Code & Data availability section, the reference to "that release" of ICON is unclear - which release? More importantly, trying to access the URL provided for the bug report, the user is prompted with a GitLab message: "You need to sign in or sign up before continuing. All users with an active DKRZ account can use this resource." There is no indication how to sign up. Browsing through the docs.dkrz.de website, one can learn that requesting an account involves provision of personal data (name, phone, physical address are obligatory fields, while the form even asks for "country of citizenship" and "country of residence"), but the docs also openly state that "we accept only email addresses from institutions known to us". Despite the paper title encouraging open communication of bugs, in fact the relevant development history is not openly accessible.
I would argue that what the paper quotes in the introduction: that "climate scientists learn programming by doing" is actually exemplified in the paper. Despite software industry having established communication methods for vulnerabilities, established methodologies for annotating and linking code changes to bug reports and development discussions, and established methods for ensuring bugs do not re-appear in further developments, neither the story of the paper, nor the actual bugfix follow it.
How a user of the new code would link the warning message with the openly communicated bug? How a user of the old code could tell from release notes if her/his usage is affected? How a developer would know that a given line is related to the bug fix, and how to demonstrate its effects? How a maintainer would be assured that the openly communicated bug is not reintroduced through subsequent developments? Answers to all of the above questions are provided if the development is carried out using industry-standard workflows, and using tools enabling: automated linking of code changes with code reviews, automated generation of linked release notes, automated execution of regression tests coupled with code-coverage analysis that helps in code reviews, and versioned documentation clearly indicating changes and deprecations in user-defined settings. There is no reason to reinvent. No reason to adapt scientific journal format to achieve what is working well with tools and platforms developed for software maintenance and development-history tracking. A public issue-tracking system with linked commit, discussion, release-notes entry and warning messages would better serve the purpose of openly communicating the bug. The resolution-dependence characteristic of the bug included.
In the paper, the references are provided with inconsistent format: journal names abbreviated or not, volume numbers given or not, page numbers given or not, some names are spelled in ALL CAPS, book ISBNs given or not. There are numerous entries without DOI provided and missing or partial bibliographic data:
- Avizienis 2004: missing DOI (10.1109/TDSC.2004.2), missing authors
- Baker et al. 2015: missing DOI (10.5194/gmd-8-2829-2015)
- Baker et al. 2016: missing DOI (10.5194/gmd-9-2391-2016)
- Bethke et al. 2021: missing DOI (10.5194/gmd-14-7073-2021)
- Blanchard & Dũng 2024: missing permament URL (e.g., https://web.archive.org/web/*/https://undonecs.sciencesconf.org/data/Undonecs_2024_abstract_30.pdf)
- Carver et al. 2007: missing DOI (10.1109/ICSE.2007.77), misspelled conference name
- Depaz 2023: missing permanent URL (https://theses.fr/2023PA030084)
- Easterbrook & Johns 2009: missing DOI (10.1109/MCSE.2009.193)
- Gutjahr & Mehlmann 2024: bogus DOI link (https://doi.org/https://doi.org/...)
- Hatton 1997: missing DOI (10.1007/978-1-5041-2940-4_2)
- Heymann & Hundebol 2017: missing DOI (10.4324/9781315406282)
- Heymann et al. 2017: ditto
- Hibler 1979: bogus DOI link (https://doi.org/https://doi.org/...)
- Hook 2009: missing permanent URL (http://hdl.handle.net/1974/1765)
- Hunke & Dukowicz 1997: bogus DOI link (https://doi.org/https://doi.org/...)
- Jöckel et al. 2016: missing DOI (10.5194/gmd-9-1153-2016), wrong version id in the title
- Kelly & Sanders 2008: missing permanent URL (e.g., https://web.archive.org/web/*/http://se4science.org/workshops/secse08/Papers/Kelly.pdf)
- Kendall et al. 2008: missing DOI (10.1109/MS.2008.86), title shortened, missing journal name
- Kimmritz et al. 2017: bogus DOI link (https://doi.org/https://doi.org/...)
- Korkin et al. 2022: missing DOI (10.1016/j.cpc.2021.108198)
- Korn et al. 2022: bogus DOI link (https://doi.org/https://doi.org/...)
- Kreyscher et al. 2000: bogus DOI link (https://doi.org/https://doi.org/...)
- Livingstone 2003: missing DOI (10.7208/9780226487243)
- Luther et al. 1988: missing DOI (10.1175/1520-0477-69.1.40)
- Mastroianni 2022: missing everything
- Miller 2006: missing DOI (10.1126/science.314.5807.1856)
- Mioduszewski et al. 2018: missing DOI (10.1175/JCLI-D-18-0109.1)
- Müller et al. 2024: submitted works not available to the public are not permitted in GDM (https://www.geoscientific-model-development.net/submission.html#references)
- Newman 2016: missing DOI (10.1007/978-3-319-47286-7_18)
- Pipitone & Easterbrook 2012: missing DOI (10.5194/gmd-5-1009-2012)
- Polanyi & Nye 2015: should it be just Polanyi? (Nye is only credited for foreword to the reprint of this 1958 book)
- Rackow: title and year need update, missing journal name, missing DOI (10.5194/gmd-18-33-2025)
- Rahman et al. 2023: missing DOI (10.1145/3626313)
- Segal 2008: missing permanent URL (https://oro.open.ac.uk/17673/)
- Semtner et al. 1976: bogus DOI link (https://doi.org/https://doi.org/...)
- Shackley 2001: missing DOI (10.7551/mitpress/1789.003.0007)
- Storer 2017: title shortened, missing DOI (10.1145/3084225)
- Tallapragada 2014: missing DOI (10.1175/MWR-D-13-00010.1)
- Tucker et al. 2022: missing DOI (10.5194/gmd-15-1413-2022)
- Wilson 2006: missing a permanent URL (e.g., https://proquest.com/openview/e5bd547f61b53a78523d61490cecbc28/1 ?)
- Winsberg 2018: missing DOI (10.1017/9781108164290)
All figures in the paper are included as raster rather than vector graphics, inadequate for journal submission.
Citation: https://doi.org/10.5194/egusphere-2024-3493-RC2 -
AC2: 'Reply on RC2', Ulrike Proske, 17 Feb 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-3493/egusphere-2024-3493-AC2-supplement.pdf
-
RC3: 'Comment on egusphere-2024-3493', Anonymous Referee #3, 27 Jan 2025
Summary: This paper is an experience report regarding a specific bug in the ICON climate model. The bug resulted from a patch that enforced a consistency constraint between two configuration options: if ice_dyn was nonzero (sea dynamics enabled) then ice stress on ocean would be disabled (stress_ice_zero). This resulted in simulating physically infeasible behavior, which was manifested in holes in predicted sea ice.
As an experience report, the paper may be useful to those who seek to understand how bugs in climate systems might arise and how they might be found. I applaud the goal of analyzing the situation thoroughly and communicating with others with the goal of preventing similar bugs from recurring. I appreciate the effort in the paper to convey the physical details to readers who, like myself, have a background in software rather than geoscience. However, the paper lacks a systematic approach. Robert Yin's book on case studies could be a valuable resource if the authors want to use a more structured method: identify a research question, select cases according to criteria, collect data in a planned and consistent manner, and make inferences from the data to address the research question (these slides may be helpful in thinking about this: https://users.ece.utexas.edu/~perry/work/papers/DP-04-icse-tut-slides.pdf). As is, the paper tells part of the story, not giving sufficient depth to enable others to know how they can do better.
The title, which suggests that the paper will argue that open communication of bugs would benefit the community, is not sufficiently justified by the content. I of course agree with the premise, but making the case would require discussing more than just one bug. The content of the paper serves as open communication itself rather than arguing for the broad adoption of open communication. An analysis of many bugs in climate models could be much more compelling. Are there many bugs like this one? If so, it may be worth doing research in how to detect and prevent the whole class of bugs.
Turning to the specifics of the experience report, it seems to me that there are critical missed opportunities here to understand why the bug came into existence in the first place. Without this information, it is hard to know what to learn. The paper says "The code structure suggests that the second check was created by duplicating the first, and subsequently adapted incompletely." Was the version control history not studied to see whether this was the case? Were the developers who wrote that code not interviewed? Is it possible that there are other instances of similar bugs in the codebase in which incompatible options can be enabled? What software engineering processes were used at the time, what processes are used now, and what changes were put into place after this analysis was completed? What kind of code reviews done at the time the bug was inserted?
Next, I think a thorough examination would consider what could have been done (and what will now be done, considering this analysis) to prevent this bug from arising in the future. Regression tests? Assertions? Code reviews? Property-based testing regarding, for example, physical constraints like conservation of momentum? Formal verification of adherence to physical constraints? Without this, I worry that the community will miss opportunities to learn from the experience.
One more suggestion: in software engineering, bugs are often prioritized and addressed in order of priority. It seems like perhaps that was not done here: investigating this bug waited until a convenient moment arose to fix it. This, too, is an opportunity to reconsider process. What kinds of activities were higher priority for the team than this bug, and why? Was that a correct judgment, or, in retrospect, what should have been done differently to increase the priority of this bug?
Overall, I think there are significant opportunities here to learn from this experience. A more structured approach to this work (e.g. what were the research questions?) and a deeper analysis could make this article much more valuable.
Citation: https://doi.org/10.5194/egusphere-2024-3493-RC3 -
AC2: 'Reply on RC2', Ulrike Proske, 17 Feb 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-3493/egusphere-2024-3493-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Ulrike Proske, 17 Feb 2025
-
EC1: 'Comment on egusphere-2024-3493', Juan Antonio Añel, 30 Jan 2025
Dear authors,
Unfortunately, after the comments by the reviewers, I do not envisage the publication of your paper in Geoscientific Model Development, and therefore, I discourage the submission of a revised manuscript. I am sorry that we can not be more positive on this occasion.
Best regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2024-3493-EC1 -
AC3: 'Reply on EC1', Ulrike Proske, 17 Feb 2025
Dear Juan A. Añel,
Thank you for sharing your preliminary assessment. Please let us know if, based on our comments, you reconsider and would be open to receive a revised manuscript.
Best regards,
The author teamCitation: https://doi.org/10.5194/egusphere-2024-3493-AC3
-
AC3: 'Reply on EC1', Ulrike Proske, 17 Feb 2025
Data sets
Data and scripts for the publication "A case for open communication of bugs in climate models" Ulrike Proske et al. https://doi.org/10.5281/zenodo.14220611
Model code and software
ICON release 2024.01 The ICON partnership (DWD, MPI-M, DKRZ, KIT, C2SM) https://doi.org/10.35089/WDCC/IconRelease01
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
703 | 94 | 15 | 812 | 11 | 11 |
- HTML: 703
- PDF: 94
- XML: 15
- Total: 812
- BibTeX: 11
- EndNote: 11
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
Germany | 1 | 256 | 32 |
United States of America | 2 | 221 | 27 |
United Kingdom | 3 | 54 | 6 |
France | 4 | 37 | 4 |
Spain | 5 | 24 | 3 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 256