the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Best practices in software development for robust and reproducible geoscientific models based on insights from the Global Carbon Project models
Abstract. Computational models play an increasingly vital role in scientific research, by numerically simulating processes that cannot be solved analytically. Such models are fundamental in geosciences and offer critical insights into the impacts of global change on the Earth system today and in the future. Beyond their value as research tools, models are also software products and should therefore adhere to certain established software engineering standards. However, scientists are rarely trained as software developers, which can lead to potential deficiencies in software quality like unreadable, inefficient, or erroneous code. The complexity of these models, coupled with their integration into broader workflows, also often makes reproducing results, evaluating processes, and building upon them highly challenging.
In this paper, we review the current practices within the development processes of the state-of-the-art land surface models used by the Global Carbon Project. By combining the experience of modelers from the respective research groups with the expertise of professional software engineers, we bridge the gap between software development and scientific modeling to outline key principles and tools for improving software quality in research. We explore four main areas: 1) model testing and validation, 2) scientific, technical, and user documentation, 3) version control, continuous integration, and code review, and 4) the portability and reproducibility of workflows.
Our review of current models reveals that while modeling communities are incorporating many of the suggested practices, significant room for improvement remains in areas such as automated testing, documentation, and reproducible workflows. For instance, there is limited adoption of automated documentation and testing, and provision of reproducible workflow pipelines remains an exception. This highlights the need to identify and promote essential software engineering practices within the scientific community. Nonetheless, we also discuss numerous examples of practices within the community that can serve as guidelines for other models and could even help streamline processes within the entire community.
We conclude with an open-source example implementation of these principles built around the LPJ-GUESS model, showcasing portable and reproducible data flows, a continuous integration setup, and web-based visualizations. This example may serve as a practical resource for model developers, users, and all scientists engaged in scientific programming.
Competing interests: Co-author Sam Rabin is on the editorial board of GMD
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(1068 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 28 Jul 2025)
Model code and software
Model workflow showcase Konstantin Gregor https://doi.org/10.5281/zenodo.15191116
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
173 | 37 | 2 | 212 | 2 | 1 |
- HTML: 173
- PDF: 37
- XML: 2
- Total: 212
- BibTeX: 2
- EndNote: 1
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1