the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
rsofun v5.0: A model-data integration framework for simulating ecosystem processes
Abstract. Mechanistic vegetation models serve to estimate terrestrial carbon fluxes and climate impacts on ecosystems across diverse biotic and abiotic conditions. Systematically informing them with data is key for enhancing their predictive accuracy and estimating uncertainty. Here we present the Simulating Optimal FUNctioning {rsofun} R package, providing a computationally efficient and parallelizable implementation of the P-model for site-scale simulations of ecosystem photosynthesis, complemented with functionalities for Bayesian model-data integration and estimation of parameters and uncertainty. We describe a use case to demonstrate the package functionalities for modelling ecosystem gross CO2 uptake at one flux measurement site, including model sensitivity analysis, Bayesian parameter calibration, and prediction uncertainty estimation. {rsofun} lowers the bar of entry to ecosystem modelling and model-data integration and serves as an open-access resource for model development and dissemination.
Status: final response (author comments only)
-
CEC1: 'Comment on egusphere-2025-1260 - No compliance with the policy of the journal and stop of peer-review', Juan Antonio Añel, 09 Apr 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlFirst, you have not published with your manuscript the data used for your work. For example, you state you use "ecosystem flux measurements taken at one site" and Fluxnet2015 data. However, you do not provide such data in a repository (see our policy). Also, I have not found information about the output data produced for the work that you present here.
Additionally, you point out to a site that does not comply with our policy for the publication of calibration diagnostics: https://geco178 bern.github.io/rsofun/articles/sensitivity_analysis.html. You should have included this information in the Code and Data Availability section. Moreover, this is a Git site, which is not acceptable according to our policy, but also, the link is broken and does not provide any information. This is exactly the proof of why such sites are not acceptable for scientific publication. The site https://geco-bern.github.io/rsofun/articles/new_cost_function.html which you mention later works, but again, is not acceptable and should be listed in the Code and Data Availability section.
As the Topical Editor mentioned you after submitting your manuscript, it does not comply with our policy, and therefore it should not be in Discussions or under review. At this point, given the lack of critical information to replicate your work, namely the data (something which we failed to spot before), the sensible thing to do is to stop inviting reviewers for your manuscript, if it applies, until your manuscript is deemed in compliance with the policy of the journal.
To continue the evaluation of your work, you must reply to this comment with the information for the requested repositories for data, code and diagnostics, including their links and permanent identifiers (e.g. DOI), and a new text for the Code and Data Availability section in your manuscript that could substitute the exiting one. After it, we will check it and will decide on how to proceed with your manuscript, namely if the review process can continue.
Please, note that if you do not comply with this request, we will have to reject your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-1260-CEC1 -
CC1: 'Reply on CEC1', Benjamin Stocker, 10 Apr 2025
Dear Editor
First of all, please accept our apologies that we hadn’t answered the comment by the Editor Carlos Sierra from 31 March. It had slipped our attention.
Second, we sincerely apologise for the lack of reader guidance in describing code and data availability. Reproducibility, open access, and transparency are central for our research and we have designed the repository referred to in the code availability statement in accordance with these standards. Apparently, we have not sufficiently described locations of published code (also in the permanent repository Zenodo), data files of model forcing, evaluation, and output data. In fact, these points had all been covered by our published repository.
Thanks to your message, we have realised that code and data provision through an R package may obscure access and the “physical” location (in file form) of data provided along with the package. Indeed, data files were provided in our package in an R-specific format, and are thus not readable across platforms. To resolve this point and the point of a lack of reader guidance, we have now implemented the following changes in the published repository and propose to implement the following changes in the manuscript. Please advise us how to proceed. We are ready to submit a revised version of our manuscript to biorxiv.
Changes in the published repository
- We added a section to the README file providing details that correspond to the Data availability and the Code availability Sections.
- We added a human-readable CSV version of the model forcing and evaluation data files.
- To clarify the origin of these files, we added two data processing scripts data-raw/generate_pmodel_drivers.R and data-raw/generate_pmodel_drivers-csv.Ras a documentation how the input data was generated based on the publicly available FLUXNET data.
- These changes were marked as model version v.5.0.1 and our published repository and Zenodo was updated accordingly (https://doi.org/10.5281/zenodo.15189864). Please note that the changes between v.5.0.0 and v.5.0.1 are only related to these auxiliary files, but do not affect the model outputs and results shown in the paper.
Proposed changes in the manuscript
- The Section on Code availability will be changed to:
The {rsofun} R package can be installed from CRAN (https://cran.r-project.org/package=rsofun) or directly from its source code on GitHub (publicly available at https://github.com/geco-bern/rsofun under an AGPLv3 licence). Versioned releases of the GitHub repository are deposited on Zenodo (https://doi.org/10.5281/zenodo.15189864*). Code to reproduce the analysis and plots presented here is contained in the repository (subdirectory ‘analysis/’) and is demonstrated on the model documentation website (https://geco-bern.github.io/rsofun/, article ‘Sensitivity analysis and calibration interpretation’). - The Section Data availability will be changed to:
The model forcing and evaluation data is based on the publicly available FLUXNET2015 data for the site FR-Pue, prepared by FluxDataKit v3.4.2 (10.5281/zenodo.14808331), taken here as a subset of the originally published data for years 2007-2012. It is accessible through the {rsofun} R package and contained as part of the repository (subdirectory ‘data/’) as CSV and as files. Outputs of the analysis presented here are archived in the ‘analysis/paper_results_files/’ subfolder. - URLs will be removed from Section 4.2
Failure on our side to more clearly describe relevant information already at an earlier stage may have given the impression of a lack of transparency and open access. Specifically, the following information and data had already been published before, along with our initial submission, and address several of the points you raised:
- The code and data contained in the repo makes all published analyses fully reproducible and is also deposited on Zenodo (https://doi.org/10.5281/zenodo.15189864). The link to Zenodo is provided in the README of the Github repository. Please accept our apologies that we forgot to include it in the Code and Data Availability statements.
- The URL of the GitHub repository is given in the Code Availability section of our manuscript and the address provided is not broken. The link you referred to must have accidentally included a line number that slipped in while copying text.
- All model forcing, evaluation, and output data is published along with the repository (subdirectory data/ of https://github.com/geco-bern/rsofun and corresponding location on Zenodo).
- All code used for the analysis presented in our manuscript is published along with the repository (subdirectory analysis/ of https://github.com/geco-bern/rsofun and corresponding location on Zenodo).
- The same code is also demonstrated and extensively documented by the vignettes published along with the repository (subdirectory vignettes/ of https://github.com/geco-bern/rsofun). These are the source code for the article ‘Sensitivity analysis and calibration interpretation’ on the model documentation website (https://geco-bern.github.io/rsofun/articles/sensitivity_analysis.html).
- Full reproducible workflows for several basic use cases of the model as implemented in the rsofun R package are extensively documented on the model description website https://geco-bern.github.io/rsofun/. We made great efforts in making these user-friendly and fully transparent.
In designing the repository, we put great attention to making code reproducible and reusable, but not primarily to make data reusable and open access. This is one of the reasons why we have initially decided to keep data accessible just for demonstration purpose within the (open access) package, but not in CSV format.
Sincerely,
Benjamin StockerCitation: https://doi.org/10.5194/egusphere-2025-1260-CC1 -
CC2: 'Reply on CC1', Benjamin Stocker, 14 Apr 2025
We have now updated the version of the pre-print, published on biorxiv, with the extended information in the Data availability and the Code availability Sections. The DOI of the updated pre-print version is: https://doi.org/10.1101/2023.11.24.568574.
Kind regards,
Benjamin Stocker
Citation: https://doi.org/10.5194/egusphere-2025-1260-CC2 -
CEC2: 'Reply on CC2', Juan Antonio Añel, 14 Apr 2025
Dear authors,
Many thanks for your detailed reply, and the explanations provided in Zenodo about the code and data that you use in your work. However, there is an outstanding issue that needs to be clarified. Regarding your data you state "The model forcing and evaluation data is based on the publicly available FLUXNET2015 data for the site FR-Pue, prepared by FluxDataKit v3.4.2 (10.5281/zenodo.14808331)." The problem here is that you state "is based on". What is necessary here to reproduce your work is not to know the dataset from which you take the forcing and evaluation data, but the exact data that you use for it. In this regard, it is unclear if somebody trying to replicate your work will have the ability to obtain exactly the data that you use simply accessing to a dataset much bigger. Therefore, instead of linking the full dataset from which you take your input data, you should share the exact input data that you have used, taken from such dataset.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-1260-CEC2 -
CC3: 'Reply on CEC2', Benjamin Stocker, 15 Apr 2025
Dear Editor,
In our understanding, our reply (https://doi.org/10.5194/egusphere-2025-1260-CC1) and the information given in the Data availability Section address your points. Specifically, you write:
What is necessary here to reproduce your work is [...] to know [...] the exact data that you use for it.
In the Data availability Section, we write:
The model forcing and evaluation data [...] is accessible through the {rsofun} R package and contained as part of the repository (subdirectory ‘data/’) as CSV and as files.
The DOI of the repository containing these files is given in the Code availability Section. Further, in our reply (https://doi.org/10.5194/egusphere-2025-1260-CC1), we wrote:
- We added a human-readable CSV version of the model forcing and evaluation data files.
- To clarify the origin of these files, we added two data processing scripts data-raw/generate_pmodel_drivers.R and data-raw/generate_pmodel_drivers-csv.R as a documentation how the input data was generated based on the publicly available FLUXNET data.
You further write:
[...] if somebody trying to replicate your work will have the ability to obtain exactly the data that you use simply accessing to a dataset much bigger.
In the Data availability statement, we write:
The model forcing and evaluation data is based on the publicly available FLUXNET2015 data for the site FR-Pue, prepared by FluxDataKit v3.4.2 (10.5281/zenodo.14808331), taken here as a subset of the originally published data for years 2007-2012.
This information should be sufficient to reproduce the data preparation. Furthermore, in our reply (https://doi.org/10.5194/egusphere-2025-1260-CC1), we mention that we have added data generation scripts (data-raw/generate_pmodel_drivers.R and data-raw/generate_pmodel_drivers-csv.R) that do just that.
You finally write:
Therefore, instead of linking the full dataset from which you take your input data, you should share the exact input data that you have used, taken from such dataset.
The exact input (and also evaluation) data we have used has been part of the published repository from the start. With our latest version update of the repository, have added that data also in the form of CSV files and we have added scripts for reproducing data generation. And all this information (except pointing to those scripts) is now provided in the revised Data availability Section. Please excuse us, but we are not sure what should still be changed at this point and how our replies and additions to the manuscript and the repository do not yet resolve open points. Please advise us. Thank you.
Beni Stocker
Citation: https://doi.org/10.5194/egusphere-2025-1260-CC3 -
CEC3: 'Reply on CC3', Juan Antonio Añel, 15 Apr 2025
Dear authors,
Many thanks for the explanation. The current wording of the Code and Data Availability section in your manuscript creates some confusion, and it is not so clear as your reply to my previous comment. It is now clear that it is possible to obtain the exact data used in your study through the use of the software that you provide. I would recommend that in potential reviewed versions of your manuscript you avoid using the expression "based on" and instead you declare in the section the information that you have posted in your reply on how to get the Fluxnet input data.
With this information we can consider your manuscript in compliance with the Code and Data policy of the journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-1260-CEC3 -
CC4: 'Reply on CEC3', Benjamin Stocker, 16 Apr 2025
Dear Editor,
Ok, thank you. From your reply I understand that no further changes to the manuscript or the published data and code are needed at this point and that the latest version of the manuscript on biorxiv (https://doi.org/10.1101/2023.11.24.568574) will be considered for the peer-review process at GMD.
Kind regards,
Benjamin Stocker
Citation: https://doi.org/10.5194/egusphere-2025-1260-CC4 -
CEC4: 'Reply on CC4', Juan Antonio Añel, 16 Apr 2025
Dear authors,
Yes, it is correct, no further changes to your manuscript are need at this point.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-1260-CEC4
-
CEC4: 'Reply on CC4', Juan Antonio Añel, 16 Apr 2025
-
CC4: 'Reply on CEC3', Benjamin Stocker, 16 Apr 2025
-
CC3: 'Reply on CEC2', Benjamin Stocker, 15 Apr 2025
-
CEC2: 'Reply on CC2', Juan Antonio Añel, 14 Apr 2025
-
CC1: 'Reply on CEC1', Benjamin Stocker, 10 Apr 2025
-
RC1: 'Comment on egusphere-2025-1260', Anonymous Referee #1, 13 May 2025
General comments:
Manuscript by Paredes et al. titled "rsofun v5.0: A model-data integration framework for simulating ecosystem processes" describes an R-package that is mainly built around the P-model for efficient simulations and model-data synthesis workflows. Authors present a common use case where they calibrate the model with a single data-stream using a Bayesian approach and propagate the calibrated parameter uncertainty to model outputs.
The objectives and the rationale of the study are clearly stated, and the manuscript is written in a clear and concise manner. I appreciate the efforts of the authors and developers for this development towards reprudicle modeling results and a lower bar of entry to vegetation modelling.
While I would like to highlight that the reported development is in the interest of the GMD community, I found the writing style of the paper a bit different than typical GMD papers where many scientific and technical details are omitted with referrals to the vignettes. Below I commented on the manuscript parts that could be enriched, and I ultimately defer to the editor's decision but I suspect that it currently is not aligning quite well with categorization as a GMD "model description paper" which are expected to be comprehensive, detailed, complete and rigorous. Please consider supplementing the manuscript with deeper details and discussions.
Specific comments:
P2.L36-39: The strength of Bayesian approaches in combining information from multiple sources and scales could also be emphasized here. Also, van Oijen (2017) could be a nice addition to the citations.
P2.L59: Would the authors consider rephrasing this sentence? Currently, it reads as if a novel solution is about to be presented whereas Bayesian calibration of a process-based vegetation model is done more times than I can count. Perhaps try presenting it as an application or an implementation of an existing solution.
P3.L67: It's a pity that the paper doesn't present this more sophisticated, and perhaps more realistic and valuable setup.
Table 2: Could the parameters that were held fixed for the calibration be marked with a different character than asterisk (*) since soilm_thetastar and kc_jmax also have an asterisk in their symbols? Or the table caption could be modified to read "... marked with an asterisk in the last column"
P6.L34: Could you elaborate as to which functions are included in this set and why/how they were chosen?
P7.L44: As GPP is not a measured but a derived quantity, could you please mention the approach used to derive it? FLUXNET2015 dataset is fairly well-documented but for the sake of being more complete, please also add how the GPP values were gapfilled and aggregated to daily values (which I assume is the model time step).
P7.L61: Undermined how? Could you please elaborate on how the convergence was affected so that the readers can follow and apply the same logic in their applications when needed? Was there a strong correlation structure? Were the chains getting stuck? Was the result the same with different algorithms? Different chain lengths? The text also mentions before that kc_jmax and beta_unitcostratio (P3.L83) have previously been calibrated separately and fixed in this study, which was somewhat agreeable, but now saying that you decided to hold them constant because calibrating them with other model parameters undermined convergence is confusing. Please clarify and reconcile.P7.L62: Also, on Table 2 (because they're fixed) no range to vary them was given for those parameters that were held constant in calibration. But I believe for them to appear in the sensitivity analysis (Fig 2) you varied them in some range. In fact, I can see that in the vignette these are specified, but please include those ranges also somewhere in the manuscript for clarity and reproducibility.
Figure 2: Would it be possible to add the symbols to the figure? Because the paragraph right before (P7.L56-62), refers to these parameters with symbols while the figure refers to them with parameter names which requires the reader to go back to Table2 to do the mapping. Admittedly it's a small list, but I suspect it wouldn't be difficult to add symbols to the figure. Same goes for Figure 3.
P8.L75: 24K iterations sounds like an interesting choice. A more typical number would be, for example, 50K or 100K. Could you please explain in the text how you decided on this number (since the goal of the package and paper is to lower the bar and provide guidance to the audience)?
P8.L77-78: I really appreciate the vignettes but I feel like some of the results reported there should go together with the paper (e.g. in the supplement). I think vignettes are a lot more practical and ultimately more useful to the end users, and I wouldn't object if this was an open source scientific software journal but I expect GMD papers to be more complete. For example there is a kphio_par_a and kphio_par_b correlation discussion in the vignette which is completely missing from the paper.
P9.L87: (from here onwards, including Figure 4) I believe there is a mix up in the way "model error" term is used. Perhaps the authors meant "error uncertainty" instead of "model error"? It is true that the credible interval is solely concerned about the uncertainty in the model parameters. Predictive interval here, however, is concerned with the overall residual error between the model and the data. In other words, the way the error term was jointly fitted in the calibration makes it intractable to decompose this term into data and model error. It is also known to dominate and cause overestimation of predictive uncertainty. Please refer to the relevant literature and revise (e.g. van Oijen, 2017).
Results - I was a bit surprised to see no quantitative reporting of the model improvement after the calibration. The only visual comparison (Figure 4) reports the posterior performance with no reference to prior performance, and only for one year. While showing a single year is useful for practical purposes, please consider providing results for all years (e.g. in the supplement):
- It would be interesting for the readers to see how performance in different years compare, also with quantitative metrics.
- Furthermore, even though measurements from years 2013 and 2014 were deemed problematic, it would be good to show what the calibrated model predicts for these out-of-sample years that the calibration did not see.
- Last, but not least, there was no mention of further posterior predictive checks/diagnostics. One can for example plot residuals against predictors and employ formal tests that measure where the observed data falls on the distribution of simulated data. Is there a pattern in the residuals? Were the correct distributional assumptions made? Comparing credible and the predictive intervals is only a (very) limited part of the story.
Discussion - Please consider enriching the discussion section with the following:
- Were the design choices adequate? I mean all the decisions from coupling the model with BayesianTools as a package to the prior and likelihood forms you selected?
- What does it take to transfer this calibration to another site/variable, multiple sites/variables?
- How do you recommend iteratively updating these results as new data becomes available (either more of the same type of observations or with new data sources)? Does your implementation allow re-reading its own outputs as inputs?
- What are the other limitations of this study? What are your outlooks?
- While I understand that it is not the main goal of the paper, the inexperienced readers could benefit from pointing to the literature that provides guidance on the peculiarities of model calibration, e.g. see MacBean et al. 2016, van Oijen 2017, Oberpriller et al. 2021, Cameron et al. 2022. Therefore, in addition to points above, please also consider adding some discussion along those lines since the section is rather thin at the moment.P11.L13: "complementary observational constraints" such as?
P11.L16: Please elaborate some more. Is it a weakness of the study that FvCB parameters were kept constant? What was the reasoning? Is it recommended to do so? What are the expected challenges there? Are there future plans to address this?MacBean et al. 2016 https://doi.org/10.5194/gmd-9-3569-2016
van Oijen 2017 https://doi.org/10.1007/s40725-017-0069-9
Oberpriller et al. 2021 https://doi.org/10.1111/ele.13728
Cameron et al. 2022 https://doi.org/10.1111/2041-210X.14002Citation: https://doi.org/10.5194/egusphere-2025-1260-RC1 - AC1: 'Reply on RC1', Fabian Bernhard, 17 Jun 2025
-
RC2: 'Comment on egusphere-2025-1260', Anonymous Referee #2, 21 May 2025
The authors presented a Simulating Optimal FUNctioning {rsofun} R package, aiming to lower the bar of entry to ecosystem modelling and model-data integration for scientists. This R package providing the potential of efficiently model parameterization and estimation of uncertainty. The authors tested the {rsofun} R package at the site level by applying parameterization to the P-model and presenting a corresponding case study. The results demonstrate that this tool can be used to calibrate photosynthesis-related parameters of the P-model at site scale and to simulate GPP accordingly. The package shows potential to advance research in ecosystem modeling. However, further development is still required. The case study presentation lacks sufficient detail regarding key aspects of the tool, such as how model–data integration is implemented. Moreover, the core functionalities of the tool appear to rely heavily on other existing R packages. The authors are encouraged to clearly identify the unique contributions and core features developed specifically within this package—particularly whether it provides a generalized interface to invoke different ecosystem models. Overall, the {rsofun} R package is a promising tool to support ecosystem modeling studies, but it remains incomplete in its current form. Its potential for application at regional scales is still uncertain. Therefore, I recommend substantial revisions before the manuscript can be considered for acceptance.
Below are detailed comments:
[line number]: Please using continuous line numbers.
[Page 1, Line 11]: Whether site-scale simulations or parameterization need parallelizable computation?How about applying this package at a larger scale?
[Page 2, Line 58]: There is no table before this table, so please rename it as Tab. 1.
[Page 4, Line 08]: Was net radiation an output in P-model?and it is not suitable to list soil temperature in ecosystem water balance.
[Page 4, Line 14]: There is no information about spin-up period or corresponding years in Table 1, please move this sentence to corresponding section.
[Page 7, Line 41]: Why the author selected this site, and as rsofun package so computationally efficient, why not test at more sites that can represent different conditions. What is the depth of the root zone soil at this site?
[Page 7, Line 60]: If the authors mentioned analysis, please give the result in manuscript or supplementary materials.
[Page 7, Line 62]: When all parameters are used for model calibration, the multidimensional parameter space increases exponentially, potentially leading to model non-convergence. This also indicates that different values of c*, beta, tau and b0 can influence the parameterization outcomes. Meanwhile, the fixed parameters should be explicitly justified in terms of how they were determined.
[Page 8, Line 71]: Given that c* is the third most sensitive parameter, why was it not included in the calibration, while β0, which ranks second to last in sensitivity, was? The authors are requested to further clarify the rationale behind the parameter selection.
[Page 10, Line 90]: It is hard to recognise the light green band, and please explain more about the model uncertainty and parameter uncertainty.
[Page 10, Line 99]: It's dark orange or red line, the authors described as red line before.
Citation: https://doi.org/10.5194/egusphere-2025-1260-RC2 - AC2: 'Reply on RC2', Fabian Bernhard, 17 Jun 2025
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
757 | 0 | 13 | 770 | 0 | 0 |
- HTML: 757
- PDF: 0
- XML: 13
- Total: 770
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1