the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A food crop yield emulator for integration in the compact Earth system model OSCAR (OSCAR-crop v1.0)
Abstract. This paper presents the development, validation, and preliminary application of a sub-national scale crop yield emulator to be integrated into the compact Earth system model OSCAR. The emulator simulates yields for four major food crops: maize, rice (two growing seasons), soybean, and wheat (spring and winter varieties), in alignment with the Agricultural Model Intercomparison and Improvement Project (AgMIP) and the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) framework. Key drivers include atmospheric CO2 concentration (represented as C), growing season temperature (T), water availability (W), and nitrogen fertilization (N). The emulator is trained on an ensemble of process-based crop model simulations from AgMIP’s Global Gridded Crop Model Intercomparison Projects (GGCMI), which is based on the ISIMIP Phase 3 protocol. The crop models used bias-corrected historical and future climate scenarios under fixed socioeconomic conditions, to estimate yield responses under various scenarios until the end of this century. Evaluation of the emulator against the crop model outputs demonstrates the emulator's ability to replicate complex model behavior with high fidelity. Additionally, the emulator-derived yield sensitivities to CO2 and temperature are consistent with those observed in field experiments, reinforcing its empirical robustness. Historical simulations incorporating time-varying nitrogen inputs show significantly improved agreement with FAO yield statistics, underscoring the emulator’s reliability over the historical period and its potential for future impact assessments. This study provides a computationally efficient yet empirically grounded tool for representing crop yield responses, bridging the gap between complex crop models and statistic models. The developed crop emulator facilitates probabilistic projections across large ensembles of climatic and socio-economic scenarios at policy-relevant, sub-national scales. Potential applications include integrated assessments of future food security under climate and land-use change, as well as evaluations of bioenergy with carbon capture and storage (BECCS) potential from crop residues.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Geoscientific Model Development.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(2630 KB) - Metadata XML
-
Supplement
(9695 KB) - BibTeX
- EndNote
Status: closed (peer review stopped)
- RC1: 'Comment on egusphere-2025-4805', Anonymous Referee #1, 21 Nov 2025
-
RC2: 'Comment on egusphere-2025-4805', Eun-Sung Chung, 28 Jan 2026
This study develops a model that designs, calibrates, and validates a crop yield emulator and integrates it as a module within the compact Earth system model or simple climate model OSCAR. The model design and intended use are clearly presented, featuring national and six sub national resolutions, annual temporal resolution, and two modes of operation, thus fitting well within the scope of Geoscientific Model Development. In addition, the calibration framework explicitly specifies a decomposition of response functions with four drivers CO2, growing season temperature, precipitation or water availability, and nitrogen, and a selection procedure based on statistical criteria, while validation is extended to comparisons against ISIMIP3b, ISIMIP3a, experimental evidence including FACE, OTC, and warming experiments, and FAO yield statistics. Nevertheless, because the authors should further strengthen the clarity of model structure and assumptions, the completeness of the reproducibility package, and the independence and generalizability of the validation, I provide the following major comments.
- The model decomposes total yield response as the product of responses to CO2, temperature, precipitation, and nitrogen as in Eq. 6. While this structure offers advantages in computational efficiency and interpretability, it may fundamentally weaken interaction terms and nonlinear couplings. Did the authors assess how much interaction remains in the ISIMIP3b and ISIMIP3a data, for example via residual structure or systematic biases in particular regimes. If not, the manuscript would be more convincing if it included, even briefly, error distributions in regimes where interactions are expected to be large such as high CO2, high temperature, and dry conditions, or a simple comparison of the performance and complexity trade off when allowing selective interaction terms instead of a purely multiplicative decomposition.
- The authors select functional forms using R squared and BIC and explore a large number of combinations across regions, crops, irrigation, and drivers. Did the authors evaluate via some form of cross validation how stable the selected functional forms are across scenarios such as different SSPs and time periods near future versus end of century. Although the statement that BIC helps suppress overfitting is reasonable, it is important to more clearly demonstrate robustness using an explicit train and validation split.
- The manuscript states that extreme value regions represent only 0.02 percent of the dataset but can distort aggregated results, so masking is applied. At the same time, the parameters for those regions are retained, reflecting the original model, and the choice is left to the user. The authors also list the number of extreme regions for specific GGCM, crop, and irrigation combinations. In that case, could the authors provide quantitative examples, a sensitivity analysis, showing which regions drive the differences and by how much, and how global and regional results change depending on whether such regions are included or excluded.
- For the fully irrigated firr case, the manuscript sets the water stress term to 1. However, in reality, irrigation can be constrained by water availability, competition for water resources, and infrastructure limitations, so conditions are not always fully unconstrained. Should firr here be understood strictly as fully irrigated under the ISIMIP experimental design. If so, it would be helpful to emphasize more prominently in the Discussion the caveats for real world applications, especially in regions projected to face future water scarcity.
- The authors explain that GGCM based samples are insufficient to constrain nitrogen responses, so they rely on response functions derived from long term field experiments. They also refit cereal responses using forms such as Michaelis Menten, Mitscherlich, and George, selected using BIC and R squared, noting that limited samples lead to non region specific responses. How do the authors address potential biases arising from this hybrid structure, where climate responses come from GGCMs while nitrogen responses are taken from experimental meta functions.
- The manuscript assumes that nitrogen is not a yield determining factor for soybean and therefore sets the N response to a constant. However, in the FAO comparison, soybean yields tend to be overestimated, and the manuscript explicitly mentions this assumption as one possible cause. In which regions does the assumption soy N response equals 1 cause the most pronounced problems. As a minimal alternative, the decision would be far more convincing if the authors tested a simple form such as a weak saturating response with very small sensitivity or a simple upper and lower constraint depending on N input level and compared performance.
- The manuscript reports that in sample global correlations are mostly above 0.8, and out of sample differences are generally within plus or minus 0.5 ton per hectare. However, the out of sample test uses different climate forcing while still effectively reproducing the source GGCM simulation space. Do the authors have plans or results for a structural independence test, for example leave one GGCM out calibration and validation, to better assess generalizability.
- The authors state that they use a 5 year moving average in calibration to filter short term variability, and they acknowledge that extreme event impacts are not considered and that only multi year trends are represented. Could the authors clarify more explicitly the scope of applicability and non recommended uses, particularly what misunderstandings might arise if users apply this model to risk and extremes assessments such as heatwave damage.
Minor comments
- In the Fig. 10 caption, dotted bule lines appears to be a typo and should be blue.
Citation: https://doi.org/10.5194/egusphere-2025-4805-RC2
Status: closed (peer review stopped)
-
RC1: 'Comment on egusphere-2025-4805', Anonymous Referee #1, 21 Nov 2025
In this manuscript, the authors present a crop yield emulator of the Agricultural Model Intercomparison and Improvement Project (AgMIP) global gridded crop models (GGCMs). Their crop yield emulator was designed to be driven by CTWN output from OSCAR for novel scenarios, but was trained on and validated using publicly available model intercomparison data from GGCMs. The authors describe the model development, validation with out-of-sample GGCM results, as well as manipulative field experiments. There is a well-established need for crop yield emulators, and it is exciting to see this field growing. However, as it currently stands, I think revisions are needed to improve the clarity and readability of the manuscript. Line by line questions are included below but one overarching comment is that it is not particular clear what and how the emulator is actually connected with/related to OSCAR.
Isn’t OSCAR now on version > V3 (Gasser et al. 2020)? How does OSCAR-crop v1.0 relate to it? Is the crop emulator standalone from OSCAR, which is why it is only on v1? Will it be included in a future OSCAR release? Some clarity on whether the emulator is a module/component of OSCAR or fully independent, that is, soft-coupled with OSCAR, would be helpful. Is there a specific version of OSCAR that the emulator is compatible with? Or is it also backwards compatible with the V1 OSCAR release? These sorts of details, and the possibility of coupling the crop emulator with other RCMs, would be helpful to readers and potential users. Were any OSCAR driven emulation results included in manuscript?
Please see below for some of the questions/comments that came to mind as I was reading.
L26 : “to estimate yield responses under various scenarios” is this under various future climate scenarios? Or are socioeconomic conditions also part of the prediction process?
L33: “bridging the gap between complex crop models and statistic models” what do the atuhors mean by this gap?
L43-48: In the chunk of text starting with “In contrast” and ending with “(Folberth et al., 2025)”. As it currently reads, with where manuscripts are cited, it seems like the only other existing crop yield emulator is documented in Abramoff et al., 2023, but other crop yield emulators exist. The authors should cite more than one other crop yield emulator in this section or clarify why this emulator is so unique among them.
L56 - 60: In this section of text, the authors have been discussing a mix of crop emulators and crop yield simulations generated by the complex crop models. The second half of the paragraph is hard to follow because it is unclear what type of model is being discussed. For example, with the sentence “Despite the wide range of outcomes due to different model structures, parameterization schemes, calibration processes and input data quality (Folberth et al., 2019;Müller et al., 2024), these projections exhibit reduced uncertainty for rice and soybean and enhanced robustness for maize and wheat (Jägermeyr et al., 2021)” are the authors referring to the the complex crop models participating in GGCMI Phase 3 as having a wide range of outcomes? Or were these national crop yield emulators that the authors are building up with this work by developing a sub-national crop emulator?
L74: “It emulates crop yields at a national level for most countries, with sub-national outputs in six large-area countries (Australia, Brazil, Canada, China, Russia, and the USA).” How many regions in total? Is this enough to be considered subnational modeling capabilities? As described in L60?
L99: “The input variables provided in the repository”, what do you mean by the input variables, are you referring to the ISIMIP data that the crop emulator will use as inputs? Or are these data included in the emulator repository for emulator users?
Equations (3 & 4): Where do the weights between the regional climate and crop-specific/regional growing season crop come from? Is Oscar producing the growing-season regional temperatures and precipitation?
L145 - Why would the concatenation matter? If doing global to regional linear pattern scaling?
L175 - Is a consistent functional form used across crop types? Or is it the best emulator per region x crop type?
Figure 3: Is the y-axis RCCO2? Or is it RC? It might be helpful to include that labeling on the y-axis
Equation 9, which subtracts the perception pi control from the climate scenario, appears inconsistent with the relative perception changes described above.
~ L340 For the N fertilizer effect, could the authors clarify whether they are conducting the field experiments following the van Grinsven et al. (2022) or if they are using data published from this field experiment? Given the data limitations and the assumptions that had to be made, is it necessary to include this term in the emulator?
In Figure 9: What do the symbol makers indicate? Is it the distribution of the global average of the sub-national absolute differences? Or is it the sub-national absolute differences?
Section 5.1 is difficult to follow. Are the authors comparing emulator results with experimental observations? Is the emulator being used to predict the experimental change in yield response? Or are the field experiment results being incorporated into the emulator by “ground[ing] the emulator’s projections in real-world experimental evidence”?Furthermore, it is not entirely clear how the MC relates to the observational/field experiments.
Citation: https://doi.org/10.5194/egusphere-2025-4805-RC1 -
RC2: 'Comment on egusphere-2025-4805', Eun-Sung Chung, 28 Jan 2026
This study develops a model that designs, calibrates, and validates a crop yield emulator and integrates it as a module within the compact Earth system model or simple climate model OSCAR. The model design and intended use are clearly presented, featuring national and six sub national resolutions, annual temporal resolution, and two modes of operation, thus fitting well within the scope of Geoscientific Model Development. In addition, the calibration framework explicitly specifies a decomposition of response functions with four drivers CO2, growing season temperature, precipitation or water availability, and nitrogen, and a selection procedure based on statistical criteria, while validation is extended to comparisons against ISIMIP3b, ISIMIP3a, experimental evidence including FACE, OTC, and warming experiments, and FAO yield statistics. Nevertheless, because the authors should further strengthen the clarity of model structure and assumptions, the completeness of the reproducibility package, and the independence and generalizability of the validation, I provide the following major comments.
- The model decomposes total yield response as the product of responses to CO2, temperature, precipitation, and nitrogen as in Eq. 6. While this structure offers advantages in computational efficiency and interpretability, it may fundamentally weaken interaction terms and nonlinear couplings. Did the authors assess how much interaction remains in the ISIMIP3b and ISIMIP3a data, for example via residual structure or systematic biases in particular regimes. If not, the manuscript would be more convincing if it included, even briefly, error distributions in regimes where interactions are expected to be large such as high CO2, high temperature, and dry conditions, or a simple comparison of the performance and complexity trade off when allowing selective interaction terms instead of a purely multiplicative decomposition.
- The authors select functional forms using R squared and BIC and explore a large number of combinations across regions, crops, irrigation, and drivers. Did the authors evaluate via some form of cross validation how stable the selected functional forms are across scenarios such as different SSPs and time periods near future versus end of century. Although the statement that BIC helps suppress overfitting is reasonable, it is important to more clearly demonstrate robustness using an explicit train and validation split.
- The manuscript states that extreme value regions represent only 0.02 percent of the dataset but can distort aggregated results, so masking is applied. At the same time, the parameters for those regions are retained, reflecting the original model, and the choice is left to the user. The authors also list the number of extreme regions for specific GGCM, crop, and irrigation combinations. In that case, could the authors provide quantitative examples, a sensitivity analysis, showing which regions drive the differences and by how much, and how global and regional results change depending on whether such regions are included or excluded.
- For the fully irrigated firr case, the manuscript sets the water stress term to 1. However, in reality, irrigation can be constrained by water availability, competition for water resources, and infrastructure limitations, so conditions are not always fully unconstrained. Should firr here be understood strictly as fully irrigated under the ISIMIP experimental design. If so, it would be helpful to emphasize more prominently in the Discussion the caveats for real world applications, especially in regions projected to face future water scarcity.
- The authors explain that GGCM based samples are insufficient to constrain nitrogen responses, so they rely on response functions derived from long term field experiments. They also refit cereal responses using forms such as Michaelis Menten, Mitscherlich, and George, selected using BIC and R squared, noting that limited samples lead to non region specific responses. How do the authors address potential biases arising from this hybrid structure, where climate responses come from GGCMs while nitrogen responses are taken from experimental meta functions.
- The manuscript assumes that nitrogen is not a yield determining factor for soybean and therefore sets the N response to a constant. However, in the FAO comparison, soybean yields tend to be overestimated, and the manuscript explicitly mentions this assumption as one possible cause. In which regions does the assumption soy N response equals 1 cause the most pronounced problems. As a minimal alternative, the decision would be far more convincing if the authors tested a simple form such as a weak saturating response with very small sensitivity or a simple upper and lower constraint depending on N input level and compared performance.
- The manuscript reports that in sample global correlations are mostly above 0.8, and out of sample differences are generally within plus or minus 0.5 ton per hectare. However, the out of sample test uses different climate forcing while still effectively reproducing the source GGCM simulation space. Do the authors have plans or results for a structural independence test, for example leave one GGCM out calibration and validation, to better assess generalizability.
- The authors state that they use a 5 year moving average in calibration to filter short term variability, and they acknowledge that extreme event impacts are not considered and that only multi year trends are represented. Could the authors clarify more explicitly the scope of applicability and non recommended uses, particularly what misunderstandings might arise if users apply this model to risk and extremes assessments such as heatwave damage.
Minor comments
- In the Fig. 10 caption, dotted bule lines appears to be a typo and should be blue.
Citation: https://doi.org/10.5194/egusphere-2025-4805-RC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 463 | 102 | 34 | 599 | 80 | 22 | 27 |
- HTML: 463
- PDF: 102
- XML: 34
- Total: 599
- Supplement: 80
- BibTeX: 22
- EndNote: 27
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
In this manuscript, the authors present a crop yield emulator of the Agricultural Model Intercomparison and Improvement Project (AgMIP) global gridded crop models (GGCMs). Their crop yield emulator was designed to be driven by CTWN output from OSCAR for novel scenarios, but was trained on and validated using publicly available model intercomparison data from GGCMs. The authors describe the model development, validation with out-of-sample GGCM results, as well as manipulative field experiments. There is a well-established need for crop yield emulators, and it is exciting to see this field growing. However, as it currently stands, I think revisions are needed to improve the clarity and readability of the manuscript. Line by line questions are included below but one overarching comment is that it is not particular clear what and how the emulator is actually connected with/related to OSCAR.
Isn’t OSCAR now on version > V3 (Gasser et al. 2020)? How does OSCAR-crop v1.0 relate to it? Is the crop emulator standalone from OSCAR, which is why it is only on v1? Will it be included in a future OSCAR release? Some clarity on whether the emulator is a module/component of OSCAR or fully independent, that is, soft-coupled with OSCAR, would be helpful. Is there a specific version of OSCAR that the emulator is compatible with? Or is it also backwards compatible with the V1 OSCAR release? These sorts of details, and the possibility of coupling the crop emulator with other RCMs, would be helpful to readers and potential users. Were any OSCAR driven emulation results included in manuscript?
Please see below for some of the questions/comments that came to mind as I was reading.
L26 : “to estimate yield responses under various scenarios” is this under various future climate scenarios? Or are socioeconomic conditions also part of the prediction process?
L33: “bridging the gap between complex crop models and statistic models” what do the atuhors mean by this gap?
L43-48: In the chunk of text starting with “In contrast” and ending with “(Folberth et al., 2025)”. As it currently reads, with where manuscripts are cited, it seems like the only other existing crop yield emulator is documented in Abramoff et al., 2023, but other crop yield emulators exist. The authors should cite more than one other crop yield emulator in this section or clarify why this emulator is so unique among them.
L56 - 60: In this section of text, the authors have been discussing a mix of crop emulators and crop yield simulations generated by the complex crop models. The second half of the paragraph is hard to follow because it is unclear what type of model is being discussed. For example, with the sentence “Despite the wide range of outcomes due to different model structures, parameterization schemes, calibration processes and input data quality (Folberth et al., 2019;Müller et al., 2024), these projections exhibit reduced uncertainty for rice and soybean and enhanced robustness for maize and wheat (Jägermeyr et al., 2021)” are the authors referring to the the complex crop models participating in GGCMI Phase 3 as having a wide range of outcomes? Or were these national crop yield emulators that the authors are building up with this work by developing a sub-national crop emulator?
L74: “It emulates crop yields at a national level for most countries, with sub-national outputs in six large-area countries (Australia, Brazil, Canada, China, Russia, and the USA).” How many regions in total? Is this enough to be considered subnational modeling capabilities? As described in L60?
L99: “The input variables provided in the repository”, what do you mean by the input variables, are you referring to the ISIMIP data that the crop emulator will use as inputs? Or are these data included in the emulator repository for emulator users?
Equations (3 & 4): Where do the weights between the regional climate and crop-specific/regional growing season crop come from? Is Oscar producing the growing-season regional temperatures and precipitation?
L145 - Why would the concatenation matter? If doing global to regional linear pattern scaling?
L175 - Is a consistent functional form used across crop types? Or is it the best emulator per region x crop type?
Figure 3: Is the y-axis RCCO2? Or is it RC? It might be helpful to include that labeling on the y-axis
Equation 9, which subtracts the perception pi control from the climate scenario, appears inconsistent with the relative perception changes described above.
~ L340 For the N fertilizer effect, could the authors clarify whether they are conducting the field experiments following the van Grinsven et al. (2022) or if they are using data published from this field experiment? Given the data limitations and the assumptions that had to be made, is it necessary to include this term in the emulator?
In Figure 9: What do the symbol makers indicate? Is it the distribution of the global average of the sub-national absolute differences? Or is it the sub-national absolute differences?
Section 5.1 is difficult to follow. Are the authors comparing emulator results with experimental observations? Is the emulator being used to predict the experimental change in yield response? Or are the field experiment results being incorporated into the emulator by “ground[ing] the emulator’s projections in real-world experimental evidence”?Furthermore, it is not entirely clear how the MC relates to the observational/field experiments.