the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CROMES v1.0: A flexible CROp Model Emulator Suite for climate impact assessment
Abstract. Global gridded crop models (GGCMs) are simulation tools designed for global, spatially explicit estimation of crop productivity and associated externalities. Key areas for their application are climate impact and adaptation studies. As GGCMs are typically computationally costly and require comprehensive data pre- and post-processing, GGCM emulators are gaining increasing popularity. Earlier emulators have typically been published pre-trained on synthetic weather and management combinations. Here, we present a novel computational pipeline CROp Model Emulator Suite (CROMES) v1.0 that serves for flexibly training GGCM emulators on data commonly available from GGCM simulations. Essentially, CROMES consists of modules to (1) process climate data from daily resolution netCDF files to (sub-)growing season aggregates as climate features, (2) combine various feature types (climate, soil, crop management), (3) train emulators using machine-learning algorithms, and (4) produce predictions. Exemplary, we apply CROMES to train emulators on simulations for rainfed maize from the GGCM EPIC-IIASA and climate projections from a single GCM to subsequently test their skill in predicting crop yields for unseen climate projections from other GCMs. Depending on the training and target data, the regression statistics between GGCM simulations and predictions across all points in time and space are in the ranges R2=0.97 to 0.98, slope=0.99 to 1.01, and intercept=-0.06 to +0.06. The RMSE ranges between 0.49 and 0.65 t ha-1. Spatially, patterns are evident with lowest performance in (semi-)arid regions where aggregation of weather data may result in higher information loss while permanent crop growth limitations may hamper evaluation statistics as well. The gain in computational speed for predictions is at more than an order of magnitude with time required to produce target features and subsequent predictions at about 30min on common hardware. We expect CROMES to be of utility in covering more comprehensively uncertainty in climate impact projections, evaluations of adaptation options, and spatio-temporal assessments of crop productivity.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Geoscientific Model Development. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(1823 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1823 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2025-862', Anonymous Referee #1, 14 May 2025
This article presents an emulator for global gridded crop models. Although emulators are now commonly used in climate impact assessment studies in lieu of full crop models, the novelty in this article originates from their methodology and not necessarily the approach itself. The authors demonstrate a faster and more efficient pipeline to extract climate features and train the emulators, which yielded an advantage of at least a magnitude over gridded climate models. Although the computational advantage is not explicitly compared to other emulators, the authors sufficiently discuss prior studies and acknowledge the fact that it is not straightforward to do such benchmarking comparisons due to differences in crop models, inputs required, and algorithms being used. The article is well written and is sufficiently detailed to allow for reproducible results with its supplementary code and data, and meets the quality standards for publication.
Citation: https://doi.org/10.5194/egusphere-2025-862-RC1 -
RC2: 'Comment on egusphere-2025-862', Jonathan Richetti, 23 Jun 2025
The manuscript presents gridded crop growth model emulators for climate change projections analysis. It seems to have a significant improvement in computational time performance without major losses in accuracy. However, there are a few issues that is not clear to me that I would encourage the authors to address prior to publication.
To really nail the flow of this work, overall, the paragraphs lack a concluding (so what?) sentence in the end. For example, the first paragraph is missing a closing sentence for the idea that GGCMs are computationally demanding, but there is hope with emulators. Something like: “This high computational cost of GGCM hinders more comprehensive scenario analysis and prevents the quick adoption of new climatic datasets that can be addressed with emulators”. You do this in the third paragraph; I would recommend that authors do this in all paragraphs. This is essential in the introduction and discussion sections.
There are a few typos, double check with a word processor or something, e.g. Line 44, you mean 1 km, not 1k. Line 157 has ‘is’ doubled (“PET is (see sect. 2.4.3 for details) is used…”), or in line 201, where the ‘i’ should be italic.
Clarity is needed for the training and evaluation of the emulators. In Section 2.6, RMSE of yield? So, is it 0.447 t/ha? What is the cut for the climate? Was the 4-fold CV randomly sampled, or were the folds defined based on time or region? The following section (2.7) states that the evaluation is performed across all individual locations and years. I think that means there is data leaking between the training of the emulators and its evaluation. I understand that there are computational limits to this study but it is not clear to me how this was actually done. There is evidence that one can draw a lucky strike on the splits, and that agricultural problems need specific cross-validation strategies depending on the uses (https://doi.org/10.1016/j.compag.2023.107642 & https://doi.org/10.1007/s11119-024-10212-2 & https://doi.org/10.1175/AIES-D-23-0026.1 ). I would expect models to be trained and the 4-fold CV for hyperparameterisation to be performed with the 1980-2014 data, and evaluation to be performed with the future projections.
Looking at the results, it is expected that the ML will have a nearly perfect fit with the training data, and if you’re using GCM data to train future scenarios, what is the value of the emulators? You’ve already run the GCM on that scenario! In this sense, section 3.1 should be supplementary materials.
If we were to use CROMES in a scenario the ML never saw, how would that perform? Because then we can say, look CROMES allows us to run x more scenarios in the same time GCMs would take to run one. This performance boost does not reduce the quality of the information.
After looking back at the M&M I found one tiny sentence stating that training and evaluating the MLs are in different GCMs. Make this more obvious and clear, having to come back to M&M and go back and forth means this is not clear (or that I’m dumb, which might be true as well, but let’s make it easier for the readers).
I would recommend having a session. “Training, variable importance, and evaluation” where you clearly state the different experiments to assess the MLs performance. When you say that the evaluation was performed with the GCMs not used in the training, is it a CV scheme where you leave-a-GCM-out or is there a GCMs used for training and a set for evaluation?
Good that you made a comment on the negative yields and the fragility of R2 with the great amount of points.
Ok, I think authors could concentrate on highlighting where CROMES performs the worst with bigger plots version of Figures 4 and 5. Currently, you can barely see any difference. All the rest could be supplementary materials. Authors want to state something like: CROME helps with GCM. The worst case is this and this, here and there, everything else is better (see supplementary materials). This way the important results can be clearly observed and the key points made without losing any of the detail, as is, it is a bit hard to follow what are the key findings of the section as opposed to the Feature Selection where it is much easier to understand that the most important variables for the emulators are x, y, and z. Same for the computational performance.
Section on computational demand could be one paragraph concentrating on the performance, there are a lot of text already present in the M&M.
The discussion is a bit shallow.
First paragraph, concluding that low-yield areas matter less, I would say they might matter more, as the food security of those areas is more compromised! They might be of less economic importance on a global scale, but those are the areas that might suffer more with decreasing food security due to climate change!
Second paragraph, starting on line 535. So what? What is the conclusion of all the various studies in relation to this? Sweet et al. (2023) state that different CV schemes impact the outcome. What does that means in this study? I think you need to talk about what you have chosen and how that impacts the performance, i.e. why yours look so good?
Third paragraph, why you start talking about what is not in scope instead of stating what you found and its implications? CROMES incorporates phenology like all the crop models, are these shown as key important variables? What does that means? Why should we care?
Last paragraph, ok. Why do I want to make quicker GCM simulations? What is the actual benefit for the community?
Minor comments:
Line 145: I would expect times for this in the results
Figure 1. Make the text in the boxes bigger. There's no point having them if they're it's hard or impossible to read.
Line 219: from seed/planting to emergence is 100oC days? 100GDD growing degree-days?
Line 224: What does the cut-off mean? 21 days after planting, the maize will mature? I would expect, even for a short-season maize, to take at least 30 days to reach reproductive stages. Looking at Figure 2, the cut-off is not after planting. Please clarify this.
Line 345: How much is sufficient N? This is needed for replicability of the study, as you highlighted, it is different from Jägermeyr et al. (2021).
Line 496: you bother to state what GPU stands for here, but not on the crazy acronyms of the GCMs. I think it is more likely for an average reader to be aware of what a GPU is than what UKESM1-0-LL is. Further, most of this paragraph is redundant.
Figure 7: Being pedantic, the colour scheme could be more harmonious. Warm tones for EPI and cold for CROMES? Add a bit of space between CROMES and EPIC in the tasks and the netCDF to binary conversion, which is the same for both.
Citation: https://doi.org/10.5194/egusphere-2025-862-RC2 - AC1: 'Combined author response for egusphere-2025-862', Christian Folberth, 21 Jul 2025
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2025-862', Anonymous Referee #1, 14 May 2025
This article presents an emulator for global gridded crop models. Although emulators are now commonly used in climate impact assessment studies in lieu of full crop models, the novelty in this article originates from their methodology and not necessarily the approach itself. The authors demonstrate a faster and more efficient pipeline to extract climate features and train the emulators, which yielded an advantage of at least a magnitude over gridded climate models. Although the computational advantage is not explicitly compared to other emulators, the authors sufficiently discuss prior studies and acknowledge the fact that it is not straightforward to do such benchmarking comparisons due to differences in crop models, inputs required, and algorithms being used. The article is well written and is sufficiently detailed to allow for reproducible results with its supplementary code and data, and meets the quality standards for publication.
Citation: https://doi.org/10.5194/egusphere-2025-862-RC1 -
RC2: 'Comment on egusphere-2025-862', Jonathan Richetti, 23 Jun 2025
The manuscript presents gridded crop growth model emulators for climate change projections analysis. It seems to have a significant improvement in computational time performance without major losses in accuracy. However, there are a few issues that is not clear to me that I would encourage the authors to address prior to publication.
To really nail the flow of this work, overall, the paragraphs lack a concluding (so what?) sentence in the end. For example, the first paragraph is missing a closing sentence for the idea that GGCMs are computationally demanding, but there is hope with emulators. Something like: “This high computational cost of GGCM hinders more comprehensive scenario analysis and prevents the quick adoption of new climatic datasets that can be addressed with emulators”. You do this in the third paragraph; I would recommend that authors do this in all paragraphs. This is essential in the introduction and discussion sections.
There are a few typos, double check with a word processor or something, e.g. Line 44, you mean 1 km, not 1k. Line 157 has ‘is’ doubled (“PET is (see sect. 2.4.3 for details) is used…”), or in line 201, where the ‘i’ should be italic.
Clarity is needed for the training and evaluation of the emulators. In Section 2.6, RMSE of yield? So, is it 0.447 t/ha? What is the cut for the climate? Was the 4-fold CV randomly sampled, or were the folds defined based on time or region? The following section (2.7) states that the evaluation is performed across all individual locations and years. I think that means there is data leaking between the training of the emulators and its evaluation. I understand that there are computational limits to this study but it is not clear to me how this was actually done. There is evidence that one can draw a lucky strike on the splits, and that agricultural problems need specific cross-validation strategies depending on the uses (https://doi.org/10.1016/j.compag.2023.107642 & https://doi.org/10.1007/s11119-024-10212-2 & https://doi.org/10.1175/AIES-D-23-0026.1 ). I would expect models to be trained and the 4-fold CV for hyperparameterisation to be performed with the 1980-2014 data, and evaluation to be performed with the future projections.
Looking at the results, it is expected that the ML will have a nearly perfect fit with the training data, and if you’re using GCM data to train future scenarios, what is the value of the emulators? You’ve already run the GCM on that scenario! In this sense, section 3.1 should be supplementary materials.
If we were to use CROMES in a scenario the ML never saw, how would that perform? Because then we can say, look CROMES allows us to run x more scenarios in the same time GCMs would take to run one. This performance boost does not reduce the quality of the information.
After looking back at the M&M I found one tiny sentence stating that training and evaluating the MLs are in different GCMs. Make this more obvious and clear, having to come back to M&M and go back and forth means this is not clear (or that I’m dumb, which might be true as well, but let’s make it easier for the readers).
I would recommend having a session. “Training, variable importance, and evaluation” where you clearly state the different experiments to assess the MLs performance. When you say that the evaluation was performed with the GCMs not used in the training, is it a CV scheme where you leave-a-GCM-out or is there a GCMs used for training and a set for evaluation?
Good that you made a comment on the negative yields and the fragility of R2 with the great amount of points.
Ok, I think authors could concentrate on highlighting where CROMES performs the worst with bigger plots version of Figures 4 and 5. Currently, you can barely see any difference. All the rest could be supplementary materials. Authors want to state something like: CROME helps with GCM. The worst case is this and this, here and there, everything else is better (see supplementary materials). This way the important results can be clearly observed and the key points made without losing any of the detail, as is, it is a bit hard to follow what are the key findings of the section as opposed to the Feature Selection where it is much easier to understand that the most important variables for the emulators are x, y, and z. Same for the computational performance.
Section on computational demand could be one paragraph concentrating on the performance, there are a lot of text already present in the M&M.
The discussion is a bit shallow.
First paragraph, concluding that low-yield areas matter less, I would say they might matter more, as the food security of those areas is more compromised! They might be of less economic importance on a global scale, but those are the areas that might suffer more with decreasing food security due to climate change!
Second paragraph, starting on line 535. So what? What is the conclusion of all the various studies in relation to this? Sweet et al. (2023) state that different CV schemes impact the outcome. What does that means in this study? I think you need to talk about what you have chosen and how that impacts the performance, i.e. why yours look so good?
Third paragraph, why you start talking about what is not in scope instead of stating what you found and its implications? CROMES incorporates phenology like all the crop models, are these shown as key important variables? What does that means? Why should we care?
Last paragraph, ok. Why do I want to make quicker GCM simulations? What is the actual benefit for the community?
Minor comments:
Line 145: I would expect times for this in the results
Figure 1. Make the text in the boxes bigger. There's no point having them if they're it's hard or impossible to read.
Line 219: from seed/planting to emergence is 100oC days? 100GDD growing degree-days?
Line 224: What does the cut-off mean? 21 days after planting, the maize will mature? I would expect, even for a short-season maize, to take at least 30 days to reach reproductive stages. Looking at Figure 2, the cut-off is not after planting. Please clarify this.
Line 345: How much is sufficient N? This is needed for replicability of the study, as you highlighted, it is different from Jägermeyr et al. (2021).
Line 496: you bother to state what GPU stands for here, but not on the crazy acronyms of the GCMs. I think it is more likely for an average reader to be aware of what a GPU is than what UKESM1-0-LL is. Further, most of this paragraph is redundant.
Figure 7: Being pedantic, the colour scheme could be more harmonious. Warm tones for EPI and cold for CROMES? Add a bit of space between CROMES and EPIC in the tasks and the netCDF to binary conversion, which is the same for both.
Citation: https://doi.org/10.5194/egusphere-2025-862-RC2 - AC1: 'Combined author response for egusphere-2025-862', Christian Folberth, 21 Jul 2025
Peer review completion


Journal article(s) based on this preprint
Data sets
Sample data for training EPIC-IIASA global gridded crop model emulators Christian Folberth et al. https://doi.org/10.5281/zenodo.14894075
Model code and software
CROMES v1.0: A flexible CROp Model Emulator Suite for climate impact assessment - Frozen code repository and example for training EPIC-IIASA global gridded crop model emulators Christian Folberth et al. https://doi.org/10.5281/zenodo.14901127
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
679 | 81 | 19 | 779 | 17 | 32 |
- HTML: 679
- PDF: 81
- XML: 19
- Total: 779
- BibTeX: 17
- EndNote: 32
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Artem Baklanov
Nikolay Khabarov
Thomas Oberleitner
Juraj Balkovič
Rastislav Skalský
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1823 KB) - Metadata XML