the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Meta-modelling of carbon fluxes from crop and grassland multi-model outputs
Abstract. We evaluated four stacking-based meta-models – Multiple Linear Regression, Random Forest, XGBoost, and XGBoost with environmental covariates (XGB+) – against the multi-model median (MMM) and best individual process-based models for gross primary production (GPP), ecosystem respiration (RECO) and net ecosystem exchange (NEE) at two cropland and two grassland sites. All meta-models were associated with improved RMSE, bias and correlation, with explained variance gains of ~10–38.5 % over MMM, largest for RECO in croplands and smallest for NEE in grasslands. Bias was nearly eliminated except at one cropland site. SHAP analysis showed that diverse individual models, not always the top performers, contributed most, and that temperature – especially for RECO in croplands and NEE in grasslands – was the dominant environmental driver, while precipitation had minor effects. These findings highlight the predictive and diagnostic advantages of stacking-based approaches over equal-weight MMM, with potential applications across agroecosystem, Earth system and environmental model ensembles.
- Preprint
(8816 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 05 Feb 2026)
-
CEC1: 'Comment on egusphere-2025-4920 - No compliance with the policy of the journal', Juan Antonio Añel, 07 Dec 2025
reply
-
AC1: 'Reply on CEC1', Nándor Fodor, 08 Dec 2025
reply
Dear Juan A. Añel,
We apologise for the oversight in not including this detail in the Code and Data Policy requirements. Following your request, the code and data related to the article has now been uploaded to the Zenodo repository: https://doi.org/10.5281/zenodo.17849931
Thank you for your patience. Sincerely yours,
Nándor FODOR
Citation: https://doi.org/10.5194/egusphere-2025-4920-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 10 Dec 2025
reply
Dear authors,
Thanks for addressing this issue so quickly. I have checked the repository and we can consider now the current version of your manuscript in compliance with the code policy of the journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-4920-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 10 Dec 2025
reply
-
AC1: 'Reply on CEC1', Nándor Fodor, 08 Dec 2025
reply
-
RC1: 'Comment on egusphere-2025-4920', Anonymous Referee #1, 23 Dec 2025
reply
This manuscript evaluates machine learning-based ensemble approaches (stacking meta-models) for improving carbon flux predictions in agricultural systems. The authors compare Multiple Linear Regression, Random Forest, XGBoost, and XGBoost with environmental covariates (XGB+) against traditional multi-model median (MMM) approaches across four sites (two croplands, two grasslands). The study reports substantial improvements and uses SHAP analysis to provide interpretability. While the topic is relevant and the interpretability focus is commendable, fundamental methodological flaws in the validation strategy undermine the reliability of the results. The inappropriate use of random train/test splits on temporally autocorrelated data, combined with limited site coverage and missing analysis of temporal structure, prevent acceptance in the current form.
Major comments
1. Validation strategy: The authors use random 70/30 train/test splits (line 238-244) on daily time-series carbon flux data without any consideration of temporal autocorrelation structure. This approach could bring potentially serious issues. Daily carbon fluxes exhibit strong temporal autocorrelation due to weather persistence, phenological continuity, and soil moisture memory. Splitting such data could be quite risky.
2. Inconsistent NEE handling: It was stated that NEE is modeled independently from GPP and RECO. No justification was provided. More importantly, it could bring some inconsistent among variables.
3. Persistent bias at C1: At this site, all models produce persistent biases. Yet, no reason was provided. It is difficult to accept that as exception.
Overall, the methodological questions seem too important. Thus, only a few minor comments are given here.
Minor comments
1. Line 190-192: Provide quantitative justification for excluding Indian site ("relatively poor temporal coverage" is vague)
2. Line 244: Specify: "randomly" - with what seed? Same split across meta-models?
3. Line 245: Specify complete XGBoost hyperparameter tuning procedure
4. Line 275: "site-level observations" - clarify this means temporal, not spatial, validation
5. Lines 295-296: Provide specific methodological justification for independent NEE modeling or change approach
6. Tables 3-5: Add statistical significance indicators comparing meta-models to MMM
7. Figure 11: Add legend explaining color intensity mapping and improve interpretabilityCitation: https://doi.org/10.5194/egusphere-2025-4920-RC1 -
RC2: 'Comment on egusphere-2025-4920', Anonymous Referee #2, 03 Jan 2026
reply
The manuscript proposes a stacking meta-modelling framework to combine outputs from multiple process-based ecosystem models for prediction of GPP, RECO and NEE using advanced machine learning approach in comparisons to the classical approach of the multi-model median, MMM. The idea is interesting, because a stacking approach can improve predictive skill and can also provide diagnostic insight into model strengths and weaknesses.
I think that in its current form, however, the manuscript requires major revisions. The two most important issues are: a) lack of conceptual clarity and terminology (the narrative is currently difficult to follow for a broad audience), and b) a week validation strategy for time-series data (I’m not completely sure that I have understood well), but a random split on daily time series risks to increase the performance of the stacked metamodel due to the strong autocorrelation in time, and c) I’m not fully convinced that T and P can be used efficiently in the stacking proves, because these variables are already used as (very important) driving variables in all the models that are a part of the ensemble.
I think that with a clearer framing and a time-aware evaluation protocol (or a robustness analysis demonstrating that conclusions hold), the paper could become a relevant contribution to the creation of high-quality simulations.
Recommendation: Major revisions.
My main concern is that (or at least I have a strong A-priori) that the validation on daily time series must be time-aware and random splitting is not adequate on its own. If training/validation is done via random 70/30 splits on daily data ( sorry if I’m wrong , I’m not really sure that validation is done in this way) , the validation set is not independent due to strong temporal autocorrelation and seasonality. This can inflate performance and may mislead readers about generalisation to new periods.
So, I invite the authors to replace or complement) random splitting with a time-aware strategy: blocked cross – validation, , contiguous hold-out blocks or similar. Due to the hyperparameters tuning, please ensure the protocol avoids optimistic biasThe Interpretation of the “Stacking+met” in my opinion needs a stronger rationale.
Adding meteorological covariates may be useful, but process models already embed meteorological forcing internally. I would like to understand iif Stacking+met is still primarily an aggregation method or whether it becomes a broader statistical correction that partially bypasses process constraints. Please, could you explain if “Stacking+met” represents conceptually: regime-dependent weighting? residual correction? hybrid modelling? Thanks, even the question seem to be theoretical only I think that is very relevant for interpretation: is it improved the combination of models, or learning a shortcut from meteorology that compensates for shared structural biases? Where possible, may be very interesting to explant the consideration about possible improvements in specific regimes (e.g., dry/wet, warm/cold, extremes).In my opinion modelling NEE independently requires justification. If NEE is modelled separately rather than derived from RECO–GPP, this breaks a key consistency relationship that many readers will expect, so, there is the need to clearly justify the methodological reasons for independent NEE modelling, discussing the trade-off between improved NEE estimation at the cost of the loss of consistency among GPP/RECO/NEE.
There is the need to Clarify concepts and use consistent terminology (ensemble / stacking / meta-modelling). In my opinion the Introduction and M&M mix related concepts in a way that makes it hard to understand what is exactly being proposed and what is new relative to existing practice. In my opinion, early in the Introduction, there is the need to provide a single, clear definition of what is the baseline (MMM), what is the stacking here (there is a confusion between techniques that use stacking (e.g. RF) and the creation of a model stacking using that techniques). This approach will allow to use after an uniform and simple naming scheme.
Please consider removing/shortening conceptual digressions that are not needed for the core message (e.g., broad “no free lunch” statements), unless directly tied to the study design and results, or put a detailed mathematical support in supplemental materials.Please consider too these minor comments: 1) consider a clearer separation into process-model ensemble generation, meta-modelling approaches, validation protocol, end interpretability and diagnostics. 2) Be uniform in Acronyms and naming: define each acronym once and use consistently throughout (MMM, stacking, XGB, etc.). 3) when using SHAP, specify exactly what dataset it refers to (site/flux/season, random selection). 4) ensure that the validation scheme is explicit in figure captions
Citation: https://doi.org/10.5194/egusphere-2025-4920-RC2
Data sets
Experimental and simulated data for crop and grassland production and carbon-nitrogen fluxes G. Bellochi et al. https://doi.org/10.7910/DVN/5TO4HE
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 265 | 102 | 40 | 407 | 20 | 18 |
- HTML: 265
- PDF: 102
- XML: 40
- Total: 407
- BibTeX: 20
- EndNote: 18
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. Also, please include in the new repository the relevant primary input and output data used to perform your work.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor