the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Meta-modelling of carbon fluxes from crop and grassland multi-model outputs
Abstract. We evaluated four stacking-based meta-models – Multiple Linear Regression, Random Forest, XGBoost, and XGBoost with environmental covariates (XGB+) – against the multi-model median (MMM) and best individual process-based models for gross primary production (GPP), ecosystem respiration (RECO) and net ecosystem exchange (NEE) at two cropland and two grassland sites. All meta-models were associated with improved RMSE, bias and correlation, with explained variance gains of ~10–38.5 % over MMM, largest for RECO in croplands and smallest for NEE in grasslands. Bias was nearly eliminated except at one cropland site. SHAP analysis showed that diverse individual models, not always the top performers, contributed most, and that temperature – especially for RECO in croplands and NEE in grasslands – was the dominant environmental driver, while precipitation had minor effects. These findings highlight the predictive and diagnostic advantages of stacking-based approaches over equal-weight MMM, with potential applications across agroecosystem, Earth system and environmental model ensembles.
- Preprint
(8816 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 05 Feb 2026)
-
CEC1: 'Comment on egusphere-2025-4920 - No compliance with the policy of the journal', Juan Antonio Añel, 07 Dec 2025
reply
-
AC1: 'Reply on CEC1', Nándor Fodor, 08 Dec 2025
reply
Dear Juan A. Añel,
We apologise for the oversight in not including this detail in the Code and Data Policy requirements. Following your request, the code and data related to the article has now been uploaded to the Zenodo repository: https://doi.org/10.5281/zenodo.17849931
Thank you for your patience. Sincerely yours,
Nándor FODOR
Citation: https://doi.org/10.5194/egusphere-2025-4920-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 10 Dec 2025
reply
Dear authors,
Thanks for addressing this issue so quickly. I have checked the repository and we can consider now the current version of your manuscript in compliance with the code policy of the journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-4920-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 10 Dec 2025
reply
-
AC1: 'Reply on CEC1', Nándor Fodor, 08 Dec 2025
reply
-
RC1: 'Comment on egusphere-2025-4920', Anonymous Referee #1, 23 Dec 2025
reply
This manuscript evaluates machine learning-based ensemble approaches (stacking meta-models) for improving carbon flux predictions in agricultural systems. The authors compare Multiple Linear Regression, Random Forest, XGBoost, and XGBoost with environmental covariates (XGB+) against traditional multi-model median (MMM) approaches across four sites (two croplands, two grasslands). The study reports substantial improvements and uses SHAP analysis to provide interpretability. While the topic is relevant and the interpretability focus is commendable, fundamental methodological flaws in the validation strategy undermine the reliability of the results. The inappropriate use of random train/test splits on temporally autocorrelated data, combined with limited site coverage and missing analysis of temporal structure, prevent acceptance in the current form.
Major comments
1. Validation strategy: The authors use random 70/30 train/test splits (line 238-244) on daily time-series carbon flux data without any consideration of temporal autocorrelation structure. This approach could bring potentially serious issues. Daily carbon fluxes exhibit strong temporal autocorrelation due to weather persistence, phenological continuity, and soil moisture memory. Splitting such data could be quite risky.
2. Inconsistent NEE handling: It was stated that NEE is modeled independently from GPP and RECO. No justification was provided. More importantly, it could bring some inconsistent among variables.
3. Persistent bias at C1: At this site, all models produce persistent biases. Yet, no reason was provided. It is difficult to accept that as exception.
Overall, the methodological questions seem too important. Thus, only a few minor comments are given here.
Minor comments
1. Line 190-192: Provide quantitative justification for excluding Indian site ("relatively poor temporal coverage" is vague)
2. Line 244: Specify: "randomly" - with what seed? Same split across meta-models?
3. Line 245: Specify complete XGBoost hyperparameter tuning procedure
4. Line 275: "site-level observations" - clarify this means temporal, not spatial, validation
5. Lines 295-296: Provide specific methodological justification for independent NEE modeling or change approach
6. Tables 3-5: Add statistical significance indicators comparing meta-models to MMM
7. Figure 11: Add legend explaining color intensity mapping and improve interpretabilityCitation: https://doi.org/10.5194/egusphere-2025-4920-RC1
Data sets
Experimental and simulated data for crop and grassland production and carbon-nitrogen fluxes G. Bellochi et al. https://doi.org/10.7910/DVN/5TO4HE
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 227 | 68 | 33 | 328 | 20 | 18 |
- HTML: 227
- PDF: 68
- XML: 33
- Total: 328
- BibTeX: 20
- EndNote: 18
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. Also, please include in the new repository the relevant primary input and output data used to perform your work.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor