the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Towards resolving poor performance of mechanistic soil organic carbon models
Abstract. The accuracy of soil organic carbon (SOC) models and their ability to capture the relationship between SOC and environmental variables are critical for reducing uncertainties in future projection of soil carbon balance. In this study, we evaluate the performance of two state-of-the-art mechanistic SOC models, the vertically resolved MIcrobial-MIneral Carbon Stabilisation (MIMICS) and Microbial Explicit Soil Carbon (MES-C) model, against a machine learning (ML) approach. By applying multiple interpretable ML methods, we find that the poorer performance of the two mechanistic models is associated both with the missing of key variables, and the underrepresentation of the role of existing variables. Soil cation exchange capacity (CEC) is identified as an important predictor missing from mechanistic models, and soil texture is given more importance in models compared to observations. Although the overall relationships between SOC and individual predictors are reasonably captured, the varying sensitivity across entire predictor range is not replicated by mechanistic models, most notably for net primary production (NPP). Observations exhibit a nonlinear relationship between NPP and SOC while models show a simplistic positive trend. Additionally, MES-C largely diminishes interacting effects of variable pairs, whereas MIMICS produces mismatches relating to the interactions between NPP and both soil temperature and moisture. Mechanistic models also fail to reproduce the interactions among soil moisture, soil texture, and soil pH, hindering our understanding on SOC stabilisation and destabilisation processes. Our study highlights the importance in improving the representation of environmental variables in mechanistic models to achieve a more accurate projection of SOC under future climate conditions.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Biogeosciences.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(3711 KB) - Metadata XML
-
Supplement
(1008 KB) - BibTeX
- EndNote
Status: open (until 14 Aug 2025)
-
CC1: 'Comment on egusphere-2025-2545: Machine learning versus “mechanistic” modelling of soil carbon dynamics: Are current comparison attempts meaningful?', Philippe C. Baveye, 13 Jul 2025
reply
Machine learning versus “mechanistic” modelling of soil carbon dynamics: Are current comparison attempts meaningful?
Philippe C. Baveye1*
1Saint Loup Research Institute, 7 rue des chênes, La Grande Romelière, 79600 Saint Loup Lamairé, France.
The manuscript that Wang et al. have submitted for possible publication in Biogeosciences raises a number of questions about the soundness of the comparison that these authors carry out between machine learning and the more traditional, “mechanistic” modelling of soil carbon dynamics.
First, a word of caution is in order about the use of the qualifier “mechanistic” used by the authors. This term conveys the notion that the models are based on detailed descriptions of actual mechanisms of processes taking place in soils. Formally, that may seem to be the case, but one could argue that for macroscopic or ecosystem-scale models like Century, RothC, and the like, it hides assumptions that may or may not correspond to reality. Take the decomposition of litter pools and SOC pools in both MIMICS and MES-C as an example. The authors consider that this decomposition follows a Michaelis-Menten kinetic equation. Heße et al. (2009), among others, have shown a while ago already that if one assumes that a Monod-type equation (mathematically identical to Michaelis-Menten) describes microbial growth at the microscale in a natural porous medium, the corresponding equation at the macroscopic scale, over soil volumes that encompass many microorganisms and pores, does not in general follow a Monod-type kinetics. Only in special cases can one write a Monod-type equation at the macroscopic scale as an approximation. In soils, computer simulations using microscale models suggest that there is no evidence that either a Monod formalism holds in general at the macroscale, even as an approximation (e.g., Falconer et al., 2015; Mbé et al., 2022). The same goes for most “mechanistic” descriptions included in MIMICS and MES-C. Unless there is clear experimental evidence that the “mechanistic” descriptions of processes at the macroscopic scale are reliable, or there is microscopic evidence and a robust upscaling approach that produces macroscopic descriptions of processes, macroscopic models based on so-called “mechanistic” descriptions are in fact entirely empirical and should be treated as such. From that standpoint, following Baveye’s (2023) discussion of Schimel’s (2023) analysis, the label of “mechanistic” should be used with significant caution when referring to ecosystem-scale models of soil organic matter dynamics at this stage, since experimental data at the landscape- or ecosystem scales in support of some of the mechanisms included in the models are lacking and since the upscaling hurdle still needs to be addressed satisfactorily.
Beyond this point of terminology, a first question that the manuscript by Wang et al. (2025) raises concerns the specific choice of MIMICS and MES-C. Neither in this manuscript, nor in the previous one by Wang et al. (2024), is this choice justified. One would have expected the authors to first review all available “mechanistic” models available, determine their respective pros and cons, and on the basis of this state-of-the-art review, choose one or two of the models for comparison with machine learning. The fact that this eminently reasonable approach was apparently not adopted raises the question of the reasons why MIMICS was selected and subsequently modified into MES-C. In this respect, it is hard to avoid thinking that MIMICS was chosen because it requires very few input data. In Table 1 of the manuscript by Wang et al. (2025), it appears that it requires knowledge only of the soil temperature, soil moisture content, soil clay content, and net primary production. That is a puzzlingly limited set of data, which one could easily argue would be insufficient to account for the complexity of soil organic carbon dynamics. Nevertheless, if the objective that is pursued is a comparison with an approach based on machine learning, one perhaps does not have the luxury of using a more sophisticated process-based model, requiring a larger input data set. Indeed, machine learning demands very large amounts of data, as is evinced by the 37,691 SOC profiles considered by Wang et al. (2025). Any comparison attempt would rapidly become unmanageable if for each one of these profiles, in order to run process-based models, one needed more soil data than what can be readily retrieved from available soil databases. The down side of this is that, by choosing a simple, let alone simplistic, “mechanistic” model in what it is difficult not to view as a “strawman” strategy, one predetermines the ultimate outcome of the comparison, since the odds are good that this model will exhibit ”poor performance”, for example, as noted by Wang et al. (2025) by showing “a simplistic positive trend” between the net primary production (NPP) and SOC, whereas in reality the observed relationship is nonlinear.
The next question concerns the soundness of comparing models that have different numbers of parameters, or involve different numbers of variables. Many years ago, modellers used to say that with 4 parameters, one could draw an elephant. With 5 parameters, one could make it wag its tail. The message was that a model with more degrees of freedom should always be expected to fit experimental data better than one with fewer degrees of freedom. From this perspective, it is not surprising that in Table 2 of Wang et al. (2025), RFenv, which involves 13 variables, leads to significantly better performance metrics than RFimp and MES-C, which involve only 6 variables (In this respect, the fact that MIMICs, involving only 4 variables, has poorer performance metrics than MES-C is perplexing). That does not mean that the additional 7 variables considered by RFenv relative to RFimp and MES-C are necessarily meaningful. Several authors have indeed demonstrated that statistically stronger patterns can emerge from spatial databases to which one has deliberately added utterly irrelevant information, for example, a painting or the photograph of a colleague (Fourcade et al. , 2018; Behrens and Viscarra Rossel, 2020). In this context, it is not at all clear that machine learning, applied to large data sets, can always be useful in “guiding future model development”, as Wang et al. (2025) suggest. Thus when these authors consider that the CEC of soils, which is taken into account by the machine learning model RFenv and has a high PVI (Permutation Variable Importance), should be taken into account in future work on “mechanistic” models, it is unclear at this point how reliable this conclusion is, based as it is only on statistical considerations. One might argue that the careful analysis of Julien and Tessier (2021) of the molecular and microscale processes involved in the fate of SOC in soils, leading to the conclusion that CEC is an important factor controlling it, is far more convincing that there may be something there worth investigating further.
An additional, even more basic question that the research of Wang et al. (2025) raises has to do with whether or not it is in fact meaningful to try to compare the so-called “mechanistic” modelling approach with one based on machine learning, insofar as the dynamics of SOC is concerned. The very large dataset assembled by Wang et al. (2025) evinces one of the key characteristics of machine learning algorithms, and more generally of data-driven and artificial intelligence approaches, namely that they require massive amounts of data (e.g., Baveye, 2021; Wadoux, 2025; Minasny and McBratney, 2025). When dealing with field soil properties, like the fate of their SOC, it is customary, in order to obtain the requisite amount of data to apply machine learning algorithms, to work with extended geographical areas. In such cases, the outcomes of the application of machine learning may be useful to address issues relevant at these large scales, but they may be inadequate to address issues at smaller spatial scales, like in a watershed, ecoregion, ecosystem, agricultural field, or even at specific locations in a field where an experiment is taking place. At these much smaller spatial scales, models attempting to describe soil processes explicitly and in detail appear far more appropriate.
In this context, those who still read the literature may recall a piece of research that had some notoriety 40 years ago. Using a large climatic database, Folkoff et al. (1981) produced a map of the pH of the surface horizon of soils within the conterminous United States that was based solely on climatic conditions. Although the outcome of their research had undeniable intellectual interest, it proved to be of little practical use, and as a result their article has been seldom cited since 1981. Knowing that the climate correlates well with soil pH does not help farmers manage the acidity in their field or governmental agencies assess, e.g., the local impact of acid rain on agricultural yields or environmental degradation. For those practical situations, a different kind of model of acidity-related soil processes is needed, taking into account the spatial heterogeneity of soils. Occasionally, the larger-scale, statistical observations may be useful. Worrall (2001), using observations of pesticide occurrence in 303 boreholes across 12 states in the midwest of the U.S., showed that the molecular topology of the pesticide molecules themselves is, in and of itself, a sufficient basis to discriminate globally between polluting and non-polluting pesticide compounds, regardless of the spatial variation of the soils through which the pesticides transit. On that basis, one might come up with nation-wide or regional regulations allowing the use of some pesticides and banning others (Baveye and Laba, 2015), but it would not be possible to predict whether and how fast a given pesticide compound applied to a given agricultural field will reach the underlying groundwater.
At the moment, a tremendous amount of hype is associated with machine learning, data-driven research, and artificial intelligence in the soil science and biogeosciences communities. There is therefore a good chance that the manuscript by Wang et al. (2025), whose title clearly advertises a bias in favour of these approaches, will be well received by at least a portion of researchers in these disciplines. The risk in this respect is that ill-inspired comparisons would further fuel the drive to accumulate massive amounts of potentially meaningless data to feed data-hungry algorithms. There may be limited uses for the latter, but it would be unwise, I think, to put too much emphasis on them, or to rely on them too closely to improve the process-based models that could be useful to us in practice. For the development of these process-based models, what we need is not massive amounts of data that are haphazardly assembled, but data that actually make sense.References
Baveye, P. C. (2022). “Data‐driven” versus “question‐driven” soil research. European Journal of Soil Science, 73(1), e13159.
Baveye, P. C. (2023). Ecosystem-scale modelling of soil carbon dynamics: time for a radical shift of perspective?. Soil Biology and Biochemistry, 184, 109112.
Baveye, P. C., & Laba, M. (2015). Moving away from the geostatistical lamppost: Why, where, and how does the spatial heterogeneity of soils matter?. Ecological Modelling, 298, 24-38.
Behrens, T., & Viscarra Rossel, R. A. (2020). On the interpretability of predictors in spatial data science: The information horizon. Scientific Reports, 10(1), 16737.
Falconer, R. E., Battaia, G., Schmidt, S., Baveye, P., Chenu, C., & Otten, W. (2015). Microscale heterogeneity explains experimental variability and non-linearity in soil organic matter mineralisation. PloS one, 10(5), e0123774.
Folkoff, M. E., Meentemeyer, V., & Box, E. O. (1981). Climatic control of soil acidity. Physical Geography, 2(2), 116-124.
Fourcade, Y., Besnard, A. G., & Secondi, J. (2018). Paintings predict the distribution of species, or the challenge of selecting environmental predictors and evaluation statistics. Global Ecology and Biogeography, 27(2), 245-256.
Heße, F., Radu, F. A., Thullner, M., & Attinger, S. (2009). Upscaling of the advection–diffusion–reaction equation with Monod reaction. Advances in water resources, 32(8), 1336-1351.
Julien, J. L., & Tessier, D. (2021). Rôles du pH, de la CEC effective et des cations échangeables sur la stabilité structurale et l’affinité pour l’eau du sol. Etude et Gestion des sols, 28, 159-179.
Mbé, B., Monga, O., Pot, V., Otten, W., Hecht, F., Raynaud, X., ... & Garnier, P. (2022). Scenario modelling of carbon mineralization in 3D soil architecture at the microscale: Toward an accessibility coefficient of organic matter for bacteria. European Journal of Soil Science, 73(1), e13144.
Minasny, B., & McBratney, A. B. (2025). Machine Learning and Artificial Intelligence Applications in Soil Science. European Journal of Soil Science, 76(2), e70093.
Schimel, J. (2023). Modeling ecosystem-scale carbon dynamics in soil: the microbial dimension. Soil Biology and Biochemistry, 178, 108948.
Shi, Z., Crowell, S., Luo, Y., & Moore, B. (2018). Model structures amplify uncertainty in predicted soil carbon responses to climate change. Nature Communications, 9(1), 1–11. https://doi.org/10.1038/s41467-018-04526-9
Wadoux, A. M. C. (2025). Artificial intelligence in soil science. European Journal of Soil Science, 76(2), e70080.
Wang, L., Abramowitz, G., Wang, Y. P., Pitman, A., & Viscarra Rossel, R. A. (2024). An ensemble estimate of Australian soil organic carbon using machine learning and process-based modelling. Soil, 10(2), 619-636.
Wang, L., Abramowitz, G., Wang, Y. P., Pitman, A., Ciais, P., & Goll, D. S. (2025). Towards resolving poor performance of mechanistic soil organic carbon models. EGUsphere, 2025, 1-32.
Citation: https://doi.org/10.5194/egusphere-2025-2545-CC1 -
CC2: 'Minor erratum on CC1', Philippe C. Baveye, 13 Jul 2025
reply
In the 4th paragraph, where the text reads "In this respect, the fact that MIMICs, involving only 4 variables, has poorer performance metrics than MES-C is perplexing", it should read instead "In this respect, the fact that MIMICs, involving only 4 variables, has performance metrics that are comparable to those of MES-C is perplexing"
Citation: https://doi.org/10.5194/egusphere-2025-2545-CC2
-
CC2: 'Minor erratum on CC1', Philippe C. Baveye, 13 Jul 2025
reply
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
171 | 37 | 13 | 221 | 7 | 6 | 13 |
- HTML: 171
- PDF: 37
- XML: 13
- Total: 221
- Supplement: 7
- BibTeX: 6
- EndNote: 13
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1