the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Beyond behavioural models: equifinality and overparameterisation undermine confidence in predictions by soil organic matter models
Abstract. The complexity of soil organic matter models is often not supported by sufficient data for parameter optimisation, resulting in the calibration of more parameters than can be reliably optimized with the available data. This results in equifinality, the phenomenon that multiple parameter sets generate behavioural models, i.e., similarly well-performing models that cannot be ruled out. As such trade-offs between model complexity and data availability are often overlooked for soil organic matter models, the aim of this study is to assess how equifinality affects the variability of predictions made by behavioural soil organic matter models. The results show that the number of identifiable parameters, those that do not compensate for one another, increases with the number of calibration constraints, but remained limited to five even under the most data-rich conditions. Furthermore, the size of particulate organic matter (POM) and mineral-associated organic matter (MAOM) can only be accurately simulated when data on these pool sizes are used, while the turnover rate of MAOM is reliably simulated only when Δ14C data for MAOM are provided. Regardless of the type of mathematical equations used (e.g., absolute vs. relative Michaelis-Menten kinetics), or the number of optimised parameters, the tested models were able to correctly reproduce the measurements in steady state. However, different model structures led to divergent predictions upon a doubling of organic matter inputs, while the variation in the response of the behavioural models was up to eight times larger for overparameterised models compared to models for which only identifiable parameters were optimised. Our results emphasise the necessity of optimising only identifiable model parameters to avoid hidden uncertainty in model predictions.
- Preprint
(2338 KB) - Metadata XML
-
Supplement
(626 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-6297', Anonymous Referee #1, 13 Apr 2026
-
RC2: 'Comment on egusphere-2025-6297', Anonymous Referee #2, 21 Apr 2026
This manuscript addresses a fundamental and timely issue in soil organic matter (SOM) modelling: the mismatch between model complexity and data availability, which leads to parameter non-identifiability, equifinality, and ultimately large uncertainty in model predictions under perturbations. The study is conceptually important and provides a clear demonstration that models calibrated to steady-state observations can produce substantially divergent predictions when subjected to changes in carbon inputs. This has important implications for the reliability of current ecosystem and Earth system models.
Below I provide specific comments.
Line 6-8: The statement that the number of identifiable parameters “remained limited to five even under the most data-rich conditions” may be somewhat overstated. The “most data-rich” scenario considered here still represents a relatively limited set of observations, and the result is likely dependent on the specific model structure and data types used. This limitation should be more clearly acknowledged.
Line 35: “Three concepts that are central to achieving this balance are identifiability, equifinality and overparameterisation.” The phrasing “achieving this balance” may be misleading, as identifiability, equifinality and overparameterisation are conceptual tools to describe the problem rather than mechanisms to achieve the balance. This could be clarified.
Line 135-141: If these “measurements” are artificially generated, the assumptions should be justified with references.
Line 136: “In this study is, artificial SOC data were …. ”. “is” should be removed?
Line 140-141: MAOC should be defined at first occurrence in the main text.
Figure 1. Differences between rhizosphere and SOM models are unclear, and Abbreviations (MAOC, DOM, POM, MIC) are not fully defined.
Figure 2: The occurrence of infinite values (“inf”) is not explained. It would be helpful to clarify under which conditions these values arise and how they should be interpreted. In addition, the “inf” values are currently plotted at the boundary of the figure, which is not ideal for interpretation. A clearer representation would improve readability.The same comment applies to Figure 4.
Line 400: “This difference is due to the formulation of the rate modifiers for depolymerisation”. If I understand correctly, the different non-linear equations lead to differences in decomposition rates through the “rate modifier” part. However, these differences ultimately reflect variations in the effective decomposition rates or residence times of carbon in different pools under increased C inputs. While the current explanation in terms of rate modifiers is mathematically correct, it would be more intuitive to present or discuss the corresponding decomposition rates or residence times of the pools.
Figure 3: the figure lacks sufficient clarity to distinguish between MIC and full parameter model results. In particular, the large differences in prediction range (42× for RM1 and 7× for RM2) are not readily apparent.
Section 3.2.1: It is unclear which figures support the results described in this section, as no figure is explicitly cited. Given that this is part of the Results section, appropriate figure references should be provided to support the statements. In addition, Figure 4 is not clearly introduced or discussed in the Results section before the manuscript proceeds to Figure 5. This disrupts the logical flow of the results. Please ensure that all figures are properly introduced and described in sequence.
Figure 5: The color scheme is difficult to interpret, particularly because MAOC and MIC appear very similar, and the meaning of some colors (e.g., dark orange) is unclear. This also makes it difficult to support the statement in Lines 427–429 that SOC can be dominated by either POC or MAOC (Fig. 5b), as the contribution of POC is not clearly visible. Improving the color scheme and legend would enhance clarity and consistency between text and figure.
Citation: https://doi.org/10.5194/egusphere-2025-6297-RC2
Model code and software
Van de Broek and Six, R codes with data Marijn Van de Broek https://doi.org/10.5281/zenodo.17974745
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 430 | 228 | 39 | 697 | 110 | 49 | 68 |
- HTML: 430
- PDF: 228
- XML: 39
- Total: 697
- Supplement: 110
- BibTeX: 49
- EndNote: 68
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript discusses the issues of parameter identifiability and equifinality in the context of soil organic matter (SOM) decomposition models, and uses multiple model formulations and parameter optimizations to estimate variability across model structures and parameterizations and demonstrate how much predictive uncertainty may remain after models have been optimized using steady-state values.
I thought this was a very informative paper and a very useful study for highlighting challenges with modeling soil carbon cycle processes. The introduction explains the complex issues of parameter identifiability and related uncertainties clearly and makes a good case for paying more attention to these issues in soil models. The model simulations and approaches are well described and provide a clear demonstration of the key concepts of the paper, including how multiple model structures and parameter values can provide similar accuracy with respect to steady-state values while diverging when the steady state changes, and demonstrating how unidentifiable parameter pairs manifest in these types of models. Overall, the paper does a great job making an important argument and backs it up with strong modeling results.
I have a few specific comments:
Line 84-86: This is true for non-microbial models as well
Line 106: Is there a missing word after "decades"? Maybe "unless Δ14 data..."
Line 122: Typo in "assess"
Line 129: Provide a reference here for the DEzs algorithm)
Line 166: The SOM model actually does keep track of the DOM pool size
Line 182: The calibration procedure for the OC inputs was not explained clearly. If they were calibrated, does this make them essentially another parameter of the model? I think having a precise value for carbon inputs is actually quite optimistic for comparison with "real" field data, since estimating litter inputs (especially for root and root exudation fluxes) is very difficult
Line 296: Following on my previous comment, I think the normal situation with real measurements is actually worse than this, because you typically don't have accurate knowledge of total litter inputs. In a steady state system, knowing the inputs is equivalent to having soil heterotrophic respiration measurements, assuming there is no net leaching of DOM. And heterotrophic respiration is difficult to measure accurately if there are living roots. So, if anything this setup might be optimistic compared to a real study where carbon pool data is available.
Line 361: Could say explicitly here that the identifiable parameters were picked based on the identifiability analysis for each model structure, which is implied but wasn't clear until I looked at the supplementary tables
I had trouble figuring out how the values for the non-optimized parameters were picked, since they had to be fixed but the premise of the study is that the values are not well constrained. I guess the values come from the deterministic parameter calibration? I found that part a bit confusing
Line 455: Are the predictions more reliable? Or is the uncertainty underestimated? If the values of the non-optimized parameters are unknown but need to be fixed, this is adding a hidden uncertainty to the model (as mentioned in the Discussion), so I'm not sure it is actually more reliable. In Figure 5a and 5b, there is certainly a narrower distribution of predictions in the IPM approach, but this results from fixing the value of some unknown parameters. Couldn't this approach just as easily lead to a narrow but wrong result? So, perhaps 5b is a more accurate depiction of the actual predictive uncertainty in this situation where data is very limited, unless there are other constraints on the parameter values.
Line 461: This alone is INsufficient
Line 463: Did it reduce the uncertainty, or underestimate the uncertainty?
Section 4.2: I think this paragraph makes an important point very clearly
Line 521-522: I really like the "hidden uncertainty" phrasing here, which makes an important point about how to interpret these results
Line 533: Again, is it correct to say that uncertainty is reduced? I'm not sure this section is even making a point about reducing uncertainty. I think it mostly makes a strong argument that identifiability is an important analysis to do. I think part of the challenge here for the community is that identifiability analysis identifies a problem but does not really provide a solution for reducing uncertainty
Line 567: This is the big challenge, right? Because if the unidentifiable parameter values could be well constrained with observations or experiments they would not need to be optimized in the first place. In that sense, the identifiability issue is the beginning of the conversation, not the end of it