the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A hybrid framework for the spin-up and initialization of distributed coupled ecohydrological-biogeochemical models
Abstract. Accurate initialization is a critical step in fully distributed ecohydrological and soil biogeochemical modeling applications, yet often hindered by the computational cost of achieving steady-state conditions across large spatial domains. This study presents a novel initialization framework that combines a flux-tracking 1D spin-up with a random forest (RF) algorithm to efficiently generate spatially heterogeneous and topography-informed initial conditions accounting for lateral fluxes of water, carbon, and nutrients. The framework first performs a limited number of 1D simulations to obtain steady-state conditions in a subset of representative cells, then uses RF to extrapolate these results across the catchment. Applied to T&C-BG-2D, a fully coupled distributed ecohydrological-soil biogeochemical model, the scheme reconstructs spatial variability of soil carbon and nutrient patterns while reducing computational demands by up to 90 % compared to a fully distributed spin-up procedure. A sensitivity analysis across multiple simulation scenarios reveals that the number of tracked cells required, varying from 20 % to 40 % of total domain grid cells, depends on the catchment’s spatial complexity and the environmental covariates embedded in the RF predictors. The framework developed here can be easily applied to other spatially distributed models and across diverse catchments, enabling large-scale distributed ecohydrological-biogeochemical model initializations under constrained computational budgets.
- Preprint
(8598 KB) - Metadata XML
-
Supplement
(12741 KB) - BibTeX
- EndNote
Status: open (until 04 Mar 2026)
-
CEC1: 'Comment on egusphere-2025-4796 - No compliance with the policy of the journal', Juan Antonio Añel, 24 Dec 2025
reply
-
AC1: 'Reply on CEC1', Sara Bonetti, 30 Dec 2025
reply
Dear Prof. Dr. Juan A. Añel,
Thank you for your feedback and the detailed instructions on the GMD Code and Data Policy. We have now carefully revised the Code and Data Availability section of the manuscript to comply with these requirements. The main revisions are summarized below:
- For the T&C-BG-2D model, we have archived the exact model code used in this study in a persistent Zenodo repository (https://doi.org/10.5281/zenodo.18084473), thereby ensuring long-term availability, proper version control, and reproducibility.
- The existing Zenodo repository (https://doi.org/10.5281/zenodo.17213868) already contained all configuration files and spin-up codes for the T&C-BG-2D simulations. In the revised version, we have extended this repository to additionally include the exact code and configuration files used for the plot-scale simulations (T&C-BG) presented in this manuscript. A new version of the Zenodo repository has been created, and the README file has been updated to clearly document the contents and usage of these files.
- We used publicly available meteorological datasets provided by WSL as the original data source. We understand your concern regarding long-term availability of this data and the reproducibility of our results. Although according to the Envidat policy (https://www.envidat.ch/#/about/policies), deposited data are preserved as long as Envidat exists and arbitrary removal of material is prohibited, we further ensured reproducibility by archiving the actual forcing data used in the model simulations. Specifically, the meteorological data were pre-processed to generate the input file ‘Data_Erlenbach_run.mat’, which is the actual forcing used by the model. This file is also archived in the Zenodo repository (https://doi.org/10.5281/zenodo.17213868). As a result, any potential future changes in the original data source will not affect the reproducibility of the simulations presented in this manuscript. We have clarified this in the revised Code and Data availability section.
The revised Code and Data Availability section now reads as follows: The T&C-BG-2D model code is available in Zenodo (Lian et al., 2025a). The accompanying initialization procedures and setup files for both the two-dimensional and plot-scale simulations, as well as processed meteorological forcing used by the model and derived from publicly available datasets (Stähli, 2018), are also archived in Zenodo (Lian et al., 2025c).
We thank you for your help, and remain at your disposal for any further clarifications and questions.
Kind regards,
Taiqi Lian and Sara Bonetti (on behalf of all co-authors)
References
Lian, T., Fatichi, S., and Bonetti, S.: TeC_BG_2D Ecohydrological Model: V1.0.0, https://doi.org/10.5281/ZENODO.18084473, 2025a.
Lian, T., Zhang, Z., Paschalis, A., and Bonetti, S.: TeC_BG_2D_Spin_up: V2.0.0, https://doi.org/10.5281/ZENODO.17213868, 2025c.
Stähli, M.: Longterm hydrological observatory alptal (central switzerland), https://doi.org/http://dx.doi.org/10.16904/envidat.380, 2018.
Citation: https://doi.org/10.5194/egusphere-2025-4796-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 30 Dec 2025
reply
Dear authors,
Many thanks for your reply. The code that you have provided is in the M Language; as this is a language which usually needs an interpreter, it would be good if you clarify in the Code and Data Availability section the interpreter that you have used (e.g. GNU Octave) and its version, to ensure the replicability of the work. It could be that instead you had used a proprietary interpreter (e.g. Matlab), which does not ensure compatibility of code between its versions and have documented numerous bugs that could affect the computations performed. In this way, knowing the exact version can be useful in the future.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-4796-CEC2 -
AC2: 'Reply on CEC2', Sara Bonetti, 31 Dec 2025
reply
Dear Editor,
Thank you for pointing this out. All simulations are based on MATLAB R2024b - we will clarify this in the Code and Data Availability Section.
Best,
Sara Bonetti
Citation: https://doi.org/10.5194/egusphere-2025-4796-AC2
-
AC2: 'Reply on CEC2', Sara Bonetti, 31 Dec 2025
reply
-
AC1: 'Reply on CEC1', Sara Bonetti, 30 Dec 2025
reply
-
RC1: 'Comment on egusphere-2025-4796', Anonymous Referee #1, 30 Jan 2026
reply
Review
A hybrid framework for the spin-up and initialization of distributed coupled ecohydrological-biogeochemical models
By Lian et al.
The study by Taiqi Lian et al. proposes a novel framework to reduce the computational requirements of the initialisation of a gridded eco-hydrological model with lateral transport using a mix of coupled/ uncoupled simulation and valorizing machine learning. The use of machine learning in process model spinup is a timely and important objective. Comparable approaches to my knowledge were not yet applied to process models with interdependent pixels.
However, critical gaps in the methodology, lack of demonstration of the robustness of the RF predictions, and a study design which fails to isolate the impact of the respective spinup components/assumptions on the results (failing to address hypotheses) are major shortcomings . In addition, the authors did not demonstrate that the new approach is accurate enough for typical model applications ( in contrast Fig 7 seems to indicate that the underlying strategy of mixed uncoupled / coupled simulation is quite inaccurate already). As a consequence it is not possible to assess if the new approach (of combining a mix of coupled/ uncoupled simulation and valorizing machine learning) actually works sufficiently well.
Specific major comments:
The main aim of the new approach is to reach a stable state at a reduced computational time compared to conventional spinup procedures. The authors did not demonstrate (1) that steady states are reached and (2) that the new approach for spin-up leads to biases in fluxes which are acceptable for the typical application field of the model. This can be achieved by providing steady state criteria and conducting a test for a typical model application (e.g. transient response of C and fluxes under climate change ). The authors should also give more information on the computational demand saving.
The robustness of the random forest predictions is not demonstrated. The authors should provide results from the testing and validation of the RF, and deploy interpretable machine learning in order to provide evidence into their predictions by demonstrating the relationships between SOC (SON) and predictors aligned with existing evidence.
The experiment design does not allow to disentangle the effect of soil properties, climate, etc from the effect of vertical transport based on the experiments. As the deployment of RF from spinning up a model with vertical transport is the main novelty of this study I see that as a major shortcoming, and suggest an additional simulation is performed which differs from the benchmark case only from the omission of vertical transport.
The methodology lacks information on (1) the calculator of the computational time savings from the new approach, (2) more information on the random forest ( training/ validation results, treatment of categorical variables, pixel selection, etc), (3) steady state criteria.
Novelty needs to be more clearly defined. The approach of using a RF for spinup of biogeochemical cycles in a land surface model has been proposed, applied and tested before in Sun, Yan, et al. "Machine learning for accelerating process‐based computation of land biogeochemical cycles." Global Change Biology 29.11 (2023): 3221-3234.
Minor
Two plot-scale simulations were performed for the vegetation-soil combinations. It is unclear how they can capture the variation in SOC from climate. Fig 2 b) suggests that only two combinations were performed for a single location. This neither variation in climate ( e.g. temperature/elevation) , nor soil texture are accounted for therefore additional sensitivity simulations are performed aiming to disentangle the respective effects. This is an approximation as drivers interact, the discussion falls short to fully reflect this,
The conservation of mass is of critical importance in the field of hydrological and biogeochemical modelling. I didn’t find any statement concerning mass conservation in the literature on T&C-BG. The authors should indicate the basic principles underlying the model.
The steady state criterion of ‘trends of [...] pools changed by less than 1%’ (Line 218) makes no sense to me. Did you mean pools changed by less than 1% ? Steady state is not reached when the trends are stabilised ( changes less than 1%) but when they are negligible . Usually we compute the linear trend over the given period and use a threshold of eg 1% per year to detect a steady state.
The investigation of the number of pixels required for RF training (section 3.2) is not very informative. It is not clear how pixels were selected nor what motivates this analysis of the different scenarios. Approaches like k-means clustering are available in order to guide the sampling.
The labelling and the design of the different catchment scenarios can be improved, e.g. the random soil texture case is not only random but spans a much wider predictor space.
Specific comments
Line 7-8: provide quantification of the degree to which variability and pattern are captured
Line 11: ‘easily applied’ this is too vague. Better would be to list which conditions need to be met (e.g model characteristics) in order for the spinup approach to be applicable.
Line 52-54: there is at least one study which actually deployed RF for model spinup in this context which could be added: Sun, Yan, et al. "Machine learning for accelerating process‐based computation of land biogeochemical cycles." Global Change Biology 29.11 (2023): 3221-3234.
Line 74: The H1 is trivial, not sure it is needed.
Line 81: ‘this is the first study to address initialization in a fully distributed coupled ecohydrological-soil biogeochemical model that mechanistically simulates coupled water, vegetation, and soil biogeochemical dynamics’ that sentence does not make it clear whether the model’s pixels are interdependent. Thus the claim is invalid ( see Sun et al 2023).
Line 100: what about soil P and K: How do they affect / are affected by vegetation? E.g. Does soil fertility control plant growth ?
Line 131 continued: Soil texture variation was not accounted for as you state later (L 137/138). I would suggest revising this section to be frank about this from the start.
Line 144: ‘may not fully equilibrate’ this is misleading. Given the slow turnover of soil organic matter it’s certain they won’t fully equilibrated within 9 years.
Line 145 continued : ‘In the 2D simulation in Fig. 2c, ideally, the entire available forcing period should be used’ which is what you did as you wrote earlier. There is no need to demonstrate that one can do worse than one did.
Lines 160: how were categorical variables treated in the RF ? Isn’t clay content besides sand accounted for?
Lines 161: be more clear on the approach of selecting representative pixels. Did you use a formalized approach like k-mean clustering?
Line 171: The sequence of removing predictors matters. There are formalized approaches to account for this like recursive feature elimination and predictor importance ranking. I would suggest using partial dependence plots or shapely values in order to investigate the relationship between pools and predictors. RF is prone to overparameterization and assessment of these relationships can help to provide trust into the black box RF.
Figure 2: is not very clear.fd (g and f) indicates RF was used but there is a pathway from (d) to (f) so there is no RF involved. Do not use ‘model’ without specifying which model is meant.
Line 200: specify the criteria for ‘satisfactory performance’ (e.g. % variance explained)
Line 211: the decoupling approach of Krinner et al 2025 was developed for model components which are coupled one-way, i.e. there is no feedback of the state of component 2 (soil) on the state component 1 ( vegetation). Is this the case in T&C-BG which has nutrient cycles (and thus soil fertility affects vegetation processes which in return affect soil fertility)?
Figure 4: It is not clear why time series (of coefficient of variances) are analyzed . Does one expect large variations in SOC over the course of 9 years? Why not aggregate over time?
Figure 4: The discussion of the underlying reason for the much higher CV of random texture (but also much wider parameter space) compared to other experiments (and the implications) is not very clear.
Line 278: This is speculation. To provide evidence that your RF captures the relationship between SOC and texture you should show their relationship using interpretable machine learning techniques like partial dependence plots.
Line 300-320, Figure 5F: Is it relevant to analyse the bias in the 0% simulations? I would assume it is common sense in the spatial modelling community (which is the prime audience for this article). I would prefer to see more discussion about the effect of the number of training data on the relationships in figure 5&6.
Section 3.4: The direction of the soil - vegetation coupling is expected to be strongly site dependent. It is not very informative as it is likely very case specific.
Figure 7: what is the purpose of 3 x 9 year long simulations? One cannot expect that 27 years are sufficient to equilibrate a coupled soil - forest model. Can one rule out that the steady-state is independent of the initial state ? If not, it is speculated that if one would continue the 3 x 9 year long simulation reaches the same state as the coupled-spin-up (T&C BG).
Line 344: I cannot follow the logic here. A 10-20 % bias in average ‘steady-state’ SOC stocks when using the mixed uncoupled - coupled spinup strategy can - depending on the composition of SOC among pools - lead to C fluxes which are potentially larger than the impact on C fluxes from changing environmental conditions. The authors should demonstrate that such biases lead to C fluxes which are negligible compared to the typical C cycle response investigated with this type of model.
Line 361: Where is the computation time reduction shown, how was it estimated ( I would assume it depends on the machine ( e.g # or processors used, etc).
Citation: https://doi.org/10.5194/egusphere-2025-4796-RC1
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 187 | 84 | 17 | 288 | 35 | 11 | 8 |
- HTML: 187
- PDF: 84
- XML: 17
- Total: 288
- Supplement: 35
- BibTeX: 11
- EndNote: 8
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
Your manuscript fails to comply regarding several issues.
First, to access the T&C-BG-2D model code you cite a paper, not a repository, and this paper points to a site GitHub to access the code. However, GitHub is not a suitable repository for scientific publication, as it does not allow to identify the exact version of the code used, and in addition, such code can be deleted. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. For the T&C model code, for which you link a site hosted in codeocean.com, something similar happens, and we can not accept that you provide codeocean as an storage solution for the assets necessary to replicate your work.
In addition, you have archived the data used and produced in your work in the WSL data portal; however, the WSL data portal does not fulfil GMD’s requirements for a persistent data archive because:
* It does not appear to have a published policy for data preservation over many years or decades (some flexibility exists over the precise length of preservation, but the policy must exist).
* It does not appear to have a published mechanism for preventing authors from unilaterally removing material. Archives must have a policy which makes removal of materials only possible in exceptional circumstances and subject to an independent curatorial decision,
If we have missed a published policy which does in fact address this matter satisfactorily, please post a response linking to it. If you have any questions about this issue, please post them in a reply.
Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
The 'Code and Data Availability’ section must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Executive Editor