the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Constraining boreal carbon allocation and turnover by assimilating forest growth dynamics in a differentiable framework
Abstract. Terrestrial biosphere models (TBMs) exhibit substantial uncertainty in simulating the land carbon sink in particular at interannual to decadal time scales, partly because parameters that govern carbon allocation and biomass turnover rates are weakly constrained. Typical model-data fusion approaches, which rely heavily on high-frequency (minutes to days) observations related to fluxes (e.g., gross primary production and leaf area index), often struggle to constrain the slow turnover processes that govern long-term biomass accumulation over multiple years. Here, we employ DifferLand, a JAX-based differentiable TBM, to jointly assimilate satellite-derived boreal forest biomass growth trajectories and high-frequency observations. Calibrations using only high-frequency fluxes reproduce short-term dynamics but yield large biases in mature forest biomass (RMSE = 138.7 Mg ha-1), while incorporating a single-year biomass stock constraint only partially reduces the error (RMSE = 87.5 Mg ha-1) and provides limited constraints on the allocation patterns. In contrast, incorporating our biomass growth curves reduce the biomass RMSE to 11.3 Mg ha-1 (a 91.9 % reduction) without degrading fits to high-frequency fluxes. Our findings reveal that while a single-year biomass stock data provides some constraints on biomass residence times, the full growth trajectory is essential to simultaneously constrain carbon allocation and turnover. The retrieved parameters indicate that boreal forests sustain biomass primarily through longer wood carbon residence times (74.1 % lower wood turnover rates) rather than higher allocation to wood, compared to calibrations using only high-frequency flux observations. Attribution analyses further show that climate conditions are the dominant driver of wood turnover, with a sharp increase when the temperature of the coldest month exceeds -20 °C. Our study demonstrates the importance of assimilating slow ecological trajectories to improve long-term predictions of carbon storage and highlights the potential acceleration of the sensitivity of the boreal carbon sink to warming under future climate change.
- Preprint
(3786 KB) - Metadata XML
-
Supplement
(5888 KB) - BibTeX
- EndNote
Status: open (until 03 Jul 2026)
- RC1: 'Comment on egusphere-2026-2241', Thomas Smallman, 29 May 2026 reply
-
RC2: 'Comment on egusphere-2026-2241', Toni Viskari, 01 Jun 2026
reply
This is the review for the manuscript “Constraining boreal carbon allocation and turnover by assimilating forest growth dynamics in a differentiable framework” by Wu et al where they have applied biomass time series data to constrain forest growth.
It is probably easiest if I start with my recommendation here which is reject with encouragement to resubmit. While the foundation of the work is solid, at least from what I understood as I will soon further explain, I had fundamental issues with the work here. To the degree that I feel the experimental setup needs to be completely reconsidered, which is, for me, beyond major revisions.
There were multiple parts of the Methodology section where there were either major parts missing or that I didn’t quite understand what was actually being done, for example the calibration of the initial state or how the growth time series was in effect used as calibration datapoints. Furthermore, there were also parts where I thoroughly disagreed with the validation work such as using the actual growth time series also for validation.
I’ve done my best to explain these issues in my line-by-line comments, hopefully to an understandable degree.
Additionally, the novelty of the work needs to be better conveyed as there have been previous studies about using biomass time series measurements for calibrating forest models. It did feel like the argument was partially build on DifferLand, but even that has been done before, as explained in the references of this work, and the application itself isn’t unique in itself.
Apologies on my review coming across as harsh. I do still genuinely recommend on resubmission after clarification and reworking the experimental setups.
Line-by-line comments:
Abstract:
Line 18: “Terrestrial biosphere models (TBMs) exhibit substantial…”
For ease-of-reading, I would suggest splitting this sentence into two.
Line 20: “Typical model-data fusion approaches…”
I’d change the word approaches to implementations as the used fusion method itself does not really care that much if higher- or lower-frequency data is used.
- Introduction:
Line 38: This is a very minor comment, but I would use another verb than regulate when discussing ecosystems as a part of the global carbon cycle. To regulate something refers to active participation to direct events towards a desired outcome, which does not apply here.
Line 40: “…terrestrial carbon and water fluxes and assess how…” -> “terrestrial carbon and water fluxes as well as assess how...”
Line 43: “These uncertainties primarily stem from differences in how models represent key biogeochemical and biophysical processes, such as photosynthesis and respiration, carbon allocation, vegetation dynamics, disturbance, and soil carbon turnover, and in how these processes respond to climate and CO2 (Canadell et al., 2021)-”
While technically correct, this sentence is essentially stating that ecosystem model uncertainties stem from how the different dynamics are presented in the model. Which is true, but kind of a moot point to make as obviously they are. Thus, this sentence should either be sharpened to better reflect the work in this manuscript or just removed.
Line 48: “…and together these uncertainties represent a major barrier to reliable future climate projections (Kaufhold et al., 2025).”
Does the uncertainties here refer to just the ecosystem related uncertainties or the combined uncertainties of the ecosystem and physical climate sensitivity? As it is a bit unclear in the current framing.
Line 49: “A primary uncertainty in TBMs lies in the post-photosynthetic processes, especially the parameterization of carbon 50 allocation and biomass turnover (i.e., carbon residence time) (Bloom et al., 2016).”
The explicit implication of this sentence is that the actual structure is the same across all TBMs and the only things varying are the parameterizations of the actual equations. Which is naturally not the case. This sentence could actually be removed all-together as it is in a certain way repeated a little bit later in the paragraph.
Line 54: “TBMs typically rely on Plant Functional Types (PFTs) to prescribe these parameters, but this PFT-based classification exhibits a limited ability to account for the observed spatial heterogeneity in carbon allocation and residence times (Bloom et al., 2016).”
Again, this statement is technically correct, although I’d argue this is simplifying the PFT application discussion. The work in this manuscript, however, does very little to address the core challenge with the PFT approach, so I am not certain of the benefit of highlighting the PFT issue here in this manner.
Line 57: “Using model-data fusion methods to constrain parameters faces the challenge of equifinality, where different parameter combinations can fit the observational data equally well (Famiglietti et al., 2021).”
True, but once more not something addressed in the manuscript.
If you are parameterizing with the same set of data and can find multiple parameter sets that produce the similar fits, that is equifinality. But once you start changing the data used to constrain the models, then it is not an equifinality question anymore.
Line 64: “Indeed, signals originating from large, slow-turnover carbon pools (such as woody 65 biomass) are difficult to detect in high-frequency flux data (Braswell et al., 2005).”
It would be helpful if the explanation here touched on why high-frequency data is relied on so much.
Additionally, in the start of the paragraph, the text mentions the use of both satellite and flux data, so I would remove the word flux from this sentence.
Line 74: “Unlike high-frequency flux data and biomass snapshots, biomass growth curves capture the integrated outcomes of long-term carbon allocation and turnover processes over decades (Xu et al., 2026; Zhou et al., 2015)”
Is there any known reason why this data hasn’t been used before in model calibration?
Line 76: “A key challenge in TBM parameter calibration is equifinality…”
This line is repetition of the preceding paragraph.
Line 82: “…thereby reducing equifinality in post-photosynthetic parameter estimates.”
Now I am repeating myself from before, but once you add a new data source as a constraint, then it is not reducing equifinality with regard to calibrations done without that dataset.
Line 91: “To achieve this, we employed DifferLand, an advanced, JAX-based differentiable TBM (Fang & Gentine, 2024). Using the differentiable DALEC implementation provided within DifferLand, automatic differentiation computes gradients of the multi-source mismatch with respect to model parameters, enabling efficient gradient-based calibration against heterogeneous constraints that combine fast fluxes with slow carbon stocks/trajectories (Fang & Gentine, 2024; Lin, 2024; Shi et al., 2024).”
First, there is no explanation here what are JAX and DALEC. Second, stating that DifferLand is differentiable TBM that uses a differentiable DALEC implementation where the automatic differentiation computes gradients is a bit staggering to read. Third, it is not clear what differentiable means in this context.
- Materials and Methods
Line 105: “By leveraging automatic differentiation to compute…”
There should be some basic explanation on what automatic differentiation is since it is presented here as this central approach.
Line 107: “…along with data assimilation (using observation operators between model variables and observations).”
I don’t quite understand what this part is attempting to state as joint calibration of parameters is data assimilation so what is the additional data assimilation here? Especially since the parentheses part about observation operators is not data assimilation, that is just a tool that allows the comparison of different data types while doing data assimilation.
Line 108: “Calibrating ecological processes across long timescales is particularly challenging, as error signals from slow-turnover pools are often difficult to attribute to the fast physiological drivers that govern them.”
This is true in a sense, but there are multiples reasons for this. Even when ignoring the time-invariant assumption, we do not generally have the level of accurate driver data across the whole time series that would be necessary to more accurately constrain the model parameters in question. For example, we do not know how much radiation there was available for the growing forest at certain crucial times, further complicated by how models represent the internal distribution of radiation within the canopy.
DifferLand, as far as I can tell, does not address any of those issues nor does it solve the fact that the DALEC model structure itself is relatively simplistic which would further complicate representing with sufficient accuracy the division of vegetation carbon between different pools. There is nothing wrong with either of those challenges, by the way, as they are just issues ecosystem modelers have to deal with, but it does make the blanket statements about how DifferLand solves the issue represented here a bit confounding.
Line 119: “Carbon enters the ecosystem through gross primary productivity (GPP)…”
This paragraph would benefit from more structured explanation of the model dynamics either by introducing equations or using a list approach to explain the different steps.
Also, I don’t quite understand the use of NPP in this paragraph. To put it simply, NPP is a flux that can be determined (-ish) from flux tower measurements and is often used, quite understandably, to signal how much carbon remained within the plant itself. However, here NPP is just referred the amount of carbon distributed among the carbon pools after respiration which is not really the case.
This is not to be pedantic, but that distinction matters a lot from the mechanics point of view as the respired carbon comes from the pools within the plant and is affected by the traits of the plant itself. For example, the larger the plant, the larger the respiration. More importantly, that respiration still happens during nighttime when there is no GPP and during stress conditions the respiration can be higher than the GPP, resulting in an opposite sign for the NPP. Thus, referring to the NPP as it is here is a bit distracting as it doesn’t reflect the dynamic of the system.
Line 129: “In total, the model comprises 26 tunable physical parameters (Table S1) governing these ecohydrological processes.”
This table should be a part of the main manuscript with all the prior assumptions included.
Line 130: “For the subsequent analysis, the "allocation to leaf" is calculated as the sum of allocation fractions to the labile and foliar pools since they both support canopy development.”
Are the other labile pools root extracts or fine roots? Also why not just use the allocation to foliar pool as allocation to leaf?
Line 145: “We assimilated multiple earth observation data…”
Just add a set in that it reads data -> data sets.
Also regarding this paragraph in general, I would recommend to changing it either to list or table format to make all the details easier to process. Especially the table format would be ideal as it gives you the opportunity to provide additional information about resolution and timesteps.
Line 165: “2.2.3 Forest Biomass Constraints for Slow Processes”
Based on the section, the only disturbance used for setting the age-curve is fire? With no consideration of logging, large-scale drought mortality or pest damage?
This is more to check that all that data was filtered out and only locations that had forest fires were used here? Because if that is the case, it needs to be more explicitly explained and justified in the manuscript.
And then to the more important factor in that it is utterly unclear based on the explanations here how this data is used for calibration? Is it sampled at certain time-intervals? Is it just the end point? This part should not be as obfuscated as it is.
Line 178: “This conversion was performed using spatially explicit Root-to-Shoot (R:S) ratios derived from a global map of root biomass, which was generated by a random forest model trained on extensive field measurements (Huang et al., 2021).”
Does the conversion assume that this relationship stays same regardless of forest maturity? It is fine if it does, I would simply recommend adding a sentence here acknowledging that assumption.
Line 182: “2.2.4 Data for Parameter Priors”
In section 2.1, it is established that the work here calibrates 26 parameters. It is not clear which of those parameters the data discussed in this section is used as constraints or how the rest of the parameters are then constrained. Prior information, expert opinion, something else?
Line 199: “We specifically extracted the 1985–2020 mean areal fractions of boreal needleleaf evergreen (ENF), boreal broadleaf summergreen (DBF), and boreal needleleaf deciduous (DNF).”
I raised this question before, but when calculating these fractions, is the assumption here that there are no other disturbances except fire affecting these systems? If so, what is it based on?
Line 202: “2.2.6 Independent Validation Data”
In the following section, the manuscript defines that it is using parts of the time series for independent validation and this particular data is not mentioned. This is especially confusing in the Results section where the validation dataset discussed is somewhat confusing in itself and this dataset is not referred to as the validation data.
Also I don’t quite comprehend the implementation of this as why would the data from plot-level be aggregated to grid-level if it is being used for validation? Why not run the model just on a plot level? And what does the aggregation even exactly mean in this context?
Line 211: “In other words, within the differentiable framework, initial conditions can be treated as parameters and we can compute the gradients/sensitivity of the model to those initial conditions.”
I didn’t quite understand this statement as the initial conditions can be treated as parameters by every calibration method. Thus, this isn’t something allowed only by the differentiable framework as implied here.
The challenge is, and why it is not that often done, is that it introduces a massive equifinality/uncertainty issue with the results and this possibility is not tested for at all in the resulting analysis.
Furthermore, it remains completely unclear to me that at what exactly is the initial state calibrated here? Like I understand that it is the model pools, but at what time? Since you are modeling the forest regrowth after a forest fire, does that not start from zero? If this is the initial state at the start of the validation period, which I am concerned of, this would essentially everything in that work somewhat pointless.
Line 214: “To capture the long-term legacy effects of carbon turnover and accumulation, we designed a 107-year continuous simulation experiment (growth-curve-constrained experiment) for each grid cell.”
As write this, I have read this section now multiple times and still have really no grasp in how the work here is using the growth curve as a constraint, especially since a few paragraphs later it is stated that for all experiments use the observational data from 2001 to 2012.
Are you repeating the initial growth period with the different parameter sets? As in that case your calibration period wouldn’t be the one specified later as it would cover a longer time window. What is the actual measurement used as a constraint here and how is it implemented in the calibration?
Line 215: “The meteorological drivers from 1985 to 2020 (36 years) were cycled three times, with the final year (2020) excluded from the last cycle to align the simulation end date (2019) with the extent of the observational record.”
From the manner which I have understood the logic behind the cycling meteorological driver, and how I have implemented myself, is that one can choose a period to repeat because of an underlying assumption that the general climate averages remain the same and this introduces noise to the system. When doing this, the periods chosen have usually been from the latter half of the 20th century.
Here, though, the chosen period includes the trends due to climate change. So when cycling like this, the work is actually restarting a trend three times while also projecting the growth of the forest in a much warmer environment than it actually occurred in. Especially since the northern latitudes where the forests chosen here are located have been much more strongly affected by climate change.
If I remember correctly, the CRU-JRA dataset does also have the earlier periods. So why implement the cycling in this manner? Why not run cycle with a different climate than which you then use for the actual simulation runs?
Line 219: “To quantify the specific impact of assimilating forest biomass growth dynamics, we conducted two additional experiments, a baseline experiment and a single-year biomass-constrained experiment”
How do you initialize the forest state for these two experiments? As that would have a drastic impact on the results.
Line 227: “For all experimental setups, the calibration process utilized observational data from the period 2001–2012 (corresponding to the simulation years 89–100 for the 107-year run, and years 1–12 for the 19-year run), while data from 2013–2019 were reserved for independent validation.”
Literally at the end of the previous paragraph it is stated that the biomass data used for the comparison experiment is from the year 2020. While I can imagine that there is just that gap in calibration data where the model is ran for the total biomass, this just feels like a really clumsy manner in which to do this.
Why wasn’t the validation done by putting aside individual sites to use those purely as validation? As that would have also given you information how well the results from the calibration here can be generally applied for the PFTs in question.
Line 229: “By aligning the calibration period with simulation years 89–100, we assume that after approximately 90 years of recovery following the stand-replacing disturbance…”
This would be true only if you assumed that every stand burned down in the year 1911. Is the implication here that your calibration data doesn’t have any stands in varying states of regrowth, but all are already recovered and mature?
Line 254: “By applying the physical parameters retrieved from the baseline and the single-year biomass-constrained experiments to this same initial state from the growth-curve-constrained experiment and comparing the result against the growth curve-constrained experiment, we isolated the divergence in 100-year biomass accumulations caused solely by the differences in physical parameters.”
Again, the estimated initial state at what year?
Furthermore, this setup is fundamentally flawed. The initial state and the parameters that produced it are inherently linked. If you are running the model with an initial state that is not at balance with the model parameters, that will affect how the system actually develops over the given time frame. Hence, the results here don’t really give an isolated perspective on the impact of the physical parameters alone.
Line 286: “The observational loss (Jobs) quantifies the mismatch between model simulations and data using the negative Normalized Nash-Sutcliffe Efficiency (NNSE). The NNSE transforms the standard error metric onto a bounded range of (0, 1], preventing variables with large absolute magnitudes from dominating the optimization gradient.”
Generally, when calculating the cost functions, the uncertainties will scale the different types of measurements to be comparable with each other. While this approach here is not inherently wrong at all, but it remained unclear to me why it was necessary in this particular situation?
Line 293: “These priors specifically target the canopy efficiency (ce), LCMA and the fraction of GPP allocated to autotrophic respiration (fauto).”
Why only use these few parameter priors instead for all the calibrated parameters?
Also, are these being scaled in the same manner as with the observations.
Line 298: “Finally, we imposed ecological and dynamical constraints (EDCs) to ensure biologically realistic process rates.”
I understand the constraints themselves, but now how they are implemented in the cost function?
Line 312: “We developed separate machine learning models to predict the calibrated values of key process parameters, specifically leaf lifespan, carbon turnover rates for wood and fine roots, and NPP allocation fractions to leaf, wood, and fine roots.”
I don’t understand the purpose of this as aren’t these the parameters you are calibrating in this work? So why develop a separate machine learning models to then predict them? Apparently this is saying that it allows you to project them to different environments, but you are already using those for the calibration?
Line 313: “To attribute the spatial heterogeneity of the calibrated parameters to environmental drivers…”
While the sentence is grammatically correct, I would suggest writing this out a bit more as in the current phrasing it feels initially a bit confusing. Additionally, you are using the word attribute twice in this sentence
Results:
Line 330: “The baseline experiment, which constrained the DALEC model solely using high-frequency flux and LAI observations…”
Because there has been no explanation how the model state has been initialized for the different experiments, it is quite impossible to evaluate the performance here. Especially since the validation window is apparently 10 years in a slow-developing ecosystem where there are no disturbances, at least if I understood your implementation correctly, which essentially means that your results are going to be heavily impacted by the initial state.
Line 335: “Consequently, the baseline simulation yielded a poor fit with the 100-year biomass derived from the forest growth curves (R2 = 0.02), with a large RMSE of 138.7 Mg ha-1.”
Alright, first of all this is not neither of the validation datasets introduced in the Methodology section.
Second, and even more crucially, isn’t the whole point of the work introduced here to calibrate with the forest growth curves? Because if that is the case, is the qualifier for improvement that the two comparison experiments didn’t perform as well compared to this dataset as the experiment where that very same data is used for calibration?
If that is not what was done here, then this part here is very misleading. But if that is instead what was done, then this comparison is so fundamentally flawed.
Line 347: “This constraint largely improved the spatial performance of the model simulation, increasing the R2 to 0.99 and reducing the RMSE to 11.3 Mg ha-1.”
I am going to repeat myself here somewhat, but this is simply, for me, really crucial to get across.
Based on what was explained above, the implication is that their performance is with the same data that was used for calibration. At which point getting a R2 value that high makes sense. The other explanation would be that the initial states mentioned above are for the start of the validation period, in which case since you are apparently running mature forests, there is almost no change expected in such a short period of time.
With that stated, and I want to stress this again here, if you are getting a R2 value of 0.99 with the model and datasets used here, that should have been an instant sign that there was something severely wrong with your test setup.
Line 385: “The gradient boosting models captured ecologically consistent signals from environmental drivers…”
Ecologically consistent based on what? Also what are the sequential R2 values in reference to?
Line 418: “For the fraction of NPP allocated to wood, VPD emerged as the most important predictor, followed by annual mean temperature and solar radiation (Fig. 4b)”
This perhaps would benefit in analysing together and with more depth with the previous paragraphs as in your current result wood turnover, in essence mortality, is most explained by solar radiation while NPP allocation, thus new growth, is most driven by VPD.
This isn’t to question the results, but rather that their combination is a bit surprising as this is the growth/diminishing of the forest.
Line 462: “Although both frameworks perform grid-level parameter retrieval, they differ critically in the observational constraints used for boreal ecosystems.“
Is CARDAMOM also calibrating the initial state? Because if not, then it is important to highlight the further differences in the process.
Discussion:
Line 490: “4. Discussion”
This was a challenging section for me to comment on as, for me, there was so much vagueness and questionable comparisons in the Methodology/Results sections. Because of that, it is difficult to truly to evaluate the claims/analysis.
I do not write to be dismissive this section, but rather that this section should be gone through again after the revisions to the previous sections to see if they are still inline. I have also made some comments about the general structure that I feel would improve the manuscript.
Line 491: “A major uncertainty in terrestrial biosphere modeling arises from equifinality, where different parameter combinations yield similar fits to observational data (Bloom et al., 2016; Famiglietti et al., 2021).”
Apologies for the repetition, but this work hasn’t really dealt with equifinality at all. Different data is used for calibration which then perform differently with the validation dataset. Which is fine, but it makes it distracting that the Discussion section is started with the statement.
Line 493: “In our baseline experiment, 493 constrained solely by high-frequency observations (GPP, RECO, ET, and LAI), the model reproduced seasonal flux 494 dynamics well but showed large biases in long-term biomass stocks.”
I would recommend reworking the whole first paragraph of the discussion as for some reason, the focus here is on the comparison experiments instead of the actual new research presented in this work.
Line 498: “In our baseline experiment…”
This is the second time in this paragraph that a sentence is started with these words.
Line 508: “To constrain these slower processes…
Both this and previous paragraphs feel more like repetition of Methods and Results instead of Discussion. I would recommend starting this by instantly connecting the central parts of this research with the larger context.
Line 532: “However, by optimizing parameters against the full accumulation trajectory, DifferLand retrieves effective mean parameters that robustly represent the long-term carbon balance of the ecosystem.”
But this isn’t a question about using DifferLand as you used that will all the experiments? You are just using more data, so all calibration methods would also produce similar results?
Line 545: “Beyond forest biomass growth curves, additional long-term constraints could further reduce equifinality, particularly for slow soil carbon pools which were not directly constrained in this study.”
I would argue one cannot reduce equifinality, either it exists for the system or it doesn’t.
Line 569: “The increasing fire frequency (Magney & Pierrat, 2025) and drought-induced mortality (Peng et al., 2011) are already reducing carbon residence times.”
And harvests?
Line 576: “This implies that future ESM projections relying on fixed turnover rates may underestimate the risk of a rapid release of stored carbon as these bioclimatic barriers are removed.”
But since your approach produces fixed turnover rates based on the historical growth rate, how would it help with this issue?
Citation: https://doi.org/10.5194/egusphere-2026-2241-RC2
Model code and software
Supplementary data and code for "Constraining boreal carbon allocation and turnover by assimilating forest growth dynamics in a differentiable framework" Jincheng Wu https://zenodo.org/records/19135427
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 215 | 50 | 8 | 273 | 22 | 17 | 15 |
- HTML: 215
- PDF: 50
- XML: 8
- Total: 273
- Supplement: 22
- BibTeX: 17
- EndNote: 15
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Reviewers: T. Luke Smallman, David T. Milodowski, Tim Green
The manuscript is a study using an intermediate complexity model of the terrestrial carbon cycle and constraining it at pixel-scale with a variety of data covering different components of the carbon cycle. The text is well written, with a good flow. However, we do have significant comments to make on this manuscript – and hope the authors recognise them as intended to be constructive.
The key conclusion appears to be that time series information on wood stocks are needed to constrain carbon allocation to, and then residence time of, the wood pool. However, this conclusion is not new. The need for time series information on wood stocks has been proposed and demonstrated at site level more than a decade ago with EO time series of woody biomass used in regional and global scale analyses (further details later). Possibly we have misunderstood and the novelty the specific method being used? In which case, the choice of using a constructed chronosequence is an interesting one but not justified in the existing text.
The authors’ goal is valuable – namely constraining the slow(er) carbon pools in order to understand current carbon cycling. This understanding would enhance estimates of future responses. We strongly support the use of data assimilation (DA) approaches which provide local calibration, reflecting a realistic spatial variability of biodiversity / functional traits and avoids simplistic plant functional type approaches.
Overall, the paper has significant weaknesses and needs to be revised. We have several significant concerns.
Firstly, related to the novelty of the analysis presented. There is a significant body of published literature on DA for carbon cycle modelling that has been ignored by the current study. This literature includes recognition and demonstration of the importance of constraining the allocation of productivity and turnover rates for the long-lived carbon pools.
Fox et al. (2009) presented results from a series of data assimilation experiments using a version of DALEC and multiple different DA algorithms, highlighting that you gain little constraint on allocation and turnover of the long-lived wood C pool when reliant on constraints from observations of “fast” C-cycle processes (NEE, LAI) regardless of retrieval algorithm. Previously, Williams et al., (2005) identified the need for long-term measurements of woody biomass to improve constraints of turnover rates and residence times.
Smallman et al. (2017) presented results from a series of DA experiments using the CARDAMOM DA framework, clearly showing that assimilation of repeated woody biomass observations constrains decadal ecosystem carbon cycle uncertainty in aggrading forests. This study compared high information content analyses using local observations, including repeat observations of woody biomass over time, with retrievals using progressively sparser remotely sensed information (repeated, single, and no woody biomass observations). The results demonstrated how the inverse model representation of C cycling varies with different levels of assimilated biomass information. Assimilation of repeated biomass observations reduced uncertainty and/or bias in all ecosystem C pools not just wood, compared to analyses using single or no stock information.
Considering larger spatial domains, Smallman et al. (2021) explicitly quantified the covariance structure between uncertainty in wood dynamics and fast / slow observables, also demonstrating clearly that fast processes are weakly connected to the turnover dynamics of wood and soil.
George-Chacon et al. (2023) assimilated biomass data collected across forest regrowth chronosequences to help constrain the calibration of wood C dynamics and thus improve confidence in model forecasts. A chronosequence approach was used here due to the complex fine spatial scale disturbance patterns inherent in the local management regime.
Moreover, the use of repeat biomass observations is now commonplace in DA studies for C cycle modelling (e.g. Ge et al., 2018; Bilir et al., 2025; Williams et al., 2025; Friedlingstein et al., 2025).
We do not dispute the finding that assimilating observations on biomass accumulation leads to improved constraints on woody turnover rates and residence times, which are otherwise poorly characterised by observations of fast processes. However, given the abundance of existing literature it is unclear what new knowledge we have gained from the experiments presented.
Secondly, the experimental design choices have not been adequately justified. As a result, we are not clear that the analysis is robust or making the best use of the observations – though we recognise this manuscript is submitted to GMD. It would help to make clearer which element is the ‘new development’, the framework itself or the choice of data constraints? Or both? The methodological choices have not been adequately justified. The chosen cost-function lacks accounting of observation uncertainty in the calibration; neglecting uncertainty is the most serious concern regarding the robustness of the study. Likewise, their validation uses highly uncertain data, with potentially poor scale matching between the analysis and the data – the implications of which are not discussed. The largest challenge for the validation is also the neglecting of their respective uncertainties.
A key aspect of the methodology presented is the construction of chronosequences to provide information on long-term dynamics of the wood C pool. At present it is very difficult to evaluate the approach with the information provided. However, in general there is a need for a much clearer description of the methods used to generate the local chronosequences, and how model fit to the “observed” chronosequence is assessed in the cost function. There needs to be a clear assessment of the uncertainty associated with the methods, and a detailed interrogation of the underlying assumptions.
There are a number of challenges to constructing a representative chronosequence from 100m data that is representative of each 0.5 degree pixel. The use of recovery from detected fire disturbances seems challenging for many boreal forests, particularly in Europe, where the forests are intensively managed, and fire activity is rare due to rapid fire suppression when fires do occur. How does the approach suggested account for multi-modal disturbance and the impact of management on woody turnover and biomass stocks?
The approach to estimate "old-growth" biomass based on the local 90th percentile does not allow for environmental gradients, which may result in variation in old-growth forest biomass, and may lead to biased estimates based on a percentile when aggregated to the model pixel level. It is also unclear how such an approach applies in parts of the boreal domain where a large majority of the forest stands are in managed systems.
Relatedly, it would also be beneficial to understand how the information content of the constructed chronosequences compares to assimilating the time series of AGB data that are available from the full ESA-CCI dataset. This might be a more informative experiment than the ones provided, but it would be dependent on robust uncertainty propagation through the chronosequences into the analysis. This is needed to justify why chronosequence is needed over existing datasets, given the additional layers of assumptions and complexity needed to derive them.
The authors’ scale –appropriate corroboration is with Bloom et al., (2016), but its value is unclear. Firstly, for the global analysis presented by Bloom et al., there was no biomass observation constraint for the Boreal zone, as global aboveground biomass data at the time were limited to the tropics. The lack of observation constraint on woody biomass in the Bloom et al., 2016 analysis means that the parameters relevant to allocation and turnover carry high uncertainties, which propagate into estimates for residence time and other aspects of the C cycle. Secondly, this analysis is no longer the state-of-the-art representation of the state of the C cycle produced with the CARDAMOM framework. There have been three more recent Boreal / Arctic focussed CARDAMOM studies which include at least 1 (López-blanco et al., 2019) and sometimes repeat (Hugelius et al., 2024; Varney et al., 2026, from Trendy v13 and v14 respectively) biomass information. CARDAMOM has been contributed to the trendy model intercomparison across several years providing analyses using different combinations of repeat biomass constraint. These are all freely available to download (e.g. Trendy v14, https://datashare.ed.ac.uk/handle/10283/9174), and would provide a much more meaningful comparison against your current analysis.
Finally, the comparison with Bloom et al., (2016) lacks a comparison of the uncertainties in either analysis. CARDAMOM produces ensemble analyses for the carbon cycle, with the spread of the ensemble for a given state variable characterising the uncertainty propagated through the calibration into the diagnostic analysis. When benchmarking, it is important to account for these uncertainties to understand whether differences between approaches are significant.
Detailed comments below follow a section-by-section approach.
Introduction
L44-45: “These uncertainties primarily stem from differences in how models represent key biogeochemical and biophysical processes…”. This statement directly contradicts the conclusions Famiglietti et al., (2021) which is cited in the preceding sentence, which shows that parameter uncertainty was a dominant source of uncertainty – it is the underpinning argument for why predictive skill has not improved with added complexity. Likewise, Smallman et al., (2021), showed that parametric uncertainty alone across the Brazil was sufficient to generate a forecast spread equal in size to Earth System Model ensembles. This section could be framed better around the contrasting narratives, e.g. the traditional land surface model community which argues for increasing complexity verses the data assimilation community, which stresses a need for process complexity to be linked to and balanced by observational constraints.
L62-64: Given this statement, isn’t your later result statement that high frequency observations are not able to inform slow pools is invalid / redundant? It’s already been established, and you don’t present a hypothesis as to why this might not be the case. See Fox et al., (2009), Williams et al., (2009), Smallman et al., (2017) which made the same arguments earlier based on in-situ observations. Also, Smallman et al., (2021) which explicitly quantified the covariance structure between uncertainty in wood dynamics and fast / slow observables also showing that fast processes are weakly connected.
L69-71: The choice to create chronosequences of woody biomass from a single Earth Observation map of aboveground biomass has not been justified. Why not use the now more than decade long maps of woody biomass that exist? The EO maps are characterised by very large pixel-level uncertainties. How were these uncertainties from the source dataset propagated into the chronosequence, and how are the uncertainties in the chronosequence characterised and propagated into the calibration? It would be good to see some examples of the chronosequence data and their calibrated curves / statistical uncertainties.
L83-85: Unclear whether this statement is correct. Constraint on woody pools and their dynamics is valuable across all forests. But arguably the biggest concern for Boreal forests is the carbon stored in their soils. Something which you later state in L87.
L87-89: Unclear, are you arguing that the soils are largely static, thus the net C balance is driven by biomass change? I think this is disputable, for example, recent RECCAP2 papers focused on the permafrost domain (e.g., Hugelius et al., 2024) where soil heterotrophic respiration is roughly half the size of photosynthesis suggesting a large role of the soil.
L114-118: The carbon cycle core may be DALEC2 (Bloom & Williams, 2015) but I think you are missing this reference which includes the plant available and unavailable water concept (Yang et al., 2022).
Methods
The methods are insufficient to understand the analysis design and implementation. The analysis seems to neglect uncertainties in each dataset used in the calibration, validation and corroboration activities. This is also concerning because the uncertainties associated with these different datasets will vary substantially. The cost function (Normalised Nash-Sutcliffe Efficiency) used penalises observations away from the mean of the observations. This is not compatible with seasonal or long-term datasets as the seasonality or long-term trend in the chronosequence is informative. Relatedly, the construction of the chronosequences themselves are not adequately described to understand what assumptions have been made or their implications. Presumably biases towards the mean will penalise extremes? How do you include temporally static priors such as LCMA in this cost function. Similarly, how is the leaf economic spectrum implemented? Can you justify neglecting the observations’ uncertainties?
L128: Throughout use CEDA acronym. Should be CDEA
L136: There are several CARDAMOM-DALEC analyses conducted using the TRENDY intercomparison forcings and contributed to the Global Carbon Budget papers. These outputs are in the public domain and would match your analysis more closely. See references section below for information.
L145-155: GPP and Reco are from different datasets, have you checked whether their implied NEE is realistic / consistent with independent estimates? This is important as it seems you are neglecting the uncertainties. More questions / comments on uncertainty later.
L156: Have you evaluated the LAI dataset for appropriate LAI seasonality across the evergreen forest areas? Most EO LAI datasets show an unrealistically seasonal cycle (i.e. implying leaf mean residence time of < 1 year, in areas which should be 3-5 years) (Green et al., 2026).
L162-163: A 50 Mg/ha value seems a very low threshold (i.e., 25 MgC ha) for mature forests. Reference to justify? Can you also show examples of the chronosequences? Can you justify why you have not assimilated the actual EO biomass across the 20-year period?
L203-206: These independent data will carry their own, probably significant uncertainties. How are you accounting for these in your evaluation?
L214-218: See earlier comment asking for justification over why chronosequence chosen over assimilating repeat biomass time series in the contemporary period.
L240-242, L252-254: A 5-member ensemble is not a robust characterisation of uncertainty. This uncertainty is likely to be a gross underestimate as it has had no observational uncertainty propagated into it. Finally, this uncertainty is not presented anywhere in the paper that I can see.
L268-282: Unclear why you need a pre-calibration on this if you are assimilating GPP and LAI information in the final contemporary period. Also, as the same datasets are being used are there consequences for the outputs?
L306-308: As wood supplies carbon to soil, this soil constraint imposes an indirect constraint on the wood also.
L307-311: Can you provide more information on how within the EDCs you imposed the leaf economic constraint?
L320: Does the 5-subsets refer to the 5-ensemble member or a resampling of the medians in space?
Results
How well do you fit to your assimilated time series and prior information? These statistics would provide valuable information on the success of the calibration.
Do your different calibrations imply differences in the net carbon balance over the contemporary period? i.e., does the information propagate away from wood alone to other pools in the C cycle?
You refer to changes in e.g. NPP allocation or residence times, or their statistical fits but I cannot see the actual values in the text. It would help to quantify these changes. This information should then be included in the discussion with comparisons to the existing literature.
At various points you refer to NPP as fractions (i.e. 0-1) or percentages (0-100). This is confusing. In figure 2 you refer to NPP fractions but present numbers in 0-100.
Foliage, wood and root residence times are not consistently presented. Sometimes as turnover fraction per year, sometimes as mean residence times in years. Be consistent across all to improve clarity.
L311-333: Are these two sentences compatible? They do not appear to be.
L333-338: Somewhere can you show the actual EO data, example chronosequences and / or domain averaged chronosequence?
L342: A longer mean residence time?
L349: This is not “Notable”, it is the expected result. That observations on fast fluxes do not inform slow pools is an underpinning hypothesis noted in your introduction and evidenced elsewhere in existing literature. The lack of coupling has already been reported numerous times in the literature using DALEC (e.g., Fox et al., 2009, Williams et al., 2009, Smallman et al., 2017, Smallman et al., 2021).
L362-369: Quantify the reported numbers and their variability not just statistical differences. Furthermore, elsewhere you report R2 but here you report R (R2 equivalent 0.33). Be consistent to improve clarity.
Fig 2 (and most others). You need to show the actual estimates / evaluation data somewhere too.
L388: Leaf life span, this has a prior provided with it. How well was it fitted? Do you return the prior or have added additional information from the analysis?
L389: Wood and root dynamics. Not clear exactly what you mean here. The turnover or allocation parameters or both?
L406: “The fraction of DBF…” is this not just the inverse of the ENF fraction? You are simulating forest areas only, therefore should ENF and DBF not sum to 1?
L460-484: The authors corroborate / benchmark their analysis against a global reanalysis produced with CARDAMOM (Bloom et al., 2016). This is not an informative benchmark for several reasons outlined towards the start of this document.
Discussion
L512-513: Again, not “notable”, it is expected.
L525: Uncertainties have not been reported.
Supplementary materials
How can the Normalised Nash-Sutcliffe Efficiency be compatible with the growth curve assimilation? This surely bias’ towards the mean and penalises extremes? This issue will be consistent for all the assimilated data.
Fig S2. I think an example plot(s) showing an actual regrown curve with the uncertainties in the ESA CCI dataset, which are considerable should be given.
Fig S3. This plot proves my earlier point about the very large uncertainties inherent in ecological data and analyses. The uncertainty needs to be accounted for.
Fig 5S. why is there no equivalent plot for the slow datasets and or the prior information? Could this information be presented in the text if not as a map?
Fig S6. Add column with the actual values in at least one of the columns. E.g. the growth-curve constrained.
Fig S7. This evaluation is of little value unless you consider the uncertainties in both the analysis and the observations – assuming they are available?
Fig S8. Why is residence time not consistently presented? Present all in years, consistent with the wider literature.
References
Bilir et al., (2025). Satellite‐constrained reanalysis reveals CO2 versus climate process compensation across the global land carbon sink. AGU Advances, 6, e2025AV001689. https://doi.org/10.1029/ 2025AV001689
Famiglietti et al., (2021). Optimal model complexity for terrestrial carbon cycle prediction. Biogeosciences, 18(8), 2727-2754. https://doi.org/10.5194/bg-18-2727-2021
Fox et al., (2009). The REFLEX project: Comparing different algorithms and implementations for the inversion of a terrestrial ecosystem model against eddy covariance data. Agricultural and Forest Meteorology, 149(10), 1597-1615. https://doi.org/10.1016/j.agrformet.2009.05.002
Friedlingstein et al., (2025). Global Carbon Budget 2024. Earth System Science Data, 17(3), 965–1039. https://doi.org/10.5194/essd-17-965-2025
Friedlingstein et al., (2026). Global Carbon Budget 2025. Earth System Science Data, 18(5), 3211-3288. https://doi.org/10.5194/essd-18-3211-2026
Ge et al., (2018). Underestimated ecosystem carbon turnover time and sequestration under the steady state assumption: a perspective from long‐term data assimilation. Global Change Biology. https://doi.org/10.1111/gcb.14547
Green et al., (2026). Spurious seasonality of Earth observation LAI across three northern evergreen needleleaf forests: Implications for analyses of the carbon cycle. EGUsphere, 2026, 1-32. https://doi.org/10.5194/egusphere-2026-2222
Hugelius et al., (2024). Permafrost Region Greenhouse Gas Budgets Suggest a Weak CO2 Sink and CH4 and N2O Sources, But Magnitudes Differ Between Top‐Down and Bottom‐Up Methods. Global Biogeochemical Cycles, 38(10), Article e2023GB007969. https://doi.org/10.1029/2023GB007969
López-blanco et al., (2019). Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system. Earth System Dynamics, 10(2), 233-255. https://doi.org/10.5194/esd-10-233-2019
Smallman et al., (2017). Assimilation of repeated woody biomass observations constrains decadal ecosystem carbon cycle uncertainty in aggrading forests. Journal of Geophysical Research: Biogeosciences. https://doi.org/10.1002/2016JG003520
Smallman et al., (2021). Parameter uncertainty dominates C cycle forecast errors over most of Brazil for the 21st Century. Earth System Dynamics, 12, 1191–1237. https://doi.org/10.5194/esd-2021-17
Varney et al., (2026). Northern high latitudes could become a net carbon source below 2°C global warming, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2025-6075, 2026.
Williams et al., (2009). Improving land surface models with FLUXNET data. Biogeosciences, 6(7), 1341-1359. https://doi.org/10.5194/bg-6-1341-2009
Williams et al., (2025). Precipitation-fire-functional interactions control biomass stocks and carbon exchanges across the world’s largest savanna. Biogeosciences, 22(6), 1597–1614. https://doi.org/10.5194/bg-22-1597-2025
Yang et al., (2022). CARDAMOM-FluxVal version 1.0: a FLUXNET-based validation system for CARDAMOM carbon and water flux estimates, Geosci. Model Dev., 15, 1789–1802, https://doi.org/10.5194/gmd-15-1789-2022.