the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Regularisation of the 4DEnVar Data Assimilation method for Calibration of Land Surface Models
Abstract. The four-dimensional ensemble variational (4DEnVar) data assimilation method is an attractive choice for land surface model calibration due to its ease of use, speed, and its circumvention of tangent linear model and adjoint calculations. However, in certain circumstances, this method may prove ineffective when implemented with a highly non-linear model, leading to out-of-range or non-physical posterior parameter values. To address this, the 4DEnVar cost function can be adapted through the introduction of a hyperparameter, which inflates the weight of the background term. In this study, we explore the anticipated challenges of applying 4DEnVar with in situ eddy-covariance flux measurements and present some explanations of the expected behaviour. We show that, when using a hyperparameter to regularise the optimisation, 4DEnVar is able to successfully calibrate two complex land surface models, JULES and ORCHIDEE, with comparable accuracy in its ability to produce model runs with an improved match to the observations. To our knowledge, this is the first study of its kind to compare parameter calibration of two different land surface models in the same experimental setting. In addition to the aforementioned benefits of using 4DEnVar, we show that the method also exhibits considerable versatility, not only with regard to the land surface model to which it is applied, but also in terms of parameter set selection and ensemble size.
- Preprint
(818 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2026-132', Anonymous Referee #1, 29 Apr 2026
reply
-
AC1: 'Reply on RC1', Natalie Douglas, 14 May 2026
reply
We thank the referee for the comments given. Our replies to each comment are below:
- L. 74-76: This sounds like a result or conclusion. In the introduction, it should be rephrased as a research question to be addressed.
We have changed:
‘As we demonstrate in this study, when not set up with sufficient care, the method is capable of retrieving unlikely (out of an expected range) or even impossible (e.g. implausibly negative) parameter values, which can result in failed model runs using posterior parameters. This is due to a control variable transform of the original 4DVar cost function that it implements, affording the method limited restriction in its search for optimal and physically valid parameters (see section 2.2.3). In this article, we examine these behaviours and describe the key changes necessary to enable the assimilation of data that was not previously possible, giving special attention to the nuances of 4DEnVar’.To:
‘When not set up with sufficient care, the method is capable of retrieving unlikely (out of an expected range) or even impossible (e.g. implausibly negative) parameter values, which can result in failed model runs using posterior parameters. This behaviour arises from the control variable transform of the original 4DVar cost function that it implements, which poses limited constraints on the search for physically valid solutions (see section 2.2.3). This raises the question of under what conditions such behaviour occurs, and how it can be mitigated when applying 4DEnVar to complex land surface models. In this study, we therefore investigate these behaviours and explore the key modifications required to enable robust assimilation of real observations, with particular attention to the role of regularisation within the 4DEnVar framework- L. 76-78: Results should be interpreted in the Discussion section, not the Introduction. This sentence could be replaced with a research question that needs answering.
See above.
- L. 126 (Table 1): Replace "N/A" by "-" or by "Unitless". I am surprised that a key parameter such as rooting depth has not been included in the analysis. In what way does each model describe soil moisture stress?
We have replaced ‘N/A’ in Table 1 with ‘-‘. In JULES, rooting depth is not described as a single parameter, rather, root water uptake is governed by a prescribed root distribution profile across soil layers. Soil moisture stress is calculated as a function of soil moisture availability weighted by this root distribution. We chose to keep the parameter set small for JULES, focusing on photosynthetic parameter calibration in JULES, rather than hydrological parameter estimation, and so we did not include these aspects in the calibration. However, for ORCHIDEE, the hum_cst parameter speaks directly to root distribution and is included in the calibration. In ORCHIDEE, soil moisture stress is represented through its effects on stomatal conductance and photosynthesis, mediated by soil hydraulic properties and root water uptake.
We have changed ‘Soil hydraulic and thermal properties are determined from soil moisture and soil texture, with textures classified into 12 categories based on the USDA soil map…’ to ‘Soil hydraulic and thermal properties are determined from soil moisture, soil texture, and the vertical distribution of roots, with textures classified into 12 categories based on the USDA soil map…’ to aid clarity.
- L. 127 (Section 2.2): Does this method account for errors in the atmospheric variables that the models use as input?
Errors in the atmospheric variables are not explicitly represented in the 4DEnVar method, but they are implicitly included in the model-observation error term represented by the R-matrix. We have changed ‘...and a fixed diagonal ${\bf R}$-matrix with standard deviation of 0.35 (gC m$^{-2}$ d$^{-1}$) for both LSMs representing the joint model-observation error.’ to ‘...and a fixed diagonal ${\bf R}$-matrix with standard deviation of 0.35 (gC m$^{-2}$ d$^{-1}$) for both LSMs representing the joint model-observation error (including uncertainties in the atmospheric driving variables).’- L. 130: "observations given the target" is unclear. Do you mean "given the value of the target"?
Yes, we intended to mean “given the value of the target variables”. We have changed ‘given the target’ to ‘given the value of the target variables’.
- L. 260: The reason why this Fluxnet site has been chosen over another should be explained. Are soil moisture deficits observed at this site?
We have changed ‘. Both models are run offline at the Harvard Forest site (FLUXNET2015 Dataset, 2015) using the meteorological forcing data provided by the FluxNet network for this site (Pastorello et al., 2020). For both models, we assimilate GPP observations derived from Net Ecosystem Exchange eddy covariance tower measurements.’ to ‘. Both models are run offline at the Harvard Forest site (FLUXNET2015 Dataset, 2015) using the meteorological forcing data provided by the FluxNet network for this site (Pastorello et al., 2020). The Harvard Forest site was selected because it provides a long, continuous, and well-characterised eddy-covariance and meteorological record and experiences seasonal soil moisture limitations during summer periods, making it suitable for both photosynthetic and hydrological parameter calibration. For both models, we assimilate GPP estimates derived from eddy-covariance Net Ecosystem Exchange (NEE) measurements using standard FLUXNET partitioning methods that separate ecosystem respiration and gross primary productivity.’ in order to address this comment and the next one.
- L. 262: 'GPP observations' are used. It should be made clear that GPP is not directly observed. I suppose the GPP values you use are derived from NEE observations and an ecosystem respiration model?
See previous comment.
- L. 327 (Fig. 5): RMSD values should have units. Year 2010 is interesting because both calibrated models are wrong. Why is there such a large delay in peak GPP for the two models?
We have added units to the RMSD values in Figure 5 which shows the validation period. Both models were calibrated against years where peak GPP occurred in July, whereas the 2010 observations exhibit an earlier peak. This discrepancy likely reflects interannual variability in phenology or environmental conditions that is not fully represented in the calibrated models or forcing data. We have changed ‘Notably, the 2010 observations exhibit a monthly maximum GPP that is offset by one month compared to the simulations from both models. In this case, the models have been calibrated with observations where GPP peaks are happening in July. In 2010, however, whether down to observational error, changes in meteorological forcing data or a genuine early peak in photosynthesis, the models are unable to reach this peak seen in observations when run with the optimised parameter values’ to ‘Notably, the 2010 observations exhibit a monthly maximum GPP that is offset by one month compared to the posterior simulations from both models. In this case, the models have been calibrated against observations where peak GPP occurred in July and the optimised parameter sets therefore favour this seasonal timing. The mismatch in 2010 likely reflects interannual variability in environmental or phenological conditions that is not fully captured by the calibrated models or forcing data, leading both models to delay the onset and peak of photosynthetic activity for this year.
- L. 386 (Fig. A1): Figure A1 is not mentioned in the text. The current caption of Figure A1 includes an interpretation. This interpretation should be incorporated into the main text. The reasons why the L-curve technique cannot be used to select gamma should be clearly explained.
We have now referred to Figure A1 in the text and have removed the interpretation from the Figure A1 caption. We have provided a more coherent explanation in the main text by changing ‘However, we note that h (xb) + Y′bw∗ serves as a linear approximation to h (x∗) and so these norms are calculated without the need for posterior
model runs. Consequently, any γ chosen this way does not translate well to the calculation of genuine posterior residual norms. As an example to illustrate this point, one can simply consider the case where γ = 1 that results in implausible parameter values - the residual norm can be calculated in w-space but it is not possible to calculate the true residual norm as the model can not be run on the corresponding parameters in x-space. A deterioration of the L-curve, when plotting the prior norm against the
residual norm using the linear approximation versus using the posterior model run, can be seen in Appendix A.’ to ‘However, h (xb) + Y′bw∗ only serves as a linear approximation to h (x∗) meaning that these residual norms are evaluated in the transformed control-variable space rather than from genuine posterior model simulations. Consequently, standard regularisation selection techniques such as the L-curve criterion can become misleading in the context of 4DEnVar. In particular, the approximate residual norms may suggest improved agreement with observations even when the corresponding posterior parameter values are physically implausible or produce degraded posterior model runs. Figure A1 demonstrates this behaviour for the JULES experiment: the L-curve constructed using the linear approximation differs substantially from that obtained using posterior model runs.
Citation: https://doi.org/10.5194/egusphere-2026-132-AC1 -
AC2: 'Edit to Reply on AC1', Natalie Douglas, 14 May 2026
reply
'rooting depth is not described as a single parameter, rather, root water uptake is governed..' should say 'rooting depth is described as a single parameter but root water uptake is governed...'
Citation: https://doi.org/10.5194/egusphere-2026-132-AC2
-
AC2: 'Edit to Reply on AC1', Natalie Douglas, 14 May 2026
reply
-
AC1: 'Reply on RC1', Natalie Douglas, 14 May 2026
reply
Model code and software
Code for processing JULES model runs Natalie Douglas https://doi.org/10.5281/zenodo.19001565
Code for ORCHIDEE model Simon Beylat https://doi.org/10.14768/c68bc728-da71-4383-84df-dcde31d9c006
Code for processing ORCHIDEE model runs Simon Beylat https://doi.org/10.5281/zenodo.14609416
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 486 | 151 | 41 | 678 | 54 | 56 |
- HTML: 486
- PDF: 151
- XML: 41
- Total: 678
- BibTeX: 54
- EndNote: 56
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review of
Regularisation of the 4DEnVar Data Assimilation method for Calibration of Land Surface Models.
by Douglas et al.
General comments:
This methodological paper focuses on improving a land surface model calibration data assimilation scheme. The authors attempt to enhance an existing method by incorporating a tunable parameter (gamma), which can be adjusted to optimise the efficiency of model parameter retrieval. The performance of the new method is evaluated using data from a Fluxnet forest site in the context of GPP (photosynthesis) observation assimilation. The method is applied to two different models: JULES and ORCHIDEE. Photosynthesis parameter values are retrieved for both models. In the case of ORCHIDEE, the retrieval also includes phenology and hydrology parameters. This well-written paper could be accepted after minor revisions.
Particular comment:
- L. 74-76: This sounds like a result or conclusion. In the introduction, it should be rephrased as a research question to be addressed.
- L. 76-78: Results should be interpreted in the Discussion section, not the Introduction. This sentence could be replaced with a research question that needs answering.
- L. 126 (Table 1): Replace "N/A" by "-" or by "Unitless". I am surprised that a key parameter such as rooting depth has not been included in the analysis. In what way does each model describe soil moisture stress?
- L. 127 (Section 2.2): Does this method account for errors in the atmospheric variables that the models use as input?
- L. 130: "observations given the target" is unclear. Do you mean "given the value of the target"?
- L. 260: The reason why this Fluxnet site has been chosen over another should be explained. Are soil moisture deficits observed at this site?
- L. 262: 'GPP observations' are used. It should be made clear that GPP is not directly observed. I suppose the GPP values you use are derived from NEE observations and an ecosystem respiration model?
- L. 327 (Fig. 5): RMSD values should have units. Year 2010 is interesting because both calibrated models are wrong. Why is there such a large delay in peak GPP for the two models?
- L. 386 (Fig. A1): Figure A1 is not mentioned in the text. The current caption of Figure A1 includes an interpretation. This interpretation should be incorporated into the main text. The reasons why the L-curve technique cannot be used to select gamma should be clearly explained.