the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Solving calibration and reanalysis challenges of ocean biogeochemical dynamics with neural schemes: a 1D vertical model case-study
Abstract. Numerous studies in climate and ocean sciences have highlighted the crucial role of ocean biogeochemical (BGC) models in studying and monitoring the global carbon cycle. Despite major advances due to both modelling and observation efforts, the quantification and reduction of the uncertainties in ocean BGC processes remain a key challenge. These difficulties arise primarily from the scarcity of observational datasets and the considerable uncertainties in ocean physics. Current ocean physics reanalyses still struggle to accurately represent the ocean’s complex dynamics, particularly at small scales, which play a critical role in driving biogeochemical cycles. Consequently, the performance of operational ocean data assimilation (DA) systems remains limited when applied to BGC dynamics, both for model calibration and reanalysis applications.
Here, we explore machine learning approaches to address these challenges. To this end, we develop an Observing System Simulation Experiment (OSSE) framework for 1D ocean BGC dynamics, designed for both training and benchmarking purposes. We rely on a differentiable programming code of a 1D Nitrate-Ammonium-Phytoplankton-Zooplankton-Detritus (NNPZD) ocean BGC model forced by solar irradiance and vertical mixing. The proposed OSSE incorporates location-dependent uncertainties in physical forcings and considers realistic configurations of in situ observing systems. Based on these OSSEs, we design numerical experiments addressing both the calibration of model parameters, the reconstruction of 1D ocean BGC dynamics and the reduction of the uncertainties in the physical forcings. For calibration and inversion, we investigate a model-based variational DA scheme, an end-to-end deep learning scheme and their hybrid combination. Our results demonstrate the potential of learning-based schemes to substantially reduce calibration uncertainties and improve physical forcing estimates. When coupled with a variational DA scheme, this approach yields enhanced reconstructions of ocean BGC dynamics. Sensitivity analyses with respect to forcing uncertainties and observing system configurations provide insights into how these findings could be extended to real-world ocean BGC modelling and monitoring.
- Preprint
(21359 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-6078', Julien Brajard, 17 Mar 2026
-
AC1: 'Reply on RC1', Jean Littaye, 27 May 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-6078/egusphere-2025-6078-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Jean Littaye, 27 May 2026
-
RC2: 'Comment on egusphere-2025-6078', Deep S. Banerjee, 11 Apr 2026
General comments
This work paper addresses calibration and reanalysis challenges in ocean biogeochemical dynamics using neural networks in a 1D vertical model framework. The preprint is generally well-motivated and the topic is timely, especially at the interface of data assimilation, ocean modelling, and biogeochemistry. I find the manuscript promising overall. However, I have a few scientific concerns, mainly regarding the consistency of the DA formulation, the interpretation of the “weak-constrained 4DVar” setup, clarification of the closed-system vs authors' nitrate nudging argument and whether the 2-year spin-up is sufficient to ensure a repeatable annual cycle and negligible year-to-year drift in the nitrogen inventory.
Specific comments:
(1) Section 3.1, page 9 line 218-225: “the considered forcings are U* since exact forcings U are not available” the problem is posed due to noisy forcings and the target operator is meant to coestimate parameters, states, and corrected forcings as well. But in section 3.2 (page 11, lines 265-275), the author states DA uses forcings as error-free (”..the considered forcings is assumed to be error-free..”) and forcing uncertainty is being pushed into the observation-error covariance. This is concerning and a major conceptual mismatch, as the whole point is about calibration under uncertain physical forcing. Authors should clarify this.
(2) Section 3.2, page 10, lines 233–240: “This weakly constrained 4Dvar scheme (Fablet et al., 2021b; Frerix et al., 2021; Tr.molet, 2007) seeks to identify an optimal set of BGC parameters..” and “This error is handled by dividing the studied time series into sub-windows..” : The authors implemented a “weak-constraint 4DVar” via sub-window consistency or a model error penalty (tau_DA = 10^-3), which is not a standard weak-constraint 4Dvar in the true sense with explicit model error/forcing error control. This approach is okay, but the author needs to justify whether they want to call it weak constraint 4dvar in the “standard sense” or something more appropriate.
(3) Section 2.1, page 4, lines 100–104: “This Neumann condition guarantees a closed model with no outflows…” and on contrary they say in section 2.1, page 5, lines 115–125: “the sinking of detritus leads to the depletion of nitrogen from the system over time… an additional term is introduced… .”: The authors state Neumann conditions gives a closed model with no outflows but then they say detritus sinking depletes nitrogen from the system and they need nitrate nudging to compensate. The author needs to clarify this. So, is the system closed or not?
(4) Section 2.2, page 7, lines 143–144: “Each simulation includes a 2-year period of spin-up with a constant nitrogen concentration as initial condition.” The authors should explicitly demonstrate that the chosen spin-up period is sufficient. For example, Bianchi et al. (2023) performed a 650-year spin-up in a 1D nitrogen cycle model and reported that steady state was achieved only after about 100 years under their setup. This does not necessarily mean that the present study also requires such a long spin-up, as the model configuration is different. However, the authors should show whether the final annual cycle is repeatable and whether the vertically integrated nitrogen inventory exhibits negligible year-to-year drift by the end of spin-up. This is important because biogeochemical model skill can depend on spin-up duration and residual dependence on initial conditions, and equilibration times are sensitive to the chosen boundary and/or restoring formulation.
(5) Section 3.3, page 12, line 305-310: The authors state the input tensor has size (Nbatch*Nch*NT*NT) where NT=240, but the linkage between the 120-day analysis period, 35 vertical levels and this 240*240 represenattion is not clear to me and needs more explanation.
(6) Section 2.3, page 7, lines 151–165: The authors represented uncertainty as spatial shifting of forcing, but in a real scenario, reanalysis error includes amplitude bias, timing bias, and vertical profile error due to mixing. The author should discuss these limitations properly, which will strengthen the paper.
(7) Section 4.4, paragraph 1: page 20, around lines 423–429: The authors introduced a hybrid method here, which, as a reader, appears quite late. I suggest the exact role of the hybrid relative to the two other methods should be stated earlier in the methods, especially what is and what is not re-estimated in the DA stage.
Technical corrections:
(1) Table 1: lambda is 0.05 d^-1, whereas in the model description (Section 2.1, paragraph 2: page 5, around lines 120–123), lambda is set to 1 d^-1 during Feb and 0 otherwise. One of them is wrong and the other is correct, I suppose. Clarifications needed.
(2) Figure 5 caption: “leaning-based scheme” should be learning-based scheme.
(3) Fig. 7 caption: “30-day state sampling” appears twice, in both red and brown distributions.
(4) Section 3.2, page 11, line 273: “the considered forcings are assumed” should be “…forcings are assumed”.
Citation: https://doi.org/10.5194/egusphere-2025-6078-RC2 -
AC2: 'Reply on RC2', Jean Littaye, 27 May 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-6078/egusphere-2025-6078-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Jean Littaye, 27 May 2026
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,101 | 955 | 107 | 2,163 | 85 | 95 |
- HTML: 1,101
- PDF: 955
- XML: 107
- Total: 2,163
- BibTeX: 85
- EndNote: 95
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The article presents a relevant contribution to the state-parameter estimate for BGC models. I recommend publications after accounting for the comments below:
General comment:
My impression is that the comparison between 4dVar and UNET is
unfair. 4DVar assumes perfect forcing and imperfect model dynamics,
while in the experiments, the model is actually perfect and the forcing
is not. On the other hand, UNET is able to correct for the forcing
error. As a consequence, it is possible that 4DVar overcorrects
parameters and state variables to compensate for the forcing error,
which could explain the worse results. I acknowledge that the
flexibility of UNET, which does not rely on strong assumptions, is an
advantage, but it would be good to include at least one fair
comparison, for example, a case with no forcing error (even if forcing
uncertainty is central to the work). Another option would be to
rewrite the 4DVar to explicitly account for forcing error (essentially
replacing the model‑error term with a forcing‑error term). I
understand that this may be too much work for a revised version, but
at the very least, the differing assumptions between the algorithms
should be more clearly highlighted before the discussion section. The
inconvenience of the 4DVar algorithm is acknowledged in the
discussion, but it remains difficult to understand the reason for the
improvement: is it the optimisation scheme itself (4DVar vs UNET) or
the fact that forcing uncertainty is accounted for?
Specific comments:
L5: The complexity of the biogeochemical processes themselves is also a major source of uncertainty.
L6: ocean data assimilation of the physics or the BGC observations?
L26–41: This introduction is very general. I would prefer a more direct presentation. One sentence about what BGC models do could be enough, followed immediately by L42.
L42–55: Same remark: this could be synthesised in a few lines, reminding the reader that observations are scarce, incomplete at the surface due to clouds, lack high-resolution physics information, and are very limited in the subsurface.
L7; L80: The term BGC dynamics is not very clear in this context. While BGC processes are dynamic, the reanalysis reconstructs the state, not the dynamics themselves.
L110: The G term -> The G factor?
Eq. 1 and Eq. 2: These are two time-dependent equations, but it is unclear how they are linked. Are NO3, NH4, etc. defined over the vertical?
Eq. 3: how is U linked to Kz?
L203: The variables have time and depth dimensions, but in the previous paragraph (L158), U had no depth dimension.
Eq. 5: The cost function is not introduced in the text and appears suddenly.
L247: It is difficult to understand that M^(i) integrates from time 0 to time i. This could be clarified. Perhaps start with M_theta and then introduce the composition of several M_theta.
L252: Equation is not numbered. There seems to be an inconsistency in the Delta notation. From line 238, Delta appears to be a time value, while in L251 it appears to be an index.
Eq. 7: The role of the factor tau_DA is unclear. In traditional DA, the weighting between the background term, the model‑error term, and the observation term is handled by the covariance matrices B and R.
L268: Why is this needed since the cost is computed in the observation space?
L299: The phrasing "no regard to optimisation" seems too strong. You could say that it is not fully optimised, since the focus is on demonstrating the potential of the method.
Figure 5: I do not understand the metric for the parameter error. In section 3.4, it is said that the Normalized Mean Square Error is used (a positive value), but I see negative values in panel d.
L253: compared to a mean value of 0.99 and a minimum at 0.68: please specify that this refers to the UNET.
L359: This definition could be moved to section 3.4.
Section 4.2 and 4.3: Please remind the reader which algorithm is used.
L427: This third hybrid algorithm arrives suddenly. Why is it not
introduced in the methodology section? The justification is unclear
except for the improved a posteriori performance.
Figure 10: It is not clear how the standard deviation is computed. The caption says this is one sample at the beginning, and that there is an ensemble of 10 members at the end. Please explain this clearly, or recall in the main text how the ensemble is computed.
L450: It would be interesting to see improvement over non-observed variables (other variables or future variables), because in this case a cubic-spline interpolation might already give a reasonable result.
L491: I do not see why the forcing could not be added as a control term in the 4DVar loss.
Section 5.2
The problem of accounting for various observation errors is not discussed. One advantage of 4DVar is the ability to handle evolving observing settings with changing observation density and varying error characteristics. Is it possible to add this in the training strategy? How would the algorithms react if observation error changes at inference time?
BGC models and observing systems evolve constantly. The training approach requires generating a large set of simulations for training data. How adaptive is the method if the model or observing system changes? Would retraining be required for every evolution?
Section 5.3: Could you give more details on how the emulator fits in your framework? Would you have an emulator of the physics only? Could you comment on the relative computational cost of physics models versus BGC models?