Solving calibration and reanalysis challenges of ocean biogeochemical dynamics with neural schemes: a 1D vertical model case-study

Littaye, Jean; Memery, Laurent; Fablet, Ronan

doi:10.5194/egusphere-2025-6078

Preprints

https://doi.org/10.5194/egusphere-2025-6078

Preprints

19 Jan 2026

| 19 Jan 2026

Solving calibration and reanalysis challenges of ocean biogeochemical dynamics with neural schemes: a 1D vertical model case-study

Jean Littaye, Laurent Memery, and Ronan Fablet

Abstract. Numerous studies in climate and ocean sciences have highlighted the crucial role of ocean biogeochemical (BGC) models in studying and monitoring the global carbon cycle. Despite major advances due to both modelling and observation efforts, the quantification and reduction of the uncertainties in ocean BGC processes remain a key challenge. These difficulties arise primarily from the scarcity of observational datasets and the considerable uncertainties in ocean physics. Current ocean physics reanalyses still struggle to accurately represent the ocean’s complex dynamics, particularly at small scales, which play a critical role in driving biogeochemical cycles. Consequently, the performance of operational ocean data assimilation (DA) systems remains limited when applied to BGC dynamics, both for model calibration and reanalysis applications.

Here, we explore machine learning approaches to address these challenges. To this end, we develop an Observing System Simulation Experiment (OSSE) framework for 1D ocean BGC dynamics, designed for both training and benchmarking purposes. We rely on a differentiable programming code of a 1D Nitrate-Ammonium-Phytoplankton-Zooplankton-Detritus (NNPZD) ocean BGC model forced by solar irradiance and vertical mixing. The proposed OSSE incorporates location-dependent uncertainties in physical forcings and considers realistic configurations of in situ observing systems. Based on these OSSEs, we design numerical experiments addressing both the calibration of model parameters, the reconstruction of 1D ocean BGC dynamics and the reduction of the uncertainties in the physical forcings. For calibration and inversion, we investigate a model-based variational DA scheme, an end-to-end deep learning scheme and their hybrid combination. Our results demonstrate the potential of learning-based schemes to substantially reduce calibration uncertainties and improve physical forcing estimates. When coupled with a variational DA scheme, this approach yields enhanced reconstructions of ocean BGC dynamics. Sensitivity analyses with respect to forcing uncertainties and observing system configurations provide insights into how these findings could be extended to real-world ocean BGC modelling and monitoring.

Received: 15 Dec 2025 – Discussion started: 19 Jan 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Jean Littaye, Laurent Memery, and Ronan Fablet

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-6078', Julien Brajard, 17 Mar 2026

The article presents a relevant contribution to the state-parameter estimate for BGC models. I recommend publications after accounting for the comments below:
General comment:

My impression is that the comparison between 4dVar and UNET is

unfair. 4DVar assumes perfect forcing and imperfect model dynamics,

while in the experiments, the model is actually perfect and the forcing

is not. On the other hand, UNET is able to correct for the forcing

error. As a consequence, it is possible that 4DVar overcorrects

parameters and state variables to compensate for the forcing error,

which could explain the worse results. I acknowledge that the

flexibility of UNET, which does not rely on strong assumptions, is an

advantage, but it would be good to include at least one fair

comparison, for example, a case with no forcing error (even if forcing

uncertainty is central to the work). Another option would be to

rewrite the 4DVar to explicitly account for forcing error (essentially

replacing the model‑error term with a forcing‑error term). I

understand that this may be too much work for a revised version, but

at the very least, the differing assumptions between the algorithms

should be more clearly highlighted before the discussion section. The

inconvenience of the 4DVar algorithm is acknowledged in the

discussion, but it remains difficult to understand the reason for the

improvement: is it the optimisation scheme itself (4DVar vs UNET) or

the fact that forcing uncertainty is accounted for?
Specific comments:
L5: The complexity of the biogeochemical processes themselves is also a major source of uncertainty.

L6: ocean data assimilation of the physics or the BGC observations?

L26–41: This introduction is very general. I would prefer a more direct presentation. One sentence about what BGC models do could be enough, followed immediately by L42.

L42–55: Same remark: this could be synthesised in a few lines, reminding the reader that observations are scarce, incomplete at the surface due to clouds, lack high-resolution physics information, and are very limited in the subsurface.

L7; L80: The term BGC dynamics is not very clear in this context. While BGC processes are dynamic, the reanalysis reconstructs the state, not the dynamics themselves.

L110: The G term -> The G factor?

Eq. 1 and Eq. 2: These are two time-dependent equations, but it is unclear how they are linked. Are NO3, NH4, etc. defined over the vertical?

Eq. 3: how is U linked to Kz?

L203: The variables have time and depth dimensions, but in the previous paragraph (L158), U had no depth dimension.

Eq. 5: The cost function is not introduced in the text and appears suddenly.

L247: It is difficult to understand that M^(i) integrates from time 0 to time i. This could be clarified. Perhaps start with M_theta and then introduce the composition of several M_theta.

L252: Equation is not numbered. There seems to be an inconsistency in the Delta notation. From line 238, Delta appears to be a time value, while in L251 it appears to be an index.

Eq. 7: The role of the factor tau_DA is unclear. In traditional DA, the weighting between the background term, the model‑error term, and the observation term is handled by the covariance matrices B and R.

L268: Why is this needed since the cost is computed in the observation space?

L299: The phrasing "no regard to optimisation" seems too strong. You could say that it is not fully optimised, since the focus is on demonstrating the potential of the method.

Figure 5: I do not understand the metric for the parameter error. In section 3.4, it is said that the Normalized Mean Square Error is used (a positive value), but I see negative values in panel d.

L253: compared to a mean value of 0.99 and a minimum at 0.68: please specify that this refers to the UNET.

L359: This definition could be moved to section 3.4.

Section 4.2 and 4.3: Please remind the reader which algorithm is used.

L427: This third hybrid algorithm arrives suddenly. Why is it not

introduced in the methodology section? The justification is unclear

except for the improved a posteriori performance.

Figure 10: It is not clear how the standard deviation is computed. The caption says this is one sample at the beginning, and that there is an ensemble of 10 members at the end. Please explain this clearly, or recall in the main text how the ensemble is computed.

L450: It would be interesting to see improvement over non-observed variables (other variables or future variables), because in this case a cubic-spline interpolation might already give a reasonable result.

L491: I do not see why the forcing could not be added as a control term in the 4DVar loss.

Section 5.2
The problem of accounting for various observation errors is not discussed. One advantage of 4DVar is the ability to handle evolving observing settings with changing observation density and varying error characteristics. Is it possible to add this in the training strategy? How would the algorithms react if observation error changes at inference time?

BGC models and observing systems evolve constantly. The training approach requires generating a large set of simulations for training data. How adaptive is the method if the model or observing system changes? Would retraining be required for every evolution?
Section 5.3: Could you give more details on how the emulator fits in your framework? Would you have an emulator of the physics only? Could you comment on the relative computational cost of physics models versus BGC models?

Citation: https://doi.org/10.5194/egusphere-2025-6078-RC1
- AC1: 'Reply on RC1', Jean Littaye, 27 May 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-6078/egusphere-2025-6078-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-6078-AC1
RC2:
'Comment on egusphere-2025-6078', Deep S. Banerjee, 11 Apr 2026

General comments
This work paper addresses calibration and reanalysis challenges in ocean biogeochemical dynamics using neural networks in a 1D vertical model framework. The preprint is generally well-motivated and the topic is timely, especially at the interface of data assimilation, ocean modelling, and biogeochemistry. I find the manuscript promising overall. However, I have a few scientific concerns, mainly regarding the consistency of the DA formulation, the interpretation of the “weak-constrained 4DVar” setup, clarification of the closed-system vs authors' nitrate nudging argument and whether the 2-year spin-up is sufficient to ensure a repeatable annual cycle and negligible year-to-year drift in the nitrogen inventory.

Specific comments:
(1) Section 3.1, page 9 line 218-225: “the considered forcings are U* since exact forcings U are not available” the problem is posed due to noisy forcings and the target operator is meant to coestimate parameters, states, and corrected forcings as well. But in section 3.2 (page 11, lines 265-275), the author states DA uses forcings as error-free (”..the considered forcings is assumed to be error-free..”) and forcing uncertainty is being pushed into the observation-error covariance. This is concerning and a major conceptual mismatch, as the whole point is about calibration under uncertain physical forcing. Authors should clarify this.
(2) Section 3.2, page 10, lines 233–240: “This weakly constrained 4Dvar scheme (Fablet et al., 2021b; Frerix et al., 2021; Tr.molet, 2007) seeks to identify an optimal set of BGC parameters..” and “This error is handled by dividing the studied time series into sub-windows..” : The authors implemented a “weak-constraint 4DVar” via sub-window consistency or a model error penalty (tau_DA = 10^-3), which is not a standard weak-constraint 4Dvar in the true sense with explicit model error/forcing error control. This approach is okay, but the author needs to justify whether they want to call it weak constraint 4dvar in the “standard sense” or something more appropriate.
(3) Section 2.1, page 4, lines 100–104: “This Neumann condition guarantees a closed model with no outflows…” and on contrary they say in section 2.1, page 5, lines 115–125: “the sinking of detritus leads to the depletion of nitrogen from the system over time… an additional term is introduced… .”: The authors state Neumann conditions gives a closed model with no outflows but then they say detritus sinking depletes nitrogen from the system and they need nitrate nudging to compensate. The author needs to clarify this. So, is the system closed or not?
(4) Section 2.2, page 7, lines 143–144: “Each simulation includes a 2-year period of spin-up with a constant nitrogen concentration as initial condition.” The authors should explicitly demonstrate that the chosen spin-up period is sufficient. For example, Bianchi et al. (2023) performed a 650-year spin-up in a 1D nitrogen cycle model and reported that steady state was achieved only after about 100 years under their setup. This does not necessarily mean that the present study also requires such a long spin-up, as the model configuration is different. However, the authors should show whether the final annual cycle is repeatable and whether the vertically integrated nitrogen inventory exhibits negligible year-to-year drift by the end of spin-up. This is important because biogeochemical model skill can depend on spin-up duration and residual dependence on initial conditions, and equilibration times are sensitive to the chosen boundary and/or restoring formulation.
(5) Section 3.3, page 12, line 305-310: The authors state the input tensor has size (Nbatch*Nch*NT*NT) where NT=240, but the linkage between the 120-day analysis period, 35 vertical levels and this 240*240 represenattion is not clear to me and needs more explanation.
(6) Section 2.3, page 7, lines 151–165: The authors represented uncertainty as spatial shifting of forcing, but in a real scenario, reanalysis error includes amplitude bias, timing bias, and vertical profile error due to mixing. The author should discuss these limitations properly, which will strengthen the paper.
(7) Section 4.4, paragraph 1: page 20, around lines 423–429: The authors introduced a hybrid method here, which, as a reader, appears quite late. I suggest the exact role of the hybrid relative to the two other methods should be stated earlier in the methods, especially what is and what is not re-estimated in the DA stage.

Technical corrections:
(1) Table 1: lambda is 0.05 d^-1, whereas in the model description (Section 2.1, paragraph 2: page 5, around lines 120–123), lambda is set to 1 d^-1 during Feb and 0 otherwise. One of them is wrong and the other is correct, I suppose. Clarifications needed.
(2) Figure 5 caption: “leaning-based scheme” should be learning-based scheme.
(3) Fig. 7 caption: “30-day state sampling” appears twice, in both red and brown distributions.
(4) Section 3.2, page 11, line 273: “the considered forcings are assumed” should be “…forcings are assumed”.

Citation: https://doi.org/10.5194/egusphere-2025-6078-RC2
- AC2: 'Reply on RC2', Jean Littaye, 27 May 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-6078/egusphere-2025-6078-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-6078-AC2

Jean Littaye, Laurent Memery, and Ronan Fablet

Viewed

Total article views: 2,206 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,128	968	110	2,206	85	95

HTML: 1,128
PDF: 968
XML: 110
Total: 2,206
BibTeX: 85
EndNote: 95

Views and downloads (calculated since 19 Jan 2026)

Month	HTML	PDF	XML	Total
Jan 2026	430	245	50	725
Feb 2026	254	208	26	488
Mar 2026	311	354	22	687
Apr 2026	77	106	5	188
May 2026	19	35	4	58
Jun 2026	10	7	0	17
Jul 2026	27	13	3	43

Cumulative views and downloads (calculated since 19 Jan 2026)

Month	HTML	PDF	XML	Total
Jan 2026	430	245	50	725
Feb 2026	254	208	26	488
Mar 2026	311	354	22	687
Apr 2026	77	106	5	188
May 2026	19	35	4	58
Jun 2026	10	7	0	17
Jul 2026	27	13	3	43

Viewed (geographical distribution)

Total article views: 2,173 (including HTML, PDF, and XML) Thereof 2,173 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 01 Aug 2026

Short summary

A realistic representation of ocean carbon exchanges through a biogeochemical (BGC) model depends heavily on its parameterisation. However, this calibration is often hindered by an inaccurate representation of small-scale ocean physical dynamics, which are common in physical reanalysis. Here, a novel learning-based method enables a robust estimation of BGC states and parameters, and correction of physical forcing, despite physical forcing uncertainties and sparse observations.


Total:	0
HTML:	0
PDF:	0
XML:	0