Solving calibration and reanalysis challenges of ocean biogeochemical dynamics with neural schemes: a 1D vertical model case-study
Abstract. Numerous studies in climate and ocean sciences have highlighted the crucial role of ocean biogeochemical (BGC) models in studying and monitoring the global carbon cycle. Despite major advances due to both modelling and observation efforts, the quantification and reduction of the uncertainties in ocean BGC processes remain a key challenge. These difficulties arise primarily from the scarcity of observational datasets and the considerable uncertainties in ocean physics. Current ocean physics reanalyses still struggle to accurately represent the ocean’s complex dynamics, particularly at small scales, which play a critical role in driving biogeochemical cycles. Consequently, the performance of operational ocean data assimilation (DA) systems remains limited when applied to BGC dynamics, both for model calibration and reanalysis applications.
Here, we explore machine learning approaches to address these challenges. To this end, we develop an Observing System Simulation Experiment (OSSE) framework for 1D ocean BGC dynamics, designed for both training and benchmarking purposes. We rely on a differentiable programming code of a 1D Nitrate-Ammonium-Phytoplankton-Zooplankton-Detritus (NNPZD) ocean BGC model forced by solar irradiance and vertical mixing. The proposed OSSE incorporates location-dependent uncertainties in physical forcings and considers realistic configurations of in situ observing systems. Based on these OSSEs, we design numerical experiments addressing both the calibration of model parameters, the reconstruction of 1D ocean BGC dynamics and the reduction of the uncertainties in the physical forcings. For calibration and inversion, we investigate a model-based variational DA scheme, an end-to-end deep learning scheme and their hybrid combination. Our results demonstrate the potential of learning-based schemes to substantially reduce calibration uncertainties and improve physical forcing estimates. When coupled with a variational DA scheme, this approach yields enhanced reconstructions of ocean BGC dynamics. Sensitivity analyses with respect to forcing uncertainties and observing system configurations provide insights into how these findings could be extended to real-world ocean BGC modelling and monitoring.
The article presents a relevant contribution to the state-parameter estimate for BGC models. I recommend publications after accounting for the comments below:
General comment:
My impression is that the comparison between 4dVar and UNET is
unfair. 4DVar assumes perfect forcing and imperfect model dynamics,
while in the experiments, the model is actually perfect and the forcing
is not. On the other hand, UNET is able to correct for the forcing
error. As a consequence, it is possible that 4DVar overcorrects
parameters and state variables to compensate for the forcing error,
which could explain the worse results. I acknowledge that the
flexibility of UNET, which does not rely on strong assumptions, is an
advantage, but it would be good to include at least one fair
comparison, for example, a case with no forcing error (even if forcing
uncertainty is central to the work). Another option would be to
rewrite the 4DVar to explicitly account for forcing error (essentially
replacing the model‑error term with a forcing‑error term). I
understand that this may be too much work for a revised version, but
at the very least, the differing assumptions between the algorithms
should be more clearly highlighted before the discussion section. The
inconvenience of the 4DVar algorithm is acknowledged in the
discussion, but it remains difficult to understand the reason for the
improvement: is it the optimisation scheme itself (4DVar vs UNET) or
the fact that forcing uncertainty is accounted for?
Specific comments:
L5: The complexity of the biogeochemical processes themselves is also a major source of uncertainty.
L6: ocean data assimilation of the physics or the BGC observations?
L26–41: This introduction is very general. I would prefer a more direct presentation. One sentence about what BGC models do could be enough, followed immediately by L42.
L42–55: Same remark: this could be synthesised in a few lines, reminding the reader that observations are scarce, incomplete at the surface due to clouds, lack high-resolution physics information, and are very limited in the subsurface.
L7; L80: The term BGC dynamics is not very clear in this context. While BGC processes are dynamic, the reanalysis reconstructs the state, not the dynamics themselves.
L110: The G term -> The G factor?
Eq. 1 and Eq. 2: These are two time-dependent equations, but it is unclear how they are linked. Are NO3, NH4, etc. defined over the vertical?
Eq. 3: how is U linked to Kz?
L203: The variables have time and depth dimensions, but in the previous paragraph (L158), U had no depth dimension.
Eq. 5: The cost function is not introduced in the text and appears suddenly.
L247: It is difficult to understand that M^(i) integrates from time 0 to time i. This could be clarified. Perhaps start with M_theta and then introduce the composition of several M_theta.
L252: Equation is not numbered. There seems to be an inconsistency in the Delta notation. From line 238, Delta appears to be a time value, while in L251 it appears to be an index.
Eq. 7: The role of the factor tau_DA is unclear. In traditional DA, the weighting between the background term, the model‑error term, and the observation term is handled by the covariance matrices B and R.
L268: Why is this needed since the cost is computed in the observation space?
L299: The phrasing "no regard to optimisation" seems too strong. You could say that it is not fully optimised, since the focus is on demonstrating the potential of the method.
Figure 5: I do not understand the metric for the parameter error. In section 3.4, it is said that the Normalized Mean Square Error is used (a positive value), but I see negative values in panel d.
L253: compared to a mean value of 0.99 and a minimum at 0.68: please specify that this refers to the UNET.
L359: This definition could be moved to section 3.4.
Section 4.2 and 4.3: Please remind the reader which algorithm is used.
L427: This third hybrid algorithm arrives suddenly. Why is it not
introduced in the methodology section? The justification is unclear
except for the improved a posteriori performance.
Figure 10: It is not clear how the standard deviation is computed. The caption says this is one sample at the beginning, and that there is an ensemble of 10 members at the end. Please explain this clearly, or recall in the main text how the ensemble is computed.
L450: It would be interesting to see improvement over non-observed variables (other variables or future variables), because in this case a cubic-spline interpolation might already give a reasonable result.
L491: I do not see why the forcing could not be added as a control term in the 4DVar loss.
Section 5.2
The problem of accounting for various observation errors is not discussed. One advantage of 4DVar is the ability to handle evolving observing settings with changing observation density and varying error characteristics. Is it possible to add this in the training strategy? How would the algorithms react if observation error changes at inference time?
BGC models and observing systems evolve constantly. The training approach requires generating a large set of simulations for training data. How adaptive is the method if the model or observing system changes? Would retraining be required for every evolution?
Section 5.3: Could you give more details on how the emulator fits in your framework? Would you have an emulator of the physics only? Could you comment on the relative computational cost of physics models versus BGC models?