Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).
Automatic reduction of ocean biogeochemical models: a case study with BFM (v5.3)
Malik J. Jordan,Emily F. Klee,Peter E. Hamlington,Nicole S. Lovenduski,and Kyle E. Niemeyer
Abstract. Modeling biogeochemical processes in ocean fluid dynamics simulations is computationally expensive, necessitating efficient model reduction techniques. Large-scale biophysical simulations, such as high-resolution large-eddy simulations (LES) of the upper ocean, require significant computing resources to capture small-scale turbulent processes while also resolving the evolution of reactive biogeochemical tracers. However, the complexity of existing biogeochemical models, such as the Biogeochemical Flux Model (BFM) which resolves 56 state variables, leads to unfeasibly high computational costs when represented in detailed LES. To address this, we applied model reduction techniques from the field of combustion to systematically reduce the complexity of the BFM while maintaining high fidelity. Specifically, we developed a modified version of the Directed Relation Graph with Error Propagation method and applied it to a 50-state-variable BFM. By analyzing 24 reduction scenarios, we produced five reduced models containing between 1 and 36 state variables capable of accurately capturing trends in concentration of the target fields. The results demonstrate the effectiveness of this reduction approach in preserving key biogeochemical dynamics while significantly reducing model size and complexity, paving the way for more efficient high-resolution ocean biogeochemical simulations.
Received: 18 Jun 2025 – Discussion started: 05 Aug 2025
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
This manuscript addresses the important and computationally challenging problem of reducing the complexity of ocean biogeochemical models. The authors adapt a model reduction technique from combustion science (DRGEP) and apply it to the 50-state-variable Biogeochemical Flux Model (BFM), generating a suite of smaller models. The topic is timely and of significant interest to the ocean modeling community.
However, while the goal is laudable, the manuscript in its current form suffers from several fundamental scientific flaws in its methodology, interpretation of results, and the substantiation of its core claims. The proposed "modified DRGEP" method is presented without sufficient theoretical justification and appears ad hoc. Its application requires manual interventions that undermine its claim of being an "automatic" process. Furthermore, the study's conclusions are drawn from a very narrow set of environmental conditions, leading to overstated claims of generalizability. The results include spurious findings, such as a single-variable oxygen model, which stem from a misapplication of the reduction technique. Finally, the central motivation of the paper—improving computational efficiency—is never quantitatively demonstrated.
Due to these significant issues, the manuscript is not suitable for publication in its current state. A major revision is required to address the methodological weaknesses and to provide a more rigorous validation and analysis of the results.
Specific Comments
The original DRGEP method is rooted in the principles of chemical kinetics. This manuscript replaces the mechanistically-derived interaction coefficient with an empirical "error matrix". This approach is problematic because: (i) The method for calculating the modified direct interaction coefficient—by removing state variables one by one and normalizing by the maximum row-wise error—is not theoretically justified. Why is this specific normalization chosen? It could introduce biases, and its relationship to the propagation of error through a coupled, non-linear system is unclear. (ii) The authors should provide a much more robust defense of this methodological choice, perhaps by testing it on a simpler, known system to demonstrate its validity before applying it to a complex model like the BFM.
The authors report that they were required to manually retain ammonium, all phytoplankton chlorophyll constituents, and in one case nitrate, to prevent the model from failing. The justification is that these variables play roles "not completely captured using the interaction coefficients". This is a critical admission that the proposed method is flawed and fails on its own to identify indisputably essential components of the ecosystem. An objective and truly automatic method should not require such a priori expert knowledge to ensure a biologically sensible outcome.
The entire analysis is based on a single set of climatological forcing data from the BATS site. The conclusion that 14 state variables (including all mesozooplankton, silicate, and semi-refractory DOM) can be removed is therefore only valid for this specific location and condition, yet it is presented as a major finding of the paper. The removal of entire functional groups like diatoms and mesozooplankton makes the reduced models fundamentally unsuitable for application in many other oceanic regimes (e.g., upwelling systems, polar oceans) or for studying key ecosystem processes like trophic transfer and carbon export.
The Oxygen3 scenario produced a model with only a single state variable, oxygen, leading the authors to conclude that peak oxygen concentration is driven more by air-sea flux than biology. This conclusion is scientifically unsound. The BFM's own equations show that oxygen is directly consumed and produced by biological processes. The reduction algorithm has simply identified the largest term in the budget at the surface and erroneously concluded that all other coupled terms are unimportant. This result should be presented as a failure of the method under these conditions, not as a valid scientific insight.
The manuscript is motivated entirely by the need to reduce the high computational cost of biogeochemical models. However, the paper presents zero data to support the claim that the reduced models are more efficient. The authors must provide a quantitative analysis of the computational speed-up for each reduced model versus the full BFM50. Without this, the primary justification for the work is missing.
The reduction algorithm is guided by errors in average concentration, peak concentration, and time of peak. These metrics can easily mask significant errors in the model's dynamic behavior. They do not constrain process rates (e.g., NPP, grazing) or the duration of events (e.g., a bloom). A model could match a peak value but have completely unrealistic fluxes, making it unsuitable for most scientific applications. The authors should evaluate the reduced models against a wider set of process-based metrics.
The study's entire methodology is built upon the "modified direct interaction coefficient," which is derived from a "percent error" in the instantaneous rate of change, dCA/dt​​ (Eq. 6). This metric is fundamentally unstable for stiff, non-linear systems like biogeochemical models. For state variables that are near quasi-steady state, the denominator, ∣dCA/dt​​∣original​, will approach zero. This will cause the ratio to become numerically explosive, massively amplifying any minor numerical noise or transient fluctuation. The authors do not describe any form of regularization, time-averaging, or thresholding to handle this critical instability, which means the foundation of their graph reduction is unreliable.
After calculating the error matrix, the authors normalize each row by its maximum value to create coefficients bounded between [0,1] (Eq. 7). This step erases the absolute magnitude of the sensitivities. Consequently, a state variable that is highly sensitive to multiple perturbations becomes indistinguishable from a variable that is only weakly sensitive to all perturbations. Applying a single, global cutoff threshold, ϵ, to these normalized—and no longer comparable—values has no consistent physical or biogeochemical meaning, likely leading to the erroneous pruning or retention of state variables.
The tables summarizing the reduction scenarios (e.g., Tables 1-5) contain a column labeled "% Error". However, the values reported in this column are not percentages but are given in absolute physical units (e.g., mg Chl−a m−3, mmol N m−3). This is a significant inconsistency, not a minor typographical error. It creates a fundamental ambiguity in interpreting the results and undermines the credibility of the automated reduction algorithm, which is described as terminating when a "percent error" exceeds a user-specified limit.
Malik J. Jordan,Emily F. Klee,Peter E. Hamlington,Nicole S. Lovenduski,and Kyle E. Niemeyer
Data sets
Data, plotting scripts, and figures for "Automatic reduction of ocean biogeochemical models: a case study with BFM (v5.3)"Malik J. Jordan, Emily F. Klee, Peter E. Hamlington, Nicole S. Lovenduski, and Kyle E. Niemeyer https://doi.org/10.5281/zenodo.16624062
We developed a method to simplify complex ocean biogeochemical models so they can run faster in computer simulations without losing important details. By adapting techniques from combustion science, we created smaller versions of a large ocean model that still accurately represent key changes in ocean biology and chemistry. This work helps make detailed ocean simulations more efficient, supporting better understanding of ocean health and climate.
We developed a method to simplify complex ocean biogeochemical models so they can run faster in...
This manuscript addresses the important and computationally challenging problem of reducing the complexity of ocean biogeochemical models. The authors adapt a model reduction technique from combustion science (DRGEP) and apply it to the 50-state-variable Biogeochemical Flux Model (BFM), generating a suite of smaller models. The topic is timely and of significant interest to the ocean modeling community.
However, while the goal is laudable, the manuscript in its current form suffers from several fundamental scientific flaws in its methodology, interpretation of results, and the substantiation of its core claims. The proposed "modified DRGEP" method is presented without sufficient theoretical justification and appears ad hoc. Its application requires manual interventions that undermine its claim of being an "automatic" process. Furthermore, the study's conclusions are drawn from a very narrow set of environmental conditions, leading to overstated claims of generalizability. The results include spurious findings, such as a single-variable oxygen model, which stem from a misapplication of the reduction technique. Finally, the central motivation of the paper—improving computational efficiency—is never quantitatively demonstrated.
Due to these significant issues, the manuscript is not suitable for publication in its current state. A major revision is required to address the methodological weaknesses and to provide a more rigorous validation and analysis of the results.
Specific Comments
The original DRGEP method is rooted in the principles of chemical kinetics. This manuscript replaces the mechanistically-derived interaction coefficient with an empirical "error matrix". This approach is problematic because: (i) The method for calculating the modified direct interaction coefficient—by removing state variables one by one and normalizing by the maximum row-wise error—is not theoretically justified. Why is this specific normalization chosen? It could introduce biases, and its relationship to the propagation of error through a coupled, non-linear system is unclear. (ii) The authors should provide a much more robust defense of this methodological choice, perhaps by testing it on a simpler, known system to demonstrate its validity before applying it to a complex model like the BFM.
Â