Automatic reduction of ocean biogeochemical models: a case study with BFM (v5.3)

Jordan, Malik J.; Klee, Emily F.; Hamlington, Peter E.; Lovenduski, Nicole S.; Niemeyer, Kyle E.

doi:10.5194/egusphere-2025-2901

Preprints

https://doi.org/10.5194/egusphere-2025-2901

Preprints

05 Aug 2025

| 05 Aug 2025

Automatic reduction of ocean biogeochemical models: a case study with BFM (v5.3)

Malik J. Jordan, Emily F. Klee, Peter E. Hamlington, Nicole S. Lovenduski, and Kyle E. Niemeyer

Abstract. Modeling biogeochemical processes in ocean fluid dynamics simulations is computationally expensive, necessitating efficient model reduction techniques. Large-scale biophysical simulations, such as high-resolution large-eddy simulations (LES) of the upper ocean, require significant computing resources to capture small-scale turbulent processes while also resolving the evolution of reactive biogeochemical tracers. However, the complexity of existing biogeochemical models, such as the Biogeochemical Flux Model (BFM) which resolves 56 state variables, leads to unfeasibly high computational costs when represented in detailed LES. To address this, we applied model reduction techniques from the field of combustion to systematically reduce the complexity of the BFM while maintaining high fidelity. Specifically, we developed a modified version of the Directed Relation Graph with Error Propagation method and applied it to a 50-state-variable BFM. By analyzing 24 reduction scenarios, we produced five reduced models containing between 1 and 36 state variables capable of accurately capturing trends in concentration of the target fields. The results demonstrate the effectiveness of this reduction approach in preserving key biogeochemical dynamics while significantly reducing model size and complexity, paving the way for more efficient high-resolution ocean biogeochemical simulations.

Received: 18 Jun 2025 – Discussion started: 05 Aug 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Malik J. Jordan, Emily F. Klee, Peter E. Hamlington, Nicole S. Lovenduski, and Kyle E. Niemeyer

Status: closed

RC1:
'Comment on egusphere-2025-2901', Anonymous Referee #1, 29 Aug 2025
This manuscript addresses the important and computationally challenging problem of reducing the complexity of ocean biogeochemical models. The authors adapt a model reduction technique from combustion science (DRGEP) and apply it to the 50-state-variable Biogeochemical Flux Model (BFM), generating a suite of smaller models. The topic is timely and of significant interest to the ocean modeling community.
However, while the goal is laudable, the manuscript in its current form suffers from several fundamental scientific flaws in its methodology, interpretation of results, and the substantiation of its core claims. The proposed "modified DRGEP" method is presented without sufficient theoretical justification and appears ad hoc. Its application requires manual interventions that undermine its claim of being an "automatic" process. Furthermore, the study's conclusions are drawn from a very narrow set of environmental conditions, leading to overstated claims of generalizability. The results include spurious findings, such as a single-variable oxygen model, which stem from a misapplication of the reduction technique. Finally, the central motivation of the paper—improving computational efficiency—is never quantitatively demonstrated.
Due to these significant issues, the manuscript is not suitable for publication in its current state. A major revision is required to address the methodological weaknesses and to provide a more rigorous validation and analysis of the results.
Specific Comments

The original DRGEP method is rooted in the principles of chemical kinetics. This manuscript replaces the mechanistically-derived interaction coefficient with an empirical "error matrix". This approach is problematic because: (i) The method for calculating the modified direct interaction coefficient—by removing state variables one by one and normalizing by the maximum row-wise error—is not theoretically justified. Why is this specific normalization chosen? It could introduce biases, and its relationship to the propagation of error through a coupled, non-linear system is unclear. (ii) The authors should provide a much more robust defense of this methodological choice, perhaps by testing it on a simpler, known system to demonstrate its validity before applying it to a complex model like the BFM.

The authors report that they were required to manually retain ammonium, all phytoplankton chlorophyll constituents, and in one case nitrate, to prevent the model from failing. The justification is that these variables play roles "not completely captured using the interaction coefficients". This is a critical admission that the proposed method is flawed and fails on its own to identify indisputably essential components of the ecosystem. An objective and truly automatic method should not require such a priori expert knowledge to ensure a biologically sensible outcome.

The entire analysis is based on a single set of climatological forcing data from the BATS site. The conclusion that 14 state variables (including all mesozooplankton, silicate, and semi-refractory DOM) can be removed is therefore only valid for this specific location and condition, yet it is presented as a major finding of the paper. The removal of entire functional groups like diatoms and mesozooplankton makes the reduced models fundamentally unsuitable for application in many other oceanic regimes (e.g., upwelling systems, polar oceans) or for studying key ecosystem processes like trophic transfer and carbon export.

The Oxygen3 scenario produced a model with only a single state variable, oxygen, leading the authors to conclude that peak oxygen concentration is driven more by air-sea flux than biology. This conclusion is scientifically unsound. The BFM's own equations show that oxygen is directly consumed and produced by biological processes. The reduction algorithm has simply identified the largest term in the budget at the surface and erroneously concluded that all other coupled terms are unimportant. This result should be presented as a failure of the method under these conditions, not as a valid scientific insight.

The manuscript is motivated entirely by the need to reduce the high computational cost of biogeochemical models. However, the paper presents zero data to support the claim that the reduced models are more efficient. The authors must provide a quantitative analysis of the computational speed-up for each reduced model versus the full BFM50. Without this, the primary justification for the work is missing.

The reduction algorithm is guided by errors in average concentration, peak concentration, and time of peak. These metrics can easily mask significant errors in the model's dynamic behavior. They do not constrain process rates (e.g., NPP, grazing) or the duration of events (e.g., a bloom). A model could match a peak value but have completely unrealistic fluxes, making it unsuitable for most scientific applications. The authors should evaluate the reduced models against a wider set of process-based metrics.

The study's entire methodology is built upon the "modified direct interaction coefficient," which is derived from a "percent error" in the instantaneous rate of change, dCA/dt (Eq. 6). This metric is fundamentally unstable for stiff, non-linear systems like biogeochemical models. For state variables that are near quasi-steady state, the denominator, ∣dCA/dt∣original, will approach zero. This will cause the ratio to become numerically explosive, massively amplifying any minor numerical noise or transient fluctuation. The authors do not describe any form of regularization, time-averaging, or thresholding to handle this critical instability, which means the foundation of their graph reduction is unreliable.

After calculating the error matrix, the authors normalize each row by its maximum value to create coefficients bounded between [0,1] (Eq. 7). This step erases the absolute magnitude of the sensitivities. Consequently, a state variable that is highly sensitive to multiple perturbations becomes indistinguishable from a variable that is only weakly sensitive to all perturbations. Applying a single, global cutoff threshold, ϵ, to these normalized—and no longer comparable—values has no consistent physical or biogeochemical meaning, likely leading to the erroneous pruning or retention of state variables.

The tables summarizing the reduction scenarios (e.g., Tables 1-5) contain a column labeled "% Error". However, the values reported in this column are not percentages but are given in absolute physical units (e.g., mg Chl−a m−3, mmol N m−3). This is a significant inconsistency, not a minor typographical error. It creates a fundamental ambiguity in interpreting the results and undermines the credibility of the automated reduction algorithm, which is described as terminating when a "percent error" exceeds a user-specified limit.
Citation: https://doi.org/10.5194/egusphere-2025-2901-RC1
- AC1: 'Reply on RC1', Kyle Niemeyer, 15 Nov 2025
  
  Thank you for your feedback. The edits we will make in response to your comments will strengthen the manuscript. We hope that our additions and revisions satisfactorily addressed these concerns, and clarified the goals of the manuscript. Please see attached our detailed responses.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2901-AC1
RC2:
'Comment on egusphere-2025-2901', Anonymous Referee #2, 07 Oct 2025

GENERAL COMMENTS
The manuscript titled "Automatic reduction of ocean biogeochemical models: a case study with BFM (v5.3)" provides a comprehensive examination of simplifying the complexity of ocean biogeochemical models using the DRGEP method. The authors demonstrate how they reduced the 50-state-variable BFM model to a smaller set of variables while preserving essential system behavior. By testing various scenarios involving living organic matter, nutrients, oxygen, and carbon, the study successfully achieves model reduction without significant loss of information. The authors further suggest that this reduction approach could be applied to other biogeochemical models, potentially leading to substantial decreases in computational costs for highly complex systems.
While the manuscript effectively describes the methodology and results of the reduction process, several key issues should be addressed before publication. In particular, the study should more clearly articulate its primary research objective and emphasize the novel contributions of this work to strengthen its scientific impact.
MAJOR COMMENTS
1. The authors have effectively conducted a reduction of the ocean biogeochemical model, particularly for the Princeton Ocean Model coupled with the BFM biogeochemical module. In the abstract, the authors mention that large-eddy simulations (LES) require high computational resources. As I understand the manuscript, the reduction of variables aims to enable the implementation of a simplified biogeochemical model within LES that explicitly resolves turbulence. However, the manuscript lacks a clear statement of the main objective behind reducing the ocean biogeochemical variables. Is the goal to develop a 1D turbulence-resolving LES or to reduce model complexity in climate-scale Earth System Models? It would be helpful for the authors to clarify in the abstract what type of model development—LES or large-scale climate modeling—the variable reduction and computational efficiency are intended to support.
2. The authors have conducted tests using ocean biogeochemical datasets at the BATS site and compared the full version of the 1D BFM with its reduced versions across various biogeochemical variables. However, first and foremost, the full-version BFM simulation should be directly compared with the BATS observations. Even though the BFM development team may have demonstrated this in previous studies, a fundamental step in model validation is to evaluate the baseline biases between the observations and the full-variable model and understand the reduced-variable results based on these biases.
In addition, it is unclear what specific changes have been made between the full-variable configuration (BFM50) and the reduced-variable simulations (e.g., BFM23, BFM36). Therefore, I strongly recommend that the authors clearly present the differences between the full and reduced versions in Figures 5, 6, 7, and related supplementary figures. For example, in Section 4.1.2, the authors note oversaturation of nitrate and phosphate and underestimation of PON and DIC; however, these differences warrant closer examination and clearer comparison.
3. Although the reduction of BGC variables is well demonstrated using the BATS dataset, I am concerned that this reduction may lead to overfitting specific to the BATS conditions. This raises an important question regarding the sensitivity of the reduction method to physical environmental processes, including vertical diapycnal mixing, seasonal entrainment, and isopycnal mixing. While there are limited sites with long-term biogeochemical time-series, I recommend performing a similar analysis using the HOTS dataset. Alternatively, applying the reduced-variable model to other locations for validation would strengthen the robustness of the reduced BATS-based model and demonstrate its potential applicability for future model developments.
MINOR COMMENTS
20: Recommend to add the most recent ocean biogeochemical models such as MARBL, PISCES.

Citation: https://doi.org/10.5194/egusphere-2025-2901-RC2
- AC2: 'Reply on RC2', Kyle Niemeyer, 15 Nov 2025
  
  Thank you for your positive feedback, and helping us improve our communication around the objectives and contributions. Please see our detailed responses in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2901-AC2

Status: closed

RC1:
'Comment on egusphere-2025-2901', Anonymous Referee #1, 29 Aug 2025
This manuscript addresses the important and computationally challenging problem of reducing the complexity of ocean biogeochemical models. The authors adapt a model reduction technique from combustion science (DRGEP) and apply it to the 50-state-variable Biogeochemical Flux Model (BFM), generating a suite of smaller models. The topic is timely and of significant interest to the ocean modeling community.
However, while the goal is laudable, the manuscript in its current form suffers from several fundamental scientific flaws in its methodology, interpretation of results, and the substantiation of its core claims. The proposed "modified DRGEP" method is presented without sufficient theoretical justification and appears ad hoc. Its application requires manual interventions that undermine its claim of being an "automatic" process. Furthermore, the study's conclusions are drawn from a very narrow set of environmental conditions, leading to overstated claims of generalizability. The results include spurious findings, such as a single-variable oxygen model, which stem from a misapplication of the reduction technique. Finally, the central motivation of the paper—improving computational efficiency—is never quantitatively demonstrated.
Due to these significant issues, the manuscript is not suitable for publication in its current state. A major revision is required to address the methodological weaknesses and to provide a more rigorous validation and analysis of the results.
Specific Comments

The original DRGEP method is rooted in the principles of chemical kinetics. This manuscript replaces the mechanistically-derived interaction coefficient with an empirical "error matrix". This approach is problematic because: (i) The method for calculating the modified direct interaction coefficient—by removing state variables one by one and normalizing by the maximum row-wise error—is not theoretically justified. Why is this specific normalization chosen? It could introduce biases, and its relationship to the propagation of error through a coupled, non-linear system is unclear. (ii) The authors should provide a much more robust defense of this methodological choice, perhaps by testing it on a simpler, known system to demonstrate its validity before applying it to a complex model like the BFM.

The authors report that they were required to manually retain ammonium, all phytoplankton chlorophyll constituents, and in one case nitrate, to prevent the model from failing. The justification is that these variables play roles "not completely captured using the interaction coefficients". This is a critical admission that the proposed method is flawed and fails on its own to identify indisputably essential components of the ecosystem. An objective and truly automatic method should not require such a priori expert knowledge to ensure a biologically sensible outcome.

The entire analysis is based on a single set of climatological forcing data from the BATS site. The conclusion that 14 state variables (including all mesozooplankton, silicate, and semi-refractory DOM) can be removed is therefore only valid for this specific location and condition, yet it is presented as a major finding of the paper. The removal of entire functional groups like diatoms and mesozooplankton makes the reduced models fundamentally unsuitable for application in many other oceanic regimes (e.g., upwelling systems, polar oceans) or for studying key ecosystem processes like trophic transfer and carbon export.

The Oxygen3 scenario produced a model with only a single state variable, oxygen, leading the authors to conclude that peak oxygen concentration is driven more by air-sea flux than biology. This conclusion is scientifically unsound. The BFM's own equations show that oxygen is directly consumed and produced by biological processes. The reduction algorithm has simply identified the largest term in the budget at the surface and erroneously concluded that all other coupled terms are unimportant. This result should be presented as a failure of the method under these conditions, not as a valid scientific insight.

The manuscript is motivated entirely by the need to reduce the high computational cost of biogeochemical models. However, the paper presents zero data to support the claim that the reduced models are more efficient. The authors must provide a quantitative analysis of the computational speed-up for each reduced model versus the full BFM50. Without this, the primary justification for the work is missing.

The reduction algorithm is guided by errors in average concentration, peak concentration, and time of peak. These metrics can easily mask significant errors in the model's dynamic behavior. They do not constrain process rates (e.g., NPP, grazing) or the duration of events (e.g., a bloom). A model could match a peak value but have completely unrealistic fluxes, making it unsuitable for most scientific applications. The authors should evaluate the reduced models against a wider set of process-based metrics.

The study's entire methodology is built upon the "modified direct interaction coefficient," which is derived from a "percent error" in the instantaneous rate of change, dCA/dt (Eq. 6). This metric is fundamentally unstable for stiff, non-linear systems like biogeochemical models. For state variables that are near quasi-steady state, the denominator, ∣dCA/dt∣original, will approach zero. This will cause the ratio to become numerically explosive, massively amplifying any minor numerical noise or transient fluctuation. The authors do not describe any form of regularization, time-averaging, or thresholding to handle this critical instability, which means the foundation of their graph reduction is unreliable.

After calculating the error matrix, the authors normalize each row by its maximum value to create coefficients bounded between [0,1] (Eq. 7). This step erases the absolute magnitude of the sensitivities. Consequently, a state variable that is highly sensitive to multiple perturbations becomes indistinguishable from a variable that is only weakly sensitive to all perturbations. Applying a single, global cutoff threshold, ϵ, to these normalized—and no longer comparable—values has no consistent physical or biogeochemical meaning, likely leading to the erroneous pruning or retention of state variables.

The tables summarizing the reduction scenarios (e.g., Tables 1-5) contain a column labeled "% Error". However, the values reported in this column are not percentages but are given in absolute physical units (e.g., mg Chl−a m−3, mmol N m−3). This is a significant inconsistency, not a minor typographical error. It creates a fundamental ambiguity in interpreting the results and undermines the credibility of the automated reduction algorithm, which is described as terminating when a "percent error" exceeds a user-specified limit.
Citation: https://doi.org/10.5194/egusphere-2025-2901-RC1
- AC1: 'Reply on RC1', Kyle Niemeyer, 15 Nov 2025
  
  Thank you for your feedback. The edits we will make in response to your comments will strengthen the manuscript. We hope that our additions and revisions satisfactorily addressed these concerns, and clarified the goals of the manuscript. Please see attached our detailed responses.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2901-AC1
RC2:
'Comment on egusphere-2025-2901', Anonymous Referee #2, 07 Oct 2025

GENERAL COMMENTS
The manuscript titled "Automatic reduction of ocean biogeochemical models: a case study with BFM (v5.3)" provides a comprehensive examination of simplifying the complexity of ocean biogeochemical models using the DRGEP method. The authors demonstrate how they reduced the 50-state-variable BFM model to a smaller set of variables while preserving essential system behavior. By testing various scenarios involving living organic matter, nutrients, oxygen, and carbon, the study successfully achieves model reduction without significant loss of information. The authors further suggest that this reduction approach could be applied to other biogeochemical models, potentially leading to substantial decreases in computational costs for highly complex systems.
While the manuscript effectively describes the methodology and results of the reduction process, several key issues should be addressed before publication. In particular, the study should more clearly articulate its primary research objective and emphasize the novel contributions of this work to strengthen its scientific impact.
MAJOR COMMENTS
1. The authors have effectively conducted a reduction of the ocean biogeochemical model, particularly for the Princeton Ocean Model coupled with the BFM biogeochemical module. In the abstract, the authors mention that large-eddy simulations (LES) require high computational resources. As I understand the manuscript, the reduction of variables aims to enable the implementation of a simplified biogeochemical model within LES that explicitly resolves turbulence. However, the manuscript lacks a clear statement of the main objective behind reducing the ocean biogeochemical variables. Is the goal to develop a 1D turbulence-resolving LES or to reduce model complexity in climate-scale Earth System Models? It would be helpful for the authors to clarify in the abstract what type of model development—LES or large-scale climate modeling—the variable reduction and computational efficiency are intended to support.
2. The authors have conducted tests using ocean biogeochemical datasets at the BATS site and compared the full version of the 1D BFM with its reduced versions across various biogeochemical variables. However, first and foremost, the full-version BFM simulation should be directly compared with the BATS observations. Even though the BFM development team may have demonstrated this in previous studies, a fundamental step in model validation is to evaluate the baseline biases between the observations and the full-variable model and understand the reduced-variable results based on these biases.
In addition, it is unclear what specific changes have been made between the full-variable configuration (BFM50) and the reduced-variable simulations (e.g., BFM23, BFM36). Therefore, I strongly recommend that the authors clearly present the differences between the full and reduced versions in Figures 5, 6, 7, and related supplementary figures. For example, in Section 4.1.2, the authors note oversaturation of nitrate and phosphate and underestimation of PON and DIC; however, these differences warrant closer examination and clearer comparison.
3. Although the reduction of BGC variables is well demonstrated using the BATS dataset, I am concerned that this reduction may lead to overfitting specific to the BATS conditions. This raises an important question regarding the sensitivity of the reduction method to physical environmental processes, including vertical diapycnal mixing, seasonal entrainment, and isopycnal mixing. While there are limited sites with long-term biogeochemical time-series, I recommend performing a similar analysis using the HOTS dataset. Alternatively, applying the reduced-variable model to other locations for validation would strengthen the robustness of the reduced BATS-based model and demonstrate its potential applicability for future model developments.
MINOR COMMENTS
20: Recommend to add the most recent ocean biogeochemical models such as MARBL, PISCES.

Citation: https://doi.org/10.5194/egusphere-2025-2901-RC2
- AC2: 'Reply on RC2', Kyle Niemeyer, 15 Nov 2025
  
  Thank you for your positive feedback, and helping us improve our communication around the objectives and contributions. Please see our detailed responses in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2901-AC2

Malik J. Jordan, Emily F. Klee, Peter E. Hamlington, Nicole S. Lovenduski, and Kyle E. Niemeyer

Data sets

Data, plotting scripts, and figures for "Automatic reduction of ocean biogeochemical models: a case study with BFM (v5.3)" Malik J. Jordan, Emily F. Klee, Peter E. Hamlington, Nicole S. Lovenduski, and Kyle E. Niemeyer https://doi.org/10.5281/zenodo.16624062

Model code and software

pyPOM1D-reducedBFM Malik J. Jordan, Emily F. Klee, Peter E. Hamlington, Nicole S. Lovenduski, and Kyle E. Niemeyer https://doi.org/10.5281/zenodo.10914329

Malik J. Jordan, Emily F. Klee, Peter E. Hamlington, Nicole S. Lovenduski, and Kyle E. Niemeyer

Viewed

Total article views: 2,139 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,973	123	43	2,139	43	41

HTML: 1,973
PDF: 123
XML: 43
Total: 2,139
BibTeX: 43
EndNote: 41

Views and downloads (calculated since 05 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	465	20	10	495
Sep 2025	1,263	10	6	1,279
Oct 2025	90	11	3	104
Nov 2025	43	19	9	71
Dec 2025	31	32	6	69
Jan 2026	64	23	8	95
Feb 2026	17	8	1	26

Cumulative views and downloads (calculated since 05 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	465	20	10	495
Sep 2025	1,263	10	6	1,279
Oct 2025	90	11	3	104
Nov 2025	43	19	9	71
Dec 2025	31	32	6	69
Jan 2026	64	23	8	95
Feb 2026	17	8	1	26

Viewed (geographical distribution)

Total article views: 2,130 (including HTML, PDF, and XML) Thereof 2,130 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 28 Feb 2026

Short summary

We developed a method to simplify complex ocean biogeochemical models so they can run faster in computer simulations without losing important details. By adapting techniques from combustion science, we created smaller versions of a large ocean model that still accurately represent key changes in ocean biology and chemistry. This work helps make detailed ocean simulations more efficient, supporting better understanding of ocean health and climate.


Total:	0
HTML:	0
PDF:	0
XML:	0