the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Optimisation of the World Ocean Model of Biogeochemistry and Trophic-dynamics (WOMBAT) using surrogate machine learning methods
Abstract. The introduction of new processes in biogeochemical models brings new model parameters that must be set. Optimisation of the model parameters is crucial to ensure model performance based on process representation, rather than poor parameter values. However, for most biogeochemical models, standard optimisation techniques are not viable due to computational cost. Typically, (tens of) thousands of simulations are required to accurately estimate optimal parameter values of complex non-linear models. To overcome this persistent challenge, we apply surrogate machine learning methods to optimise the model parameters of a new version of the World Ocean Model of Biogeochemistry and Trophic dynamics (WOMBAT), which we call WOMBAT-lite. WOMBAT-lite has undergone numerous updates described herein with many new model parameters to prescribe. A computationally inexpensive surrogate machine learning model based on Gaussian Process Regression was trained on a set of 512 simulations with WOMBAT-lite. These simulations explored model fidelity to 8 observation-based target datasets by varying 26 uncertain parameters across their a priori ranges. The surrogate model, trained on these 512 simulations, facilitated a global sensitivity analysis to identify the most important parameters and facilitated Bayesian parameter optimisation. Our approach returned optimal posterior distributions of 13 important parameters that, when input to WOMBAT-lite, ensured excellent fidelity to the target datasets. This process improved the representation of chlorophyll-a concentrations, air-sea carbon dioxide fluxes and patterns of phytoplankton nutrient limitation. We present an optimal parameter set for use by the modelling community. Overall, we show that surrogate-based calibration can deliver optimal parameter values for the biogeochemical components of earth system models and can improve the simulation of key processes in the global carbon cycle.
- Preprint
(12248 KB) - Metadata XML
-
Supplement
(2330 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-4026', Anonymous Referee #1, 17 Mar 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-4026/egusphere-2024-4026-RC1-supplement.pdf
-
RC2: 'Comment on egusphere-2024-4026', Anonymous Referee #2, 25 Mar 2025
The authors present the use of a machine learning-based approach to emulate the output, or more specifically, the model-data misfit as quantified by a cost function of a NPZD-style global ocean biogeochemical model to estimate model parameters. The approach is interesting, appears to work well in a relatively high-dimensional parameter space; however, some of the implementation details are not described well, and a key model shortcoming is not emphasized enough.
general comments
One aspect of the study that may and is not emphasized enough in the current version of the manuscript is the use of a single phytoplankton and a single zooplankton tracer/variable in a global ocean model. Phytoplankton growth rates and other key parameters vary greatly between different phytoplankton groups, for example those that dominate coastal ecosystems in comparison to phytoplankton more commonly found in the open ocean. Thus, in some way, this study aims to do the impossible: fit the parameters of a one phytoplankton variable to a global dataset which is based on a complex spatially and temporally varying phytoplankton population. Here, I am not arguing that such as study should not be conducted -- the use of a machine learning surrogate model is very interesting -- but many of the shortcomings of the model identified in the study may be due to the very simple plankton representation in the model. For example, the manuscript describes underestimates of net primary production and issues in bloom phenology, and provides possible reasons for these model shortcomings, but the use of a single phytoplankton variable is not among them. Yet, a single phytoplankton model is unlikely to provide good primary production and bloom phenology estimates globally.
One more example: the authors highlight the "difficulty in reproducing the seasonality of air-sea CO2 exchange in the Southern Ocean" (l. 654) as a chief issue of the model. There are many studies that emphasize the unique phytoplankton composition in the Southern Ocean and the role of Southern Ocean diatoms in modulating carbon export, for example "Influence of diatom diversity on the ocean biological carbon pump" (Tréguer et al., 2018, DOI: 10.1038/s41561-017-0028-x). Thus, it does not seem surprising that a single phytoplankton model, optimized on a global dataset, does not perform particularly well in the Southern Ocean.
A related aspect to the comments above is that model performance can easily be biased based on the observation locations and the amount of data from each location. That is, if more open-ocean than nearshore observations are included in the RMSE-based cost function used for parameter estimation, then the phytoplankton variable will be parameterized more like a general open-ocean species (likely adapted to low-nutrient conditions) whereas the parameter estimates may be quite different if mostly nearshore observations are used. In summary, I suggest emphasizing the drawbacks of simple biogeochemical models more. Yes, parameter estimation becomes easier with fewer plankton variables, but even optimized parameters cannot capture the complex plankton ecosystem on a global scale for simple NPZD-style models.
Building suitable and balanced cost functions for parameter estimation experiments is not easy and the authors include 8 data products in their cost function (Eq. 2), making it quite complex and interesting. Here modelers and readers like me might be interested in a bit more detail that can be obtained without additional model experiments: One question if interest is how "orthogonal" the different cost function parts are, i.e. which cost function parts/data products provide additional constraints on the parameters. Calculating the linear correlation of the cost functions parts for the 512 simulations or visualizing them using a "scatter plot matrix" could be of interest and would show which cost function parts can be optimized together and which move the parameter estimates in different directions. Relatedly, it would be good to know if the optimal parameter values that were obtained perform well (have low values) for all cost function parts or if they perhaps do not fit certain datasets very well at all.
Although not made explicit, the simulations in this study appear to rely on a 9-year spin up (output from year 10 was used for model data-comparison). The authors state that "Continuing to run the model forward for 100 years post initialization showed some degradation in the performance" (l. 469). Parameter estimation relies on not having too much drift in the model and for the parameter values to have taken effect (there is no longer drift introduced by the change in parameter values). How did the authors ensure that the 9-year spin up was sufficient?
One of the metrics used in the cost function is primary limiting nutrient data, but there is not much information about its use and how model-data misfit is quantified. The model seems to include carbon and iron, all other elemental quotas/ratios are considered to be fixed. Fig. 3g shows nitrogen, iron and phosphorus as limiting nutrients in the data, how are these compared to model output? How is the RMSE computed? Some more information is needed.
Several aspects of study design and implementation details are not described well and can lead to confusion. For example, the generation of 512 parameter samples is mentioned in several places independently: "We undertook 512 simulations that each sampled randomly from predefined ranges of 24 key parameters related to the ecosystem component of the model" (l. 189). Then, later: "Initially, a Quasi-Monte Carlo Sobol sequence is applied to generate 512 parameter samples using the Uncertainty Quantification Python Laboratory package" (l. 263). Here, it is not clear if the 512 parameter samples are the same ones mentioned before; these methods sections need to be connected better. Furthermore, the text mentions that 512 parameter samples are drawn initially and then a sample size of 512 was selected based on the sample size sensitivity experiments (l. 267). Was it sheer luck that the initial number of experiments was also the right number of experiments confirmed in later tests?
specific commentsL 12: "Optimisation of the model parameters is crucial to ensure model performance based on process representation, rather than poor parameter values.": This sentence is not very clear, please rephrase.
L 19: Isn't the size of the training dataset (512) somewhat in conflict with the previous claim that "(tens of) thousands of simulations are required to accurately estimate optimal parameter values" (l 14). Perhaps it would be useful to add some qualifiers that this large number of simulations would be required by naive approaches like grid search and for a certain number of estimated parameters.
L 22: What are "optimal posterior distributions"? From a Bayesian technique, one would expect to obtain (samples from) the posterior distribution, or perhaps maximum a posteriori (MAP) point estimates. But there is no "optimal" posterior distribution.
L 66: "However, even if our understanding and observational network were complete, there exist many tuneable and potentially inter-dependent parameters that control many target outcomes...": True, but with complete understanding, we would know the relevant rates that these parameters aim to represent. We would not even need models if our understanding and observational network were complete. I would suggest rephrasing the issue and avoiding the fictional scenario of "complete" knowledge.
L 74: Does the "a priori ranges" imply that the prior distributions are uniform? As a first-time reader familiar with Bayesian techniques, I would have expected the term "prior estimates" here or a brief explanation of the "a priori ranges".
L 118: "We have developed a new ocean biogeochemical model called WOMBAT-lite": This statement is a bit confusing to the reader, as a previous sentence appears to suggest that WOMBAT-lite has been applied in previous studies: "Although WOMBAT-lite has few tracers and is computationally efficient, making it viable for high-resolution configurations (Kiss et al., 2020; Matear et al., 2015; Menviel and Spence, 2024; Oke et al., 2013)" (l 90). I would suggest clarifying.
L 118: Fig. 2 is referenced before Fig. 1.
L 119: It would be useful to many readers to add a brief description of Sobol sensitivity analyses or at least a reference.
L 145: How is cell size included in the model using just one phytoplankton tracer?
L 174: The connection between CbPM, VGPM and the data used for model evaluation needs a better explanation.
L 189: Where samples obtained "randomly" or via a more sophisticated sampling strategy like Latin hypercube sampling? How were the parameter ranges selected?
L 288: How does the max-min scaling work? Are max and min values taken from the 512 samples, are outliers considered in the normalization (for example those in the depth of the DCM, Fig. S2 d)?
Eq. 2: I would recommend placing the superscript into parentheses "(n)" to avoid confusing it for an exponent.
L 301: The "p(z)" here is not very helpful without providing additional context.
Fig. 4: It is unclear how the primary limiting nutrient is turned into a number and how the misfit is quantified.
Fig. 5: What kind, if any, normalization was applied to the RMSE values here? Not using any would make the values dependent on the scale and units of the properties/target field shown. Using max-min (previously denoted as NRMSE) would make it dependent on the variability that can be introduced by any changes in the parameter values -- which might vary greatly between properties.
L 444: The units here would raise fewer eyebrows by changing the "m^6 (mmol C)^-2" to "(mmol C m^-3)^-2".
L 450: "The fact that our optimisation always chose the lowest values (near 0.01 day-1) suggests that the proportion of the community that is stressed is considerably lower than we assumed.": I would suggest being careful in generalizing from the "optimal" value of a single global phytoplankton tracer to properties of the full plankton community.
Citation: https://doi.org/10.5194/egusphere-2024-4026-RC2
Model code and software
WOMBAT-lite code Pearse J. Buchanan https://github.com/pearseb/WOMBAT_dev
Interactive computing environment
Analysis of sections 3.1 and 3.4 Pearse J. Buchanan https://github.com/pearseb/WOMBAT-lite_optimisation_analysis
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
196 | 49 | 6 | 251 | 15 | 6 | 5 |
- HTML: 196
- PDF: 49
- XML: 6
- Total: 251
- Supplement: 15
- BibTeX: 6
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 85 | 30 |
Australia | 2 | 72 | 25 |
China | 3 | 16 | 5 |
United Kingdom | 4 | 14 | 5 |
Germany | 5 | 11 | 3 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 85