the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Complexity in Biogeochemical Models: Consequences for the Biological Carbon Pump
Abstract. Ocean biogeochemical models underpin projections of future marine ecosystem change, including anticipated shifts in the biological carbon pump (BCP) and broader biogeochemical cycles. However, their outputs remain highly sensitive to model complexity and parameterisation choices. Here, we evaluate five configurations of the Pelagic Interaction Scheme for Carbon and Ecosystem Studies (PISCES) to quantify intramodel variability in net primary productivity (NPP), carbon export (Cexp), and export efficiency (e-ratio) over the 21st century under the high emissions RCP8.5 scenario. The tested PISCES configurations differed from the standard model through distinct modifications to phytoplankton growth processes, but are forced by identical physical variables, representing an ensemble opportunity. All configurations resolve NPP and Cexp within the range of remote-sensing variability. The more complex Quota-based configurations produce 15–21 (10–18) Pg C yr-1 more NPP than the simpler Monod-quota models in the reference (future) period, but this increase, driven by elevated small phytoplankton biomass, does not enhance Cexp, yielding lower e-ratios (0.14–0.17) than in the Monod-quota configurations (~0.25). The introduction of a picophytoplankton functional type (PFT) emerges as one of the most influential parameterisation choices. It drives opposing future NPP responses between 30–60º N/S, an increase in the Monod-quota configurations versus a decline in the Quota-based ones, as well as contrasting latitudinal trends in Cexp within the same region. Other parameterisations, such as a low-iron scheme, an added diazotroph PFT, and explicit manganese cycling, exert more modest, regionally confined effects under high emissions scenarios, influencing NPP and Cexp primarily at biome scales rather than driving large-scale divergence in model behaviour.
- Preprint
(1823 KB) - Metadata XML
-
Supplement
(6082 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-6505', John Dunne, 18 Jan 2026
-
AC1: 'Reply on RC1', Jonathan Rogerson, 29 Jan 2026
Firstly, we thank the reviewer, John Dunne, for their insightful and positive comments on the manuscript. Aside from several minor corrections to wording and figure captions, two main points are raised. The first concerns an exploration of intramodel variability in the spatial patterns of nitrogen and phosphorus, as well as their projected future changes. The second, more limited point, relates to a discussion of the degree of variability resolved in our model ensemble relative to CMIP ensembles more generally. We address these comments and suggestions in detail below.
Response:
54 - Yes, would agree that giving a date or year is more suitable than simply the word ‘present’ in adding to the long-term readability of the work.
162 - For this comment I am not exactly sure how to respond. In our methodology, we state that all the PISCES configurations were forced with identical physical outputs from the IPSL _CM5A_LR climate model and beyond the historical run, we used the high emissions RCP8.5 scenario. So we are using the scenarios that were used within CMIP5.
192 - In lines 185 – 190, we give a brief description of the P5M configuration and state that: “…incorporating manganese (Mn), following Hawco et al. (2022), which included its role in limiting phytoplankton productivity in the Southern Ocean, where observations show Mn as either a primary or co-limiting micronutrient alongside Fe…”. So in brief, Mn and Zn addition impacts phytoplankton growth by imposing an additional nutrient limitation term, alongside the other micro- and macronutrients already represented.
Fig. (1) - Ah, thank you for pointing this out. In our supplementary we show the biome spatial map with all the regions labelled. I will amend the caption then to the following:
Figure 1: Model and remote-sensing (RMT) estimates of (a) NPP and (b) Cexp, integrated over each RECCAP2 biome (refer to Supp. S1). Black bars indicate ±1 standard deviation across the remote-sensing ensemble.
The biome map was placed in the supplementary material solely to limit the number of figures in the main manuscript.
Fig. (2) - Your suggestion of adding the remote sensing (RMT) means is something we considered and I, as well as my fellow co-authors, can definitely see the value in doing so. However, we opted for a per biome breakdown when comparing NPP and Cexp of PISCES and RMT to champion the fact that the modifications to some of the configurations were highly regionally specific: e.g. diazotrophs in the subtropical gyres in P6Z or Mn and Zn impacts in the Southern Ocean for P5M.
Pertaining to one of your major comments, we adopt your idea when looking at the N and P representations in the different configurations
As for the caption and the time ranges, we do state in the methodology: “For this study, we conducted our analysis of the BCP using averaged model outputs over two time windows, the ‘reference’ (1986-2005) and ‘future’ (2091-2100)”. We rigidly stick to this nomenclature throughout the manuscript and thus implicitly have the date ranges accounted for in the various tables and plots. Am sure personal preferences prevail, but hope this justification suffices.
Major comment #1: N and P representations
Please refer to the figure showing the latitudinal upper-ocean N and P inventories integrated over the upper 100 m for the present period, as well as the corresponding relative future changes for the five PISCES configurations. We also include data from the World Ocean Atlas 2023 (WOA23) for comparison. As all models were forced with identical physical forcing, intramodel variability in the simulated N and P inventories arises primarily from differences in biogeochemical formulations and, potentially, model tuning.
Addressing the latter first, we are unable to directly quantify the degree to which model tuning contributes to the spread in the reference period for N and P inventories across the different configurations. The inclusion of additional biogeochemical processes or parameters in the various PISCES configurations inevitably requires some degree of tuning, with the implicit assumption that individual modelling teams calibrate their configurations to best represent the present-day (reference) state. While I do not have explicit documentation of the tuning procedures employed for each configuration, correspondence with the respective lead authors indicates that the various PISCES configurations were tuned to reasonably reproduce present-day conditions.
Differences in biogeochemical parameterisations are likely the main cause for intramodel variability in the N and P inventories. In the figure, N and P exhibit consistent spatial patterns across all configurations for the reference period (panels a, b), as well as broadly similar future trends (panels c, d). For the reference period, N and P inventories are systematically higher in the Monod-quota configurations than in the Quota-based configurations; however, all configurations closely reproduce both the spatial patterns and magnitudes of the WOA23 data. Differences in the absolute magnitudes of N and P inventories, particularly between the standard and Quota-based versions, likely reflect the flexible stoichiometry inherent to the quota models and, potentially, the larger contribution of small phytoplankton biomass with higher nutrient affinities. Together, these factors result in slightly lower standing inventories of N and P in the quota-based configurations, owing to more efficient nutrient uptake.
Despite small differences in the magnitudes for the reference period and future N and P inventories, all configurations simulate similar future shifts, with N and P inventories decreasing by -11.78 ± 1.85% and -12.39 ± 1.28%, respectively. The small standard deviations indicate that variability in N and P is similarly resolved across configurations. Consequently, variability in Cexp is more likely driven by differences in phytoplankton assemblages and their interactions with zooplankton, as discussed in the manuscript.
The intention is to add the figure to our supplementary section and within the discussion section to mention that N/P variability are not likely a driving factor of Cexp variability in our study, referencing of course the figure.
I hope this explanation is satisfactory and addresses all your comments on the matter.
Major comment #2: Variability in CMIP models vs our ensemble
I fully agree with your comment. While the magnitude of variability for relative change in NPP and Cexp is similar for our ensemble vs CMIP5, it does not suggest that our ensemble captures the diversity in model complexity present within CMIP. As you mention, in CMIP, the physical models differ along with the manner in which N and P are represented, as well as other nutrients and functional types. What is interesting nonetheless is that through only modifications to the growth processes controlling phytoplankton, we create a similar spread to CMIP, underscoring if nothing else the sensitivity of these parameterisations and processes within biogeochemical models.
502-503 - On this, thank you for the paper recommendation and will nuance this phrase as well as add the appropriate reference. As I understand, from your comment reduced export is a consequence, not the driver.
-
AC1: 'Reply on RC1', Jonathan Rogerson, 29 Jan 2026
-
RC2: 'Comment on egusphere-2025-6505', Shengwei Liu, 20 Mar 2026
Overall
This study aims to evaluate the effect of more detailed phytoplankton representation on primary production and organic carbon export with five configurations of the PISCES model under current climate and a relatively “extreme” emission scenario. The results suggest that the addition of picophytoplankton PFT leads to large global intramodel differences in future trends and latitudinal patterns, while the influence of other factors are relatively confined. I find this manuscript well structured, readable, and overall informative. This is useful for modellers to know how to make the trade-off on complexity when trying to improve BGC models for next-gen ESMs. But I have identified a few issues that require attention/clarification before publication.
Major
The time-window choice should be justified. The two time windows for the model outputs are “reference” (1986-2005) and “future” (2091-2100), while the time window for the satellite observations is “reference” (1998-2005). Comparing the temporal average over different window lengths of 20, 10, and 8 years seems to be questionable, as the physical ocean circulation can have strong variability on decadal time scales (e.g. ENSO, AMOC, …).
As the authors have claimed for Figure. 1, “all configurations produce for the reference period NPP and Cexp magnitudes that fall within the broad range of remote-sensing estimates, making it difficult to assess whether added complexity unequivocally improves model realism.”
- It is unclear whether the model–observation comparison in Figure. 1 uses model output averaged over 1998–2005 or the broader 1986–2005 reference period. If the latter, then the time-window choice problem I mentioned above can be also relevant here.
- It can be inferred either from Table. 2 or Figure. S2&S3 or L 547-549 that there are large variabilities among the results of different NPP and Cexp algorithms. Is it possible to judge priori which of them can be “structurally” biased and thus ruled out? The equations listed in Table S1. seem to be empirical. Do they still apply under future climate?
- The impact of the Quota-based configurations in the reference-state may be undervalued. Table 2 shows that P5Z, P6Z, and P5M fall within the ±1σ range of the remote-sensing ensemble for global integrated NPP, whereas PST and PSF fall below it. However, Fig. S7 indicates that this does not translate into a clear improvement in spatial skill, as the Quota-based models—especially P5Z and P6Z—appear to have excessive spatial variance and only modest spatial correlation relative to the remote-sensing ensemble mean. The manuscript should state this trade-off explicitly, e.g.: increased complexity may improve global NPP magnitude without improving, and possibly worsening, spatial pattern fidelity. This is more precise than the current broad statement that increasing complexity does not improve skill. I also encourage the authors to briefly discuss the origin of this discrepancy.
- How would other key BGC metrics like alkalinity (involved in e.g. nitrogen cycling and soft-tissue pump) be impacted in your model configurations? Will they be more helpful for the model-observation comparison?
The claim that the spread among these PISCES variants may explain a substantial portion of CMIP spread is probably too strong. An intramodel, identical physics comparison can show that parameterization choices matter a lot, but it cannot demonstrate that these same choices explain “a substantial portion” of CMIP spread, because CMIP differences also include circulation, physics–biogeochemistry coupling, ecosystem structure, numerics, tuning procedure, and forcing-response differences. I would advise the authors to recast this as a hypothesis or suggestive analogy, rather than an inference.
Minor
The standard PISCES version (Aumont et al. (2015)) used here is also used in CNRM-ESM2-1 (CMIP6) and IPSL-CM6A-LR (CMIP6), but not in CMIP5. The RCP8.5 forcing is also used in CMIP5, but not in CMIP6 (where SSP is used). So it seems unjustified why the authors choose to compare with CMIP5 results in Table. 2.
Please be consistent with your figure/table callout style: some are written as “Fig. (1)”, while others are written as “Fig. 1”.
L 217&225: “Epply” should be “Eppley”.
L 575: “RSP8.5” should be “RCP8.5”.
Citation: https://doi.org/10.5194/egusphere-2025-6505-RC2 -
AC2: 'Reply on RC2', Jonathan Rogerson, 24 Mar 2026
We thank Shengwei Liu for their careful review and constructive comments on our manuscript. In addition to several minor grammatical and typographical corrections, the reviewer raised two main points. The first concerns our methodology, particularly Fig. (1), with emphasis on the choice of time-frames for the model outputs and satellite products used in the model vs remote-sensing comparison. The reviewer also encouraged us to further refine our interpretation of the statistical analysis presented in Supp. (S8). The second comment, consistent with that of the first reviewer, relates to clarifying and improving our description when contrasting the variability captured within our PISCES-only ensemble and that of the broader CMIP (CMIP5) ensemble. We address all these comments in detail below.
Response
Major 1: Model vs remote-sensing
We thank the reviewer for their important comments regarding the choice of time-windows, and the request for a deeper interrogation of the statistical analysis presented in Supp. (S8), as well as other points of clarification, which we address below.
The model outputs used in this study were obtained from the respective modelling teams as temporally averaged products over predefined periods (1986-2005 for the historical/reference period and 2091-2100 for the future scenario). As such, the temporal windows could not be harmonised further across datasets. Furthermore, the satellite-based reference period (1998-2005) is constrained by data availability and represents the longest consistent observational record available for comparison. To clarify a point raised by the reviewer, in Fig. (1), model outputs are those averaged over 1986-2005, not 1998-2005. This does then introduces the time-window problem raised by the reviewer.
As explained by the reviewer, the use of different averaging periods (20, 10, and 8 years) may introduce some sensitivity to decadal variability (e.g. associated with ENSO or AMOC). As typically done, we averaged over decadal to multi-decadel time-frames in order to reduce the influence of interannual variability and allow for a consistent comparison of large-scale patterns for NPP and Cexp. Our analysis focuses on large-scale spatial patterns and multi-model mean responses, which are less sensitive to the precise choice of averaging window. We have clarified this limitation in the revised manuscript and note that while decadal variability may contribute to uncertainty, it does not affect the main conclusions of the study.
We include the following in Section 2.2: “Although the averaging periods differ between model outputs and remote-sensing, the use of multi-year means reduces the influence of interannual variability and allows for a consistent comparison of large-scale patterns.”
_______________________
The reviewer comments on the variability present within the remote-sensing derived NPP and Cexp and hints at whether there is a way to justify the most suitable or relegate certain algorithms when doing the comparison against the model outputs. We used a subsample of commonly used algorithms to compute Cexp (see Supp. S4), and for a full review of algorithms, refer to Jönsson et al. (2023). Especially for Cexp, the variability observed reflects both parametric and structural differences in their formulations.
We confined our choice of Cexp algorithms to those that used empirical relationships with SST and NPP for ease. But yes, different Cexp algorithms exist that also include and leverage the empirical relationships with MLD, chlorophyll and/or euphotic depth. To ensure the model comparison was not unduly influenced by the choice of a single remote-sensing product, we used multiple NPP and Cexp fields that captured the range of variability present within Doney et al. (2024). This approach reduced sensitivity to individual datasets and provided a more balanced baseline for comparing variability across model configurations. Furthermore, given the empirical and semi-empirical nature of these algorithms, they are typically calibrated against observations within a limited range of present-day conditions. As such, their direct applicability under future climate scenarios remains uncertain, particularly where environmental conditions may fall outside the range of the original calibration datasets.
In short, we do not exclude any algorithms a priori. Instead, we consider the spread across algorithms as a representation of structural uncertainty, and interpret the results in terms of ensemble behaviour rather than reliance on any single product.
_______________________
For greater explanation and leveraging of the results present in Supp. (S8), I would agree with the reviewer. We did not want Supp. (S8) to be a major focus but we like the reviewers contribution.
Within the discussion, we therefore intend to add the following suggested by the reviewer:
“Increased complexity may improve global NPP magnitude without improving, and possibly worsening, spatial pattern fidelity, while for Cexp there is little difference in skill across configurations”
And indeed, the added statement is more precise than the current broad statement we have regarding increasing complexity and model skill. Thank you for this.
Answering why these improvements/worsening in NPP occur is beyond the scope of the paper. But it is important to understand that Supp. (S8) is the ensemble mean of all the NPP and Cexp remote-sensing products used in the study. Referring back to Fig. (1) showing the range in remote-sensing for the different regions, the magnitude is captured well but we cannot infer spatial patterns. To do so would require us to individually assess the remote-sensing products against the different model configurations. This in itself would be an interesting study, and granted, would be a very valuable question to answer.
Major 2: CMIP variability
The comment about refining our interpretation and rhetoric used when contrasting our ensemble results with that of CMIP was also echoed by the first reviewer. Having had the opportunity to now assimilate both reviewers feedback, we amend the section present in the discussion as follows:
“These findings align with previous modelling studies (Bindoff et al., 2019), and fall within the variability of the CMIP5 ensemble (Bopp et al., 2013; Fu et al., 2016). The comparable magnitude of variability in NPP and Cexp (Tab. 2) indicates that differences in parameterisations among the selected PISCES configurations can generate a spread in results of similar order to that found across CMIP5 models (Séférian et al., 2020). While this does not imply that the present ensemble captures the full diversity of CMIP model structural differences, it nevertheless highlights the sensitivity of biogeochemical outputs to relatively subtle differences in the representation of phytoplankton growth processes and ecosystem complexity which do contribute to intermodel variability.”
The edits remove strong language claims and reframe our results on this topic as suggestive rather than demonstrative. We also slightly amend the wording in the conclusion to reflect this new nuance.
Minor corrections:
Tab 2 - We chose to compare our results with CMIP5 because the experimental framework used in this study is consistent with CMIP5 protocols, in particular the use of IPSL-CM5A-LR forcing and the RCP8.5 scenario. While the PISCES v2 biogeochemical model is also used within CMIP6 models, our simulations are not directly based on a CMIP6 experimental configuration (e.g. SSP-based forcing and associated coupled model setups). For this reason, CMIP5 provides the most consistent and directly comparable ensemble for contextualising our results.
We note that CMIP6 results and developments are discussed in the Introduction to provide broader context on recent model developments and findings related to NPP and Cexp.
Use of inconsistent figure and table referencing style in-text: For this, no example is given but to briefly explain. If a figure or table is used in-text within a sentence it will be “Fig. (x)” whereas if it is just referenced it becomes “(Fig. x)”. I hope this explanation addresses any confusion.
L 217 & 225: “Epply” should be “Eppley”. --- NOTED and DONE
L 575: “RSP8.5” should be “RCP8.5”. -- NOTED and DONE
Other Comments:
“How would other key BGC metrics like alkalinity (involved in e.g. nitrogen cycling and soft-tissue pump) be impacted in your model configurations? Will they be more helpful for the model-observation comparison?”
Alkalinity was not explored in the work, but acknowledge that it could have been an additional facet to the study.
Citation: https://doi.org/10.5194/egusphere-2025-6505-AC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 370 | 188 | 25 | 583 | 55 | 15 | 19 |
- HTML: 370
- PDF: 188
- XML: 25
- Total: 583
- Supplement: 55
- BibTeX: 15
- EndNote: 19
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The manuscript “Complexity in Biogeochemical Models: Consequences for the Biological Carbon Pump” by Rogerson et al compares 5 biogeochemical formulations of the PISCES model in the same physical model to explore the implications of varying complexity of phytoplankton physiological assumptions on model representation of primary production and carbon export out of the euphotic zone. The experimental design is well designed and executed and the manuscript is very well written. My only criticism is that the manuscript does not provide an analysis/comparison of the latitudinal structure of surface/mixed layer/euphotic zone nitrate and phosphate in the present day with observations and the change under the future projections. These patterns and their relative change seem key to explaining the differences between model representation of Cexp patterns and their change under climate warming and the degree to which the structural uncertainty described here captures the range in CMIP models in general.
Specific comments:
54 - “the present” is constantly changing. Better to use “2023”
162 - it would be helpful to add which configurations were used for CMIP5 and CMIP6 given their prominence in previous climate change intercomparisons as point of reference.
192 - It would be helpful to add a sentence on the impact of adding Mn and Zn modulation of the ecosystem.
Figure 1: The acronym definitions for each region should be provided.
Figure 2: it would be helpful to add the Model and Remote Sensing means and range estimates here. Also, the caption should provide the time ranges used for both the model averages and delta values.
317 - before moving on to the sensitivity to climate change, it is important to understand the differences in model representation of surface nitrate and phosphate as these are the most robust observational constraints and often the levers through which changes are manifest, particularly the latitudinal structure of the Southern Ocean and Equatorial Pacific high nutrient regions and the amount of residual phosphate in the subtropical north Pacific after nitrate exhaustion as a constraint on nitrogen fixation, both of which have been shown in previous CMIP comparisons to vary between models, particularly under changes in relative iron limitation. Alternatively, insofar as the nitrate and phosphate concentrations in all the models is similar (as perhaps suggested in the similarity of structure in Cexp figure 2), then that vastly simplifies the interpretation. It would also be important to add the degree to which all these configurations were tuned to represent the same average surface nitrate and phosphate concentrations. Moving to the sensitivity to climate change question, the delta nitrate and phosphate in the projections is also an important point of analysis and constraint.
399 - adding the suggested surface nitrate and phosphate analysis would improve the robustness of these conclusions dramatically.
443 - it would be interesting to know how the average euphotic zone or mixed layer ammonia values vary between the models and compare to observations as another constraint on the relative robustness of the different models.
498 - The CMIP models also have radically different representations of Cexp latitudinal structure through surface nitrate and phosphate and underlying physical model differences. I would be surprised if this present ensemble captured much of that variability in Seferain et al 2020, but the degree to which it does capture the CMIP structural uncertainty should be explicit as contextualization of the discussion here.
502-503 - The logic in the assertion that “the overall decline in carbon export reduces the ocean’s capacity to sequester CO2 from the atmosphere.” seems reversed… The enhanced stratification increases the ocean’s sequestration of remineralized carbon and the associated nutrients, which then leads to reduced supply of nutrients to the surface and reduced Cexp as a consequence of the enhanced efficiency of the biological carbon pump. See https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2023GB007859 for a detailed discussion.