the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Uncertain current and future ocean deoxygenation due to internal climate variability and observational gaps
Abstract. Observed declines in oceanic oxygen (O2) over recent decades are subject to substantial uncertainty due to internal climate variability (ICV) and limited observational coverage. Here, we quantify how observational uncertainty affects the assessment of both historical and future ocean deoxygenation by combining multiple observational datasets with a large ensemble simulation of the Max Planck Institute Earth System Model (MPI-ESM). We find that observational biases in ICV can amplify global and regional O2 variability by 150 %–500 % in annual time series over the past 50 years. The combined effect of ICV and sampling bias can also introduce deviations of 5 %–25 % in estimated multi-decadal O2 trends. Moreover, time-dependent changes in observational coverage complicate the interpretation of historical O2 trends. Our results underscore the crucial need for a sustained, globally uniform ocean observing system to monitor long-term deoxygenation, assess its impact on marine ecosystems, and detect the anthropogenic signal in O2 trends. We further show that near-future trend detection will remain sensitive to ICV, and observational gaps may distort the detection of scenario-based projections of O2 trends, especially in the context of climate mitigation efforts.
- Preprint
(3582 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-3757', Anonymous Referee #1, 24 Nov 2025
-
RC2: 'Comment on egusphere-2025-3757', Anonymous Referee #2, 10 Mar 2026
Overview:
This paper seeks to evaluate the extent to which sampling bias and internal climate variability (ICV) contribute to uncertainty in our understanding of ocean deoxygenation and oxygen variability over the past few decades. The authors approach this issue by sub-sampling a set of Earth system model ensemble members according to historical sampling patterns and comparing metrics from the sub-sampled fields to the full fields as well as to observation-derived data products. They use oxygen concentration at 300 meters depth to illustrate their results.
The authors show that sparse data coverage can lead to the conclusion that ICV is much greater than it is reality. They also find differences in trends computed from sub-sampled versus full-field results with a tendency for overestimation of the trend in data sparse regions. The authors indicate that saturation changes alone are not the major driver of deoxygenation over recent decades. They also point out that ICV contributes to trends in addition to the forced response, especially over short timescales.
Finally, the authors test the impacts of three different scenarios for distributions biogeochemical (BGC) Argo floats over the next 20 years. They find that near-term deoxygenation might be overestimated without expansion of the BGC float array. The scenarios of planned expansion and expansion followed by a sharp cut tend to more realistically capture near-term global oxygen trends.
General comments:
I generally agree with the outcomes and conclusions of this work, but I am somewhat skeptical of the methodology and approach. I have concerns about the appropriateness of the dataset selected to represent observational coverage, the choice of a fixed depth level to illustrate the results, and the future-looking analysis.
I believe there should be more detailed discussion regarding the Ito et al. (2017) dataset, what their gaps in data coverage actually represent, and how this translates to the main conclusions of the paper. Ito17 objectively mapped their oxygen anomaly data using a Gaussian weight function with relatively long zonal and meridional length scales. So the “coverage” of the Ito17 dataset is upscaled from the actual observational coverage, and subsampling full fields of model ensemble members based on the Ito17 coverage therefore presents a rather optimistic view of observational coverage that assumes the simple length scales employed by Ito17 are adequate. At the same time, the efforts made by the referenced fully gap-filled datasets (Ito, 2022; Roach and Bindoff, 2023; Ito et al., 2024) to account for those observational gaps should not be ignored either. The results presented in Figures 1–4 and discussed in Section 3 represent a kind of intermediate state, whereby model ensemble members are subsampled according to unrealistically extended observational coverage, but no attempt is made to account for the gaps between those areas of coverage. I’m not sure how well this intermediate state represents studies that evaluate ocean deoxygenation trends and ICV. For this reason, I’m not fully convinced of the value of the results, although I generally agree with the points of discussion.
I understand the choice to focus the analysis to ~300 meters, as this is a depth with strong oxygen variability and relevance to marine ecosystems. However, it is also a depth with steep vertical oxygen gradients that depend strongly on ocean physics and the exact positions of water masses. This makes it challenging to compare between observational datasets and model output in depth space, and makes trends very sensitive to ICV and to changes in observational coverage. Even if the authors would like to keep the focus on this 300 meter depth level, some supplemental analysis in potential density space an/or on other levels would be helpful.
The BGC-Argo analysis reads like it was tacked on late and is not a coherent piece of the paper. For one, the methodological information in Section 4.1 should be introduced earlier in the methods section. There is also a bit of a disconnect between the first part of the paper, where the sampling is based on the Ito17 data from shipboard observations alone, and the second part, where the sampling is based on simulated floats alone. More discussion of how these two observational networks might interact, and better yet inclusion of that interaction in the analysis, would be beneficial.
Line-by-line comments:
152: Vertical oxygen gradients are strong in the thermocline, so the 10-meter gap between observational and model comparisons is not trivial.
225: Was a consistent mask applied to compare the fully gap-filled observational products and model ensemble members? If not, discrepancies may come from different areas of the global ocean being included in the global means.
454: Given the average lifetime of a BGC Argo float, the “cut” scenario is not realistic. Even if deployments stopped immediately, the reduction would be a more gradual one.
Citation: https://doi.org/10.5194/egusphere-2025-3757-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 394 | 278 | 29 | 701 | 25 | 43 |
- HTML: 394
- PDF: 278
- XML: 29
- Total: 701
- BibTeX: 25
- EndNote: 43
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The authors have presented what is largely a methodological study focused on challenges in trend detection for ocean interior oxygen. Their analysis tools are clearly chosen, and their knowledge and judgment in choosing an ensemble and a range of observational resources is sound, as is their interpretation. So with that I believe that the manuscript is well-organized and well-presented, and overall I think it can meet the standard of Biogeosciences with major revisions. The reason for recommending major revisions is given below, but stems largely from what I think is an illl-advised choice of 300m as a horizon on which to assess observing system design.
Main Points:
There is what I see as a highly problematic decision at the core of the framing of the study in choosing a fixed depth horizon (300m) for evaluating trends in the ocean interior. Though it is technically fine to do this, it should be expected to be a horizon that is characterized by elevated noise due to climate variations and wave activity in the ocean interior. Previous studies have tended to choose two different approaches to deal with the associated signal-to-noise ratio for this horizon being very high: (a) Long et al. (2016) considered changes on isopycnal surfaces in the ocean interior, and (b) Bopp et al. (2013) considered layer integrals for the upper ocean. These choices were made precisely because the signal-to-noise ratio at 300m is unnecessarily high. In short, no one would choose a 300m horizon for detecting anthropogenic trends, so it’s perplexing that this fixed horizon was chosen for this study. Additionally the authors do not provide any evidence of any particular importance of this depth horizon for any species or ecosystem community structure.
The core idea is that even with a perfect and gapless observing system for oxygen, natural variability at a fixed depth of 300m will be so large that it would take hundreds of years to detect a trend.
I believe that for this paper to be of interest to and to find broader applications within the broader community, including the ocean observations community, this problem must be addressed. One way to do this would be to choose a vertical layer spanning the upper ocean (again the approach of Bopp, 2013), if the problem is that the MPI model doesn’t include monthly mean output. A much more challenging but robust approach might be to do a more comprehensive analysis for density layers and then to map back to depth horizons based on the mean depth of isopycnal layers.
More detailed points:
In the Abstract , the authors refer to “observational bias in ICV”, do they mean “undersampling of ICV”? This also applies to the first sentence of the second paragraph of the “Summary and Discussion” section.
It would help if the equations were numbered, but for the second equation (for AOU) the authors should be clear about which terms are saved as annual means and which are not.
For the 4th paragraph of the “Summary and Discussion” (lines 550-561) it would be good if the authors could describe what distinguishes this study from previous studies.