the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Decadal biogeochemical predictions for the bottom marine environment of the Northeast U.S. Continental Shelf
Abstract. The Gulf of Maine and the surrounding Northeast U.S. Continental Shelf are experiencing rapid marine environmental change arising from complex regional dynamics that challenge near-term (1–10 years) predictive capabilities for valuable living marine resources. Here, using a high-resolution regional ocean model, we demonstrate skilful decadal forecasts of ocean bottom habitat characteristics including bottom temperature, dissolved oxygen (O2), pH and aragonite saturation state (Ωar). Bottom temperature and pH predictions show substantial skill driven primarily by radiatively forced warming and carbon uptake trends, while bottom O2 and Ωar predictions benefit more from initialization due to stronger internal variability. Retrospective forecasts successfully predicted observed historical changes in water masses and environmental properties, including recent cooling/freshening transitions driven by replacement of Warm Slope Water with Labrador Slope Water. This water mass variability also modulates biogeochemical conditions and ocean acidification buffering capacity, with our recent forecasts indicating that benefits from the expected respite from rapid warming might be tempered by challenges posed by rapid acidification. The demonstrated predictability of coupled physical-biogeochemical processes supports developing integrated prediction systems for climate-informed marine resource management.
Competing interests: At least one of the (co-)authors serves as editor for the special issue to which this paper belongs.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(1708 KB) - Metadata XML
-
Supplement
(1399 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2026-481', Anonymous Referee #1, 20 Mar 2026
-
RC2: 'Comment on egusphere-2026-481', Anonymous Referee #2, 04 May 2026
Overview
This paper presents and evaluates a prediction system for physical and biogeochemical variables in the Gulf of Maine, largely motivated by applications to fishery management. The authors carefully evaluate a sophisticated set of models and simulations to show that there is some predictive skill in the modeling framework, which is an important new accomplishment.
I struggled to review this paper because it was difficult to understand for two main reasons: (1) the modeling system is complex and not described in sufficient detail for all but the most sophisticated readers to understand and (2) many of the figures are difficult to read.
I believe the paper is an important contribution to ocean forecasting. The paper makes a clear distinction between forced and natural variability, which is valuable. I am very pleased to see inclusion of important biogeochemical variables (oxygen, pH, and aragonite saturation state) in the forecasting system. The explanations of the causes of variability are good but fall short in several cases for the biogeochemical variables.
Decadal predictions are not new with this system, as they were done in Ross et al. (2024) and Koul et al. (2024), but I think what is new in this paper is the inclusion of biogeochemical predictions. That point could be brought out a little more clearly; I was having a hard time figuring out exactly what was new in this study.
I think the problems I bring up are fairly straightforward to address and I have tried to be clear and constructive with my criticisms and suggestions.
One last point is that the title is a little misleading by including “Northeast U.S. Continental Shelf.” The paper is almost exclusively about the Gulf of Maine, so if a particular region is going to be mentioned in the title then it should be the Gulf of Maine. If the authors want the paper to reach a more general audience, they could end the title with “ … bottom marine environment of a coastal ecosystem.”
Methods
Most of my time was spent trying to figure out what the authors actually did. The authors present a fairly sophisticated modeling system, with two models (a global model and a regional model), each used in very different ways. Ensembles add another layer of complexity. If one is an expert in this sort of system and has read all of the papers by this research group, then reviewing this paper might be straightforward. Because I was not familiar with the prediction system, reviewing the paper took me about 15 hours. The authors talk about the real-world applications of this research, so it seems that they would like to target a more general audience. If that is the case, then I would suggest that the authors provide more clarity and motivation in their description of the methods. I do not know if it was the authors’ choice to place the methods at the end of the paper or if this is required by the journal, but doing so makes the paper more difficult to read. There should be clear links between motivation, methods, and results; placing the methods at the end breaks such linkages.
I struggled to figure out all of the simulations that were done in this study. The simulations go by many names, including historical, retrospective, forecasts, hindcasts, nudging, downscaled, and predictions. A difficulty is that words that seem to have similar meanings (like retrospective and historical) are used to describe very different simulations. I think it would be helpful if the authors used one descriptor for each type of simulation and stuck to that descriptor throughout the text.
I will summarize what I think the authors did. There are three types of simulations using the high-resolution (regional) model: (1) retrospective simulations, (2) retrospective forecasts (hindcasts), and (3) historical simulations. Detail on each follows.
(1) Retrospective simulations. These are simulations from 1960 to 2024, which use nudging at a time scale of 90 days towards the full vertical profiles of T and S in the EN4 1-degree gridded data set wherever the ocean bottom is deeper than 1000 m. (As an aside, it would be helpful to get an idea of where the 1000 m isobath is. I suggest that the authors place a map larger than Figure 1a in the paper and clearly indicate where the 1000 m isobath is, as well as the various regions described in the paper: GOM, MAB, SS, and GB.) These simulations are forced at the surface and open boundaries by the global atmosphere–ocean–sea-ice SPEAR model, which itself is nudged towards observed SST and reanalysis winds. This regional model simulation is called NWA12_EN4_NUDG in the authors’ nomenclature. It is confusing to me that these retrospective simulations are presented in terms of their ensemble mean. I think there are 10 members, so that would be a total of 65 years x 10 members = 650 years of simulation. I do not understand how the ensembles were created for retrospective simulations. If these simulations are forced by SPEAR, which has also been nudged towards observations, then how is a true ensemble being created? Also, it would seem difficult to generate a true ensemble if T and S are being restored towards observations. SPEAR must be free-running at some point in order to create the ensemble, but I cannot tell when the authors let go of the restoring of SPEAR towards observed SST and winds. The fact that the authors do not discuss the ensemble spread in Figures 1 and 2 adds to the confusion. I do not understand the point of creating an ensemble if only the ensemble mean is presented. Finally, I was surprised at the river input being climatological (lines 443–444). Is it well established that interannual variability in these shelf waters is independent of river input?
(2) Retrospective forecasts (hindcasts). These are a series of 10-year simulations that vary depending on the start year and the ensemble member. Every year starting in 1960 has a set of 10 ensemble members. I am uncertain as to what the last start year is. Is it 2024? Or is 2024 the last end year, in which case the last start year would be 2015? Looking at Koul et al. (2024), I think the last start year might be 2024. Either way, this set of simulations represents a lot of simulation time: 10 members x 10 years per simulation x 55 (or 65) start years = 5500 (or 6500 years). The authors call these simulations NWA12_HIND. The 10 ensemble members seem to be tied to the same 10 ensemble members in the retrospective simulations above, both for the regional and global models, but because I do not understand the ensemble design for the retrospective simulations, it is difficult for me to understand the ensemble design for the hindcasts. One last puzzle is related to the relaxation time scale north of 45 deg N, which “decreases sinusoidally to 0 at 45 deg N.” Since sinusoids are not monotonic, I cannot see how one can have a sinusoidal decrease.
(3) Historical simulations. The authors also call these “uninitialized,” which I do not understand because every simulation has to be initialized. These simulations are forced by SPEAR, but apparently not the same configuration of SPEAR used in the first two sets of simulations (NWA12_NUDG and NWA12_HIND). This configuration of SPEAR is called SPEAR_HIST. In this case SPEAR does not seemed to be nudged towards observations, though it is difficult for me to tell. Very little information is provided about SPEAR_HIST. The description is given in the Methods from lines 457 to 468 and seems incomplete to me. All that is stated is that “greenhouse gas concentrations and other radiative forcings” are included. No references are provided. 10 ensemble members of SPEAR are used to force the regional model, I think from 1850 to 2100. So that would be 250 years x 10 ensemble members = 2500 years of simulation. Is that correct? But only output from the time period 1965 to 2024 is used, so a total of 600 years? The corresponding regional model simulations are called NWA12_HIST.
Figures
Reviewing the paper was made challenging because many of the figures are difficult to read. The main problem is that the font sizes are too small. Though it is possible to use a computer to zoom in to the figures and see everything, one should be able to print out the paper (which is what I did) and have it be readable. Take Figure 1 as an example. The font sizes on all of the tick mark labels and legends are very small. I looked to see if EGU has any guidelines on font size in figures but could find none. I saw that AGU has a lower limit of 8 point, with 6 point for subscripts:
https://onlinelibrary.wiley.com/page/journal/21699402/homepage/graphics.htm
I doubt Figure 1 satisfies these reasonable criteria. All other figures are also difficult to read, with the exception of Figures 3 and S3–S6, which are very legible.
Also the map in Figure 1a is extremely small, maybe just slightly bigger than a postage stamp. It is also difficult to clearly see on panels 1b, 1d, and 1e when various events described in the text occur because these panels do not have horizontal axes just below them. A solution to this would be to include horizontal axes or, maybe better, to have thin vertical dashed lines on the years 1990, 2000, 2010, and 2020, which would have the advantage of more easily relating changes in one variable to another.
There is too much information on Figure 4. Here are some suggestions to make the figure more understandable (in addition to the font size recommendation made earlier): (1) In the caption, note that the solid black lines (NWA12_EN4_NUDG) are the same as the lines in Figure 1b, d, e, and f; (2) do not show individual ensemble members but rather some measure of the ensemble spread (e.g., maximum and minimum values or interquartile range); (3) remove overlapping hindcasts (for example, remove hindcasts starting in 2004 and 2022).
Results
It is encouraging that the retrospective simulations capture bottom temperature and salinity variability (lines 139–174, Figure 1b and c), but it is not clear how much of this signal comes from the nudging the full water column to T and S observations where the bottom depth is greater than 1000 m. Some discussion is needed.
For bottom oxygen in the retrospective simulations, the authors point out that the variations result from different water masses (advection, effectively), solubility, and biological oxygen demand (lines 179–182). Instead of this very general statement, I recommend that the authors attempt to distinguish causes of these variations by computing the saturation concentration ([O2]sat, a function of potential temperature and salinity) and the difference [O2] – [O2]sat. So [O2] = [O2]sat + [O2]other, where [O2]other represents any processes that drive [O2] away from saturation, which one expects to be dominated by respiration for bottom waters because surface waters are generally close to saturation. [O2]other is basically the negative of the apparent oxygen utilization (AOU = [O2]sat – [O2]).
Similarly, the authors should attempt to diagnose causes of variability in pH and CaCO3 saturation state (omega). The authors claim that the overall declining trend in these variables is consistent with the increase in atmospheric CO2 (lines 197–199) but they cannot say if that is the case quantitatively without further analysis. To understand pH and omega variations, it is best to first think about changes in the master variables DIC and alkalinity (Alk) because (unlike pH and omega), they will not change with temperature and salinity, will mix conservatively, and will also respond in straightforward, quantitatively consistent ways to processes such as gas exchange and respiration. pH and omega will decrease as DIC increases and as Alk decreases, which can be seen using equations of the full carbonate system or with some approximations.
For pH, it is best to think in terms of the hydrogen ion concentration, which, from the second dissociation step of CO2, is given by K2[HCO3]/[CO3], where HCO3 and CO3 are shorthand for the bicarbonate and carbonate ions. Then the approximations HCO3 = 2DIC – Alk and CO3 = Alk – DIC, so you can see how H+ increases (pH decreases) as DIC increases and Alk decreases. You can also see how H+ might vary with T and S due to variations in K2. So if the decline in pH is really due to anthropogenic CO2, then DIC should be going up roughly in line with expectations from the Revelle factor, and that DIC increase should be of a magnitude to cause the pH decline.
The difference between Alk and DIC can be particularly illuminating for understanding acidification metrics, such as pH and omega. I suggest the authors look at the following paper and use its ideas for understanding pH and omega variability: Xue, L., & Cai, W.-J. (2020). Total alkalinity minus dissolved inorganic carbon as a proxy for deciphering ocean acidification mechanisms. Marine Chemistry, 222, 103791.
A similar approach should be used for omega, which is equal to [CO3][Ca]/Ksp, where [Ca] is the calcium ion concentration, which is roughly proportional to salinity except in very low salinity (S < 5 psu) water, and Ksp is the solubility product, which depends on temperature (mainly) but also salinity and pressure. The authors are arguing that omega is decreasing due to an increase in anthropogenic DIC (impacting CO3), but omega could also be changing due to changes in Ca and Ksp, or changes in CO3 resulting from other processes influencing DIC and Alk (advection, respiration, CaCO3 dissolution, etc.).
Some effort is made in Figure 2 to quantify the impact of natural variability on omega, which is well done. However, the causes of the long-term declines in pH and omega remain unclear. I see that the historical simulations in Figure 4 are used to investigate causes of long-term change. This seems backwards to me. Before removing the long-term trend in Figure 2, the reader should first see the long-term trend itself.
The authors state in line 199–201 that omega varies more than pH but I do not see how one can make a statement like that when comparing two different variables. If there were two time series of the same variable, then one could simply compare their standard deviations. Such a comparison for pH and omega seems flawed.
The result that omega and temperature are correlated on interannual time scales is a nice finding (Figure 2). The explanation of the relationship (lines 203–208), however, needs to be fleshed out. The authors talk about the response of the carbonate chemistry to warming and the increased alkalinity associated with WSW. These effects should be separated. Further, WSW probably has high DIC as well, which would work in the opposite direction on omega. This is why I would emphasize discussion of DIC and Alk in order to explain omega. The authors go on to make connections to pH (lines 207–209), but there is no solid basis for the explanations.
Figure 3 is a helpful figure. What’s missing, however, is the salinity plot. Showing the zero correlation line would also be helpful. In general, salinity seems to be dropped as a variable to discuss after Figure 1. Maybe the authors have a good reason for this, but I could not find one in the paper.
Figure 4 is interesting but very difficult to read, as noted above. The authors might consider adding a salinity panel. I am curious to know if there is some long-term trend due to climate change; I would expect freshening, given enhance precipitation, runoff, and ice melt at high latitudes, the signal of which might be advected southward in the Labrador current.
Figure 5 is helpful to show the skill of short-term predictions.
Minor comments
Typo: Should be “skillful” in line 24
Line 112: Koul et al. (2024) is cited as predicting biogeochemical variability, but this paper does not include biogeochemistry.
Figures S1 and S2. I recommend that the titles of these panels read “Mid-Atlantic Bight,” “Gulf of Maine,” etc. so that readers can easily understand what is being shown.
Line 434: It seems that Figure S1 should also be cited here.
Citation: https://doi.org/10.5194/egusphere-2026-481-RC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 1,018 | 490 | 73 | 1,581 | 153 | 175 | 214 |
- HTML: 1,018
- PDF: 490
- XML: 73
- Total: 1,581
- Supplement: 153
- BibTeX: 175
- EndNote: 214
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review for Koul et al. – Decadal biogeochemical predictions for the bottom marine environment of the Northeast U.S. Continental Shelf
This manuscript presents a decadal scale forecast for the Gulf of Maine, describing the skill of hindcast models that can be used to parse mechanisms useful for predicting changes to temperature and other water quality conditions that are important for marine biota. The baseline model performs favorably against multiple sources of regridded observational data and is a step towards operationalizing longer-term forecasts that would be important for fishery and ecosystem management purposes.
Overall, the paper is well written and does a good job describing skill assessments and limitations of multiple approaches. It would also benefit the paper to further describe the dataset errorbars presented, specifically what they correspond to (spatio-temporal variability?) and how they were computed/propagated after regridding. After reading the paper, it is my understanding that substantial degradation in predictive skill is unavoidable after ~2 years of simulations into future conditions (Figure 4), inherently limiting the forecast window of this approach since it will necessarily be forced to guess at realistic boundary conditions that influence the projected results. However, this was a bit difficult to discern from my read of the paper, and ties together with some general confusion about differences between the NWA12_HIND and NWA12_HIST model simulations, particularly with respect to boundary conditions and how they are being used for forecast analyses. As it reads in the methods section, I am unsure if NWA12_HIST simulations are exactly equivalent to NWA12_HIND but are simply missing an initialization period? If so, I remain confused about how the authors are considering the impacts of unavailable future boundary conditions to influence their model results, particularly with some of the lags found in Figure 5 when attempting to simulate accurate probabilistic predictions of multiple variables. I believe that many of these points, some of which are addressed in the detailed comments below, are not insurmountable and could be more thoroughly addressed by the authors before publication.
Detailed Comments
Line 105: Suggest striking “however” to make sentence flow better.
Figure 1: Suggest modifying aspect ratio (widen) panel a to improve readability, a little difficult to see any detail for bathymetry like that present in the Northeast Channel. May also be beneficial to include arrows showing sources of Warm Slope, Labrador Slope, and Gulf Stream water masses. What do the errorbars correspond to, just standard deviations over the regridded datasets? I’m unsure what the implications may be for normalizing temperature to the variables in d-f, as it may imply that a stronger direct relationship between unit changes in O2, pH, and omega for changes in temperature when the mechanisms are more complex. I also am a bit confused by the choice to invert temperature on panel d, wouldn’t it be more beneficial to show that increases in oxygen anomalies are correlated with decreases in temperature and vice versa? As plotted now, it requires a much more careful reading of the plot to confirm this.
Line 243-249: Are you still referring to Figure 3 in this paragraph? Or Figure 1?
Line 254-259: This explanation seems plausible, but as you note the large increase in bottom oxygen concentrations in 2015 seem to also be driven by a large positive winter MLD anomaly (Fig. S5) and that does seem to match well with Fishbot data. Are there other plausible explanations for the rapid declines in prediction skill?
Line 292-293: Missing end quote, but also suggest rewording to “… ask about forecast reliability of anomalously warm or cool conditions.”
Line 303-323: Perhaps this is also addressed in the discussion, but it seems from looking at Figure 5 that the model is most limited by transition periods and its own internal variability that increases the persistence of ongoing trends. For all variables shown, there are numerous examples of high confidence projected by the model for the persistence of a trend, followed by an eventual transition 1-2 years later (or sometimes missed altogether as seen in the 2010s for pH and omega). I don’t think that this necessarily indicates a critical lack of model skill, but it does warrant some further elaboration and discussion here.
Line 377-379: I would agree that the model does largely predict past variability well for temperature and salinity, excluding a few outliers in the 2000s, and would be curious to know if the bottom temperature skill is comparable to the surface temperature skill referenced in Koul et al. (2024). It seems like a bit of a stretch to say that it captured past variations in pH and omega (Figure 1) given the relatively limited dataset.
Line 391: suggest replacing “however” with “therefore”
Line 393-398: This sentence is a bit of a weak conclusion to some impressive work in the paper. Suggest striking or finding other references that you can use to stress the importance of probabilistic forecasting on longer timescales.
Line 466: Please define or provide citation for SSP585.
Figure S5: Typo of Fishbot in caption and would be helpful to define what errorbars shown correspond to.
General figure comments: Some parts of the figures can be a little difficult to read (e.g., axis labels in Fig 1-2, 4, legend and skill metrics in Fig. 2), recommend doing a quick reformat to make text more visible and consistent among them