the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Estimating the AMOC from Argo Profiles with Machine Learning Trained on Ocean Simulations
Abstract. The Atlantic Meridional Overturning Circulation (AMOC) plays an important role in our climate system, continuous monitoring is important and could be enhanced by combing all available information. Moored measuring arrays like RAPID divide the AMOC in near-surface contributions, western-boundary currents, and the deep ocean in the interior of the basin. For the deep-ocean component, moorings measure density and focus on the calculation through geostrophy. These moored devices come with a high maintenance effort. Existing reconstruction studies show success with near-surface variables on monthly time scales, but do not focus on the interior transport. For interannual to decadal time scales, the geostrophic contribution becomes an important contribution.
Argo floats could provide required information about the geostrophic circulation as they continuously and cost-effective deliver hydrographic profiles. But they are spatially unstructured and only report instantaneous values. Here we show that the geostrophic part of the AMOC can be data-drivenly reconstructed by Argo profiles. To demonstrate this, we use a realistic and physically consistent high-resolution model VIKING20X. By simulating virtual Argo floats, we demonstrate that a learnable binning method to process the spatially variable Argo float distribution is able to reconstruct the geostrophic part of the VIKING20X AMOC by up to 80 % explained variance and a mean error of less than one Sverdrup for the geostrophic transport. Using methods of explainable AI we investigate the importance of our input components showing an increasing importance of the Argo profiles on seasonal and interannual timescales, validating the usefulness of the Argo floats for the reconstruction. Our results demonstrate how an AMOC reconstruction from unstructured Argo profiles could replace estimates of the geostrophic deep-ocean component of the AMOC from the RAPID Array in the context of high-resolution ocean and climate models.
- Preprint
(5936 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2782', Anonymous Referee #1, 18 Jul 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-2782/egusphere-2025-2782-RC1-supplement.pdf
-
RC2: 'Comment on egusphere-2025-2782', David Smeed, 08 Sep 2025
This paper investigates an interesting approach to monitoring the AMOC that applies machine learning to derive information from Argo float profiles, and other data. I am not an expert in machine learning and my review is from the perspective of an oceanographer.
The authors find that in the model the machine learning technique can make accurate estimates of the AMOC, however, the amount of training data is much greater than is currently available from real observations. Thus, the only prospect for applying the method to estimate the real AMOC would be to train the method on model data. In the discussion, the authors suggest that models are not sufficiently realistic for this to be done now, though this is not analysed. The work is novel and interesting but the presentation is not always easily understandable and some of the analysis seems to confuse different questions. I recommend a major revision
There are two parts to the paper that I think need to be improved.
1). Section 4.3. This should be the most important part of the paper as it focuses on the contribution to the AMOC that depends upon the mooring measurements. However, I found this section confusing and the main variable under consideration was not clearly defined
a) The "RAPID like AMOC", sometimes also referred to in the manuscript as the "geostrophic AMOC" which is the focus of the analysis in section 4.3 is not clearly defined. On line 132 use the term "interior geostrophic transport", I think this is the most accurate description and it would be better throughout. Labelling it as "AMOC" is misleading.
b). The AMOC is usually defined as the maximum of the overturning stream function so the text on line 251 should be "strength of the stream function at the grid box closest to 1000m". Then on line 255 "we also use an interior geostrophic transport time series".
c) Note too that the RAPID "upper mid-ocean time series" usually includes the western boundary wedge. Smeed et al 2018 presented only the geostrophic part east of the mooring WB2 and referred to that as "gyre recirculation".
d) On line '536' it is stated that "the RAPID-like geostrophic AMOC, mainly represents the southward deeper return brach of the AMOC" . I think this is incorrect, but the variable is not defined so I am not sure. Normally the southward deep transport should be equal to the AMOC
e) When calculating geostrophic transport it is necessary to choose a reference velocity at one level. How was this done in this case? In the RAPID calculation this is done so that the total net transport is zero, so the reference velocity is also influenced by the Ekman and Florida Straits transports.
f) I did not understand why a reduced sampling near the surface to mimic the RAPID observations was done. Surely we want to know how well the ML reconstruction can estimate the actual geostrophic transport? How the missing data from the moorings affects the RAPID estimate is interesting but separate question. The analysis is confounding two different things. For this paper it would be better to focus only on the ML technique.
2) Section 4.2. "Importance of individual components for the AMOC reconstruction"
a) This section seems to be confusing two questions. The first question is what components of the circulation contribute most to AMOC variability and the second is which data is most useful for the ML reconstruction.
b) There are already quite a few papers that have discussed the first question. In particular Moat et al 2020 discuss how Ekman transport is important at short timescales and that at long time scales most variability is from the mid-ocean transport (see their Figure 2). So the results in Figure 7 do not seem surprising
c) It would be much more interesting if the authors instead examined how much different data contributed to the interior geostrophic transport. Is the surface stress or the Florida Straits transport contributing to the skill in the reconstruction of this component?
Other comments:
I found the paper quite long (740 lines excluding figures, tables, references and the abstract) and there are many places where the text could be shortened. E.g. in the introduction "Zilberman et al. (2020) grouped Argo profiles into 6°×6° cells in the Pacific to create a uniform coverage of Argo profiles which could be used for further computation" seems tangential and could be removed. Is it necessary to say (about Argo floats) "Data are transmitted through a satellite connection while the float drifts at the surface for a few hours."? Shortening the text will make the manuscript easier to read.
line 115 I do not understand "we also use positions of the RAPID moorings for information about the deeper layers."
line 155 "Figure ??"
line 193 please provide a citation for "graph data structure". Many readers, like me, will not be expert in the techniques of machine learning and so citations are particularly important. Similarly for "explainable AI (X-AI) techniques" on line 485
line 223. The statement "The reconstruction uses the concatenation of the density values from the Argo profiles for the upper 2000 meters and the derivation of the meridional velocity w.r.t. the depth computed with the RAPID mooring locations as information deeper than 2000 meters". is confusing. A concatenation of density and velocity seems odd.
Line 294 I do not understand what is meant by "For the virtual Argo profiles, the goal is to train an embedding (black box in Figure 2 B)) that maps a set of Argo profiles into a hidden space in where similar ocean states are near each other even though their spatial distribution of observations may be different." What is "an embedding"? I think "in where' should be "in which"
Table 1 In the last line I suppose "WS" should be "ZW"?
Line 354 is the naming of "test, validation, and training periods" standard? Often "test" and "validation" have similar meaning.Citation: https://doi.org/10.5194/egusphere-2025-2782-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
405 | 78 | 11 | 494 | 11 | 23 |
- HTML: 405
- PDF: 78
- XML: 11
- Total: 494
- BibTeX: 11
- EndNote: 23
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1