Estimating the AMOC from Argo Profiles with Machine Learning Trained on Ocean Simulations

Wölker, Yannick; Rath, Willi; Renz, Matthias; Biastoch, Arne

doi:10.5194/egusphere-2025-2782

Preprints

https://doi.org/10.5194/egusphere-2025-2782

Preprints

23 Jun 2025

| 23 Jun 2025

Estimating the AMOC from Argo Profiles with Machine Learning Trained on Ocean Simulations

Yannick Wölker, Willi Rath, Matthias Renz, and Arne Biastoch

Abstract. The Atlantic Meridional Overturning Circulation (AMOC) plays an important role in our climate system, continuous monitoring is important and could be enhanced by combing all available information. Moored measuring arrays like RAPID divide the AMOC in near-surface contributions, western-boundary currents, and the deep ocean in the interior of the basin. For the deep-ocean component, moorings measure density and focus on the calculation through geostrophy. These moored devices come with a high maintenance effort. Existing reconstruction studies show success with near-surface variables on monthly time scales, but do not focus on the interior transport. For interannual to decadal time scales, the geostrophic contribution becomes an important contribution.

Argo floats could provide required information about the geostrophic circulation as they continuously and cost-effective deliver hydrographic profiles. But they are spatially unstructured and only report instantaneous values. Here we show that the geostrophic part of the AMOC can be data-drivenly reconstructed by Argo profiles. To demonstrate this, we use a realistic and physically consistent high-resolution model VIKING20X. By simulating virtual Argo floats, we demonstrate that a learnable binning method to process the spatially variable Argo float distribution is able to reconstruct the geostrophic part of the VIKING20X AMOC by up to 80 % explained variance and a mean error of less than one Sverdrup for the geostrophic transport. Using methods of explainable AI we investigate the importance of our input components showing an increasing importance of the Argo profiles on seasonal and interannual timescales, validating the usefulness of the Argo floats for the reconstruction. Our results demonstrate how an AMOC reconstruction from unstructured Argo profiles could replace estimates of the geostrophic deep-ocean component of the AMOC from the RAPID Array in the context of high-resolution ocean and climate models.

Received: 13 Jun 2025 – Discussion started: 23 Jun 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5936 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (5936 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

18 Dec 2025

| Highlight paper

Estimating the AMOC from Argo profiles with machine learning trained on ocean simulations

Yannick Wölker, Willi Rath, Matthias Renz, and Arne Biastoch

Ocean Sci., 21, 3541–3562, https://doi.org/10.5194/os-21-3541-2025,https://doi.org/10.5194/os-21-3541-2025, 2025

Short summary Editorial statement

Yannick Wölker, Willi Rath, Matthias Renz, and Arne Biastoch

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-2782', Anonymous Referee #1, 18 Jul 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-2782/egusphere-2025-2782-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-2782-RC1
- AC1: 'Reply on RC1', Yannick Wölker, 17 Oct 2025
  
  Thank you for reviewing our manuscript. Please find a detailed response to your comments in the attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2782-AC1
RC2:
'Comment on egusphere-2025-2782', David Smeed, 08 Sep 2025

This paper investigates an interesting approach to monitoring the AMOC that applies machine learning to derive information from Argo float profiles, and other data. I am not an expert in machine learning and my review is from the perspective of an oceanographer.

The authors find that in the model the machine learning technique can make accurate estimates of the AMOC, however, the amount of training data is much greater than is currently available from real observations. Thus, the only prospect for applying the method to estimate the real AMOC would be to train the method on model data. In the discussion, the authors suggest that models are not sufficiently realistic for this to be done now, though this is not analysed. The work is novel and interesting but the presentation is not always easily understandable and some of the analysis seems to confuse different questions. I recommend a major revision

There are two parts to the paper that I think need to be improved.

1). Section 4.3. This should be the most important part of the paper as it focuses on the contribution to the AMOC that depends upon the mooring measurements. However, I found this section confusing and the main variable under consideration was not clearly defined

a) The "RAPID like AMOC", sometimes also referred to in the manuscript as the "geostrophic AMOC" which is the focus of the analysis in section 4.3 is not clearly defined. On line 132 use the term "interior geostrophic transport", I think this is the most accurate description and it would be better throughout. Labelling it as "AMOC" is misleading.

b). The AMOC is usually defined as the maximum of the overturning stream function so the text on line 251 should be "strength of the stream function at the grid box closest to 1000m". Then on line 255 "we also use an interior geostrophic transport time series".

c) Note too that the RAPID "upper mid-ocean time series" usually includes the western boundary wedge. Smeed et al 2018 presented only the geostrophic part east of the mooring WB2 and referred to that as "gyre recirculation".

d) On line '536' it is stated that "the RAPID-like geostrophic AMOC, mainly represents the southward deeper return brach of the AMOC" . I think this is incorrect, but the variable is not defined so I am not sure. Normally the southward deep transport should be equal to the AMOC

e) When calculating geostrophic transport it is necessary to choose a reference velocity at one level. How was this done in this case? In the RAPID calculation this is done so that the total net transport is zero, so the reference velocity is also influenced by the Ekman and Florida Straits transports.

f) I did not understand why a reduced sampling near the surface to mimic the RAPID observations was done. Surely we want to know how well the ML reconstruction can estimate the actual geostrophic transport? How the missing data from the moorings affects the RAPID estimate is interesting but separate question. The analysis is confounding two different things. For this paper it would be better to focus only on the ML technique.

2) Section 4.2. "Importance of individual components for the AMOC reconstruction"

a) This section seems to be confusing two questions. The first question is what components of the circulation contribute most to AMOC variability and the second is which data is most useful for the ML reconstruction.

b) There are already quite a few papers that have discussed the first question. In particular Moat et al 2020 discuss how Ekman transport is important at short timescales and that at long time scales most variability is from the mid-ocean transport (see their Figure 2). So the results in Figure 7 do not seem surprising

c) It would be much more interesting if the authors instead examined how much different data contributed to the interior geostrophic transport. Is the surface stress or the Florida Straits transport contributing to the skill in the reconstruction of this component?

Other comments:

I found the paper quite long (740 lines excluding figures, tables, references and the abstract) and there are many places where the text could be shortened. E.g. in the introduction "Zilberman et al. (2020) grouped Argo profiles into 6°×6° cells in the Pacific to create a uniform coverage of Argo profiles which could be used for further computation" seems tangential and could be removed. Is it necessary to say (about Argo floats) "Data are transmitted through a satellite connection while the float drifts at the surface for a few hours."? Shortening the text will make the manuscript easier to read.

line 115 I do not understand "we also use positions of the RAPID moorings for information about the deeper layers."

line 155 "Figure ??"

line 193 please provide a citation for "graph data structure". Many readers, like me, will not be expert in the techniques of machine learning and so citations are particularly important. Similarly for "explainable AI (X-AI) techniques" on line 485

line 223. The statement "The reconstruction uses the concatenation of the density values from the Argo profiles for the upper 2000 meters and the derivation of the meridional velocity w.r.t. the depth computed with the RAPID mooring locations as information deeper than 2000 meters". is confusing. A concatenation of density and velocity seems odd.

Line 294 I do not understand what is meant by "For the virtual Argo profiles, the goal is to train an embedding (black box in Figure 2 B)) that maps a set of Argo profiles into a hidden space in where similar ocean states are near each other even though their spatial distribution of observations may be different." What is "an embedding"? I think "in where' should be "in which"

Table 1 In the last line I suppose "WS" should be "ZW"?

Line 354 is the naming of "test, validation, and training periods" standard? Often "test" and "validation" have similar meaning.

Citation: https://doi.org/10.5194/egusphere-2025-2782-RC2
- AC2: 'Reply on RC2', Yannick Wölker, 17 Oct 2025
  
  Thank you for reviewing our manuscript. Please find a detailed response to your comments in the attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2782-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-2782', Anonymous Referee #1, 18 Jul 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-2782/egusphere-2025-2782-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-2782-RC1
- AC1: 'Reply on RC1', Yannick Wölker, 17 Oct 2025
  
  Thank you for reviewing our manuscript. Please find a detailed response to your comments in the attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2782-AC1
RC2:
'Comment on egusphere-2025-2782', David Smeed, 08 Sep 2025

This paper investigates an interesting approach to monitoring the AMOC that applies machine learning to derive information from Argo float profiles, and other data. I am not an expert in machine learning and my review is from the perspective of an oceanographer.

The authors find that in the model the machine learning technique can make accurate estimates of the AMOC, however, the amount of training data is much greater than is currently available from real observations. Thus, the only prospect for applying the method to estimate the real AMOC would be to train the method on model data. In the discussion, the authors suggest that models are not sufficiently realistic for this to be done now, though this is not analysed. The work is novel and interesting but the presentation is not always easily understandable and some of the analysis seems to confuse different questions. I recommend a major revision

There are two parts to the paper that I think need to be improved.

1). Section 4.3. This should be the most important part of the paper as it focuses on the contribution to the AMOC that depends upon the mooring measurements. However, I found this section confusing and the main variable under consideration was not clearly defined

a) The "RAPID like AMOC", sometimes also referred to in the manuscript as the "geostrophic AMOC" which is the focus of the analysis in section 4.3 is not clearly defined. On line 132 use the term "interior geostrophic transport", I think this is the most accurate description and it would be better throughout. Labelling it as "AMOC" is misleading.

b). The AMOC is usually defined as the maximum of the overturning stream function so the text on line 251 should be "strength of the stream function at the grid box closest to 1000m". Then on line 255 "we also use an interior geostrophic transport time series".

c) Note too that the RAPID "upper mid-ocean time series" usually includes the western boundary wedge. Smeed et al 2018 presented only the geostrophic part east of the mooring WB2 and referred to that as "gyre recirculation".

d) On line '536' it is stated that "the RAPID-like geostrophic AMOC, mainly represents the southward deeper return brach of the AMOC" . I think this is incorrect, but the variable is not defined so I am not sure. Normally the southward deep transport should be equal to the AMOC

e) When calculating geostrophic transport it is necessary to choose a reference velocity at one level. How was this done in this case? In the RAPID calculation this is done so that the total net transport is zero, so the reference velocity is also influenced by the Ekman and Florida Straits transports.

f) I did not understand why a reduced sampling near the surface to mimic the RAPID observations was done. Surely we want to know how well the ML reconstruction can estimate the actual geostrophic transport? How the missing data from the moorings affects the RAPID estimate is interesting but separate question. The analysis is confounding two different things. For this paper it would be better to focus only on the ML technique.

2) Section 4.2. "Importance of individual components for the AMOC reconstruction"

a) This section seems to be confusing two questions. The first question is what components of the circulation contribute most to AMOC variability and the second is which data is most useful for the ML reconstruction.

b) There are already quite a few papers that have discussed the first question. In particular Moat et al 2020 discuss how Ekman transport is important at short timescales and that at long time scales most variability is from the mid-ocean transport (see their Figure 2). So the results in Figure 7 do not seem surprising

c) It would be much more interesting if the authors instead examined how much different data contributed to the interior geostrophic transport. Is the surface stress or the Florida Straits transport contributing to the skill in the reconstruction of this component?

Other comments:

I found the paper quite long (740 lines excluding figures, tables, references and the abstract) and there are many places where the text could be shortened. E.g. in the introduction "Zilberman et al. (2020) grouped Argo profiles into 6°×6° cells in the Pacific to create a uniform coverage of Argo profiles which could be used for further computation" seems tangential and could be removed. Is it necessary to say (about Argo floats) "Data are transmitted through a satellite connection while the float drifts at the surface for a few hours."? Shortening the text will make the manuscript easier to read.

line 115 I do not understand "we also use positions of the RAPID moorings for information about the deeper layers."

line 155 "Figure ??"

line 193 please provide a citation for "graph data structure". Many readers, like me, will not be expert in the techniques of machine learning and so citations are particularly important. Similarly for "explainable AI (X-AI) techniques" on line 485

line 223. The statement "The reconstruction uses the concatenation of the density values from the Argo profiles for the upper 2000 meters and the derivation of the meridional velocity w.r.t. the depth computed with the RAPID mooring locations as information deeper than 2000 meters". is confusing. A concatenation of density and velocity seems odd.

Line 294 I do not understand what is meant by "For the virtual Argo profiles, the goal is to train an embedding (black box in Figure 2 B)) that maps a set of Argo profiles into a hidden space in where similar ocean states are near each other even though their spatial distribution of observations may be different." What is "an embedding"? I think "in where' should be "in which"

Table 1 In the last line I suppose "WS" should be "ZW"?

Line 354 is the naming of "test, validation, and training periods" standard? Often "test" and "validation" have similar meaning.

Citation: https://doi.org/10.5194/egusphere-2025-2782-RC2
- AC2: 'Reply on RC2', Yannick Wölker, 17 Oct 2025
  
  Thank you for reviewing our manuscript. Please find a detailed response to your comments in the attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2782-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Yannick Wölker on behalf of the Authors (17 Oct 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (08 Nov 2025) by Benjamin Rabe

RR by David Smeed (13 Nov 2025)

Suggestions for revision or reasons for rejection

Thank you to the authors for their responses to my previous comments. I find the revised manuscript much easier to read and understand. There is just one point that is not clear to me.

On L186 it is stated that: “Additionally, we provide the stream function below 2000 meters calculated from virtual RAPID moorings as an input to the reconstruction. Presumably, this is the term “MO” in table 1. It is not clear what data was used to calculate “the stream function”. I presume only the data below 2000m was used to calculate the streamfunction?

Other minor comments:

There is an error in equation (A4). I think that “1000” in the last term on the right should be “0”.

Figure 4 the legend and the caption give different interpretation for the colours of the lines. E.g. captions says “VIKING20X AMOC is depicted in orange” and legend implies ground truth is green.

Line 46 “baroclinc” should be “barotropic”

Line 50. Change
“This study explores the concept of using passively drifting autonomous observing systems, such as simulated Argo floats from ocean models”
To
“This study explores the concept of using Argo floats”

Line 53-54. Change
“They have potential for the upper part of the RAPID moorings at different time scales, from a single 10-day dive cycle, to mesoscale times scales over 30 days, seasonal signals over 90 days, and even longer time scales.”
To
“They have the potential to replace the upper part of the RAPID moorings.”

Line 105 delete “on” before “wind”

Line 187 Should it be “need not have” instead of “must not have”
Line 229 “reconstruction”

In section 4.4 maybe useful to say that even if deep mooring data does not improve the AMOC estimate in these experiments we do expect that it has useful information about the stream function at deeper levels

Hide

RR by Anonymous Referee #1 (19 Nov 2025)

ED: Publish subject to minor revisions (review by editor) (27 Nov 2025) by Benjamin Rabe

AR by Yannick Wölker on behalf of the Authors (28 Nov 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (08 Dec 2025) by Benjamin Rabe

AR by Yannick Wölker on behalf of the Authors (10 Dec 2025) Manuscript

Journal article(s) based on this preprint

18 Dec 2025

| Highlight paper

Estimating the AMOC from Argo profiles with machine learning trained on ocean simulations

Yannick Wölker, Willi Rath, Matthias Renz, and Arne Biastoch

Ocean Sci., 21, 3541–3562, https://doi.org/10.5194/os-21-3541-2025,https://doi.org/10.5194/os-21-3541-2025, 2025

Short summary Editorial statement

Yannick Wölker, Willi Rath, Matthias Renz, and Arne Biastoch

Viewed

Total article views: 1,006 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
823	158	25	1,006	21	35

HTML: 823
PDF: 158
XML: 25
Total: 1,006
BibTeX: 21
EndNote: 35

Views and downloads (calculated since 23 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	88	19	2	109
Jul 2025	56	20	6	82
Aug 2025	121	21	0	142
Sep 2025	404	41	5	450
Oct 2025	62	27	6	95
Nov 2025	58	8	3	69
Dec 2025	34	19	3	56
Jan 2026	0
Feb 2026	0
Mar 2026	3	0	3
Apr 2026	0

Cumulative views and downloads (calculated since 23 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	88	19	2	109
Jul 2025	56	20	6	82
Aug 2025	121	21	0	142
Sep 2025	404	41	5	450
Oct 2025	62	27	6	95
Nov 2025	58	8	3	69
Dec 2025	34	19	3	56
Jan 2026	0
Feb 2026	0
Mar 2026	3	0	3
Apr 2026	0

Viewed (geographical distribution)

Total article views: 1,000 (including HTML, PDF, and XML) Thereof 1,000 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 11 Apr 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (5936 KB)
Metadata XML

Short summary

The Atlantic Meridional Overturning Circulation (AMOC) is a large current system that helps regulate Earth's climate. Monitoring the AMOC relies on fixed instruments anchored to the seafloor. This study explores in a high-resolution model whether data from Argo floats, autonomous drifters collecting hydrographic profiles, can be used to monitor the AMOC cost-effectively with the help of Machine Learning. Results suggest that Argo floats can extend AMOC monitoring beyond current fixed arrays.


Total:	0
HTML:	0
PDF:	0
XML:	0