On the reconstruction of ocean interior variables: a feasibility data-driven study with simulated surface and water column observations

García-Espriu, Aina; González-Haro, Cristina; Aguilar-Gómez, Fernando

doi:10.5194/egusphere-2025-705

Preprints

https://doi.org/10.5194/egusphere-2025-705

Preprints

24 Feb 2025

| 24 Feb 2025

On the reconstruction of ocean interior variables: a feasibility data-driven study with simulated surface and water column observations

Aina García-Espriu, Cristina González-Haro, and Fernando Aguilar-Gómez

Abstract. This work uses data-driven approaches to study the feasibility of reconstructing ocean interior variables (temperature and salinity) from surface observations provided by satellites and interior observations provided by buoys. The feasibility of the approach is based on an Observing System Simulation Experiment (OSSE) in which we use the outputs from an ocean numerical model as the ground truth, and simulate a real observing system of the ocean, taking the surface of the model as a simulation of satellite observations, and vertical profiles in the same locations as the real buoys. We implemented different models based on Random Forest Regressors and Long-Short Term Memory networks which were trained with the simulated observations and validated against the complete numerical model results. We obtain high spatial and temporal correlation using both technologies and an accurate description of the annual variability of the data accompanied by small biases.

Received: 14 Feb 2025 – Discussion started: 24 Feb 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 14040 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (14040 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

24 Oct 2025

On the global reconstruction of ocean interior variables: a feasibility data-driven study with simulated surface and water column observations

Aina Garcia-Espriu, Cristina González-Haro, and Fernando Aguilar-Gómez

Ocean Sci., 21, 2579–2603, https://doi.org/10.5194/os-21-2579-2025,https://doi.org/10.5194/os-21-2579-2025, 2025

Short summary

Aina García-Espriu, Cristina González-Haro, and Fernando Aguilar-Gómez

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-705', Anonymous Referee #1, 05 Apr 2025

Summary
García-Espriu conduct an observing system simulation experiment (OSSE) to evaluate the feasibility of reconstructing ocean interior temperature and salinity from in situ observational data and satellite observational data products. The authors leverage output from the CMEMS Global Ocean Ensemble Reanalysis product to conduct this experiment, and they subsample the product at times and locations where Argo float profiles are available. They then use these subsampled synthetic profiles to train machine learning models, which they apply to satellite products to reconstruct ocean interior properties, and they compare these reconstructions against the reanalysis “truth” to evaluate the skill of the reconstruction methods.
The authors find that the more complex versions of their random forest regression (RFRv2) model and Long-Short Term Memory (LSTMv2) network are able to reproduce ocean temperature with an R² of around 0.85 and salinity with an R² of around 0.95. They validate their models with synthetic profiles withheld from model training and by using a regional subsection of the reanalysis dataset. They report validation statistics spatially and by depth, concluding that the RFRv2 model performed better in terms of the evaluation statistics against the test dataset but the LSTMv2 model was better able to represent the data in terms of variability over time and space. The authors also use SHapley Additive exPlanations (SHAP) to interpret their trained models.
Overall, I support the approach this manuscript takes to question of how in situ and satellite observing systems can be leveraged to reconstruct ocean interior properties. However, it falls short in its execution and interpretation of the analysis. Most importantly, the authors could attempt to remedy or discuss more extensively the shortcomings of the models to predict ocean interior variables from primarily surface data and the results could be better placed into context among similar studies that reconstruct ocean interior properties from observational data.
General suggestions
One aspect that I think is missing from the manuscript is the contextualization of the authors’ results with similar methodologies that have been applied to map salinity and temperature from observations (a few of which are referenced in the introduction). Although not all studies that reconstruct ocean interior properties from observations include a reanalysis-based evaluation of mapping accuracy (as is the focus of this manuscript), many report error statistics of their reconstructions evaluated against independent data. Su et al. (2018), for example, evaluate their reconstructed subsurface temperature anomalies using root mean squared error and R² as metrics, and the results of the OSSE reported here could be evaluated against those results.
In general, I was surprised to see such high disagreement with the test data at depth, when temperature and salinity should be more constant in space and time, and therefore relatively easier to reconstruct than at the surface. Buongiorno Nardelli (2020), for example, retrieve minimum errors for temperature and salinity at depth. This points, in my opinion, to an aspect of the methodology that can be significantly improved. It is not particularly surprising that a model based primarily on surface characteristics would struggle to estimate temperature and salinity at 1000 meters. I suspect a strategy of somehow de-emphasizing the impact of the surface predictor datasets as depth increases might improve these high offsets at depth. In any event, this is another instance where contextualization of the results of this OSSE would be helpful.
Lastly, the authors miss an opportunity to incorporate uncertainties into their experiment, or at least to discuss their implications. OSSEs present an opportunity to mimic real-world conditions; in reality, satellite observations are not perfect, nor are temperature and salinity measurements from profiling floats. Incorporating measurement uncertainty estimates in the analysis would be an important piece for answering the central question of how feasible it is to use satellite and in situ data to reconstruct ocean interior properties.
Line-by-line comments
Abstract: I would suggest defining the simulated in situ measurement platforms as “Argo floats” or “profiling floats” rather than buoys in the abstract.
28: Presumably this should say “subsurface temperature and salinity”?
86: Awkward phrasing in reference to the equatorial region.
97: punctuation issue here
161-166: I’m not sure I understand the training and test split. Are you withholding some percentage of the dataset on a daily frequency (if so, what percentage?) for testing during model training? How does this differ from the ground truth dataset that is being used for evaluation?
173: It would be helpful to specify the metric you are referring to when discussing “accuracies”
239: What is meant by “it does not overlap with the training dataset”? There are no Argo profiles from 2008-2009 in this region?
272: should be “…each of them with their own…”

Citation: https://doi.org/10.5194/egusphere-2025-705-RC1
- AC1: 'Reply on RC1', Aina Garcia, 05 Aug 2025
  
  We have carefully addressed the comments of the reviewer in the following document. Answers are given in blue and reference the tracked-changes manuscript. In the tracked changes manuscript, the additions are marked in blue and the modifications (or moved text) in green.
  
  Citation: https://doi.org/10.5194/egusphere-2025-705-AC1
RC2:
'Comment on egusphere-2025-705', Anonymous Referee #2, 17 Apr 2025
Review of the paper "On the reconstruction of ocean interior variables: a feasibility data-driven study with simulated surface and water column observations"
by Aina García-Espriu, Cristina González-Haro, and Fernando Aguilar-Gómez
This study investigates the feasibility of reconstructing ocean interior variables, specifically temperature and salinity profiles, using AI-based algorithms applied to simulated satellite surface data and in-situ buoy observations. Leveraging an Observing System Simulation Experiment (OSSE) with outputs from a numerical ocean reanalysis model from the EU Copernicus Marine Service, the authors compare the performance of Random Forest Regressors (RFR) and Long-Short Term Memory (LSTM) networks. The results show that both models reasonably capture the spatial and temporal variability of ocean interior conditions (particularly for salinity), with RFR offering higher accuracy in direct reconstructions and LSTM demonstrating better extrapolation capabilities with ground truth observations. The findings highlight the potential of data-driven approaches to enhance 4D ocean reconstruction and contribute to future digital twin ocean frameworks, while also identifying current challenges in capturing vertical variability and reducing biases. Nevertheless, the study lacks some aspects which should be integrated at least at the discussion level.
Major Points
I could not fully understand how surface information is synthesized from the Copernicus marine service numerically modelled data. To the best of my understanding, the aim is to provide insights on a potential 4D reconstruction that exploits satellite based surface observations. In particular, the Authors claim the intention to perform reconstructions at the spatial resolution provided by space-based microwave sensors. However, it seems surface observations are directly extracted from modelled surface data. To be consistent, an assessment of the type and effective resolutions of satellite input data should be performed and the synthetic input data should be adjusted accordingly. For example, present-day satellite-based sea surface heights/currents/temperature could differ significantly with respect to the outputs of a hydrodynamic model. A discussion on how this could impact the results of the 4D reconstruction could be beneficial;

On the same note, I think the paper lacks discussions on the capability of current satellite missions and, more importantly, future missions for Earth observations in the microwave band, how this could impact e.g. sea surface temperature and salinity monitoring and which could be the impact of such missions on the proposed ocean 4D reconstruction. I think this should also be integrated in the discussion section, at least.

Minor Points
I was wondering if the proposed reconstruction methodology is able to provide un uncertainty estimate to verify if the profiles provided in Figure 10 can be considered significantly different. Could the Authors quickly comment on that?

Have the Authors tried to inter-compare the feature-resolution of the reconstructed fields versus the ground truth? Are you expecting significant differences?

Could the Authors also provide a broad overview of which could be the “real in-situ and satellite” data more suitable for their future applications?

Typos
In general, please always use Copernicus Marine Service instead of CMEMS when referring to data generated within the EU Copernicus Marine Service

Line 88: earth-> Earth
Citation: https://doi.org/10.5194/egusphere-2025-705-RC2
- AC2: 'Reply on RC2', Aina Garcia, 05 Aug 2025
  
  We have carefully addressed the comments of the reviewer in the following document. Answers are given in blue and reference the tracked-changes manuscript. In the tracked changes manuscript, the additions are marked in blue and the modifications (or moved text) in green.
  
  Citation: https://doi.org/10.5194/egusphere-2025-705-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-705', Anonymous Referee #1, 05 Apr 2025

Summary
García-Espriu conduct an observing system simulation experiment (OSSE) to evaluate the feasibility of reconstructing ocean interior temperature and salinity from in situ observational data and satellite observational data products. The authors leverage output from the CMEMS Global Ocean Ensemble Reanalysis product to conduct this experiment, and they subsample the product at times and locations where Argo float profiles are available. They then use these subsampled synthetic profiles to train machine learning models, which they apply to satellite products to reconstruct ocean interior properties, and they compare these reconstructions against the reanalysis “truth” to evaluate the skill of the reconstruction methods.
The authors find that the more complex versions of their random forest regression (RFRv2) model and Long-Short Term Memory (LSTMv2) network are able to reproduce ocean temperature with an R² of around 0.85 and salinity with an R² of around 0.95. They validate their models with synthetic profiles withheld from model training and by using a regional subsection of the reanalysis dataset. They report validation statistics spatially and by depth, concluding that the RFRv2 model performed better in terms of the evaluation statistics against the test dataset but the LSTMv2 model was better able to represent the data in terms of variability over time and space. The authors also use SHapley Additive exPlanations (SHAP) to interpret their trained models.
Overall, I support the approach this manuscript takes to question of how in situ and satellite observing systems can be leveraged to reconstruct ocean interior properties. However, it falls short in its execution and interpretation of the analysis. Most importantly, the authors could attempt to remedy or discuss more extensively the shortcomings of the models to predict ocean interior variables from primarily surface data and the results could be better placed into context among similar studies that reconstruct ocean interior properties from observational data.
General suggestions
One aspect that I think is missing from the manuscript is the contextualization of the authors’ results with similar methodologies that have been applied to map salinity and temperature from observations (a few of which are referenced in the introduction). Although not all studies that reconstruct ocean interior properties from observations include a reanalysis-based evaluation of mapping accuracy (as is the focus of this manuscript), many report error statistics of their reconstructions evaluated against independent data. Su et al. (2018), for example, evaluate their reconstructed subsurface temperature anomalies using root mean squared error and R² as metrics, and the results of the OSSE reported here could be evaluated against those results.
In general, I was surprised to see such high disagreement with the test data at depth, when temperature and salinity should be more constant in space and time, and therefore relatively easier to reconstruct than at the surface. Buongiorno Nardelli (2020), for example, retrieve minimum errors for temperature and salinity at depth. This points, in my opinion, to an aspect of the methodology that can be significantly improved. It is not particularly surprising that a model based primarily on surface characteristics would struggle to estimate temperature and salinity at 1000 meters. I suspect a strategy of somehow de-emphasizing the impact of the surface predictor datasets as depth increases might improve these high offsets at depth. In any event, this is another instance where contextualization of the results of this OSSE would be helpful.
Lastly, the authors miss an opportunity to incorporate uncertainties into their experiment, or at least to discuss their implications. OSSEs present an opportunity to mimic real-world conditions; in reality, satellite observations are not perfect, nor are temperature and salinity measurements from profiling floats. Incorporating measurement uncertainty estimates in the analysis would be an important piece for answering the central question of how feasible it is to use satellite and in situ data to reconstruct ocean interior properties.
Line-by-line comments
Abstract: I would suggest defining the simulated in situ measurement platforms as “Argo floats” or “profiling floats” rather than buoys in the abstract.
28: Presumably this should say “subsurface temperature and salinity”?
86: Awkward phrasing in reference to the equatorial region.
97: punctuation issue here
161-166: I’m not sure I understand the training and test split. Are you withholding some percentage of the dataset on a daily frequency (if so, what percentage?) for testing during model training? How does this differ from the ground truth dataset that is being used for evaluation?
173: It would be helpful to specify the metric you are referring to when discussing “accuracies”
239: What is meant by “it does not overlap with the training dataset”? There are no Argo profiles from 2008-2009 in this region?
272: should be “…each of them with their own…”

Citation: https://doi.org/10.5194/egusphere-2025-705-RC1
- AC1: 'Reply on RC1', Aina Garcia, 05 Aug 2025
  
  We have carefully addressed the comments of the reviewer in the following document. Answers are given in blue and reference the tracked-changes manuscript. In the tracked changes manuscript, the additions are marked in blue and the modifications (or moved text) in green.
  
  Citation: https://doi.org/10.5194/egusphere-2025-705-AC1
RC2:
'Comment on egusphere-2025-705', Anonymous Referee #2, 17 Apr 2025
Review of the paper "On the reconstruction of ocean interior variables: a feasibility data-driven study with simulated surface and water column observations"
by Aina García-Espriu, Cristina González-Haro, and Fernando Aguilar-Gómez
This study investigates the feasibility of reconstructing ocean interior variables, specifically temperature and salinity profiles, using AI-based algorithms applied to simulated satellite surface data and in-situ buoy observations. Leveraging an Observing System Simulation Experiment (OSSE) with outputs from a numerical ocean reanalysis model from the EU Copernicus Marine Service, the authors compare the performance of Random Forest Regressors (RFR) and Long-Short Term Memory (LSTM) networks. The results show that both models reasonably capture the spatial and temporal variability of ocean interior conditions (particularly for salinity), with RFR offering higher accuracy in direct reconstructions and LSTM demonstrating better extrapolation capabilities with ground truth observations. The findings highlight the potential of data-driven approaches to enhance 4D ocean reconstruction and contribute to future digital twin ocean frameworks, while also identifying current challenges in capturing vertical variability and reducing biases. Nevertheless, the study lacks some aspects which should be integrated at least at the discussion level.
Major Points
I could not fully understand how surface information is synthesized from the Copernicus marine service numerically modelled data. To the best of my understanding, the aim is to provide insights on a potential 4D reconstruction that exploits satellite based surface observations. In particular, the Authors claim the intention to perform reconstructions at the spatial resolution provided by space-based microwave sensors. However, it seems surface observations are directly extracted from modelled surface data. To be consistent, an assessment of the type and effective resolutions of satellite input data should be performed and the synthetic input data should be adjusted accordingly. For example, present-day satellite-based sea surface heights/currents/temperature could differ significantly with respect to the outputs of a hydrodynamic model. A discussion on how this could impact the results of the 4D reconstruction could be beneficial;

On the same note, I think the paper lacks discussions on the capability of current satellite missions and, more importantly, future missions for Earth observations in the microwave band, how this could impact e.g. sea surface temperature and salinity monitoring and which could be the impact of such missions on the proposed ocean 4D reconstruction. I think this should also be integrated in the discussion section, at least.

Minor Points
I was wondering if the proposed reconstruction methodology is able to provide un uncertainty estimate to verify if the profiles provided in Figure 10 can be considered significantly different. Could the Authors quickly comment on that?

Have the Authors tried to inter-compare the feature-resolution of the reconstructed fields versus the ground truth? Are you expecting significant differences?

Could the Authors also provide a broad overview of which could be the “real in-situ and satellite” data more suitable for their future applications?

Typos
In general, please always use Copernicus Marine Service instead of CMEMS when referring to data generated within the EU Copernicus Marine Service

Line 88: earth-> Earth
Citation: https://doi.org/10.5194/egusphere-2025-705-RC2
- AC2: 'Reply on RC2', Aina Garcia, 05 Aug 2025
  
  We have carefully addressed the comments of the reviewer in the following document. Answers are given in blue and reference the tracked-changes manuscript. In the tracked changes manuscript, the additions are marked in blue and the modifications (or moved text) in green.
  
  Citation: https://doi.org/10.5194/egusphere-2025-705-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Aina Garcia on behalf of the Authors (05 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (07 Aug 2025) by Bernadette Sloyan

RR by Anonymous Referee #1 (16 Aug 2025)

ED: Publish subject to minor revisions (review by editor) (27 Aug 2025) by Bernadette Sloyan

AR by Aina Garcia on behalf of the Authors (04 Sep 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (05 Sep 2025) by Bernadette Sloyan

AR by Aina Garcia on behalf of the Authors (05 Sep 2025)

Journal article(s) based on this preprint

24 Oct 2025

On the global reconstruction of ocean interior variables: a feasibility data-driven study with simulated surface and water column observations

Aina Garcia-Espriu, Cristina González-Haro, and Fernando Aguilar-Gómez

Ocean Sci., 21, 2579–2603, https://doi.org/10.5194/os-21-2579-2025,https://doi.org/10.5194/os-21-2579-2025, 2025

Short summary

Aina García-Espriu, Cristina González-Haro, and Fernando Aguilar-Gómez

Viewed

Total article views: 1,432 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
791	585	56	1,432	40	81

HTML: 791
PDF: 585
XML: 56
Total: 1,432
BibTeX: 40
EndNote: 81

Views and downloads (calculated since 24 Feb 2025)

Month	HTML	PDF	XML	Total
Feb 2025	33	7	2	42
Mar 2025	30	4	1	35
Apr 2025	56	10	4	70
May 2025	20	12	3	35
Jun 2025	22	5	1	28
Jul 2025	18	6	1	25
Aug 2025	86	10	11	107
Sep 2025	358	7	3	368
Oct 2025	28	19	0	47
Nov 2025	15	59	2	76
Dec 2025	14	88	3	105
Jan 2026	25	67	12	104
Feb 2026	26	50	3	79
Mar 2026	29	59	3	91
Apr 2026	31	182	7	220
May 2026	0

Cumulative views and downloads (calculated since 24 Feb 2025)

Month	HTML	PDF	XML	Total
Feb 2025	33	7	2	42
Mar 2025	30	4	1	35
Apr 2025	56	10	4	70
May 2025	20	12	3	35
Jun 2025	22	5	1	28
Jul 2025	18	6	1	25
Aug 2025	86	10	11	107
Sep 2025	358	7	3	368
Oct 2025	28	19	0	47
Nov 2025	15	59	2	76
Dec 2025	14	88	3	105
Jan 2026	25	67	12	104
Feb 2026	26	50	3	79
Mar 2026	29	59	3	91
Apr 2026	31	182	7	220
May 2026	0

Viewed (geographical distribution)

Total article views: 1,428 (including HTML, PDF, and XML) Thereof 1,428 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 02 May 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (14040 KB)
Metadata XML

Short summary

Ocean measurements currently rely on buoys for depth data and satellites for surface observations. We investigated combining these using data-driven approaches to reconstruct full 4D ocean profiles. Using an ocean model as ground truth, we simulated satellite surface data and ARGO profiles and then applied machine learning to predict complete temperature and salinity profiles. Results showed accurate predictions that matched simulation data and captured seasonal patterns.


Total:	0
HTML:	0
PDF:	0
XML:	0

On the reconstruction of ocean interior variables: a feasibility data-driven study with simulated surface and water column observations

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)