the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multi-satellite U-Net for high-resolution sea surface temperature reconstruction
Abstract. High-resolution sea surface temperature (SST) products are critical for understanding ocean dynamics at submesoscales (less than tens of kilometers) and their influence on upper ocean physics. While modern infrared (IR) radiometers measure SST at high (∼ 1 km) resolution, they cannot image through clouds, resulting in large gaps in remotely sensed SST. In this study, we address the challenge of reconstructing gap-free high-resolution SST by fusing complementary observations across sensors and time using machine learning (ML). We present the Multi-satellite U-Net for SST Estimation (MUSE), a residual U-Net fuses two days (eight 6-hourly snapshots) of multi-satellite IR and microwave (MW) data into cloud-free SST, and further mosaicked into global SST fields. The MUSE model is trained on 9 months of simulated cloudy SST from the MITgcm LLC4320 1/48° SST product, and evaluated on 2 months of held-out LLC4320 data and out-of-distribution Level 3 satellite data. MUSE outperforms single-time and single-satellite baselines across error, correlation and coherence metrics, achieving a global reconstruction error of 0.035 °C on the simulated dataset. On the satellite dataset, MUSE produces results comparable to the state-of-the-art Level 4 MUR 0.01° product. Our results demonstrate the power of ML in synthesizing diverse satellite measurements, each with inherent limitations, into a submesoscale-resolving dataset that enables critical insights into ocean dynamics. Both our data fusion strategy and simulation-to-satellite paradigm can be generalized to other geophysical variables to produce high-resolution, observation-based Earth system fields.
- Preprint
(4854 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2025-4847', Claudia Fanelli, 22 Oct 2025
-
AC1: 'Reply on CC1', Ellin Zhao, 16 Jan 2026
Hello Dr. Fanelli: We apologize for the error, and we have corrected the text in our revisions. Thanks for pointing this out!
Citation: https://doi.org/10.5194/egusphere-2025-4847-AC1
-
AC1: 'Reply on CC1', Ellin Zhao, 16 Jan 2026
-
RC1: 'Comment on egusphere-2025-4847', Anonymous Referee #1, 11 Dec 2025
Authors present a rigorous methodology to reconstruct sea surface temperature (SST) from data with gaps due to the presence of the clouds. Infrared (IR) sensors which can provide high-resolution kilometre scale measurements cannot penetrate into the clouds while microwave (MW) sensors which can have much lower resolutions (~100 km). The methodology is based on modelled observations sampled from a high-resolution (1/48°) and high frequency (1h) simulation of the global ocean (LLC4320) representing IR and MW SST observations. These observations are used to train a U-Net based model (MUSE) for gap filling as well as validation and testing. Moreover, the model build is used to reconstruct satellite L3S observations to demonstrate its applicability with real observations.
The manuscript is well written. It is a timely contribution to the field by extending existing efforts to multi-sensor observations in global scales. I suggest a minor revision of the manuscript before acceptance to publication. Below are my suggestions that I hope will help to improve the quality of this work.
General Comments
Authors mentions foundation SST only once in the introduction but never in the main text. It is an important concept for SST especially if linked to the numerical model SST. I suggest them to discuss what they mean by foundation SST, how it is linked to the LLC4320 first model layer. How it is defined in L3S and L4 products used. How the diurnal cycle represented model’s hourly outputs projects onto the foundation SST?
Another point is that the study is clearly conducted for climate applications. However, using future data is a limitation for short term forecasting applications such as MHW. It would be useful to discuss if the distribution of data and the skill of MUSE would change if only the past data would be used as input. In other words, would the skill be similar if the last day of the input is reconstructed instead of the day in the middle?
Specific Comments
Fig.1 No test or validation in boreal summer? Do you expect an impact of land distribution on the separation of dataset?
L112 Is there a reason why clouds from forcing aren’t used instead of L3 product? Wouldn’t it be more consistent with the model SST during training?
L209. The middle timestep is maximally correlated to all input timesteps, so reconstructing the middle timestep is easier than reconstructing all input timesteps.
L218. Why the channel method does not learn enough deserves a more solid justification. Is it the “multivariate” nature of the channel approach that degrades the correlations? Is MW used only on the gaps or everywhere when used as a channel?
L292 the SST RMSE is 0.037°C, and the gradient SST RMSE is 0.012 °C km-1
Please discuss if these errors are realistic. Also, in L337. The RMSE using real observations instead of simulated ones are 3-5 times more. Can the degradation of the skill using real observations be due to the mismatch between the foundation SST and model SST? Would it be better in case subskin SST is used? What is the first model depth?
L324. Please explain precisely how daily L3 SST becomes an input to an 8-time model.
L341. What are other ways of mitigating OOD beyond preprocessing? If there are ways, authors should justify why they haven’t used them.
Citation: https://doi.org/10.5194/egusphere-2025-4847-RC1 -
RC2: 'Comment on egusphere-2025-4847', Anonymous Referee #2, 06 Jan 2026
Review of “Multi-satellite U-Net for high-resolution sea surface temperature reconstruction”
The paper describes a neural network to reconstruct missing data in satellite images. The authors use hydrodynamical model data to train a neural network to apply the trained model on real satellite images. The approach taken is also able to combine IR and microwave data. The general approach is novel and interesting.
While reviewing this paper, I want to highlight the following major points:
A. The main claim of the authors seems to be that one can train with hydrodynamical model data alone and apply the trained neural network to real data. But to really make this claim authors should do the training on satellite images and compare the reconstruction of a trained model based on hydrodynamical model to really assess the gap “sim-to-real” in a comparable setting (exactly the same domain and time interval).
B. The error estimate is also problematic. It seems that the method can only predict the error for a whole tile while other techniques are able to provide an error per pixel.
However the error variance is expected to be highly spatial dependent. This limitation is not mentioned in the manuscript. The paper also mentions that one could empirically deduce the RMSE from the cloud coverage and the standard deviation of overlapping tiles. However this approach is not demonstrated and validated quantitatively in a statistical manner. Also it should be noted that empirical correlation coefficients should not be deduced on the test dataset as the test dataset should be independent.
It would also be important to check the realism of the error estimation on real data not just simulated data.
C. The power spectrum analysis should be done on the real observations, instead (or in addition) to the simulated model data. It is indeed an interesting approach to train a neural network on hydrodynamical model data but the validation should focus on the observations (see also point B).
D. comparison with other techniques are not as straightforward as the authors would like it to be as the reported RMS errors are not for the exactly same domains and for the same input data.
Therefore I recommend major revision, but I would be happy to change my recommendation if the authors are willing to take my review into account.
Below are some other mostly minor comments for the revision.
47: “Zupančič Muc et al. (2025) build on MAESSTRO by using 3 daily IR SST snapshots for their Coarse Reconstruction with ITerative Refinement (CRITER) network
that uses a MAE in the coarse reconstruction stage.”
Zupančič Muc did compare their work to MAESSTRO, but I would not say that they based their work on MAESSTRO. Zupančič Muc uses a ViT.
Table 1: It is difficult to compare the RMSE from different regions and different datasets. There is too little context to compare the value.
A table would be sensible if we would use a standardized dataset but this is not the case.
117: “the satellite MW data is bi-linearly upsampled across time to match the 6-hourly temporal resolution of the L3S data.”
Do you use a 1d or 2d interpolation?
equation 7: “We set γ1 = 1 and γ2 = 5 so that the SST and SST gradient losses have similar magnitudes during training.”
T and |\nabla_xy T| do not have the same units. So γ1 and γ2 cannot be both adimensional at the same time ? Unless, they use as gradient the finite different without dividing by the resolution (but this is not the gradient as it is defined mathematically)
178: replace 1e^{-5} by 10^{-5} (and similar)
equation 8:
I guess that k is the norm of the wave number vector, please clarify.
It is also not very clear CSD and PSD is done on time series, 2D field or 3D field. Please also include a reference for the approach as you used it.
Reading the source code (assuming it is the file figure_scripts/plot_npo_sst.py) clarifies their approach. But it also shows that there is a lot more to it than what is mentioned in the manuscript: use of Hann window, linear detrending. Also the analysis is done on the x and y axis independently and then averaged is not mentioned (i.e. no real 2D analysis). The MITgcm LLC4320 has a resolution of 1/48° meaning that the resolution in km is different for the x and y axis. Can you clarify how this is taken into account?
252: “The cross-section also shows a slight oscillating artifact in the IR reconstruction”: What is the origin of those oscillations?
Figure 9: Please add also the spectrum of the original IR data on this figure. This is important information to assess variance is potentially missing at small scales. Currently this figure only shows the difference between two reconstruction approaches.
Also making this comparison to real satellite observations (as opposed to model data) would be much more useful (in addition or instead). Consider using an IR image with no (or very few missing data) where you add artificial clouds for the reconstruction.
Table 3 and 4: For correlation of SST: please also provide the correlation with the seasonal cycle removed.Citation: https://doi.org/10.5194/egusphere-2025-4847-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 405 | 212 | 30 | 647 | 23 | 24 |
- HTML: 405
- PDF: 212
- XML: 30
- Total: 647
- BibTeX: 23
- EndNote: 24
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
I just want to point out that the work of Fanelli et al. (2024) does not include microwave (MW) measured SST data, but products obtained only by infrared measurements.
I hope that authors will review the manuscript accordingly.
Thank you.