the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Sea surface salinity downscaling using deep generative diffusion models
Abstract. High-resolution satellite observations are essential for studying fine-scale ocean processes. We investigate diffusion models, a class of deep generative models, for improving the resolution of sea surface salinity (SSS) from coarse inputs and for reconstruction under noisy and incomplete observations. We train an unconditional prior on 1/12° reanalysis fields and condition the model at inference time on coarse SSS (1/3°) together with high-resolution (1/12°) sea surface temperature (SST) and sea surface height (SSH) as auxiliary variables. Conditioning is performed via a pseudo-inverse guidance approach, which steers sampling toward solutions that are both statistically consistent with the learned prior and compatible with the observations. We also introduce a simple gradient-enhancement procedure applied during inference to increase contrast while maintaining consistency with the conditioning constraints. Experiments in the Gulf Stream region compare models conditioned on SST only, on SSH only, and on both variables. Validation over the year 2020 uses root-mean-square error (RMSE), structural similarity (SSIM), gradient distributions, and temporal Fourier spectra. Conditioning on SST substantially improves accuracy relative to SSH alone; combining SST and SSH yields further gains and slightly outperforms a convolutional baseline. The gradient-enhanced sampler restores sharper fronts and increased weekly-daily variance at a small cost in pixel-wise scores. Overall, the results show that guided diffusion models can downscale SSS while recovering fine-scale structure, with SST providing the dominant small-scale constraint and SSH adding complementary mesoscale context. The framework is designed to extend naturally to satellite SSS products and future higher-resolution missions.
- Preprint
(7609 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2026-1828', Anonymous Referee #1, 10 Jun 2026
-
RC2: 'Comment on egusphere-2026-1828', Anonymous Referee #2, 17 Jun 2026
This manuscript presents a novel diffusion-based deep generative model for downscaling SSS from 1/3° to 1/12° resolution, utilizing high-resolution SST and SSH as conditioning variables during inference. While the framework shows potential for extension to satellite SSS products, the methodological description lacks clarity. Furthermore, the proposed method demonstrates only marginal improvements over existing approaches. Therefore, I recommend a major revision prior to be published.
Major comments:
- As a reanalysis product incorporating data assimilation, how does the GLORYS dataset ensure dynamical consistency among SST, SSH, and SSS? Given that these variables are assimilated independently from diverse observational streams, the physical coherence between them may be compromised.
- The horizontal resolution of the GLORYS dataset (1/12°) is insufficient to resolve submesoscale and small-scale structures. Resolving submesoscale features typically requires resolutions of 1–2 km, while small-scale processes demand resolutions of hundreds of meters or higher. Consequently, it is questionable whether the proposed method can genuinely recover these fine-scale dynamics from such coarse input data.
- As shown in Table 2, the DIFF-SST-SSH-BEST configuration yields only a marginal 7% RMSE reduction compared to RESAC, while the DIFF-SST-SSH-GF variant performs worse, exhibiting a 12% higher RMSE. These inconsistent results raise concerns regarding the robustness and added value of the proposed methodology. Consequently, the model performance requires substantial improvement to justify its novelty.
- In general, the unknown variable should be Y, the known variable should be X. It should be redefined for equation (1) to illustrate the downscaling clearly.
- Both the forward diffusion process and the denoising process are described in section 3.2. It is contradictory to the title.
Minor comments:
- Line84: how can obtain a 1/3◦ SSS field by applying a 3×3 spatial averaging from 1/12◦ native field? Does it be 4×4?
- Line 114 in equation (2): the prediction should be high-resolution or low-resolution field? Lots of variables should be checked again to make sure which one is low resolution and which one is high resolution in Section 3 Method. It is difficult to understand now.
- Line 129: delete the second we.
- Line 228: what is “the actual desired ouput images” ? It is not clear how to compute the DIFF-SST-SSH-BEST.
- Table 1: Why the SST and SSH of 1/12◦ are output?
Citation: https://doi.org/10.5194/egusphere-2026-1828-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 201 | 145 | 21 | 367 | 37 | 36 |
- HTML: 201
- PDF: 145
- XML: 21
- Total: 367
- BibTeX: 37
- EndNote: 36
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
See comments in the attached document.