the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Attention-Driven and Multi-Scale Feature Integrated Approach for Earth Surface Temperature Data Reconstruction
Abstract. High-resolution observations are essential for the study of surface temperatures characterized by complex changes, especially in the surface air temperature of the ocean region, which is an important indicator of coupled changes in sea and air. Because of the scarcity of conventional observations of surface atmospheric temperature in these areas, high-resolution surface atmospheric temperature data obtained from satellite inversion has become the main source of information. However, the lack of data due to such factors as orbital spacing, cloud volume, sensor errors and other interference of polar satellites poses a major challenge to the estimation of the Earth's surface temperature (EST). In this paper, we present ESTD-Net, a new model based on deep learning designed for surface temperature data repair. ESTD-Net combines enhanced multi-header context attention and improved transformer blocks to capture long-range pixel dependencies, improving the model's ability to focus on boundary areas. In addition, we have integrated a convolutional U-Net to optimize high-frequency details and leverage texture enhancements from convolutional neural networks (CNN) to further improve the quality of reconstructed images. The model was enhanced by two key innovations: (1) weighted reconstruction losses, which prioritized masking areas to ensure accurate reconstruction of missing data; (2) Gradient consistency regularizes to minimize gradient differences between real and reconstructed images to ensure structural coherence and consistency. The evaluation showed that ESTD-Net outperformed existing methods in terms of pixel-level accuracy and perceived quality. Our approach provides a robust and reliable solution for restoring surface temperature data.
- Preprint
(6091 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on egusphere-2025-1980', Juan Antonio Añel, 22 Jun 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlin your "Code and Data Availability" statement you say that the data used in your work is publicly available, and you list two main webpages of portals. However, first, these are not valid repositories for long-term archival of the data used in your work, and second, the information that you provide is not enough to obtain the exact data that you have used in your study both for training and validation of your results.
Therefore, the current situation with your manuscript is irregular, as we can not accept manuscripts in Discussions that do not comply with our policy. Please, publish the data that you have used to produce your work in one of the appropriate repositories according to our policy and reply as soon as possible to this comment with a modified 'Code and Data Availability' section for your manuscript, which must include the relevant information (link and handle or DOI) of the new repositories, and which you should include in a potentially reviewed manuscript.
I must note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-1980-CEC1 -
AC1: 'Reply on CEC1', Yunjie Chen, 25 Jun 2025
Dear Editor,
Thank you very much for your feedback regarding the "Code and Data Availability" section of our manuscript.
We sincerely apologize for the initial oversight. In response to your comments, we have now uploaded the datasets used in our study to appropriate long-term data repositories in compliance with the journal's Code and Data Policy.
Specifically, we have made the following updates:
- The FengYun-3D (FY-3D) MWRI data used in our study is now archived at Zenodo with the following DOI: https://doi.org/10.5281/zenodo.15734212
- The ERA5 data used is archived at Zenodo as well, with DOI: https://doi.org/10.5281/zenodo.15734414
We hope this resolves the issue, and we sincerely appreciate your time and consideration. Please let us know if any further changes are required.
Sincerely,
Yunjie ChenCitation: https://doi.org/10.5194/egusphere-2025-1980-AC1
-
AC1: 'Reply on CEC1', Yunjie Chen, 25 Jun 2025
-
RC1: 'Comment on egusphere-2025-1980', Anonymous Referee #1, 27 Jun 2025
This manuscript introduces a deep learning-based data restoration method, ESTD-Net, aimed at recovering surface temperature from high-resolution observations. This method is built upon advanced models and restores temperature data with pixel-level accuracy, enhancing the quality of the reconstructed images. The manuscript writing motivation is clear, and the experimental results show improvements. However, there are still many issues that need to be further revised.
Major Comments:
- The ESTD-Net proposed in the manuscript effectively solves the time complexity of traditional Transformers in image quality restoration tasks, but there is no experimental comparison with the training time of Transformer models presented in the article.
- The manuscript presents the superiority of ESTD-Net in the task of recovering high-resolution temperature data, but lacks experimental comparisons with more recent extreme models for data recovery tasks, which cannot fully demonstrate the effectiveness of the proposed method. Additional evidence can be provided by comparing experimental results with more advanced models.
- The overall model diagram and description are not clear enough. It is unclear how the Conv-U-Net in the second stage further improves the reconstruction accuracy. Additionally, the structure of the discriminator is unclear. Is it composed only of fully connected layers?
- The overall workflow of training and inference is not very intuitive.
- The manuscript directly reconstructs the processed brightness temperature data. Is the brightness temperature first processed into surface temperature and then verified with the surface temperature of ERA5? The description and rationality of the dataset need to be further explained.
Minor Comments:
- Regarding the mask based contextual attention module proposed in section 3.3.1 of this manuscript, although a detailed calculation process is introduced, no corresponding figure is attached for explanation, which will reduce the readability of the paper.
- This manuscript validated the impact of various loss factors in the loss function on model performance in the ablation experiment section, but it seems that there were no confirmatory experiments on the impact of contextual attention and other modules on the model, which may lead to ambiguity in the study of model architecture.
Overall, this article contributes a new method architecture in the field of surface temperature restoration. By introducing improved modules such as context attention mechanism, the overall edge feature extraction ability of the model is improved. I suggest that the authors resolve the aforementioned issues before proceeding with publication.
Citation: https://doi.org/10.5194/egusphere-2025-1980-RC1 -
RC2: 'Comment on egusphere-2025-1980', Anonymous Referee #2, 12 Jul 2025
A. Summary
This manuscript presents ESTD-Net, a two-stage hybrid architecture for inpainting missing Earth Surface Temperature (EST) fields derived from MWRI/FY-3D satellite data. The first stage employs a modified Transformer (“boundary-aware” multi-head context attention, dynamic masks, no LayerNorm, concatenated residuals) to perform global reconstruction. The second stage refines outputs via a convolutional U-Net. Losses include a Weighted Reconstruction Loss, Gradient Consistency Regularization, and an adversarial GAN loss. Experiments on ERA5-simulated gaps compare ESTD-Net against inverse-distance weighting and a partial-conv U-Net, reporting improvements in MAE, RMSE, PSNR, and SSIM.
B. General Comments
1. High-Frequency Refinement by U-Net
Why do you claim that the convolutional U-Net in ESTD-Net can "refine" high-frequency details? What theoretical basis supports this? Could other network architectures achieve similar refinement? Which component specifically drives the refinement? Moreover, the U-Net design here is very crude—how should it be improved?
2. Missing Diffusion-Model Comparison
The absence of any diffusion-based baseline is a serious gap. By 2025, diffusion models aren’t just "nice to have," they’ve become the gold standard for high-fidelity inpainting. Showing that your hybrid Transformer-GAN outperforms a 2018 CNN is only a sanity check; it tells us nothing about where ESTD-Net sits relative to the true state of the art. I would strongly recommend benchmarking against at least one modern diffusion inpainting model
e.g., RePaint in CVPR 2022 (see https://arxiv.org/abs/2201.09865) or Palette in SIGGRAPH 2022 (see https://arxiv.org/abs/2111.05826).
You could also consult the state-of-the-art leaderboard for image inpainting (see https://paperswithcode.com/task/image-inpainting).
3. Limited Baselines and Ablation
The set of baselines is very limited, which makes it hard to highlight the proposed method’s advantages. Additionally, there is no comprehensive ablation study or stability analysis, and the choice of model hyperparameters is not discussed.
4. Lack of Domain-Specific Adaptation
The proposed method appears entirely generic—usable for standard image inpainting or sea-surface-temperature fields alike—without any targeted adaptation. It seems transplanted wholesale from computer vision, with no domain-specific modifications. Are surface-temperature gaps really analogous to arbitrary image holes? Authors should justify their design choices from a physical/meteorological standpoint.
C. Minor Comments
1. “Key Innovations” should be stated more objectively
The paper claims, "The model is augmented by two key innovations," yet both the "Weighted Reconstruction Loss" and the "Gradient Consistency Regularization" are commonplace in the image-inpainting literature. It is unclear whether these truly qualify as "innovations."
2. Definition of PSNR’s MAX Value
Why is the PSNR’s MAX defined as it is? How does this differ from standard computer-vision images? Does MAX relate to temperature units? Is it a constant or variable? Please explain the phrase on line 350: "the maximum possible pixel value of an image.
3. Edge-Case Temperature Variations
How does the proposed method perform in scenarios of rapid or high-amplitude temperature variation? These edge cases may reveal significant weaknesses.
4. Gradient-Consistency Implementation and Hyperparameter
Eq. (5)’s gradient-consistency term $L_{gp}$ may require second-order derivatives—how is this implemented in code and what is the computational cost? Why is $\alpha$ set to 0.001? Similar hyperparameter questions apply to the Gradient Consistency Regularization.
5. Equation Numbering and Punctuation
The definition of $M_{ij}^\prime$ is missing an equation number—it should be Eq. (2). Furthermore, none of the equations ends with a comma or period, which is non-standard.
6. MTB Module Design Motivation
The motivation for the MTB module in Figure 3 is unclear. Why is the concatenation placed as shown? Why the specific sequence MCA–C–FC–MLP? Why eliminate Layer Normalization? It is not explained how these choices realize the authors’ stated goal at line 245: "To address these challenges…"
7. Definition of FC vs. MLP
In Eq. (2), what exactly is "FC"? A fully connected layer (e.g. PyTorch’s nn.Linear)? How does FC differ from MLP? If FC plus an activation is an MLP, why distinguish them? The authors provide no discussion.
8. Adversarial Loss Stability
Eqs. (3) and (4) define the adversarial loss—yet is this formulation stable? Did the authors assess training convergence and robustness? In image tasks, these losses are notoriously unstable; more analysis is needed beyond Table 2 to support claims of "robust and reliable" performance.
D. Recommendation
Major Revision: The paper addresses an important interdisciplinary problem but currently lacks sufficient AI/ML rigor and contemporary benchmarking. Addressing the general comments—particularly theory behind U-Net refinement, domain-specific adaptations, comprehensive ablations, clarity on loss implementations, and inclusion of modern diffusion-model baselines—will be essential before the manuscript can be considered for acceptance.
Citation: https://doi.org/10.5194/egusphere-2025-1980-RC2 -
AC2: 'Comment on egusphere-2025-1980', Yunjie Chen, 29 Jul 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1980/egusphere-2025-1980-AC2-supplement.zip
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
810 | 74 | 23 | 907 | 13 | 34 |
- HTML: 810
- PDF: 74
- XML: 23
- Total: 907
- BibTeX: 13
- EndNote: 34
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1