Attention-Driven and Multi-Scale Feature Integrated Approach for Earth Surface Temperature Data Reconstruction

Zhang, Minghui; Chen, Yunjie; Yang, Fan; Qin, Zhengkun

doi:10.5194/egusphere-2025-1980

Preprints

https://doi.org/10.5194/egusphere-2025-1980

Preprints

27 May 2025

| 27 May 2025

Attention-Driven and Multi-Scale Feature Integrated Approach for Earth Surface Temperature Data Reconstruction

Minghui Zhang, Yunjie Chen, Fan Yang, and Zhengkun Qin

Abstract. High-resolution observations are essential for the study of surface temperatures characterized by complex changes, especially in the surface air temperature of the ocean region, which is an important indicator of coupled changes in sea and air. Because of the scarcity of conventional observations of surface atmospheric temperature in these areas, high-resolution surface atmospheric temperature data obtained from satellite inversion has become the main source of information. However, the lack of data due to such factors as orbital spacing, cloud volume, sensor errors and other interference of polar satellites poses a major challenge to the estimation of the Earth's surface temperature (EST). In this paper, we present ESTD-Net, a new model based on deep learning designed for surface temperature data repair. ESTD-Net combines enhanced multi-header context attention and improved transformer blocks to capture long-range pixel dependencies, improving the model's ability to focus on boundary areas. In addition, we have integrated a convolutional U-Net to optimize high-frequency details and leverage texture enhancements from convolutional neural networks (CNN) to further improve the quality of reconstructed images. The model was enhanced by two key innovations: (1) weighted reconstruction losses, which prioritized masking areas to ensure accurate reconstruction of missing data; (2) Gradient consistency regularizes to minimize gradient differences between real and reconstructed images to ensure structural coherence and consistency. The evaluation showed that ESTD-Net outperformed existing methods in terms of pixel-level accuracy and perceived quality. Our approach provides a robust and reliable solution for restoring surface temperature data.

Received: 27 Apr 2025 – Discussion started: 27 May 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Minghui Zhang, Yunjie Chen, Fan Yang, and Zhengkun Qin

Status: closed

CEC1:
'Comment on egusphere-2025-1980', Juan Antonio Añel, 22 Jun 2025

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
in your "Code and Data Availability" statement you say that the data used in your work is publicly available, and you list two main webpages of portals. However, first, these are not valid repositories for long-term archival of the data used in your work, and second, the information that you provide is not enough to obtain the exact data that you have used in your study both for training and validation of your results.
Therefore, the current situation with your manuscript is irregular, as we can not accept manuscripts in Discussions that do not comply with our policy. Please, publish the data that you have used to produce your work in one of the appropriate repositories according to our policy and reply as soon as possible to this comment with a modified 'Code and Data Availability' section for your manuscript, which must include the relevant information (link and handle or DOI) of the new repositories, and which you should include in a potentially reviewed manuscript.
I must note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2025-1980-CEC1
- AC1:
  'Reply on CEC1', Yunjie Chen, 25 Jun 2025
  Dear Editor,
  Thank you very much for your feedback regarding the "Code and Data Availability" section of our manuscript.
  We sincerely apologize for the initial oversight. In response to your comments, we have now uploaded the datasets used in our study to appropriate long-term data repositories in compliance with the journal's Code and Data Policy.
  Specifically, we have made the following updates:
  The FengYun-3D (FY-3D) MWRI data used in our study is now archived at Zenodo with the following DOI: https://doi.org/10.5281/zenodo.15734212
  
  The ERA5 data used is archived at Zenodo as well, with DOI: https://doi.org/10.5281/zenodo.15734414
  
  We hope this resolves the issue, and we sincerely appreciate your time and consideration. Please let us know if any further changes are required.
  Sincerely,
  
  Yunjie Chen
  
  Citation: https://doi.org/10.5194/egusphere-2025-1980-AC1
RC1:
'Comment on egusphere-2025-1980', Anonymous Referee #1, 27 Jun 2025
This manuscript introduces a deep learning-based data restoration method, ESTD-Net, aimed at recovering surface temperature from high-resolution observations. This method is built upon advanced models and restores temperature data with pixel-level accuracy, enhancing the quality of the reconstructed images. The manuscript writing motivation is clear, and the experimental results show improvements. However, there are still many issues that need to be further revised.
Major Comments:
The ESTD-Net proposed in the manuscript effectively solves the time complexity of traditional Transformers in image quality restoration tasks, but there is no experimental comparison with the training time of Transformer models presented in the article.

The manuscript presents the superiority of ESTD-Net in the task of recovering high-resolution temperature data, but lacks experimental comparisons with more recent extreme models for data recovery tasks, which cannot fully demonstrate the effectiveness of the proposed method. Additional evidence can be provided by comparing experimental results with more advanced models.

The overall model diagram and description are not clear enough. It is unclear how the Conv-U-Net in the second stage further improves the reconstruction accuracy. Additionally, the structure of the discriminator is unclear. Is it composed only of fully connected layers?

The overall workflow of training and inference is not very intuitive.

The manuscript directly reconstructs the processed brightness temperature data. Is the brightness temperature first processed into surface temperature and then verified with the surface temperature of ERA5? The description and rationality of the dataset need to be further explained.

Minor Comments:
Regarding the mask based contextual attention module proposed in section 3.3.1 of this manuscript, although a detailed calculation process is introduced, no corresponding figure is attached for explanation, which will reduce the readability of the paper.

This manuscript validated the impact of various loss factors in the loss function on model performance in the ablation experiment section, but it seems that there were no confirmatory experiments on the impact of contextual attention and other modules on the model, which may lead to ambiguity in the study of model architecture.

Overall, this article contributes a new method architecture in the field of surface temperature restoration. By introducing improved modules such as context attention mechanism, the overall edge feature extraction ability of the model is improved. I suggest that the authors resolve the aforementioned issues before proceeding with publication.
Citation: https://doi.org/10.5194/egusphere-2025-1980-RC1
RC2: 'Comment on egusphere-2025-1980', Anonymous Referee #2, 12 Jul 2025

A. Summary
This manuscript presents ESTD-Net, a two-stage hybrid architecture for inpainting missing Earth Surface Temperature (EST) fields derived from MWRI/FY-3D satellite data. The first stage employs a modified Transformer (“boundary-aware” multi-head context attention, dynamic masks, no LayerNorm, concatenated residuals) to perform global reconstruction. The second stage refines outputs via a convolutional U-Net. Losses include a Weighted Reconstruction Loss, Gradient Consistency Regularization, and an adversarial GAN loss. Experiments on ERA5-simulated gaps compare ESTD-Net against inverse-distance weighting and a partial-conv U-Net, reporting improvements in MAE, RMSE, PSNR, and SSIM.

B. General Comments
1. High-Frequency Refinement by U-Net
Why do you claim that the convolutional U-Net in ESTD-Net can "refine" high-frequency details? What theoretical basis supports this? Could other network architectures achieve similar refinement? Which component specifically drives the refinement? Moreover, the U-Net design here is very crude—how should it be improved?
2. Missing Diffusion-Model Comparison
The absence of any diffusion-based baseline is a serious gap. By 2025, diffusion models aren’t just "nice to have," they’ve become the gold standard for high-fidelity inpainting. Showing that your hybrid Transformer-GAN outperforms a 2018 CNN is only a sanity check; it tells us nothing about where ESTD-Net sits relative to the true state of the art. I would strongly recommend benchmarking against at least one modern diffusion inpainting model
e.g., RePaint in CVPR 2022 (see https://arxiv.org/abs/2201.09865) or Palette in SIGGRAPH 2022 (see https://arxiv.org/abs/2111.05826).
You could also consult the state-of-the-art leaderboard for image inpainting (see https://paperswithcode.com/task/image-inpainting).
3. Limited Baselines and Ablation
The set of baselines is very limited, which makes it hard to highlight the proposed method’s advantages. Additionally, there is no comprehensive ablation study or stability analysis, and the choice of model hyperparameters is not discussed.
4. Lack of Domain-Specific Adaptation
The proposed method appears entirely generic—usable for standard image inpainting or sea-surface-temperature fields alike—without any targeted adaptation. It seems transplanted wholesale from computer vision, with no domain-specific modifications. Are surface-temperature gaps really analogous to arbitrary image holes? Authors should justify their design choices from a physical/meteorological standpoint.

C. Minor Comments
1. “Key Innovations” should be stated more objectively
The paper claims, "The model is augmented by two key innovations," yet both the "Weighted Reconstruction Loss" and the "Gradient Consistency Regularization" are commonplace in the image-inpainting literature. It is unclear whether these truly qualify as "innovations."
2. Definition of PSNR’s MAX Value
Why is the PSNR’s MAX defined as it is? How does this differ from standard computer-vision images? Does MAX relate to temperature units? Is it a constant or variable? Please explain the phrase on line 350: "the maximum possible pixel value of an image.
3. Edge-Case Temperature Variations
How does the proposed method perform in scenarios of rapid or high-amplitude temperature variation? These edge cases may reveal significant weaknesses.
4. Gradient-Consistency Implementation and Hyperparameter
Eq. (5)’s gradient-consistency term $L_{gp}$ may require second-order derivatives—how is this implemented in code and what is the computational cost? Why is $\alpha$ set to 0.001? Similar hyperparameter questions apply to the Gradient Consistency Regularization.
5. Equation Numbering and Punctuation
The definition of $M_{ij}^\prime$ is missing an equation number—it should be Eq. (2). Furthermore, none of the equations ends with a comma or period, which is non-standard.
6. MTB Module Design Motivation
The motivation for the MTB module in Figure 3 is unclear. Why is the concatenation placed as shown? Why the specific sequence MCA–C–FC–MLP? Why eliminate Layer Normalization? It is not explained how these choices realize the authors’ stated goal at line 245: "To address these challenges…"
7. Definition of FC vs. MLP
In Eq. (2), what exactly is "FC"? A fully connected layer (e.g. PyTorch’s nn.Linear)? How does FC differ from MLP? If FC plus an activation is an MLP, why distinguish them? The authors provide no discussion.
8. Adversarial Loss Stability
Eqs. (3) and (4) define the adversarial loss—yet is this formulation stable? Did the authors assess training convergence and robustness? In image tasks, these losses are notoriously unstable; more analysis is needed beyond Table 2 to support claims of "robust and reliable" performance.

D. Recommendation
Major Revision: The paper addresses an important interdisciplinary problem but currently lacks sufficient AI/ML rigor and contemporary benchmarking. Addressing the general comments—particularly theory behind U-Net refinement, domain-specific adaptations, comprehensive ablations, clarity on loss implementations, and inclusion of modern diffusion-model baselines—will be essential before the manuscript can be considered for acceptance.

Citation: https://doi.org/10.5194/egusphere-2025-1980-RC2
AC2: 'Comment on egusphere-2025-1980', Yunjie Chen, 29 Jul 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1980/egusphere-2025-1980-AC2-supplement.zip

Citation: https://doi.org/10.5194/egusphere-2025-1980-AC2

Status: closed

CEC1:
'Comment on egusphere-2025-1980', Juan Antonio Añel, 22 Jun 2025

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
in your "Code and Data Availability" statement you say that the data used in your work is publicly available, and you list two main webpages of portals. However, first, these are not valid repositories for long-term archival of the data used in your work, and second, the information that you provide is not enough to obtain the exact data that you have used in your study both for training and validation of your results.
Therefore, the current situation with your manuscript is irregular, as we can not accept manuscripts in Discussions that do not comply with our policy. Please, publish the data that you have used to produce your work in one of the appropriate repositories according to our policy and reply as soon as possible to this comment with a modified 'Code and Data Availability' section for your manuscript, which must include the relevant information (link and handle or DOI) of the new repositories, and which you should include in a potentially reviewed manuscript.
I must note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2025-1980-CEC1
- AC1:
  'Reply on CEC1', Yunjie Chen, 25 Jun 2025
  Dear Editor,
  Thank you very much for your feedback regarding the "Code and Data Availability" section of our manuscript.
  We sincerely apologize for the initial oversight. In response to your comments, we have now uploaded the datasets used in our study to appropriate long-term data repositories in compliance with the journal's Code and Data Policy.
  Specifically, we have made the following updates:
  The FengYun-3D (FY-3D) MWRI data used in our study is now archived at Zenodo with the following DOI: https://doi.org/10.5281/zenodo.15734212
  
  The ERA5 data used is archived at Zenodo as well, with DOI: https://doi.org/10.5281/zenodo.15734414
  
  We hope this resolves the issue, and we sincerely appreciate your time and consideration. Please let us know if any further changes are required.
  Sincerely,
  
  Yunjie Chen
  
  Citation: https://doi.org/10.5194/egusphere-2025-1980-AC1
RC1:
'Comment on egusphere-2025-1980', Anonymous Referee #1, 27 Jun 2025
This manuscript introduces a deep learning-based data restoration method, ESTD-Net, aimed at recovering surface temperature from high-resolution observations. This method is built upon advanced models and restores temperature data with pixel-level accuracy, enhancing the quality of the reconstructed images. The manuscript writing motivation is clear, and the experimental results show improvements. However, there are still many issues that need to be further revised.
Major Comments:
The ESTD-Net proposed in the manuscript effectively solves the time complexity of traditional Transformers in image quality restoration tasks, but there is no experimental comparison with the training time of Transformer models presented in the article.

The manuscript presents the superiority of ESTD-Net in the task of recovering high-resolution temperature data, but lacks experimental comparisons with more recent extreme models for data recovery tasks, which cannot fully demonstrate the effectiveness of the proposed method. Additional evidence can be provided by comparing experimental results with more advanced models.

The overall model diagram and description are not clear enough. It is unclear how the Conv-U-Net in the second stage further improves the reconstruction accuracy. Additionally, the structure of the discriminator is unclear. Is it composed only of fully connected layers?

The overall workflow of training and inference is not very intuitive.

The manuscript directly reconstructs the processed brightness temperature data. Is the brightness temperature first processed into surface temperature and then verified with the surface temperature of ERA5? The description and rationality of the dataset need to be further explained.

Minor Comments:
Regarding the mask based contextual attention module proposed in section 3.3.1 of this manuscript, although a detailed calculation process is introduced, no corresponding figure is attached for explanation, which will reduce the readability of the paper.

This manuscript validated the impact of various loss factors in the loss function on model performance in the ablation experiment section, but it seems that there were no confirmatory experiments on the impact of contextual attention and other modules on the model, which may lead to ambiguity in the study of model architecture.

Overall, this article contributes a new method architecture in the field of surface temperature restoration. By introducing improved modules such as context attention mechanism, the overall edge feature extraction ability of the model is improved. I suggest that the authors resolve the aforementioned issues before proceeding with publication.
Citation: https://doi.org/10.5194/egusphere-2025-1980-RC1
RC2: 'Comment on egusphere-2025-1980', Anonymous Referee #2, 12 Jul 2025

A. Summary
This manuscript presents ESTD-Net, a two-stage hybrid architecture for inpainting missing Earth Surface Temperature (EST) fields derived from MWRI/FY-3D satellite data. The first stage employs a modified Transformer (“boundary-aware” multi-head context attention, dynamic masks, no LayerNorm, concatenated residuals) to perform global reconstruction. The second stage refines outputs via a convolutional U-Net. Losses include a Weighted Reconstruction Loss, Gradient Consistency Regularization, and an adversarial GAN loss. Experiments on ERA5-simulated gaps compare ESTD-Net against inverse-distance weighting and a partial-conv U-Net, reporting improvements in MAE, RMSE, PSNR, and SSIM.

B. General Comments
1. High-Frequency Refinement by U-Net
Why do you claim that the convolutional U-Net in ESTD-Net can "refine" high-frequency details? What theoretical basis supports this? Could other network architectures achieve similar refinement? Which component specifically drives the refinement? Moreover, the U-Net design here is very crude—how should it be improved?
2. Missing Diffusion-Model Comparison
The absence of any diffusion-based baseline is a serious gap. By 2025, diffusion models aren’t just "nice to have," they’ve become the gold standard for high-fidelity inpainting. Showing that your hybrid Transformer-GAN outperforms a 2018 CNN is only a sanity check; it tells us nothing about where ESTD-Net sits relative to the true state of the art. I would strongly recommend benchmarking against at least one modern diffusion inpainting model
e.g., RePaint in CVPR 2022 (see https://arxiv.org/abs/2201.09865) or Palette in SIGGRAPH 2022 (see https://arxiv.org/abs/2111.05826).
You could also consult the state-of-the-art leaderboard for image inpainting (see https://paperswithcode.com/task/image-inpainting).
3. Limited Baselines and Ablation
The set of baselines is very limited, which makes it hard to highlight the proposed method’s advantages. Additionally, there is no comprehensive ablation study or stability analysis, and the choice of model hyperparameters is not discussed.
4. Lack of Domain-Specific Adaptation
The proposed method appears entirely generic—usable for standard image inpainting or sea-surface-temperature fields alike—without any targeted adaptation. It seems transplanted wholesale from computer vision, with no domain-specific modifications. Are surface-temperature gaps really analogous to arbitrary image holes? Authors should justify their design choices from a physical/meteorological standpoint.

C. Minor Comments
1. “Key Innovations” should be stated more objectively
The paper claims, "The model is augmented by two key innovations," yet both the "Weighted Reconstruction Loss" and the "Gradient Consistency Regularization" are commonplace in the image-inpainting literature. It is unclear whether these truly qualify as "innovations."
2. Definition of PSNR’s MAX Value
Why is the PSNR’s MAX defined as it is? How does this differ from standard computer-vision images? Does MAX relate to temperature units? Is it a constant or variable? Please explain the phrase on line 350: "the maximum possible pixel value of an image.
3. Edge-Case Temperature Variations
How does the proposed method perform in scenarios of rapid or high-amplitude temperature variation? These edge cases may reveal significant weaknesses.
4. Gradient-Consistency Implementation and Hyperparameter
Eq. (5)’s gradient-consistency term $L_{gp}$ may require second-order derivatives—how is this implemented in code and what is the computational cost? Why is $\alpha$ set to 0.001? Similar hyperparameter questions apply to the Gradient Consistency Regularization.
5. Equation Numbering and Punctuation
The definition of $M_{ij}^\prime$ is missing an equation number—it should be Eq. (2). Furthermore, none of the equations ends with a comma or period, which is non-standard.
6. MTB Module Design Motivation
The motivation for the MTB module in Figure 3 is unclear. Why is the concatenation placed as shown? Why the specific sequence MCA–C–FC–MLP? Why eliminate Layer Normalization? It is not explained how these choices realize the authors’ stated goal at line 245: "To address these challenges…"
7. Definition of FC vs. MLP
In Eq. (2), what exactly is "FC"? A fully connected layer (e.g. PyTorch’s nn.Linear)? How does FC differ from MLP? If FC plus an activation is an MLP, why distinguish them? The authors provide no discussion.
8. Adversarial Loss Stability
Eqs. (3) and (4) define the adversarial loss—yet is this formulation stable? Did the authors assess training convergence and robustness? In image tasks, these losses are notoriously unstable; more analysis is needed beyond Table 2 to support claims of "robust and reliable" performance.

D. Recommendation
Major Revision: The paper addresses an important interdisciplinary problem but currently lacks sufficient AI/ML rigor and contemporary benchmarking. Addressing the general comments—particularly theory behind U-Net refinement, domain-specific adaptations, comprehensive ablations, clarity on loss implementations, and inclusion of modern diffusion-model baselines—will be essential before the manuscript can be considered for acceptance.

Citation: https://doi.org/10.5194/egusphere-2025-1980-RC2
AC2: 'Comment on egusphere-2025-1980', Yunjie Chen, 29 Jul 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1980/egusphere-2025-1980-AC2-supplement.zip

Citation: https://doi.org/10.5194/egusphere-2025-1980-AC2

Minghui Zhang, Yunjie Chen, Fan Yang, and Zhengkun Qin

Viewed

Total article views: 2,054 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,917	110	27	2,054	23	45

HTML: 1,917
PDF: 110
XML: 27
Total: 2,054
BibTeX: 23
EndNote: 45

Views and downloads (calculated since 27 May 2025)

Month	HTML	PDF	XML	Total
May 2025	55	8	4	67
Jun 2025	91	17	10	118
Jul 2025	59	17	7	83
Aug 2025	343	28	2	373
Sep 2025	1,283	11	2	1,296
Oct 2025	60	8	1	69
Nov 2025	26	21	1	48

Cumulative views and downloads (calculated since 27 May 2025)

Month	HTML	PDF	XML	Total
May 2025	55	8	4	67
Jun 2025	91	17	10	118
Jul 2025	59	17	7	83
Aug 2025	343	28	2	373
Sep 2025	1,283	11	2	1,296
Oct 2025	60	8	1	69
Nov 2025	26	21	1	48

Viewed (geographical distribution)

Total article views: 2,030 (including HTML, PDF, and XML) Thereof 2,030 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 18 Nov 2025

Short summary

Considering the key role of high-resolution surface observation temperature data in the study of surface atmospheric temperature in ocean regions, we propose a new two-stage deep learning model. The model is used to fill ocean surface temperature data missing from satellite observations due to the orbital clearance of polar satellites.


Total:	0
HTML:	0
PDF:	0
XML:	0