the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
MAESSTRO: Masked Autoencoders for Sea Surface Temperature Reconstruction under Occlusion
Abstract. This study investigates the use of masked autoencoders (MAE) to address the challenge of filling gaps in high-resolution (1 km) sea surface temperature (SST) fields caused by cloud cover, which often results in gaps in the SST data and/or blurry imagery in blended SST products. Our study demonstrates that MAE, a deep learning model, can efficiently learn the anisotropic nature of small-scale ocean fronts from numerical simulations and reconstruct the artificially masked SST images. The MAE model is trained and evaluated on synthetic SST fields and tested on real satellite SST data from VIIRS sensor on Suomi-NPP satellite. It is demonstrated that the MAE model trained on numerical simulations can provide a computationally-efficient alternative for filling gaps in satellite SST. MAE can reconstruct randomly occluded images with a root mean squared error (RMSE) of under 0.2 °C for masking ratios of up to 80 %. It has exceptional efficiency, requiring three orders of magnitude (a factor of 5000) less time. The ability to reconstruct high-resolution SST fields under cloud cover has important implications for understanding and predicting global and regional climates, and detecting small-scale SST fronts that play a crucial role in the exchange of heat, carbon, and nutrients between the ocean surface and deeper layers. Our findings highlight the potential of deep learning models such as MAE to improve the accuracy and resolution of SST data at kilometer scales. It presents a promising avenue for future research in the field of small-scale ocean remote sensing analyses.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(6372 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(6372 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1385', Anonymous Referee #1, 18 Oct 2023
The authors employ Masked Autoencoders (MAE) to tackle the challenge of filling gaps in high-resolution (1km) sea surface temperature (SST) fields arising from cloud cover. These gaps often lead to discontinuities in the SST data and produce blurry imagery in blended SST products. Their work demonstrates that the application of this machine learning method yields significantly superior results when compared to traditional optimal interpolation techniques.
The analysis is robust, and the results appear to possess credibility. The authors deserve commendation for their noteworthy contribution and the compelling nature of their paper. While the paper is generally well-written, there are a few minor improvements that could enhance its quality, as outlined below.
Figure 11: It would be beneficial to include additional realistic examples, perhaps at least one more, to facilitate a more in-depth discussion of differences between patterns. Consider showcasing results that incorporate realistic cloud cover scenarios. Moreover, evaluating the reconstruction over several days for the same area can help demonstrate the temporal consistency of the reconstructed fronts.
Discussion Section: The paper could be strengthened by presenting a reconstruction over an extensive area with continuous cloud cover. This would help in assessing the limitations and boundaries of the extensive area. Quantifying these limits can provide valuable insights.
Figures: It is recommended to recreate figures such as Figures 10, 2, and 7 with larger dimensions and include informative titles to improve their clarity and comprehensibility.
By implementing these minor adjustments, the paper can further enhance its impact and deliver a more comprehensive understanding of the authors' contributions.
- AC1: 'Reply on RC1', Jinbo Wang, 17 Jan 2024
-
RC2: 'Comment on egusphere-2023-1385', Anonymous Referee #2, 11 Dec 2023
This paper describes an application of a masked autoencoder (MAE) applied to model sea surface temperature to reconstruct missing observations. The performance of the MAE is compared to kriging and radial-basis cubic interpolation. It is shown that the MAE provides a better accuracy than the other methods for the model data, in particular at small scales. Finally the method is tested on a single SST image from VIIRS. The results are quite encouraging.
Major comments:
In general the methodology section of the masked autoencoder does not provide enough details for the typical audience of Ocean Sciences. Please include more information about the network architecture, tensor sizes, and all hyperparameters involved. Did you implement the MAE from scratch or did you adapt an existing implementation? In the later case, please also reference the base implementation.
The data missing SST are not just random pixels, but they have a spatial extent (the size of clouds). It would be important to test methods in this context. Otherwise the results of the validation would be too optimistic. In the discussion the authors note this themselves, but they have not taken this problem in account (beside noting it as future work).
The size of the validation/test dataset is not always clear or very small (a single image for the real SST image). Please use a large validation/test dataset to compute the error statistics. Of course it is fine to show a single or only a few representative images in the manuscript.
The MAE decomposes the image in patches of 4x4: how does it work in practice when clouds do not occupy regular patches of size 4x4 ? if within the 16 pixels of a 4x4 patch a single pixel is missing, the entire patch is considered as missing? This can quite significantly increase the amount of missing data.
All figures, please make sure to always mention the units of the variable. In particular for the SST gradient.I would propose major revision before publication in OS.
Specific comments:Line 8: “It has exceptional efficiency, requiring three orders of magnitude
(a factor of 5000) less time.” Compared to what?
Line 84: “To build a model for SST, using real satellite SST imagery as ground truth would be ideal. However, these images often contain noise and are susceptible to bias and errors. As an initial conceptual demonstration, this paper employs synthetic satellite sea surface temperature (SST) data derived from two high-resolution numerical simulations …”The motivation is not clear. Would you not expect that the errors in models are even larger than in satellite data? Can the masked autoencoder be trained based on gappy data?
Section 2.3, line 115:
“To resize the tiles, a random portion of the full tile is cropped, ranging from 20% to 100% of the original tile, before being resized to the final 128x128 dimensions using bicubic interpolation.” Does this mean that the neural network gets images which are not always at the same spatial resolution? Is the actual resolution provided to the neural network? If not, this can be a problem as the energy/variance is not distributed uniformly across scales.Figure 2: please clarify if this image is from the training, validation or test dataset. (If the image shows the training dataset, please use an image from the validation or test dataset in addition or instead of figure 2)
Line 144: “false-color RGB images of SST from the LLC4320 validation”: I don’t understand, is the SST image treated as a 3 channel RGB image rendered using some colorbar? SST is a scalar variable, so a single channel tensor should be sufficient.
Line 152: “While the original MAE implementation He et al. (2022) uses the mean squared error (MSE) between the reconstructed and original pixel values, MAESTRO uses the root-mean-square error (RMSE) in order to recover the same units'': Why should this matter, as the minimum of the loss function is the same? Once we have the MSE, one can compute the RMSE by just applying the square root.Line 164: “is the cross-spectral density along the x-axis (each row) …” why just considering the x-axis? Can this metric be made rational invariant?
“Section 3 Evaluation on a sample SST tile from LLC2160”: How the parameters involved in the Kriging operation are chosen and which kriging variant is used (ordinary, simple, universal…) ? In particular, how is the variogram determined and are the observations assumed to be noise free? And If not, what noise level is used?
Line 175: “radial-basis bicubic interpolation” Can you give the equations of this interpolation method. Can it account for noise in the observations?186: “Kriging with linear and Gaussian variogram”: I do not understand what a linear variogram is. A variogram should tend to zero for small distances and to a constant value for large distances. Do you use a piecewise linear function for a variogram? If yes, how you choose the threshold values. It is also not clear how the parameters of the Gaussian variogram were determined. Please provide more information.
“Table 2: Evaluation metrics for the single-tile example shown in Figures 5 and 4 for different reconstruction methods.” What is actually the training, validation and test split of the dataset here? Please extend the evolution metric to the whole (unseen) test/validation data. Typically the test and validation data are about 10% (or more) of the trending dataset to achieve robust error statistics.
Line 220: how are the missing pixels chosen for the “Global validation results on LLC2160”. Do the gaps have a realistic spatial extent?
253 “cubic interpolation (the best-performing baseline)...”: It is surprising that cubic interpolation is the best-performing baseline. Must current techniques use optimal interpolation (similar to Kirging). How much effort was placed in optimizing the Kriging interpolation? Please keep the same name of the method as before “radial-basis bicubic interpolation” as an ordinary cubic interpolation does not involve a radial-basis function.Line 235: “SST gradient typically has a standard deviation of 0.1-0.3 ◦ C” and Figure 9: I do not understand why the gradient of SST has the units °C rather than °C / km (or any other length scale).
Line 269: “To substantiate our methodology, we tested it using real satellite Sea Surface Temperature (SST) data from the Suomi NPP Visible Infrared Imaging Radiometer
Suite (VIIRS) in the California Current region, specifically at coordinates (35◦ N, 125◦ W) on January 16, 2021 (Figure 11, left).”. To really validate a technique it is not sufficient to take a single snapshot. One would need to provide an error statistics over several images to provide a robust estimate.
Line 275: “showing edge artifacts in 4x4 patches” Where do these patches come from? I think it is essential to discuss network architecture.
Line 309: “while traditional methods require approximately 24 hours” can you be more specific which traditional methods you are comparing to here?Figure 11: please show also the clouded image that you used as an input.
In general, as per OS policy, verify that the image is also accessible to people with vision deficiencies (https://www.ocean-science.net/submission.html). I am not sure if Figure 11 is ok.
(Very) minor comments in the references:In general please use DOIs when they are available
ALVERA-AZCÁRATE: -> Alvera-Azcárate
Change https://doi.org/https://doi.org/10.1175/2007JCLI1824.1 -> https://doi.org/10.1175/2007JCLI1824.1 (and other links with the same issue)
“JPL/OBPG/RSMAS: GHRSST Level 2P Global Sea Surface Skin Temperature from the Visible and Infrared Imager/Radiometer
Suite (VIIRS) on the Suomi-NPP satellite (GDS2). Ver. 2016.2., PO.DAAC, CA, USA. Dataset accessed [YYYY-MM-DD] at https://doi.org/10.5067/GHVRS-2PJ62, 2020.”
Please provide year, month and day.“Application of dincae to reconstruct the gaps in chlorophyll-a satellite observations in the south china sea and west philippine sea” -> “Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea
” Please check the capitalization of your references.Citation: https://doi.org/10.5194/egusphere-2023-1385-RC2 -
AC2: 'Reply on RC2', Jinbo Wang, 17 Jan 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1385/egusphere-2023-1385-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Jinbo Wang, 17 Jan 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1385', Anonymous Referee #1, 18 Oct 2023
The authors employ Masked Autoencoders (MAE) to tackle the challenge of filling gaps in high-resolution (1km) sea surface temperature (SST) fields arising from cloud cover. These gaps often lead to discontinuities in the SST data and produce blurry imagery in blended SST products. Their work demonstrates that the application of this machine learning method yields significantly superior results when compared to traditional optimal interpolation techniques.
The analysis is robust, and the results appear to possess credibility. The authors deserve commendation for their noteworthy contribution and the compelling nature of their paper. While the paper is generally well-written, there are a few minor improvements that could enhance its quality, as outlined below.
Figure 11: It would be beneficial to include additional realistic examples, perhaps at least one more, to facilitate a more in-depth discussion of differences between patterns. Consider showcasing results that incorporate realistic cloud cover scenarios. Moreover, evaluating the reconstruction over several days for the same area can help demonstrate the temporal consistency of the reconstructed fronts.
Discussion Section: The paper could be strengthened by presenting a reconstruction over an extensive area with continuous cloud cover. This would help in assessing the limitations and boundaries of the extensive area. Quantifying these limits can provide valuable insights.
Figures: It is recommended to recreate figures such as Figures 10, 2, and 7 with larger dimensions and include informative titles to improve their clarity and comprehensibility.
By implementing these minor adjustments, the paper can further enhance its impact and deliver a more comprehensive understanding of the authors' contributions.
- AC1: 'Reply on RC1', Jinbo Wang, 17 Jan 2024
-
RC2: 'Comment on egusphere-2023-1385', Anonymous Referee #2, 11 Dec 2023
This paper describes an application of a masked autoencoder (MAE) applied to model sea surface temperature to reconstruct missing observations. The performance of the MAE is compared to kriging and radial-basis cubic interpolation. It is shown that the MAE provides a better accuracy than the other methods for the model data, in particular at small scales. Finally the method is tested on a single SST image from VIIRS. The results are quite encouraging.
Major comments:
In general the methodology section of the masked autoencoder does not provide enough details for the typical audience of Ocean Sciences. Please include more information about the network architecture, tensor sizes, and all hyperparameters involved. Did you implement the MAE from scratch or did you adapt an existing implementation? In the later case, please also reference the base implementation.
The data missing SST are not just random pixels, but they have a spatial extent (the size of clouds). It would be important to test methods in this context. Otherwise the results of the validation would be too optimistic. In the discussion the authors note this themselves, but they have not taken this problem in account (beside noting it as future work).
The size of the validation/test dataset is not always clear or very small (a single image for the real SST image). Please use a large validation/test dataset to compute the error statistics. Of course it is fine to show a single or only a few representative images in the manuscript.
The MAE decomposes the image in patches of 4x4: how does it work in practice when clouds do not occupy regular patches of size 4x4 ? if within the 16 pixels of a 4x4 patch a single pixel is missing, the entire patch is considered as missing? This can quite significantly increase the amount of missing data.
All figures, please make sure to always mention the units of the variable. In particular for the SST gradient.I would propose major revision before publication in OS.
Specific comments:Line 8: “It has exceptional efficiency, requiring three orders of magnitude
(a factor of 5000) less time.” Compared to what?
Line 84: “To build a model for SST, using real satellite SST imagery as ground truth would be ideal. However, these images often contain noise and are susceptible to bias and errors. As an initial conceptual demonstration, this paper employs synthetic satellite sea surface temperature (SST) data derived from two high-resolution numerical simulations …”The motivation is not clear. Would you not expect that the errors in models are even larger than in satellite data? Can the masked autoencoder be trained based on gappy data?
Section 2.3, line 115:
“To resize the tiles, a random portion of the full tile is cropped, ranging from 20% to 100% of the original tile, before being resized to the final 128x128 dimensions using bicubic interpolation.” Does this mean that the neural network gets images which are not always at the same spatial resolution? Is the actual resolution provided to the neural network? If not, this can be a problem as the energy/variance is not distributed uniformly across scales.Figure 2: please clarify if this image is from the training, validation or test dataset. (If the image shows the training dataset, please use an image from the validation or test dataset in addition or instead of figure 2)
Line 144: “false-color RGB images of SST from the LLC4320 validation”: I don’t understand, is the SST image treated as a 3 channel RGB image rendered using some colorbar? SST is a scalar variable, so a single channel tensor should be sufficient.
Line 152: “While the original MAE implementation He et al. (2022) uses the mean squared error (MSE) between the reconstructed and original pixel values, MAESTRO uses the root-mean-square error (RMSE) in order to recover the same units'': Why should this matter, as the minimum of the loss function is the same? Once we have the MSE, one can compute the RMSE by just applying the square root.Line 164: “is the cross-spectral density along the x-axis (each row) …” why just considering the x-axis? Can this metric be made rational invariant?
“Section 3 Evaluation on a sample SST tile from LLC2160”: How the parameters involved in the Kriging operation are chosen and which kriging variant is used (ordinary, simple, universal…) ? In particular, how is the variogram determined and are the observations assumed to be noise free? And If not, what noise level is used?
Line 175: “radial-basis bicubic interpolation” Can you give the equations of this interpolation method. Can it account for noise in the observations?186: “Kriging with linear and Gaussian variogram”: I do not understand what a linear variogram is. A variogram should tend to zero for small distances and to a constant value for large distances. Do you use a piecewise linear function for a variogram? If yes, how you choose the threshold values. It is also not clear how the parameters of the Gaussian variogram were determined. Please provide more information.
“Table 2: Evaluation metrics for the single-tile example shown in Figures 5 and 4 for different reconstruction methods.” What is actually the training, validation and test split of the dataset here? Please extend the evolution metric to the whole (unseen) test/validation data. Typically the test and validation data are about 10% (or more) of the trending dataset to achieve robust error statistics.
Line 220: how are the missing pixels chosen for the “Global validation results on LLC2160”. Do the gaps have a realistic spatial extent?
253 “cubic interpolation (the best-performing baseline)...”: It is surprising that cubic interpolation is the best-performing baseline. Must current techniques use optimal interpolation (similar to Kirging). How much effort was placed in optimizing the Kriging interpolation? Please keep the same name of the method as before “radial-basis bicubic interpolation” as an ordinary cubic interpolation does not involve a radial-basis function.Line 235: “SST gradient typically has a standard deviation of 0.1-0.3 ◦ C” and Figure 9: I do not understand why the gradient of SST has the units °C rather than °C / km (or any other length scale).
Line 269: “To substantiate our methodology, we tested it using real satellite Sea Surface Temperature (SST) data from the Suomi NPP Visible Infrared Imaging Radiometer
Suite (VIIRS) in the California Current region, specifically at coordinates (35◦ N, 125◦ W) on January 16, 2021 (Figure 11, left).”. To really validate a technique it is not sufficient to take a single snapshot. One would need to provide an error statistics over several images to provide a robust estimate.
Line 275: “showing edge artifacts in 4x4 patches” Where do these patches come from? I think it is essential to discuss network architecture.
Line 309: “while traditional methods require approximately 24 hours” can you be more specific which traditional methods you are comparing to here?Figure 11: please show also the clouded image that you used as an input.
In general, as per OS policy, verify that the image is also accessible to people with vision deficiencies (https://www.ocean-science.net/submission.html). I am not sure if Figure 11 is ok.
(Very) minor comments in the references:In general please use DOIs when they are available
ALVERA-AZCÁRATE: -> Alvera-Azcárate
Change https://doi.org/https://doi.org/10.1175/2007JCLI1824.1 -> https://doi.org/10.1175/2007JCLI1824.1 (and other links with the same issue)
“JPL/OBPG/RSMAS: GHRSST Level 2P Global Sea Surface Skin Temperature from the Visible and Infrared Imager/Radiometer
Suite (VIIRS) on the Suomi-NPP satellite (GDS2). Ver. 2016.2., PO.DAAC, CA, USA. Dataset accessed [YYYY-MM-DD] at https://doi.org/10.5067/GHVRS-2PJ62, 2020.”
Please provide year, month and day.“Application of dincae to reconstruct the gaps in chlorophyll-a satellite observations in the south china sea and west philippine sea” -> “Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea
” Please check the capitalization of your references.Citation: https://doi.org/10.5194/egusphere-2023-1385-RC2 -
AC2: 'Reply on RC2', Jinbo Wang, 17 Jan 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1385/egusphere-2023-1385-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Jinbo Wang, 17 Jan 2024
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
651 | 357 | 47 | 1,055 | 51 | 33 |
- HTML: 651
- PDF: 357
- XML: 47
- Total: 1,055
- BibTeX: 51
- EndNote: 33
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Edwin Goh
Alice R. Yepremyan
Brian Wilson
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(6372 KB) - Metadata XML