the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Generation of super-resolution gap-free ocean colour satellite products using DINEOF
Abstract. In this work we present a super-resolution approach to derive high spatial and temporal resolution ocean colour satellite datasets. The technique is based on DINEOF (Data Interpolating Empirircal Orthogonal Functions), a data-driven method that uses the spatio-temporal coherence of the analysed datasets to infer missing information. DINEOF is now used to effectively increase the spatial resolution of satellite data, and is applied to a combination of Sentinel-2 and Sentinel-3 datasets. The results show that DINEOF is able to infer the spatial variability observed in the Sentinel-2 data into the Sentinel-3 data, while reconstructing missing information due to clouds and reducing the amount of noise in the initial dataset. In order to achieve this, both Sentinel-2 and Sentinel-3 datasets have undergo the same preprocessing, including a comprehensive, region-independent, and pixel-based automatic switching scheme for choosing the most appropriate atmospheric correction and ocean colour algorithm to derive the in-water products. The super-resolution DINEOF has been applied to two different variables (turbidity and chlorophyll) and two different domains (Belgian coastal zone and the whole North Sea), and the submesoscale variability of the turbidity along the Belgian coastal zone has been studied.
- Preprint
(4169 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-1268', Anonymous Referee #1, 07 Aug 2024
This is the review of the manuscript “Generation of super-resolution gap-free ocean colour satellite products using DINEOF” by Aida Alvera-Azcárate and co-authors.
The manuscript is written with clear English, addresses a very interesting topic and I surely recommend it to be published in EGUsphere. However, I suggest the authors to go through my comments below which are sincerely meant to improve the quality and readability of the work of which I recognize its scientific value.
At the end of section 3 – after introduction, data and methodology – the reader expects to have a clear idea of what has been done, why and how. Unfortunately this is not the case here and I strongly suggest the authors to reshape the information in a more coherent way to make the reading smoother. For example, lines 198-204 in the results are purely methodological and a reader like myself would likely expect them to appear somewhere in section 3. This surely facilitate going through the paper more easily without the need of reading the entire paper any given time. There are other such examples that prevent the reader to fully get what has been done by the authors: please, carefully address this; the quality of the work will surely improve.
The second important comment that I would like the authors to take into consideration is given by the fact that results are presented as a series of case studies which on one side do provide the reader with effective and impactful examples but on the other lack of giving a quantitative and statistically robust evaluation of the outcomes of the analyses. I am personally aware that this is not the case but still the authors need to prove it in a more robust and convincing way.
As clearly stated in the acknowledgements, this work was developed in the context of the Copernicus Marine Service MultiRes project, and the way results are presented looks typical of a project final report. Personally, I do not have anything against the large number of figures within a paper; still, twenty figures, for the amount of information they convey, appear definitely too many. Perhaps, authors may consider some of them to be condensed.
Here below the detailed list of comments.
Section 1 – Introduction
The problem is well-posed and the line of reasoning clear. I would have expected the authors to choose the area of study in such a way to promote a comparison with the neural network approaches they mentioned in the introduction or at least to use it in one example of the many they show. This would provide robustness to the approach without the need of implementing the other techniques which may require a considerable effort.
Line 24 – the sentence should read (?) “Super-resolution approaches aimed at increasing the spatial resolution of geophysical datasets and have been developed …”
Section 2.1 – Study area
The area of study is well characterized and provides a useful background for all those unfamiliar with the complex dynamic system of the Belgian waters.
Section 2.2 – Satellite data
This section is a synthetic overview of the not at all trivial preprocessing approach, very useful.
One important information that is missing or too scattered is the space-time resolution of the various datasets that are used in the work along with their temporal coverage. The spatial domain is well depicted by Figure 1. For example, it is not clear why the validation of the Rrs product against in situ data is performed from 2019 to 2022 (not including 2023 and 2024 data) and the generation of the super-resolution data only covers the 2020.
Figure 2 caption – please, substitute “atmospheric correct algorithms” with “atmospheric correction algorithms”
Section 2.2.1 – Remote sensing reflectance and pixel classification
Since this is a crucial element of the S2 & S3 preprocessing, it would be very useful for the reader if the authors could provide a synthetic overview of the C2RCC to ACOLITE/DSF pixel-based switching which is fully described in Van der Zande et al. (2023).
Line 106 – from the text it appears that the IDEPIX software is applied soon after the implementation of the two atmospheric correction schemes but from Figure 2 they are at the same level as if the two steps were run simultaneously: it is confusing.
In this or in the next section I would have expected to find relevant info about the resolution that is being used for the two sensors.
Section 2.2.2 – Turbidity and Suspended Particulate Matter
If I understand it correctly, the algorithms used to generate SPM and TUR are an updated version of Nechad et al. (2010) that account for the “switching single band algorithms” developed by Novoa et al (2017). Still the way this paragraph is written is not very clear and I suggest rephrasing it for a better readability.
Figure 3 – How do the two maps quantitatively compare? A scatterplot between the two would provide the reader with a better mean to interpret and compare the two products. This figure could then be cited also at lines 103-104 when talking about compatibility between the two sensor products. Please make the numbers on the colorbar larger, they are almost unreadable.
Line 123 – what is the different temporal scale represented by the panels of Figure 3? From the caption it seems that both refer to the 5th April, 2020.
Section 2.3 – Multi-sensor chlorophyll data
Line 126 – please spell out CMEMS as Copernicus Marine Environment Monitoring Service (even in braces is fine).
Reading of the text let understand that cmems_obs-oc_atl_bgc-plankton_my_l3-multi-1km_P1D only covers the period February-October 2022. Please, rephrase this sentence.
Section 2.4 – In situ data
How many in situ-satellite rrs matchups were extracted in the period 2019-2022? How come the 2023 and six months of 2024 were not included in the analysis?
Section 3.2 – Generation of super-resolution data
It is not entirely clear why the authors only use data from 2020?
Figure 4 caption – please, substitute spare with square or box.
Section 4.1 – Super-resolution data
The analysis presented in this section is effective but very qualitative; some hint on how to make it more quantitative at a reasonable cost is provided below.
Lines 202-204 – The only two optional … – what is the range of variability in the results associated with the settings of these two parameters?
Line 210 – isn’t there also a peak in January?
Line 210 – please substitute “apaprent” with “apparent”
Figure 6, 7, 8 – even if these figures do provide a mean to interpret the overall results and the added value of the super-resolution data, this entire analysis lacks of robustness as it only refer to single case studies from which inferring a general rule might be difficult. An important missing information is the data density around the specific examples, that is, how many observations, both high and low resolution, are present in the previous and following days? This, along with the temporal distance between observations and interpolated data, would help explaining where the smaller scale present in the super-resolution data comes from. Probably, a more effective way to evaluate the outcome of the DINEOF interpolation would have been to randomly (or regularly along the time series) remove some day data (both high and low resolution) from the initial time series and use them for a more robust and statistically significant comparison: involving a larger number of observations.
Section 4.2 – Validation
Figures 10, 11 and 12 all have to do with the validation of satellite Rrs (both from S2 and S3) agaist in situ measurements using hypernet data. There is however some inconsistency between the statistics in figures 10 and 11 and those reported in figure 12. Please verify that the numbers in the figures are correct.
Caption of figure 11 – please substitute “Ostend” with “Oostende” or the other way around, consistently with the rest of the manuscript.
Figure 12 – I would expect some more degree of spectral consistency between rmse and mape plots, the lack of which could depend on the uneven distribution of the relative error characterized by long tails: perhaps, the median rather than the mean relative error would provide values more in line with respective rmse.
Line 282 – “very well” should be backed up by some statistics.
Figure 13 – the figure and associated discussion has some potential which should however be supported by some statistics, otherwise the entire paragraph is too qualitative and the reader might see it as only speculative.
To my understanding Figure 13 and Figure 14 contain the same data; figure 13 and associated discussion is too qualitative. On the other hand, Figure 14, if supported by an associated statistics, is more robust. Perhaps, dots in Figure 14 could be coloured according to time (months or seasons) condensing the information and reducing the number of figures.
Section 4.3 – Scale assessment
Figure 15 – showing the comparison on a selected transect provides a good idea of the outcome of the analysis, which unfortunately falls short of statistical robustness. Perhaps a relative error map instead of the very similar maps (for which it is almost impossible to find differences) would better complement the transect view. This comment also applies to figures 6, 7, 8 (perhaps not crucial because of the cloudiness), 9 and 16.
Lines 318-319 – this sentence is perfectly in line with my previous comment about figure 15. I believe it mostly has to do with the way these results are presented. Another aspect that I would suggest to take into consideration is to try to condense the information as much as possible trying to avoid specific case studies which provide the reader with useful insights but are difficult to be used to derive general rules.
Line 325 – is there any reference or figure to back up this sentence? And perhaps with the drawback of reducing spatial variability by smoothing the field (as mentioned at line 300).
Section 5 – Submesoscale variability in the Belgian Coastal Zone
This section is a bit controversial: from one side it is presented as the right and expected application for the super-resolution data but on the other it is soon discovered that the data are not suitable to answer the question because of the low temporal resolution. I am not fully sure that this section actually adds value to the work, at least the way it is presented.
Lines 358-359 – even if it is somehow intuitive to assume that during boreal summer river outflows are at their minimum in Europe, it would also be preferable to have a reference to back up this sentence.
Section 6 – Conclusions
Lines 362-366 – as they just mentioned in the previous paragraph, authors should also mention here the importance of the high temporal variability which, unfortunately, is still not covered by the ocean colour sensors currently on orbit.
Line 385 – please substitute “omre” with “more”
Citation: https://doi.org/10.5194/egusphere-2024-1268-RC1 -
RC2: 'Comment on egusphere-2024-1268', Anonymous Referee #2, 06 Sep 2024
The manuscript by Alvera-Azcárate et al. explores the use of the DINEOF technique on satellite ocean colour data gathered from a coastal area between the North Sea and the English Channel. This technique was applied to achieve a gap-free, interpolated product with enhanced spatial resolution through the combination of Sentinel-2 and Sentinel-3 data. The topic is highly relevant, and the paper could contribute significantly to the field of multi-resolution products. The manuscript is well-organized, reasonably clear, and the quality of the English is good.
However, I have a few suggestions and comments that could improve the clarity and readability of the paper, address some issues/inconsistencies, and should be considered before publication:
Section 2.2: It is unclear what the satellite resolutions are and whether Sentinel-3A and Sentinel-3B (or Sentinel-2A and 2B) are treated as merged or separate sensors.
Lines 101–104: A brief summary of the switching method employed would be helpful.
Figure 2 caption: Replace “atmospheric correct algorithms” to “atmospheric correction algorithms.”
Section 2.3: Why are multi-sensor satellite data described in a different section from “satellite data”? Aren’t they also satellite data? Additionally, at this point in the manuscript, the purpose of the multi-sensor data is unclear, and it is not helpful to have to jump between different pages and sections to understand how these data are being used in the study. I recommend merging this section as a subsection of 2.2 (like the other satellite products) and explaining earlier in the manuscript how these data support the study’s goals.
Please provide a link or reference for the product "cmems_obs-oc_atl_bgc-plankton_my_l3-multi-1km_P1D" mentioned in the paper. This dataset typically covers from September 1997 until 8–10 days before the present day. The authors should clarify why only the data from 1 February 2022 to 1 November 2022 were chosen and why this particular period was selected.
Lines 149–150: "The matchup validation protocol described by Bailey and Werdell (2006) was applied to remove erroneous matchups from the analysis." A brief summary of the criteria used to remove erroneous matchups would be useful.
Lines 150–151: "Macro-pixels of 3x3 60m pixels for Sentinel-2/MSI and 3x3 300m for Sentinel-3/OLCI were extracted from the L2 products." Given the different resolutions of these sensors, using the same 3x3 macro-pixel extraction leads to different spatial coverage, affecting spatial variability in macro-pixel extraction, especially in coastal zones. I recommend adjusting the number of pixels for macro-pixel extraction to account for each sensor's resolution to obtain comparable macro-area extractions for matchup analysis.
Section 3.2: The authors mention the different spatial resolutions of the satellites, which may cause differences between the datasets. A quantitative analysis (using bias, RMSE, etc.) of these differences, when data from both Sentinel-2 and Sentinel-3 overlap, would be very useful.
Line 197: Sentinel-2 data are available from 2017 and Sentinel-3 from 2016. Why did the authors only use data from 18 January 2020 to 17 December 2020? Why not a period such as August 2021 to March 2022 or any other period when data from both Sentinel-2 and Sentinel-3 are available?
What is the final output resolution? Is it 60m or 100m, as indicated in the caption of Figure 6 (for the first time in the manuscript)?
Section 4.1
line 210: Regarding Figure 4, I notice a peak in January, which is the highest point in the entire series. Why has this not been considered?
Figure 6: Geographical coordinates in the maps are expressed in degrees and minutes, but in the bottom plot, latitude is in decimal degrees. Consistent units should be used across all figures (this applies to all figures where geographical coordinates are displayed). Also, the bottom plot shows latitude increasing from south to north, while in Figures 18, 19, and 20, latitude increases from north to south. Please ensure consistency in the presentation of geographical coordinates.
Section 4.2
Figures 10 and 11: It would be helpful to provide definitions and formulas for each of the metrics used.
Line 263: The authors refer to band 492, but Figure 10 shows the 490nm plot.
Line 267: The authors mention 179 matchup points for Sentinel-3, but Figure 11 shows matchups ranging between 168 and 179 points. Could this discrepancy be related to the criteria used for match-up quality flagging? Consider eliminating entire spectra when at least one band has quality issues.
Line 270: The authors refer to bands 492 and 709, but Figure 11 shows 490nm and 704nm plots.
Line 273: The authors discuss RMSE for Figure 12, but Figures 10 and 11 show RMSD or cRMSD. Please clarify. Additionally, the metrics in Figure 12 (slope, RMSE/RMSD, and MAPE) differ significantly from those in Figures 10 and 11. Figure 12 should present metrics in a comparable way to demonstrate consistency across the bands. Why are the metrics different?
Figure 13: The DINEOF (blue line) should have values for every day in 2020. The RT1 (green line) does not have daily values due to quality control or other reasons, but representing it as a continuous line makes it hard to distinguish actual data points. I suggest adding markers on the green line where RT1 measurements exist. I recommend the authors also indicate the number of valid RT1 data points in 2020. Furthermore, why does the green line only span from January to August 2020? Where are the data from September to Decembre 2020, when RT1 data should be available from 2019 to 2023?
Lines 286–289: In this section, just before Figure 14, the authors repeat the technique used for the matchup analysis. However, I assume that the same method was also applied to Figure 13. If this is correct, I recommend moving this paragraph to precede the description of Figure 13. If not, please clarify the method used for the analysis in Figure 13.
Figure 14: Is MAPD the same as MAPE? Also, as described in the text and shown in the figure, with DINEOF the number of matchup points increases from 67 to 90 for 2020. Therefore, can I assume that the maximum number of matchup points available from RT1 is 90? Please clarify this number of in-situ matchup points.
Line 294–295: The authors, referring to Figure 14, mention an underestimation of DINEOF data for high TUR values. Could this underestimation be the same as the one observed in Figure 13 for January and February, possibly due to high cloud cover as indicated in the text a few lines above?
Section 4.3: I assume that the plots in Figures 15, 16, and 17 are in log scale, but this isn’t explicitly mentioned. Since the authors note that "the spatial distribution of chlorophyll is similar in all figures,” it would be helpful to include percentage difference maps for each analyzed day between "DINEOF Super Resolution" and "DINEOF reference." Additionally, a performance analysis of "DINEOF Super Resolution" over the entire dataset adopted (February 2022–November 2022) using percentage difference maps, scatterplots, or density plots would provide a more comprehensive evaluation than focusing on just 2–3 single days. Expanding the dataset used for analysis is recommended to ensure a more robust evaluation.
Section 5: In my opinion, the usefulness of this analysis is unclear. While the authors attempt to apply super-resolution interpolated data, this example may not be the most suitable due to high temporal variability, as acknowledged by the authors. It would be more useful if the authors could present the performance of the super-resolution interpolated data over all 210 days selected for 2020. This would help validate the positive results observed for the 2–3 days analyzed in the previous sections and demonstrate the potential utility of this technique for operational contexts or long-term application. Anyway, going into the paragraph, it is not mentioned that the data in all the figures are logarithmic. Moreover, including all the DINEOF data in the bottom panels of Figures 18 and 19 seems unnecessary, as superimposing data from other days makes it harder to interpret the trends for the specific day being analyzed.
Citation: https://doi.org/10.5194/egusphere-2024-1268-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
339 | 132 | 54 | 525 | 31 | 21 |
- HTML: 339
- PDF: 132
- XML: 54
- Total: 525
- BibTeX: 31
- EndNote: 21
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1