the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Pan-Arctic Sea Ice Concentration from SAR and Passive Microwave
Abstract. Arctic sea ice monitoring is a fundamental prerequisite for anticipating and mitigating the impacts of climate change. Satellite-based sea ice observations have been subject to intense attention over the last few decades, with passive microwave (PMW) radiometers being the primary sensors for retrieving pan-Arctic sea ice concentration, albeit with coarse spatial resolutions of a few or even tens of kilometers. Space-borne Synthetic Aperture Radar (SAR) missions, such as Sentinel-1, provide dual-polarized C-band images with <100 meter spatial resolution, which are particularly well-suited for retrieving high-resolution sea ice information. In recent years, deep learning-based vision methodologies have emerged with promising results for SAR-based sea ice concentration retrievals. Despite recent advancements, most contributions focus on regional or local applications without empirical studies on the generalization of the algorithms to the pan-Arctic region. Furthermore, many contributions omit uncertainty quantification from the retrieval methodologies, which is a prerequisite for the integration of automated SAR-based sea ice products into the workflows of the national ice services, or for the assimilation into numerical ocean-sea-ice coupled forecast models. Here, we present ASIP (Automated Sea Ice Products): a new and comprehensive deep learning-based methodology to retrieve high-resolution sea ice concentration with accompanying well-calibrated uncertainties from Sentinel-1 SAR and Advanced Microwave Scanning Radiometer 2 (AMSR2) passive microwave observations at a pan-Arctic scale for all seasons. We compiled a vast matched dataset of Sentinel-1 HH/HV imagery and AMSR2 brightness temperatures to train ASIP with regional ice charts as labels. ASIP achieves an R2-score of 95 % against a held-out test dataset of regional ice charts. In a comparative study against pan-Arctic ice charts and PMW-based sea ice products, we show that ASIP generalizes well to the pan-Arctic region. Additionally, the comparison reveals that ASIP consistently produces relatively higher sea ice concentration than the PMW-based sea ice product, with mean biases ranging from 1.45 % to 8.55 %, and that the discrepancies are primarily attributed to disparities in the marginal ice zone.
- Preprint
(133599 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2024-178', Anonymous Referee #1, 27 Mar 2024
Review for
Pan-Arctic Sea Ice Concentration from SAR and Passive Microwave
by
Tore Wulf, Jørgen Buus-Hinkler, Suman Singha, Hoyeon Shi, and Matilde Brandt KreinerGeneral comments:
Considering the need for wide-coverage regular monitoring of environmental and climatological changes in the Arctic, this paper provides an interesting contribution, dealing with automated retrieval of sea ice concentration (SIC) at higher spatial resolution. The key points are: (1) the introduction of a deep learning based SIC retrieval with improved spatial resolution and with associated calibrated uncertainties, and (2) the use of a substantially extended training and validation data set of Sentinel-1 images collocated with AMSR-2 data which covers the whole periphery of Greenland, the Canadian Archipelago, and parts of the Labrador Sea, and contains more than 5000 samples acquired from 2018 to 2021.
The retrieval method is based on an ensemble of convolutional neural networks (ConvNets) and includes investigations of re-calibration strategies and metrics for quantifying mis-calibration. With a proper calibration of the retrieval method one guarantees that the confidence scores provided by the model reflect its predictive uncertainty which is needed by the end-user to directly assess the reliability of the SIC information (that is how I understood it).
The paper is well structured, and the text well formulated, although in parts at a too detailed level without first providing the main questions. In my opinion the paper should definitely be accepted with considerations of the suggestions and comments provided below.
I have two main issues(1) Methodology and Future Work :
Subsections 3.3.1 to 3.3.3, and Sect. 3.4 present many details. I had some difficulties to follow the use of single recalibration strategies and single mis-calibration metrics. Apparently all of the former are combined with all of the latter? Which then helps to decide which ConvNets configuration is finally used (lines 381-383 in Sect. 4.1)? I recommend to provide a graphical presentation of the workflow that is described in the sections mentioned above. This will help the reader to understand the overall structure of the methodology before digging into the many details provided in the recent text. In addition it would be helpful if the authors formulate the motivating questions, which in my understanding are roughly: Which are the optimal recalibration strategies? Which are the optimal metrics to decide?
In the sub-section “Future work“ it would be helpful for the reader when - with one or two sentences - the overall topic of this part is introduced, which in my understanding is the discussion of two alternative methods: multi-parameter retrieval (lines 489 to 525) and self-supervised learning (starting line 525), and separate these two alternatives also visually in the text formatting.(2) Figs 8-12 and corresponding text (lines 395 -447).
With the given figures, judgements regarding the similarity between the ASIP results and the ice charts shown for comparison are possible only on a subjective basis. Even the authors themselves use only vague formulations: “resemble …to a significant extent (lines 398-399) and “fairly similar“ (line 440). In my (subjective) opinion, the differences between ASIP results and the data shown for comparison are relatively large in some regions. In particular the results shown on the pan-Arctic scale are difficult to assess, even with zooming (here in particular Fig. 7 and 8, since the interpretation of the SAR images in terms of SIC will be difficult for most readers). I recommend to keep Fig. 7 as an example for the incomplete Sentinel-1 coverage in comparison to the final ASIP SIC results, and replace Fig. 8 with regional (scale of a Sentinel-1 EW scene) examples that show cases of higher and lower uncertainties, with possible explanations in text for the latter. Figure 9 and 10 could be combined into a single figure, choosing the three most interesting cases including the bottom row of Fig 10 (H-L) for which zoom-ins are shown in Figs. 11-12. I would also like to see the ASIP uncertainty maps in Figs. 9-12 which are more important for a judgement of possible problems than the difference to the OSI-SAF SIC. Because of the uncertainties in the OSI-SAF data the difference maps do not help to judge the accuracy of the ASIP. If possible and if they want, the authors could provide more figures as supplementary material.
Minor questions:(some of my questions are related to my lack of knowledge of deep learning terminology)
lines 176-177: Are single NIC charts composed of observational data acquired at different days, or of data from just one fixed day which may be from up to 5 days prior to production of the ice chart?
lines 198-199: “The Sentinel-1 HH/HV bands and the AMSR2 brightness temperatures are standardized prior to training“ - what means standardize?Fig. 3: what is the meaning of HxWxC, HxWx(RC)? Although the other abbreviations are explained in the text, I recommend to repeat the explanations in the figure caption which makes it easier for the reader to understand the graph without jumping back and forth between text and figure. For readers like me who are not familiar with deep learning terminology, it is helpful to explain (or replace) the "Conv1x1" which probably means to map an input pixel to an output pixel without considering the pixels around (so in fact there is no convolution).
Equation (1): What is parameter “y“ in words?
Equation (2): How is the accuracy determined? What means “support of bin Bm“?
line 276: what means “hold-out“ validation?
line 288: what is a “one-hot“ encoded label?
line 330: …we set “the“ bin support…
line 350: I do not see much sense in considering NIC charts with a time difference of up to 12 days relative to the actual observations. They can definitely not be used for judging the quality of the ASIP results. But this is not a critical point. The authors could better explain why they included this comparison.
lines 369-370: “…introduced the during…“?
line 388: please give a range for “intermediate SIC“
Figs. 6-10: The identification scheme of the single plates (A,B,C…) in the figures, as used in the text, should be explained in at least the caption for Fig. 6, the other figures may refer to it.
lines 412-414: The increase of the ocean backscatter due to wind does not change the absolute level of backscattering from the sea ice (or did I misunderstand this sentence?). What is changing is the intensity contrast between ice and water which probably is not easy to consider in the training of the ConvNets without additional information about wind conditions.
Figs. 9 and 10: ASIP SIC - OSI SAF SIC: The range for showing the difference values should be selected smaller, e.g. excluding the negative differences which don’t occur in the maps (instead a corresponding hint in the figure caption?)
lines 509-512: “Allowing the ConvNets to learn the location-dependent seasonal variation in sea ice conditions, either by including the location and the time of the year as additional input features or by some other mechanism, we can level the playing field between the ice analyst and the ConvNets, improving their predictive
performance.“ To focus deep learning methods on typical local conditions (either just for retrieval of SIC as stand-alone or of muitl-parameter sets) seems to be the way forward also for improving the accuracy of a pan-Arctic product? Could be explicitly mentioned in the discussion if the authors agree.References: for the first one (Allen et al. 2023) the journal is missing. Note I did not check all references.
Citation: https://doi.org/10.5194/egusphere-2024-178-RC1 - AC2: 'Reply on RC1', Tore Wulf, 24 Jun 2024
-
RC2: 'Comment on egusphere-2024-178', Anonymous Referee #2, 24 Apr 2024
Review of ” Pan-Arctic Sea Ice Concentration from SAR and Passive Microwave” by Wulf et al.
Robust all-weather multi-sensor SIC estimates for the Arctic are an important topic in cryospheric Earth Observation. Here, the authors present a deep learning-based retrieval framework for SIC which combines SAR and PMW observations, trained with regional ice charts from Danish and Canadian Ice Services.
The authors mainly do a good job of explaining the technically complex training and retrieval. The idea of applying the full classification probability vector’s information to estimate SIC is clever and commendable. Still, some gaps in presentation remain. One result figure was missing from the pdf, and some aspects of the retrieval and training could be better justified. Also, the pan-Arctic applicability discussion, while logical and relevant, seemed to focus on resolution aspects and operator-dependent biases. I was missing some critical thought on the broader aspects of deriving SIC across the full width of the Arctic Ocean. Once these issues are remedied, however, I see this paper as a very worthwhile addition to the body of SIC retrieval literature, with clear advances in several aspects and a clean delivery in written form.
Comments:
- The regional DMI and CIS coverages appear to have overlap near west Greenland. Did the authors assess the similarity of the ice charts as a measure for subjective analyst's classification uncertainty? Section 4.2 suggests the results do reflect some operator dependence, but did you try to quantify it?
- Why were the validation and test datasets very, very small compared to the training dataset? Often at least 10% of all data are assigned to validation and test groups, here it’s ~1%. How much does this influence the results?
- Section 3.2. answers well the question of “what”, but offers little for the question “why”. Did you test alternative deep learning methods than the ConvNet chosen? What was the key reason to choose it over other alternatives?
- Figure 9 seemed to missing entirely from the pdf?
- Fig 10 and similar – the difference plot color range would be better constrained to typical observed ranges rather than the physical maxima of +/- 100%.
- While I agree that the pan-Arctic SIC estimates here appear reasonable, I hold some reservations about the applicability of the near-coastal training data to the full range of ice behaviour across the broad swath of the Arctic Ocean. For example, were there sufficient leads and melt ponds in summer in the training data w.r.t. the innermost AO? Were ridges present in the full height range encountered?
Citation: https://doi.org/10.5194/egusphere-2024-178-RC2 - AC1: 'Reply on RC2', Tore Wulf, 24 Jun 2024
Status: closed
-
RC1: 'Comment on egusphere-2024-178', Anonymous Referee #1, 27 Mar 2024
Review for
Pan-Arctic Sea Ice Concentration from SAR and Passive Microwave
by
Tore Wulf, Jørgen Buus-Hinkler, Suman Singha, Hoyeon Shi, and Matilde Brandt KreinerGeneral comments:
Considering the need for wide-coverage regular monitoring of environmental and climatological changes in the Arctic, this paper provides an interesting contribution, dealing with automated retrieval of sea ice concentration (SIC) at higher spatial resolution. The key points are: (1) the introduction of a deep learning based SIC retrieval with improved spatial resolution and with associated calibrated uncertainties, and (2) the use of a substantially extended training and validation data set of Sentinel-1 images collocated with AMSR-2 data which covers the whole periphery of Greenland, the Canadian Archipelago, and parts of the Labrador Sea, and contains more than 5000 samples acquired from 2018 to 2021.
The retrieval method is based on an ensemble of convolutional neural networks (ConvNets) and includes investigations of re-calibration strategies and metrics for quantifying mis-calibration. With a proper calibration of the retrieval method one guarantees that the confidence scores provided by the model reflect its predictive uncertainty which is needed by the end-user to directly assess the reliability of the SIC information (that is how I understood it).
The paper is well structured, and the text well formulated, although in parts at a too detailed level without first providing the main questions. In my opinion the paper should definitely be accepted with considerations of the suggestions and comments provided below.
I have two main issues(1) Methodology and Future Work :
Subsections 3.3.1 to 3.3.3, and Sect. 3.4 present many details. I had some difficulties to follow the use of single recalibration strategies and single mis-calibration metrics. Apparently all of the former are combined with all of the latter? Which then helps to decide which ConvNets configuration is finally used (lines 381-383 in Sect. 4.1)? I recommend to provide a graphical presentation of the workflow that is described in the sections mentioned above. This will help the reader to understand the overall structure of the methodology before digging into the many details provided in the recent text. In addition it would be helpful if the authors formulate the motivating questions, which in my understanding are roughly: Which are the optimal recalibration strategies? Which are the optimal metrics to decide?
In the sub-section “Future work“ it would be helpful for the reader when - with one or two sentences - the overall topic of this part is introduced, which in my understanding is the discussion of two alternative methods: multi-parameter retrieval (lines 489 to 525) and self-supervised learning (starting line 525), and separate these two alternatives also visually in the text formatting.(2) Figs 8-12 and corresponding text (lines 395 -447).
With the given figures, judgements regarding the similarity between the ASIP results and the ice charts shown for comparison are possible only on a subjective basis. Even the authors themselves use only vague formulations: “resemble …to a significant extent (lines 398-399) and “fairly similar“ (line 440). In my (subjective) opinion, the differences between ASIP results and the data shown for comparison are relatively large in some regions. In particular the results shown on the pan-Arctic scale are difficult to assess, even with zooming (here in particular Fig. 7 and 8, since the interpretation of the SAR images in terms of SIC will be difficult for most readers). I recommend to keep Fig. 7 as an example for the incomplete Sentinel-1 coverage in comparison to the final ASIP SIC results, and replace Fig. 8 with regional (scale of a Sentinel-1 EW scene) examples that show cases of higher and lower uncertainties, with possible explanations in text for the latter. Figure 9 and 10 could be combined into a single figure, choosing the three most interesting cases including the bottom row of Fig 10 (H-L) for which zoom-ins are shown in Figs. 11-12. I would also like to see the ASIP uncertainty maps in Figs. 9-12 which are more important for a judgement of possible problems than the difference to the OSI-SAF SIC. Because of the uncertainties in the OSI-SAF data the difference maps do not help to judge the accuracy of the ASIP. If possible and if they want, the authors could provide more figures as supplementary material.
Minor questions:(some of my questions are related to my lack of knowledge of deep learning terminology)
lines 176-177: Are single NIC charts composed of observational data acquired at different days, or of data from just one fixed day which may be from up to 5 days prior to production of the ice chart?
lines 198-199: “The Sentinel-1 HH/HV bands and the AMSR2 brightness temperatures are standardized prior to training“ - what means standardize?Fig. 3: what is the meaning of HxWxC, HxWx(RC)? Although the other abbreviations are explained in the text, I recommend to repeat the explanations in the figure caption which makes it easier for the reader to understand the graph without jumping back and forth between text and figure. For readers like me who are not familiar with deep learning terminology, it is helpful to explain (or replace) the "Conv1x1" which probably means to map an input pixel to an output pixel without considering the pixels around (so in fact there is no convolution).
Equation (1): What is parameter “y“ in words?
Equation (2): How is the accuracy determined? What means “support of bin Bm“?
line 276: what means “hold-out“ validation?
line 288: what is a “one-hot“ encoded label?
line 330: …we set “the“ bin support…
line 350: I do not see much sense in considering NIC charts with a time difference of up to 12 days relative to the actual observations. They can definitely not be used for judging the quality of the ASIP results. But this is not a critical point. The authors could better explain why they included this comparison.
lines 369-370: “…introduced the during…“?
line 388: please give a range for “intermediate SIC“
Figs. 6-10: The identification scheme of the single plates (A,B,C…) in the figures, as used in the text, should be explained in at least the caption for Fig. 6, the other figures may refer to it.
lines 412-414: The increase of the ocean backscatter due to wind does not change the absolute level of backscattering from the sea ice (or did I misunderstand this sentence?). What is changing is the intensity contrast between ice and water which probably is not easy to consider in the training of the ConvNets without additional information about wind conditions.
Figs. 9 and 10: ASIP SIC - OSI SAF SIC: The range for showing the difference values should be selected smaller, e.g. excluding the negative differences which don’t occur in the maps (instead a corresponding hint in the figure caption?)
lines 509-512: “Allowing the ConvNets to learn the location-dependent seasonal variation in sea ice conditions, either by including the location and the time of the year as additional input features or by some other mechanism, we can level the playing field between the ice analyst and the ConvNets, improving their predictive
performance.“ To focus deep learning methods on typical local conditions (either just for retrieval of SIC as stand-alone or of muitl-parameter sets) seems to be the way forward also for improving the accuracy of a pan-Arctic product? Could be explicitly mentioned in the discussion if the authors agree.References: for the first one (Allen et al. 2023) the journal is missing. Note I did not check all references.
Citation: https://doi.org/10.5194/egusphere-2024-178-RC1 - AC2: 'Reply on RC1', Tore Wulf, 24 Jun 2024
-
RC2: 'Comment on egusphere-2024-178', Anonymous Referee #2, 24 Apr 2024
Review of ” Pan-Arctic Sea Ice Concentration from SAR and Passive Microwave” by Wulf et al.
Robust all-weather multi-sensor SIC estimates for the Arctic are an important topic in cryospheric Earth Observation. Here, the authors present a deep learning-based retrieval framework for SIC which combines SAR and PMW observations, trained with regional ice charts from Danish and Canadian Ice Services.
The authors mainly do a good job of explaining the technically complex training and retrieval. The idea of applying the full classification probability vector’s information to estimate SIC is clever and commendable. Still, some gaps in presentation remain. One result figure was missing from the pdf, and some aspects of the retrieval and training could be better justified. Also, the pan-Arctic applicability discussion, while logical and relevant, seemed to focus on resolution aspects and operator-dependent biases. I was missing some critical thought on the broader aspects of deriving SIC across the full width of the Arctic Ocean. Once these issues are remedied, however, I see this paper as a very worthwhile addition to the body of SIC retrieval literature, with clear advances in several aspects and a clean delivery in written form.
Comments:
- The regional DMI and CIS coverages appear to have overlap near west Greenland. Did the authors assess the similarity of the ice charts as a measure for subjective analyst's classification uncertainty? Section 4.2 suggests the results do reflect some operator dependence, but did you try to quantify it?
- Why were the validation and test datasets very, very small compared to the training dataset? Often at least 10% of all data are assigned to validation and test groups, here it’s ~1%. How much does this influence the results?
- Section 3.2. answers well the question of “what”, but offers little for the question “why”. Did you test alternative deep learning methods than the ConvNet chosen? What was the key reason to choose it over other alternatives?
- Figure 9 seemed to missing entirely from the pdf?
- Fig 10 and similar – the difference plot color range would be better constrained to typical observed ranges rather than the physical maxima of +/- 100%.
- While I agree that the pan-Arctic SIC estimates here appear reasonable, I hold some reservations about the applicability of the near-coastal training data to the full range of ice behaviour across the broad swath of the Arctic Ocean. For example, were there sufficient leads and melt ponds in summer in the training data w.r.t. the innermost AO? Were ridges present in the full height range encountered?
Citation: https://doi.org/10.5194/egusphere-2024-178-RC2 - AC1: 'Reply on RC2', Tore Wulf, 24 Jun 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
449 | 169 | 28 | 646 | 26 | 16 |
- HTML: 449
- PDF: 169
- XML: 28
- Total: 646
- BibTeX: 26
- EndNote: 16
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1