the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Using Deep Learning and Multi-source Remote Sensing Images to Map Landlocked Lakes in Antarctica
Anyao Jiang
Xin Meng
Yan Huang
Abstract. Antarctic landlocked lakes' open water (LLOW) plays an important role in the Antarctic ecosystem and serves as a reliable climate indicator. However, since field surveys are currently the main method to study Antarctic landlocked lakes, the spatial and temporal distribution of landlocked lakes across Antarctica remains understudied. We first developed an automated detection workflow for Antarctic LLOW using deep learning and multi-source satellite images. The U-Net model and LLOW identification model achieved average Kappa values of 0.85 and 0.62 on testing datasets respectively, demonstrating strong spatio-temporal robustness across various study areas. We chose four typical ice-free areas located along the coastal Antarctica as our study areas. After applying our LLOW identification model to a total of 79 Landsat 8-9 images and 390 Sentinel-1 images in these four regions, we generated high spatiotemporal resolution LLOW time series from January to April between 2017 and 2021. We analyzed the fluctuation of LLOW areas in the four study areas, and found that during expansion of LLOW, over 90 % of the changes were explained by positive degree days; while during contraction, air temperature changes accounted for more than 50 % of the LLOW area fluctuations. It is shown that our model can provide long-term LLOW series products that help us better understand how lakes change under a changing climate in the future.
- Preprint
(1972 KB) - Metadata XML
-
Supplement
(31 KB) - BibTeX
- EndNote
Anyao Jiang et al.
Status: open (until 06 Jan 2024)
-
RC1: 'Comment on egusphere-2023-1810', Anonymous Referee #1, 22 Nov 2023
reply
Summary
The paper describes a semantic segmentation scheme to map landlocked lakes in Antarctica, using Landsat and Sentinel-1 satellite imagery as base data. Landsat images are segmented with a U-net, Sentinel-1 with a manually tuned threshold. The results are merged with a simple late fusion logic.
Novelty/Relevance
There isn’t any technical novelty. Methods are standard and used in somewhat ad-hoc manner without clear justification for the design.
The specific application appears to be new, I am note aware of any paper that described the specific case of land-locked antarctic lakes. That being said, the distinction is perhaps a tad contrived, there has certainly been work on detecting supra-glacial lakes, so the only difference is really to check whether a lake is surrounded by rock or by ice.
Strengths
Since the task has apparently not been studied before, there is potential to systematically map land-locked lakes with the method (it is not done at any scale, though). I am not an expert in Antarctic ecology or climate and cannot judge the relevance of this, but it is a mapping capability that hadn’t been investigated.
The proposed method works moderately well, even if the segmentation performance is not surprising or spectacular for the fairly straightforward task.
Weaknesses
Technical decisions seem somewhat arbitrary and ad-hoc. Not seriously wrong, but the described scheme is just “a way to do it”, not a carefully designed and justified “best way to do it”.
The evaluation is rather weak, using only a few small areas, and even excluding some lakes that are clearly visible within the image tiles. The study does not go beyond the four small proof-of-concept regions, there are no large-scale, wall-to-wall results.
The model validation suggests that almost all the performance is due to Landsat, whereas Sentinel-1 does not offer much except the potential to densify in time - which however is not actually done, since the Landsat segmentation acts as a hard constraint: the algorithm does not appear to allow SAR to add lake pixels.
Finding of a “decreasing trend in LLOW area” for 2017-2021 is rather trivial and expected. It would be more interesting to interpret the measured areas beyond just that obvious trend.
Presentation
Throughout, the text could be made shorter and more concise. E.g.,
- lines 215-220 are unnecessary, everything that is said there is already implied by the use of U-net
- 228-240 is a verbose, meandering way to simply say “we manually chose a global threshold by inspecting histograms”
- 250-264 says little more than that the definition of a land-locked lake is a water region surrounded by a rock region.
- etc.
The introduction is verbose and not very focussed, touching on all sorts of studies about land-locked lakes that have no relation or importance for what the paper then does.
The analysis in lines 420-440 is rather hand-wavy, I was not able to see what purpose it actually serves. It gives me the impression that the authors just performed a random analysis that was easily doable, to send a message that the maps could potentially serve some useful purpose.
There are remaining language issues, both in terms of English grammar (random example: “due to non-uniform of field surveys”) and in terms of technical expressions (e.g., “gradient disappearance” instead of “gradient vanishing”).
Technical Questions
The computational procedure is not entirely clear. It is one way of cobbling together a segmentation pipeline, but there is no clear explanation why that specific design was chosen. Whereas there are obvious concerns about it, e.g.,
- the potential benefit of Sentinel-1 for the land-cover map are not exploited
- possible correlations between optical and SAR are lost
- the fusion seems to not leverage away the (pseudo-)probabilistic segmentation scores
What is meant by saying 300x300 is the “common” patch size for U-net models? There isn’t a single, canonical patch size for training those models, and at test time they are anyways fully convolutional and not tied to a specific patch size.
I don’t understand the upsampling of the input for the U-net. No information is added by this and the effective receptive field / context window inside the network is actually reduced. So it would seem that one can reach at least equal performance, with lower computational effort, by properly training the U-net to handle the smaller images. Please explain.
Certain augmentations should be ablated and empirically justified. Conceptually it seems problematic to apply transformations like rotating or vertical flipping, as this leads to illumination directions that are implausible in real Landsat images, especially at Antarctic altitudes with low sun elevation.
Using a threshold for segmentation is of course entirely correct and sensible, if it works. But the justification that single-channel input will lead to “instability” of U-net makes no sense. Countless applications use U-nets with various single-channel inputs (SAR, panchromatic, depth,…).
The fusion step is unclear. First you argue for using SAR, and for combining it with optical data, to obtain better temporal resolution. But then, a consensus over at least 2 Landsat acquisitions is required for a potential LLOW pixel, meaning that the shortest possible resolution of everything that follows is the interval between three Landsat overpasses (if a pixel flips from ice to water between two consecutive images, you need to wait for a third image to confirm, so over the entire period you cannot say whether the pixel remained the same or thawed and froze again).
In line 290, it remains unclear how the authors “disregard” underestimated lakes. To do that one must identify them first - but the algorithm, by definition, does not know where it made an underestimation error.
I do not understand why only a tiny set of 17k pixels were “annotated for U-net”, but 225k pixels were “annotated for LLOW identification”. 17k seems an overly small training set: assuming an average lake size of, say, 600x600m that would be fewer than 50 lakes. Why would one do that if apparently another 225k annotated pixels are available?
Cohen’s kappa as a segmentation metric is discouraged (cf. [Pontius and Milliones, 2011]). I would recommend to follow best practice and show confusion matrices, F1 scores, IoU scores.
The PDD metric (equation 4) is defined in a strange manner. According to the definition that metric is exactly the same for a cold spell with two weeks of constantly at 0 degrees, or of constantly -25 degrees. Surely that would make a difference for the ice cover of the lakes? Wouldn’t it be more natural to look at the number of consecutive PDD, or to integrate the average temperature including negative values?
For the decline (Section 5.2), why use the minimum temperature? To my knowledge, and also more in line with the PDD metric used earlier, a more common indicator is the number of consecutive negative degree days, at least in studies of lake ice in Canada, the Alps, etc.
Minor Comments
Why do ice-covered lakes “magnify the warming trend”? I would think they might rather dampen it?
It is a strange claim that U-net “requires less training datasets and time” than other neural networks. That depends on who you compare to, of course there are designs that are faster than U-net (e.g., those created for mobile or embedded devices). Moreover, “U-net” isn’t a specific architecture but a whole family of networks with certain characteristics - essentially, symmetric hourglass encoder-decoder structure with dense skip connections. So some “U-nets” are a lot slower and more data-hungry than others.
Citation: https://doi.org/10.5194/egusphere-2023-1810-RC1
Anyao Jiang et al.
Anyao Jiang et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
160 | 71 | 11 | 242 | 20 | 8 | 8 |
- HTML: 160
- PDF: 71
- XML: 11
- Total: 242
- Supplement: 20
- BibTeX: 8
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1