the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Ice Anatomy: A Benchmark Dataset and Methodology for Automatic Ice Boundary Extraction from Radio-Echo Sounding Data
Abstract. The measurement of ice thickness is of great importance for the accurate estimation of glacier volume and the delineation of their bedrock topography. In particular, this is a crucial factor in forecasting the future evolution of glaciers in the context of a changing climate. In order to derive the ice thickness, the travel time of electromagnetic waves in radargrams acquired by radio-echo sounding (RES) systems is analyzed. This can only be achieved by identifying the ice surface and underlying ice bottom in corresponding radargrams. Manually identifying these two reflection horizons in RES data is a laborious and time-consuming process. Consequently, scientists are attempting to automate this task through the use of techniques such as deep learning. Such automation can significantly reduce the time between a field campaign and the calculation of the glacier's ice thickness distribution. In this paper, we present the first benchmark dataset for delineating the ice surface and bottom boundaries in RES data, to facilitate straightforward comparisons of deep learning models in the future. The "IceAnatomy'' dataset comprises radargrams and the corresponding manual picks, amounting to a total of over 45,000 km of observations. The RES data originates from three sources: FAU, CReSIS, and AWI. The dataset comprises different RES systems as well as different pre-processing methods. In addition, the data was acquired over a large range of geographical and glaciological settings, featuring different thermal regimes present in Antarctica and the Southern Patagonian Icefield. This diversity ensures that the models' behaviors can be analyzed in different scenarios. We define a standardized train-test split for each source in the dataset. This allows us to introduce not only a baseline model trained on the entire training set (the "omni'' model), but also three source-specific baseline models. The source-specific models are trained exclusively on the subset of the training data acquired by the specified source. The baseline models provide an initial benchmark against which subsequent models can be compared. The source-specific models demonstrate more accurate results than the omni model. For the FAU, CReSIS, and AWI test sets, the source-specific models achieve Mean Meter Errors of 2.1 m, 23.1 m, and 4.9 m for the ice surface and 9.1 m, 78.2 m, and 29.3 m for the ice bottom. In relation to the mean measured ice thickness of the test set, these errors equate to 1.2 %, 3.1 %, and 0.3 % for the ice surface and 4.9 %, 10.4 %, and 1.5 % for the ice bottom. The dataset and implementation are available at https://zenodo.org/records/14036897 (Dreier et al., 2024) and https://doi.org/10.5281/zenodo.14038570 (Dreier, 2024).
- Preprint
(13006 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3597', Anonymous Referee #1, 04 Mar 2025
The manuscript presents a significant advancement in the study of glacial structures through the use of RES data, introducing a standardized benchmark dataset that will greatly benefit the community. By providing over 45000 km of annotated radargrams, this work sets a foundation for future studies aiming to automate and improve ice boundary delineations using deep learning models. The potential of this dataset to facilitate robust comparisons of model performance across various settings is particularly valuable, given the diverse geographic conditions represented.
Despite the merits, there are specific areas that require attention before this manuscript can be recommended for publication:
1. The choice to standardize all radargrams to a height of 1024 pixels requires further justification, especially given the reduction in resolution this causes, which could potentially affect the precision of the derived ice boundaries. The manuscript should provide a more detailed rationale (possibly linked to computational efficiency) for this choice, considering the capabilities of the U-Net-like to process varying input shapes.
2. The decision to not use an exclusive flight for the AWI testing subset due to significant variability among the radargrams is questionable. The inherent variability could, in fact, provide a rigorous real-world test scenario, which is crucial for assessing the robustness and adaptability of the model to new and varied environments (which should be the eventual goal of any benchmark dataset and the models developed based on them). A reevaluation of the testing subset choice is recommended to potentially enhance the findings.
3. The omni model shows reduced performance in the FAU and AWI domains, which the authors attribute to domain shifts. Consideration of alternative approaches such as weighting samples by domain frequency or uniformly sampling training examples across domains could potentially mitigate this issue. An exploration of these methods would be valuable for enhancing model generalization.
4. The proposed U-Net uses two heads to separately predict the ice surface and bottom. Why is it better than a straightforward approach with one head simultaneously doing both? Softmax can be applied later in the column-wise manner to extract the boundaries as well, so it should not be a limitation.
5. The authors write in Section 5.1: "Depending on the chosen method, the metrics used to assess the quality of the predictions differ," which is not really true, as zone predictions are easily convertible to boundaries and vice versa, so there is no problem to providing the whole set of metrics.
6. The manuscript claims that confusion matrix-based metrics would perform poorly if predictions are, e.g., consistently off by a pixel. However, this statement is misleading as these metrics are typically used for zone predictions, not boundary delineations. A correction or further explanation is needed to resolve this confusion.
7. In Appendix A, it is stated that the authors used dropout layers inside the ResBlocks. Was it a regular dropout? If not, it should be specified. If yes, I would suggest also trying something like spatial dropout, as many practitioners found it more helpful in convolutional networks.
8. Figures 6, 7, and similar graphics are challenging to interpret. I would suggest just plotting four curves---two groundtruths (surface and bottom) and two predictions on top (e.g. dashed).Overall, the paper is nicely written. The authors have also shared the dataset and software publicly, which significantly enhances the reproducibility of the study and trust in the results presented.
Citation: https://doi.org/10.5194/egusphere-2024-3597-RC1 -
RC2: 'Comment on egusphere-2024-3597', Anonymous Referee #2, 02 Apr 2025
Manuscript egusphere-2024-3597, Ice Anatomy: A Benchmark Dataset and Methodology for Automatic Ice Boundary Extraction from Radio-Echo Sounding Data"
The manuscript presents the "IceAnatomy" dataset, a benchmark dataset for automatic ice boundary extraction from radio-echo sounding (RES) data. It also introduces baseline models trained on this dataset, providing an initial framework for comparing deep learning-based ice boundary delineation methods. This work is relevant to the glaciology and remote sensing communities, particularly those developing machine learning techniques for RES data analysis. The paper is well-structured, and the dataset has potential value for the field. However, several areas require further clarification and improvement before acceptance.
General Assessment
- The manuscript uses varying terminology, such as “the air-ice layer and ice-ground layer” and “ice bottom and ice surface layer.” Maintaining consistency in terminology throughout the text would improve clarity and readability.
- The manuscript claims to present the first benchmark dataset for ice boundary extraction, yet related datasets such as CReSIS data have been widely used. The authors should explicitly contrast IceAnatomy with existing datasets and justify why this dataset is uniquely valuable beyond just being a "benchmark.
- The radargram visualizations are useful but could benefit from additional annotations. Additionally, the color scheme makes it difficult to distinguish certain features, and the way different annotations are represented could be improved for better clarity
- The manuscript is highly technical and may be challenging for a glaciological audience. Since The Cryosphere primarily targets glaciologists, the extensive use of computer science jargon and technical terminology either requires more thorough explanations or suggests that a different journal may be a better fit. Ensuring the content is more accessible to the journal’s primary readership should be a key consideration.
- The scientific motivation of the study could be further elaborated. This is one of the aspects that might suggest the paper, in its current form, would be better suited for a more technical journal.
- The rationale for the baseline model choices (e.g., why U-Net with specific modifications) should be better justified. Why not test other architectures such as Transformers or hybrid CNN-RNN models?
- The manuscript states that the dataset consists of manually labeled ice boundaries but does not provide sufficient details on the annotation process. What steps were taken to ensure label accuracy? Were multiple annotators involved? How was inter-annotator variability handled?
- The inclusion of noisy annotations from CReSIS data is acknowledged, but how does this affect training and evaluation? Have any data cleaning techniques been applied?
- The dataset includes different radar systems and processing methods, which may introduce domain shifts. Are these shifts quantified? How do they impact model performance?
- The AP-5% metric relaxes the error bounds, but why were 1% and 5% chosen? Would alternative thresholds (e.g., 2% or 10%) provide additional insights?
- The "ice boundary collapse" issue observed in predictions is significant. Could this be mitigated with additional constraints in the loss function or post-processing techniques?
- The paper does not discuss the impact of hyperparameters in training. How sensitive is the model to learning rate, regularization, and architecture modifications?
- Some terms, such as "depth resolution," "relative error," and "wave velocity assumptions," need clearer definitions in the main text rather than just appearing in equations.
- Some parts of the manuscript have an informal tone.
- The manuscript overstates the novelty and impact of its contributions. It describes the framework as the "first step" toward automated ice thickness mapping, despite acknowledging decades of prior research. Similarly, the claim that this work has "invited other scientists to start working" in this area overlooks longstanding studies. These statements should be revised to more accurately reflect the field’s history.
Line by line assessment:
Line 36: The phrase “radargram of the glacier” sounds somewhat awkward. Additionally, the manuscript does not always adhere consistently to standard glaciological terminology.
Line 61: “ The term "ice boundary layers" is used in the manuscript, but there are more precise and commonly accepted ways to refer to these features.
Line 64: The statement, “however, a large portion of …,” should be supported with evidence. Importantly, the critique of automatically labeled bedrock seems contradictory, as the study itself aims to achieve this. Clarifying this point would strengthen the argument.
Line 98: Jebeli et al. 2023 have performed a very similar aim to this work in their study.
Line 88: Moqadam et al. 2024 (DOI: 10.22541/essoar.172987463.39597493/v1) also have done the tracking of internal layers.
Line 98 – 105 : The manuscript would benefit from citing additional relevant work to provide a more complete context for readers. For instance, Moqadam and Eisen (https://doi.org/10.5194/egusphere-2024-1674) offers a broad review of prior research on ice boundary extraction, making it a fitting reference at the end of the literature review.
Line 102: Where the use of CNN for autoamtic tracing of internal layers is mentioned, Jebeli et al. 2023 (DOI: 10.13140/RG.2.2.23219.20007), Moqadam et al. 2024 (DOI: 10.22541/essoar.172987463.39597493/v1) directly addresses the application of deep learning to this task and would be valuable citations in the section discussing recent advancements in this area.
Including these references, along with other relevant studies would help situate the manuscript within the broader body of existing research and provide readers with a more comprehensive view of the field.
Line 124: not clear what the authors want to say.
Line 134 – 137: More references are needed to support the claims.
Figure 1: black lines for the flight paths are not so easy to distinguish.
Line 141: “Hence, … “ it is not clear or accurate argument for the clearer signal of the thinner ice. The aim of the sentence is evident but the sentence should be reformulated.
Line 148: the sentence seems to be incomplete.
Line 181: this process needs to be elaborated.
Line 200: the sentence is confusing.
Line 226: “the” should be removed.
Lines 240–245, 264–271, and other similar sections contain highly technical explanations. These should either be clarified and simplified for better accessibility to the journal’s audience or, if the technical depth is essential, the choice of journal may need to be reconsidered.
Line 251: hyphen needed between differently and sized.
Line 311: the authors mention that resizing changes the MAE so they introduce MME. It is not clear why they keep the MAE in the paper, if MME is a more suitable metric.
Line 315: The phrase 'pass through a pixel' is unclear. At times, the radargram are treated as an image, and at other times as a matrix. However, it is important to note that a wave does not pass through a pixel.
Line 324: The argument presented is not compelling. Line 381: The sentence needs to be rewritten for clarity. Lines 392-399: These sentences need to be rewritten for clarity and flow. Line 401: Please provide a more detailed explanation of the ablation study.
Line 414: the explanation of temperate ice this can appear much earlier in the manuscript
line 418: the sentence should be rewritten.
Line 421: It is obvious that the differences decrease when AP-5% is considered, and there is nothing surprising about this result. Please rewrite this statement or clarify the reasoning behind the argument.
Line 430: the sentence does not read well.
Line 437: Please provide further explanation. What exactly do you mean, and why is this the case?
Line 441: The authors mention that thicker ice is more challenging, but shouldn't it actually be easier, as less collapse would occur in thicker ice compared to thinner ice?
Line 448: The statement "We believe that our framework is the first step towards a potential fully automated generation of ice thickness maps based on RES data" could be reworded for accuracy. As noted in the literature review, research in this area has been ongoing for nearly two decades. While this work is a valuable contribution, positioning it as the first step towards automation may not fully acknowledge prior advancements in the field.
Line 462: The statement suggesting that this work has "invited other scientists to start working in this research area" may overstate its impact. Given the examples of previous studies provided by the authors, it would be more accurate to acknowledge the long-standing research efforts in this field while highlighting how this study builds upon them.
Line 464: “ice depth” is unclear.
Final Recommendations
Based on the overall assessment and line-by-line feedback, I recommend major revisions for this manuscript. The authors should address several critical issues to improve the clarity, accuracy, and scientific rigor of the work. Below are the key areas that require attention:
- Terminology and Consistency
- Novelty and Claims
- Dataset and Baseline Model Justification
- Technical Detail and Accessibility
- Clarification of Evaluation Metrics and Model Performance
- Model Training and Hyperparameter Impact
- Annotation and Data Quality
Conclusion
In conclusion, the manuscript offers valuable contribution to the field, but there are a few areas that require further attention to improve clarity, scientific claims, and provide more thorough justification for the methodology. To enhance the manuscript, the authors should consider revising the sections where misleading claims are made, providing additional explanations for key points, and ensuring that their work is appropriately situated within the broader context of existing research. These revisions will help ensure the manuscript is well-positioned for publication.
Citation: https://doi.org/10.5194/egusphere-2024-3597-RC2
Data sets
Ice Anatomy: A Benchmark Dataset and Methodology for Automatic Ice Boundary Extraction from Radio-Echo Sounding Data Marcel Dreier et al. https://zenodo.org/records/14036897
Model code and software
Implementation of Ice Anatomy: A Benchmark Dataset and Methodology for Automatic Ice Boundary Extraction from Radio-Echo Sounding Data Marcel Dreier https://doi.org/10.5281/zenodo.14038570
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
123 | 28 | 9 | 160 | 10 | 8 |
- HTML: 123
- PDF: 28
- XML: 9
- Total: 160
- BibTeX: 10
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 47 | 29 |
Germany | 2 | 35 | 21 |
France | 3 | 10 | 6 |
China | 4 | 8 | 4 |
United Kingdom | 5 | 7 | 4 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 47