the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Developing a deep learning forecasting system for short-term and high-resolution prediction of sea ice concentration
Abstract. There has been a steady increase of marine activity throughout the Arctic Ocean during the last decades, and maritime end users are requesting skillful high-resolution sea ice forecasts to ensure operational safety. Different studies have demonstrated the effectiveness of utilizing computationally lightweight deep learning models to predict sea ice properties in the Arctic. In this study, we utilize operational atmospheric forecasts as well as ice charts and sea ice concentration passive microwave observations as predictors to train a deep learning model with ice charts as the ground truth. The developed deep learning forecasting system can predict regional sea ice concentration at one kilometer resolution for 1 to 3-day lead time. We validate the deep learning system performance by evaluating the position of forecasted sea ice concentration contours at different concentration thresholds. It is shown that the deep learning forecasting system achieves a lower error for several sea ice concentration contours when compared against baseline-forecasts (persistence-forecasts and a linear trend), as well as two state-of-the-art dynamical sea ice forecasting systems (neXtSIM and Barents-2.5) for all considered lead times and seasons.
- Preprint
(3798 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2023-3107', Anonymous Referee #1, 03 Apr 2024
The manuscript addresses the critical need for accurate sea ice forecasting in the Arctic, driven by the increasing maritime activity due to sea ice retreat. A deep learning approach is developed that leverages operational atmospheric forecasts, ice charts, and satellite data to enhance short-term sea ice concentration forecasts within a 1 to 3 days timeframe, aiming for a detailed 1km resolution. The model's performance, validated against various thresholds of sea ice concentration contours, outperforms both baseline forecasts and two state-of-the-art dynamical sea ice forecasting systems across all considered lead times and seasons.
Nonetheless, the paper could stand to delve deeper into the model's limitations. Addressing potential biases from the training data and the effects of missing or inaccurate data could enrich the study. Suggestions for improvement are listed as below.
- Place Table 1 within the 'Data' section for better context.
- On page 6, line 140, provide clarification regarding the significance of the 'timeliness of 2.5 hours' for the AROME Arctic model, a detail omitted in Section 2.2.
- Using operational atmospheric forecasts, ice charts, and Sea Ice Concentration (SIC) from passive microwave observations as predictors is innovative. However, the paper should consolidate potential biases in these data sources and their impact on model performance in the discussion, making the article more logical and complete.
- On page 9, line 205, explain the rationale behind the selection of a specific number of epochs for model training.
- The impact of hyperparameter tuning on model performance should be discussed. Were any automated hyperparameter optimization techniques like grid search or Bayesian optimization used?
- In section 4.2, the comparison with dynamical models should include a discussion on the computational efficiency of the deep learning model. This is particularly important for operational forecasting, where timely predictions are crucial.
- It would be beneficial to conduct a more detailed analysis of the model's performance across various sea ice concentration ranges in Section 4.2.
- Certain figures, especially those illustrating the model's performance compared to baseline and dynamical models, could be enhanced for clarity and aesthetics. For example, Figure 9 may require modifications to improve clarity.
Citation: https://doi.org/10.5194/egusphere-2023-3107-RC1 - AC1: 'Reply on RC1', Are Frode Kvanum, 03 Jun 2024
-
RC2: 'Comment on egusphere-2023-3107', Anonymous Referee #2, 25 Apr 2024
The authors propose a novel operational-like short-term sea-ice forecasting system based on deep learning. Based on past sea-ice charts, satellite images, and weather forecast data, neural networks are trained to predict sea-ice charts for one to three days in advance. To train the neural networks and tackle the issues of unbalances between the sea-ice concentration categories, the authors introduce a new formulation for the categorical prediction. They show that their proposed deep learning system can outperform baseline methods as well as prediction systems based on geophysical sea-ice models.
Generally, the approach is sounding and the manuscript follows a logical order. However, the readability of the manuscript can be improved, please see also my minor comments. Additionally, I have a few general comments that should be addressed, before I can recommend an acceptation of the manuscript::
- The proposed method performs better than the second-best method, a Eulerian persistence forecast. Additionally, the two different employed feature importance metrics indicate that the initial sea-ice chart is the most important predictor. From my experience, the shown difference between the deep learning method and persistence could be explained by advection of the sea ice. So, one could wonder how an advection-based (Lagrangian persistence) model would perform in these settings. Based on the wind velocities given by the AROME forecasts, the free-drift equations can be applied to obtain sea-ice velocities, then useable to advect the sea-ice concentration. My feeling says that this might work similarly well as the deep learning method.
Even if such a free-drift model would perform similarly to deep learning, this wouldn’t mean a shortcoming of deep learning: it would suggest that deep learning can learn such advective behavior without ever seeing any physical relationship. Additionally, deep learning has the potential to exceed this performance with further technological advancements, while the potential for improvements in a free-drift model might be very incremental.
An implementation of this might be outside the scope of the article. Nevertheless, I would like to see a discussion of this point in the manuscript and a further reasoning why deep learning outperforms persistence. - The comparison to numerical systems is nice and shows the potential of deep learning compared to those systems based on geophysical equations. However, the comparison seems not entirely fair:
deep learning starts from perfect initial conditions, while the forecasting systems start from an analysis. In Fig. 6, it can be seen that the neXtSIM-F has a very large initialization error and suffers very much from double penalty effects. Hence, I would like to see an experiment, where the deep learning system is initialized with the sea-ice concentration as seen in neXtSIM-F. This way, both forecasts would have the same initialization error, leading to a fairer comparison. In addition, the comparison to the “perfect” initial conditions case could reveal interesting discussion points, e.g., on the stability of the deep learning system or the impact of worse initial conditions in the neXtSIM system, possibly signifying the importance of an improved analysis product. - The writing in the methods part is from time to time ambiguous and the reader can easily lose the thread:
Dataset pre-processing and selection:- Dataset pre-processing and selection: General description of predictors and times needs several times reading and is still partially unclear.
- l. 133ff: “Lead time ... should not exceed the publication time of the target ice chart (15:00 UTC).” Contradiction to the use of AROME initialized at 18:00 UTC?
- l. 143ff: Why is the temporal development of the atmosphere between 15:00 to 18:00 UTC missed if AROME is initialized for 18:00 UTC?
- Fig. 2: Why not imitating how the sea-ice chart is produced by averaging 00:00 UTC to 15:00 UTC with the numerical systems?
- l. 150f: I get the argument that it needs less memory during the prediction, but to estimate mean fields, the data has nevertheless to be loaded.
- l. 154f: It doesn’t matter if the NN takes temporal structures or not into account. You could provide different timesteps as independent channels to the NN and the NN could extract the needed quantities itself. A stronger argument would be that you do feature engineering by using already aggregated statistics.
- Land-covered grid points (Page 6, l. 125): Unclear if used within the loss function? Do you use zero padding for U-Net? If yes, why not for the land grid points?
- It seems like the treatment of the sea-ice concentration by the use of cumulative contours is novel. It deserves its own subsection, which would improve the readability. Nevertheless, remains a bit ambiguous:
- How do you estimate the forecasted sea-ice concentration? L. 185 presents the forecasted SIC as sum over all contours. If the contours are between 0 and 1, the sum can be over 1. Do you mean instead the mean? If yes, how do you ensure that the neural network is continuous, meaning what do you do if threshold 50% is predicted and 30% is predicted but 40% has a very low probability? Can happen because of independent predictions.
- Important citation about different loss functions for the sea-ice concentration is missing (Kucik and Stokholm, 2022). How does the present study fit into their results?
Model implementation: - Although many details are given, some details remain unknown, e.g., how have you tested different architectures? Have you used the validation dataset for that? For which lead time? If multiple, what happens if you had different results for different lead times?
- l. 191f: How do you go from 64 to 256: by 64->128->256?
- l. 192: The network has at its bottleneck a width of 256 feature maps. The depth of the U-Net is 2, because of three stages. The depth of the whole network is the number of layers.
- l. 198: The explanation of the shared network can be improved. Have I understood it correctly that the network extracts common features and then the last (output) layer combines the features to the prediction of the contours?
- l. 200: Should be corrected to “The loss function is computed individually for all contours”. For “all layers” can be misleading as it could also mean that the loss function is estimated for each layer within the NN.
- l. 203: How much memory has the A100 GPU? There are two version with 40GB or with 80GB.
- l. 206: Learning rate and weight decay are two separate things. The learning rate determines how much of the gradient is added to the weights, and weight decay specifies the amount of normalization each weight experiences. Consequently, the learning rate cannot be weight decayed but just decayed.
- Since you mention weight decay, was any regularization used for the training of the neural network? If yes, please specify.
- The section could profit from splitting into several paragraphs, e.g., splitting at line 198 and 202.
- Figure 3: I guess that the target ice chart is only used during training to estimate the loss function and not as input for the U-Net. The figure can be misleading such that the predictors and the target ice chart appear as input into the U-Net? Does the U-Net refine the target ice chart?
Minor comments:
- Abstract: A bit unclear what the target of this study this? Sea-ice charts or continuous sea-ice concentration? The abstract could profit from slight restructuring, e.g., what makes this study important?
- Abstract, l. 5: with (future) ice charts as ground truth.
- Introduction: Many citations about past deep-learning-based approaches are given, however, be careful about what was forecasted. Some of the studies predict sea-ice concentration categories, others the sea-ice extent, while some also predict the sea-ice concentration directly. The introduction could profit from focusing on the mist important approaches, e.g., how much are they comparable to the here presented study? My suggestion would be to make it either clearer what has been forecasted in those studies or to concentrate only on studies that have the highest similarity to the task of predicting sea-ice charts.
- l. 73: what means high spatial resolution?
- l. 75: how are the ice charts gridded? also how are other products interpolated to the target resolution?
- l. 100: AROME covers most of the ice chart domain? What happens for grid points without AROME cover?
- Page 5: Has neXtSIM-F been running with Arome as atmospheric forcing? If not, what is the forcing for neXtSIM-F? Information missing.
- l. 158f: Years are wrong compared to the data period (2019-2022) and Table 2, wouldn’t it be rather: “We further split the data such that 2019 and 2020 is used for training, 2021 for validation, and 2022as test dataset.”?
- l. 240f: Why not averaging the physical models like the ice chart is created, by taking 00:00 UTC up to 15:00 UTC into account?
- l. 248: Width and not depth of the neural network.
- l. 250ff: Please specify that this information is not shown in the paper. Why not putting a proper ablation study into the Appendix of the paper?
- Figure 7: please use different colors than for Fig. 6, otherwise might be a bit misleading.
- Figure 7: Check if really because of smoothing: You could for example show the impact of smoothing in the true data on the ice edge length.
Citations:
Kucik and Stokholm, 2022, AI4SeaIce: selecting loss functions for automated SAR sea ice concentration charting, Nature Scientific Reports, https://doi.org/10.1038/s41598-023-32467-x
Citation: https://doi.org/10.5194/egusphere-2023-3107-RC2 - AC2: 'Reply on RC2', Are Frode Kvanum, 03 Jun 2024
- The proposed method performs better than the second-best method, a Eulerian persistence forecast. Additionally, the two different employed feature importance metrics indicate that the initial sea-ice chart is the most important predictor. From my experience, the shown difference between the deep learning method and persistence could be explained by advection of the sea ice. So, one could wonder how an advection-based (Lagrangian persistence) model would perform in these settings. Based on the wind velocities given by the AROME forecasts, the free-drift equations can be applied to obtain sea-ice velocities, then useable to advect the sea-ice concentration. My feeling says that this might work similarly well as the deep learning method.
Model code and software
Project repository Are Frode Kvanum https://github.com/AreFrode/Developing_ice_chart_deep_learning_predictions
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
426 | 137 | 36 | 599 | 25 | 19 |
- HTML: 426
- PDF: 137
- XML: 36
- Total: 599
- BibTeX: 25
- EndNote: 19
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
2 citations as recorded by crossref.
- The MET Norway Ice Service: a comprehensive review of the historical and future evolution, ice chart creation, and end user interaction within METAREA XIX W. Copeland et al. 10.3389/fmars.2024.1400479
- Improving short-term sea ice concentration forecasts using deep learning C. Palerme et al. 10.5194/tc-18-2161-2024