the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Data-driven surrogate modeling of high-resolution sea-ice thickness in the Arctic
Abstract. A novel generation of sea-ice models with Elasto-Brittle rheologies, such as neXtSIM, can represent sea-ice processes with an unprecedented accuracy at the mesoscale, for resolutions of around 10 km. As these models are computationally expensive, we introduce supervised deep learning techniques for surrogate modeling of the sea-ice thickness from neXtSIM simulations. We adapt a convolutional UNet architecture to an Arctic-wide setup by taking the land-sea mask with partial convolutions into account. Trained to emulate the sea-ice thickness on a lead time of 12 hours, the neural network can be iteratively applied to predictions up to a year. The improvements of the surrogate model over a persistence forecast prevail from 12 hours to roughly a year, with improvements of up to 50 % in the forecast error. The predictability of the sea-ice thickness measured against a daily climatology additionally lays by around 8 months. By using atmospheric forcings as additional input, the surrogate model can represent advective and thermodynamical processes, which influence the sea-ice thickness and the growth and melting therein. While iterating, the surrogate model experiences diffusive processes, which result into a loss of fine-scale structures. However, this smoothing increases the coherence of large-scale features and hereby the stability of the model. Therefore, based on these results, we see a huge potential for surrogate modelling of state-of-art sea-ice models with neural networks.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(11511 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(11511 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1384', Nils Hutter, 27 Oct 2023
Review of “Data-driven surrogate modeling of high-resolution sea-ice thickness in the Arctic”
by Durand et al.
Reviewer: Nils Hutter
In this manuscript, the authors present a machine-learning based surrogate model of the numerical sea-ice model neXtSIM. The presented surrogate model simulates sea-ice thickness and its predictions outperform the climatology benchmark for lead times up to 8 months. The findings of this study are a valuable addition to the field and illustrate how machine learning can be used to reduce the computational costs of sea-ice simulations, e.g. in the context of ensemble forecasting. The main shortcoming of the surrogate model presented, however, is that it simulates very smooth thickness fields compared to the feature-rich neXtSIM input data used for training that includes for example leads. The authors address and analyze this issue in the manuscript, but I still have the major comments outlined below regarding the presentation, analysis, and interpretation of this point that should be addressed before I can recommend this paper for publication.
Major comments:
Smoothness of simulated ice thickness fields:
The authors use simulations of neXtSIM, known for its ability to resolve deformation features and heterogeneous sea ice fields, to train the NN emulator. The surrogate model presented in this manuscript is not able to retain these features over multiple iterations and quickly smoothes the sea ice fields, resulting in thickness fields much more similar to coarse-resolution sea-ice simulations. The authors argue that the smoothed version better minimizes the RMSE (MSE used for training) compared to a model that retains these features and potentially gets penalized for misplacing them. Therefore, the model learns to predict smooth fields and mimics large-scale circulation. While this makes sense in light of the cost function used, I have the following issues with how this fact is presented and interpreted:
- Why do you use high resolution in the first place? In the abstract and introduction, the authors make the valid point that current models that simulate small-scale features, like e.g. leads, are computationally very expensive and that a surrogate model would be of great benefit here. However, the presented surrogate model is not able to simulate these fine-scale features, but “just” the large-scale dynamics. Now, I am wondering why it is necessary to use the feature-rich simulations for training. In the introduction the authors suggest that “small-scale effects have an advantage on representing the thermodynamics of sea ice”, but I am not aware of modeling studies that have proven this point comprehensively. Now I am wondering if the same results could also be achieved with a coarse resolution model that also resolves the large-scale circulation (and is much cheaper to run). Once having trained the NN on coarse resolution model output, the authors should comment on whether there is an additional benefit in using the high-resolution input data that is currently used.
- Are the smoothed fields really simulating large-scale dynamics and are the presented methods sufficient to show this? The authors compare their NN results against persistence and find increased skill of the surrogate model. They attribute this skill to the fact that the model learned the large-scale dynamics. In Fig. 10 one can see that the model only outperforms persistence forecasts in periods of rapidly changing ice cover and thickness (melt and early freeze period). Couldn’t we find the same behavior as well, if the model learned the climatology of ice thickness and relaxes the input sea ice thickness to this climatology? This would be a better benchmark to beat to justify the authors’ claim of learned physics. In general, there should be a more in-depth analysis to demonstrate that the model learned large-scale dynamics, for example, how does the integrated ice edge error (Goessling et al., 2016) varies for different lead times (a quantitative analysis of the qualitative comparison in Appendix D), or a more quantitative evaluation of ice drift started in Fig. 6. Another possibility would be to compare the model skill at different spatial scales by e.g. coarse-graining the predictions. Currently, metrics based on pixel values are shown that always include the effect of missing features. In a scale-dependent analysis, the authors could see up to which scale the model has improved skill and if the large-scale variations are represented appropriately. Or can similar information be extracted from your power spectrum analysis?
- If smooth fields are better for prediction, why should the scientific community then at all pursue developing feature-rich models like neXtSIM in a prediction context?
- The authors suggest that the surrogate model will be of great advantage in computing the adjoint model in variational data assimilation or generating larger ensemble sizes. I have four comments on this: 1) The surrogate model smoothened the input fields and by doing so will reduce the spread of an ensemble, potentially limiting its use for data assimilation. 2) Given the smoothness of the simulated thickness fields and the strong differences with the feature-rich input fields, do you think the surrogate and the numerical model are similar enough to use the surrogate for the adjoint, especially over longer assimilation windows? 3) Does using the surrogate model as adjoint work that easily given the interpolation from unstructured to regular grid? 4) To properly use the surrogate model in data assimilation for both creating an ensemble or the adjoint, more model variables should be simulated than just sea ice thickness. Please comment on all these points in the manuscript.
- Is it in general not possible to achieve a higher degree of details in the ice thickness or is it your outlined training and network architecture that hinders it? In the introduction, you state that the cost function plays a major role here, but the manuscript lacks suggestions how to potentially overcome this issue. Please outline potential ways forward in the paper.
The authors should consider these points and adapt the manuscript accordingly.
Text quality:
The manuscript follows a clear structure, but the text is in passages hard to read and follow and clearly requires further editing. In parts, words are missing or sentences are half-finished. In times of automated language editing tools, more thorough language editing is possible also for non-native speakers, and I highly encourage the authors to make use of these tools in the future.
Specific comments:
First paragraph of introduction -> Given that the surrogate model is unlikely to be used for long climate simulations that are mostly described in this paragraph, I recommend tailoring this introduction more towards the actual use cases of such a model, like short-term predictions etc. Please consider adding a few sentences about this.
L27: “of the Arctic” -> Consider removing “of the Arctic” as neither CICE nor SI3 are limited to the Arctic
L28: “road” -> route?
L32: “Divergent features in the ice, like leads and polynyas” -> Polynyas are not necessarily formed by divergence.
L34-35: “Consequently, models correctly representing the effects of such small-scale can have also an advantage in representing the thermodynamics of sea ice.” -> Could you please add references, on which studies you base this general statement? In my eyes, it is still an ongoing research question if and what advantage these directly resolved small-scale features have in contrast to parameterizations currently used in climate models.
L34: “small-scale” -> features? Processes? A word seems missing here.
L39: “benefit” -> benefits
L60: “Explained differently, the surrogate model is trained to reduce errors” -> I do not see how this explains the sentences before differently, it basically says the same as the first sentence in L59. Please clarify.
L65: “learn” -> train?
L72: “surrogate model” -> simulate?
L84: “model area” -> It is not clear if this is the area of the neXtSIM simulations or of the NN. Please clarify the text accordingly.
L106: “Because these forcings are also to guide the neXtSIM simulations” -> From this statement I assume that the neXtSIM simulations are also forced with ERA5. Please clarify this already earlier on in the text to prevent confusion.
L131: “add to the inputs the SIT” -> Didn’t you write above that SIT is already an input? Why add it again? Please clarify.
L133-134: “there are called later ’with 2 inputs’. Otherwise, the neural networks are trained ’with 1 input’ “ -> Could you please add those labels for clarity to Table1.
L152: “(Rampal et al., 2019)” -> This reference is somehow misleading at it is not clear that it only refers to the multi scale features in sea-ice dynamics, and not to the ability of CNNs to represent those. Please clarify this, or remove the citation here.
Figure 2 Caption: “512, 256, and 128,” -> The figure also shows images of size 64. Please correct.
L165: “sea” -> ocean?
L188: “global mean of x and y” -> As x and y have a physical meaning, it would be helpful for readers if you could also write what the local and global loss mean with respect to sea ice, e.g. local and global trends in sea ice thickness.
L192: “λ is manually tuned to 100” -> What do you optimize for, how do you manually decide on best performance? Please clarify.
L224-225: “over all pixels (i, j) of the field of size (Nx , Ny )” -> Also over land pixels? Including land pixels in the RMSE will artificially reduce its value.
L235-237: “We define two terms: the first one N>σacc indicates the number of pixels where xtn+k∆t and xftn+k∆t disagree on the presence of sea ice, and the second one N<σacc where the models disagree on the presence of open water.” -> This is not clear to me. For all pixels, where the two masks disagree, one will show ice and the other will show open water. Shouldn’t therefore not also both terms be the same? Please check those definitions and clarify.
L250: “kx and ky” -> Please rename the indexes x and y to not confuse them with the input of the model x and the output y.
L254: “justified” -> caused?
L259-260: “In practice, this exponent can be numerically estimated by a linear regression between lnE and ln∥k∥.” -> multiple studies show that linear fits in double logarithmic plots are not ideal for determining power-law exponents, e.g. Clauset et al. (2009). Please elaborate why you chose this method. Also how does you metric takes into account if such a scaling actually exists or not, or are you computing exponents regardless of the distribution?
L 270: “in” -> on
L281-282: “However, the impact of the global Eq. 6) on the RMSE relatively small compared to the influence of including additional time steps.” -> This sentence does not fit to the observed results. When adding the constraint to the 1 input NN the global RMSE reduces by an order of magnitude (as written in the sentence before). Comparing both unconstrained NNs, the global RMSE reduces to about 25% when including additional time steps, so a much lower reduction compared to including the constraint.
L. 283: “The impact of adding..” -> Just “Adding…”
L288: “reduce” -> the RMSE increases!
L 289: “surrogate RMSE” -> what is the surrogate RMSE?
L291-292: “after 12 hours, the global RMSE has improved by a factor 9.4 for the one input surrogate” -> Please comment why there is so little improvement for the 2 input NN.
P 307: “leaded” -> ??
L308: “both biases” -> which biases?
L310: “higher likelihood of errors being introduced in the input data” -> What kind of errors are you talking about here? The input data is taken from a model simulation where all data points should be consistent with each other. Except for numerical precision, these data should not have a considerate uncertainty as for instance satellite observations. Please clarify.
L310-316: “As we cycle the neural network, …” -> Do you want to say that the 2-input NN is able to represent a higher degree of nonlinear physics and therefore shows more chaotic behavior?
L 324-326: “The consistent performance of this model across different evaluation metrics and scenarios further validates its reliability and 325 robustness. This surrogate configuration is able to capture the essential features and patterns of SIT dynamics, enabling more accurate predictions compared to other configurations.” -> Please add references to the Tables and Figures with the results you are referring to.
Figure 4. -> Please add which NN are displayed with or without constraint.
L334: “in” -> on
Table 4: 1) Please add at which lead time these statistics are computed. 2) Fig. 5 -> Fig. 5a
Figure6. “surrogate model” -> which of the four models is actually shown here?
Figure 6. “The trajectories for 30 days are shown in red for neXtSIM and yellow for the surrogate model.” -> Please use different colors to not confuse them with the ice extent plotted in the same colors in subfigure a) and b).
L.354-360: “In order to verify this visual impression,…” -> This entire paragraph requires a more in-depth analysis. What is the separation of the two trajectories over time, etc. Also, more than just the four trajectories would be helpful to better quantify these errors.
L 356. “important crack” -> What is important about the crack?
L 360: “but these differences do not indicate incoherent or erratic behavior.” -> Unclear what is meant by this. The deviations are errors?
L363-365: “The observation of a smoothing effect on fine-scale features which increases with the forecast lead time aligns with our expectations” -> This sentence is unfortunately formulated in a misleading way. If properly forecasted fine scale feature would improve the forecast skill. Only if assumed that the model is unable to properly place the features a smoothed forecast might outperform the fine-scale forecast.
L371: “8” -> Fig. 8
L374: “important” -> important for what?
L.375: “decrease” -> decreases
L379-380: “We hypothesize that the neural network has attained its resolution capacity for a correct advection of the sea-ice on the global scale by reducing the fine-scale dynamics that is inherently chaotic and stochastic.” -> It is unclear to me what is meant by this sentence and how it would lead to more structure in the forecasted ice fields. Please elaborate.
L 383: “arctic” -> Arctic
L386: “initialization periods evenly distributed during that period” -> Be more specific: initialized every month?
L. 387: “In the appendix see Fig. D3” -> Does not fit to the rest of the sentence, please rewrite.
L 387: “propose” -> ?
L.389: “In the bottom panel of the figure” -> Which figure are you talking about? Fig.D3 does not show global average SIT…
L391-401: “This consistency…” -> This paragraph is hard to understand and the described hypothesis is hard to follow. Please clarify and add a more comprehensive analysis to justify your points raised.
L400-401: “As anticipated, the surrogate model performs significantly better than persistence during periods of high variation, particularly during summer and autumn.” -> What is in the other seasons? From Fig.10 it looks like the surrogate model only clearly outperforms persistence from August/September to January, while in spring there seems to be no skill. Please elaborate on this and clarify in which periods there is no gain over persistence.
L406: “This opens the perspective to run a large ensemble of simulations for complex sea-ice models, which can facilitate data assimilation.” -> Please discuss how this fits to the smoothening effect of the model. It might be hard to create an ensemble spread if the model blurs all features. (See major comment above)
Figure 8 caption: “blue” -> orange?
L411: “has reached its resolution capacity for correctly simulating the advection of sea ice on a global scale.” -> This sentence appears the second time in the text and it is unclear to me what is meant with it.
L417: “This hypothesis implies that the surrogate model focuses on capturing the dominant advection patterns that drive the overall behavior of sea ice, while sacrificing some of the finer details.” -> This sounds a bit too active for a computer model to me. Isn’t that focus determined by the researchers defining the cost function that the model aims to minimize while training? Please comment on strategies how to overcome this issue, e.g. new loss functions, more training data, etc. Or do you think a NN is unable to reproduce these fine-scale features at all? (See major comment above)
L 425: “have important information for the prediction from the physical model” -> Unclear what is meant with this!
L432: “instantiation” -> ?
L436: “similarly simulated” -> The simulated fields are very smooth and hardly similar in nature to the feature-rich fields that neXtSIM is capable of simulating.
Appendix C1 “Partial Convolution algorithm” -> Here seems to be text missing.
References:
- Goessling, H. F., Tietsche, S., Day, J. J., Hawkins, E., and Jung, T. (2016), Predictability of the Arctic sea ice edge, Geophys. Res. Lett., 43, 1642–1650, doi:10.1002/2015GL067232.
- Clauset, A., Shalizi, C. R. & Newman, M. E. J. Power-Law Distributions in Empirical Data. Siam Rev 51, 661–703 (2009).
Citation: https://doi.org/10.5194/egusphere-2023-1384-RC1 - AC1: 'Reply on RC1', Charlotte Durand, 15 Dec 2023
-
RC2: 'Comment on egusphere-2023-1384', Anonymous Referee #2, 15 Nov 2023
The paper presents a strong case of surrogate modeling by using neural networks to emulate the increase in sea ice thickness, however, the paper lacks clarity at several places in the manuscript and requires minor revisions:
1. There is little information provided on the choice of atmospheric variables considered as forcings. Please provide more evidence from literature on this.
2. If the neural network is designed for future forecasting, none of the input features should belong to the same timestep as the target. In case of this paper, all the atmospheric variables are of same timestep whereas like SIT, they should also be up till 't' timestep. You can justify through experiments how the current setting performs better than the one suggested.
3. There are some minor errors that should be corrected:
>> UNet by definition is not a convolutional architecture but it is an encoder-decoder Neural Network architecture with skip-connections. There are several papers utilizing LSTM-based UNet or ConvLSTM-based UNet.
>> Andersson et al. did not propose IceNet for SIC prediction. Their work targets SIP predictions which is slightly different from SIC.
4. There are several other recent papers that utilize CNN, ConvLSTM and LSTM for SIC predictions. There is not enough convincing argument present on just relying on UNet for the surrogate model. Did the authors try a CNN or ConvLSTM based architecture for surrogate modeling?
5. How was 100 decided as the optimal value of lambda? Did you experiment with other values of lambda in calculating the global loss?
6. What is the timestep used in case of longterm forecasting?
7. Did the authors consider using custom loss function instead of partial convolution to incorporate land-mask into the modeling?
Ref:
1. Ebert-Uphoff, Imme, et al. "CIRA Guide to Custom Loss Functions for Neural Networks in Environmental Sciences--Version 1." arXiv preprint arXiv:2106.09757 (2021).
2. Ali, Sahara, and Jianwu Wang. "MT-IceNet-A Spatial and Multi-Temporal Deep Learning Model for Arctic Sea Ice Forecasting." 2022 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT). IEEE, 2022.
3. Kim, Eliot, et al. "Multi-task deep learning based spatiotemporal arctic sea ice forecasting." 2021 IEEE International Conference on Big Data (Big Data). IEEE, 2021.Citation: https://doi.org/10.5194/egusphere-2023-1384-RC2 - AC2: 'Reply on RC2', Charlotte Durand, 15 Dec 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1384', Nils Hutter, 27 Oct 2023
Review of “Data-driven surrogate modeling of high-resolution sea-ice thickness in the Arctic”
by Durand et al.
Reviewer: Nils Hutter
In this manuscript, the authors present a machine-learning based surrogate model of the numerical sea-ice model neXtSIM. The presented surrogate model simulates sea-ice thickness and its predictions outperform the climatology benchmark for lead times up to 8 months. The findings of this study are a valuable addition to the field and illustrate how machine learning can be used to reduce the computational costs of sea-ice simulations, e.g. in the context of ensemble forecasting. The main shortcoming of the surrogate model presented, however, is that it simulates very smooth thickness fields compared to the feature-rich neXtSIM input data used for training that includes for example leads. The authors address and analyze this issue in the manuscript, but I still have the major comments outlined below regarding the presentation, analysis, and interpretation of this point that should be addressed before I can recommend this paper for publication.
Major comments:
Smoothness of simulated ice thickness fields:
The authors use simulations of neXtSIM, known for its ability to resolve deformation features and heterogeneous sea ice fields, to train the NN emulator. The surrogate model presented in this manuscript is not able to retain these features over multiple iterations and quickly smoothes the sea ice fields, resulting in thickness fields much more similar to coarse-resolution sea-ice simulations. The authors argue that the smoothed version better minimizes the RMSE (MSE used for training) compared to a model that retains these features and potentially gets penalized for misplacing them. Therefore, the model learns to predict smooth fields and mimics large-scale circulation. While this makes sense in light of the cost function used, I have the following issues with how this fact is presented and interpreted:
- Why do you use high resolution in the first place? In the abstract and introduction, the authors make the valid point that current models that simulate small-scale features, like e.g. leads, are computationally very expensive and that a surrogate model would be of great benefit here. However, the presented surrogate model is not able to simulate these fine-scale features, but “just” the large-scale dynamics. Now, I am wondering why it is necessary to use the feature-rich simulations for training. In the introduction the authors suggest that “small-scale effects have an advantage on representing the thermodynamics of sea ice”, but I am not aware of modeling studies that have proven this point comprehensively. Now I am wondering if the same results could also be achieved with a coarse resolution model that also resolves the large-scale circulation (and is much cheaper to run). Once having trained the NN on coarse resolution model output, the authors should comment on whether there is an additional benefit in using the high-resolution input data that is currently used.
- Are the smoothed fields really simulating large-scale dynamics and are the presented methods sufficient to show this? The authors compare their NN results against persistence and find increased skill of the surrogate model. They attribute this skill to the fact that the model learned the large-scale dynamics. In Fig. 10 one can see that the model only outperforms persistence forecasts in periods of rapidly changing ice cover and thickness (melt and early freeze period). Couldn’t we find the same behavior as well, if the model learned the climatology of ice thickness and relaxes the input sea ice thickness to this climatology? This would be a better benchmark to beat to justify the authors’ claim of learned physics. In general, there should be a more in-depth analysis to demonstrate that the model learned large-scale dynamics, for example, how does the integrated ice edge error (Goessling et al., 2016) varies for different lead times (a quantitative analysis of the qualitative comparison in Appendix D), or a more quantitative evaluation of ice drift started in Fig. 6. Another possibility would be to compare the model skill at different spatial scales by e.g. coarse-graining the predictions. Currently, metrics based on pixel values are shown that always include the effect of missing features. In a scale-dependent analysis, the authors could see up to which scale the model has improved skill and if the large-scale variations are represented appropriately. Or can similar information be extracted from your power spectrum analysis?
- If smooth fields are better for prediction, why should the scientific community then at all pursue developing feature-rich models like neXtSIM in a prediction context?
- The authors suggest that the surrogate model will be of great advantage in computing the adjoint model in variational data assimilation or generating larger ensemble sizes. I have four comments on this: 1) The surrogate model smoothened the input fields and by doing so will reduce the spread of an ensemble, potentially limiting its use for data assimilation. 2) Given the smoothness of the simulated thickness fields and the strong differences with the feature-rich input fields, do you think the surrogate and the numerical model are similar enough to use the surrogate for the adjoint, especially over longer assimilation windows? 3) Does using the surrogate model as adjoint work that easily given the interpolation from unstructured to regular grid? 4) To properly use the surrogate model in data assimilation for both creating an ensemble or the adjoint, more model variables should be simulated than just sea ice thickness. Please comment on all these points in the manuscript.
- Is it in general not possible to achieve a higher degree of details in the ice thickness or is it your outlined training and network architecture that hinders it? In the introduction, you state that the cost function plays a major role here, but the manuscript lacks suggestions how to potentially overcome this issue. Please outline potential ways forward in the paper.
The authors should consider these points and adapt the manuscript accordingly.
Text quality:
The manuscript follows a clear structure, but the text is in passages hard to read and follow and clearly requires further editing. In parts, words are missing or sentences are half-finished. In times of automated language editing tools, more thorough language editing is possible also for non-native speakers, and I highly encourage the authors to make use of these tools in the future.
Specific comments:
First paragraph of introduction -> Given that the surrogate model is unlikely to be used for long climate simulations that are mostly described in this paragraph, I recommend tailoring this introduction more towards the actual use cases of such a model, like short-term predictions etc. Please consider adding a few sentences about this.
L27: “of the Arctic” -> Consider removing “of the Arctic” as neither CICE nor SI3 are limited to the Arctic
L28: “road” -> route?
L32: “Divergent features in the ice, like leads and polynyas” -> Polynyas are not necessarily formed by divergence.
L34-35: “Consequently, models correctly representing the effects of such small-scale can have also an advantage in representing the thermodynamics of sea ice.” -> Could you please add references, on which studies you base this general statement? In my eyes, it is still an ongoing research question if and what advantage these directly resolved small-scale features have in contrast to parameterizations currently used in climate models.
L34: “small-scale” -> features? Processes? A word seems missing here.
L39: “benefit” -> benefits
L60: “Explained differently, the surrogate model is trained to reduce errors” -> I do not see how this explains the sentences before differently, it basically says the same as the first sentence in L59. Please clarify.
L65: “learn” -> train?
L72: “surrogate model” -> simulate?
L84: “model area” -> It is not clear if this is the area of the neXtSIM simulations or of the NN. Please clarify the text accordingly.
L106: “Because these forcings are also to guide the neXtSIM simulations” -> From this statement I assume that the neXtSIM simulations are also forced with ERA5. Please clarify this already earlier on in the text to prevent confusion.
L131: “add to the inputs the SIT” -> Didn’t you write above that SIT is already an input? Why add it again? Please clarify.
L133-134: “there are called later ’with 2 inputs’. Otherwise, the neural networks are trained ’with 1 input’ “ -> Could you please add those labels for clarity to Table1.
L152: “(Rampal et al., 2019)” -> This reference is somehow misleading at it is not clear that it only refers to the multi scale features in sea-ice dynamics, and not to the ability of CNNs to represent those. Please clarify this, or remove the citation here.
Figure 2 Caption: “512, 256, and 128,” -> The figure also shows images of size 64. Please correct.
L165: “sea” -> ocean?
L188: “global mean of x and y” -> As x and y have a physical meaning, it would be helpful for readers if you could also write what the local and global loss mean with respect to sea ice, e.g. local and global trends in sea ice thickness.
L192: “λ is manually tuned to 100” -> What do you optimize for, how do you manually decide on best performance? Please clarify.
L224-225: “over all pixels (i, j) of the field of size (Nx , Ny )” -> Also over land pixels? Including land pixels in the RMSE will artificially reduce its value.
L235-237: “We define two terms: the first one N>σacc indicates the number of pixels where xtn+k∆t and xftn+k∆t disagree on the presence of sea ice, and the second one N<σacc where the models disagree on the presence of open water.” -> This is not clear to me. For all pixels, where the two masks disagree, one will show ice and the other will show open water. Shouldn’t therefore not also both terms be the same? Please check those definitions and clarify.
L250: “kx and ky” -> Please rename the indexes x and y to not confuse them with the input of the model x and the output y.
L254: “justified” -> caused?
L259-260: “In practice, this exponent can be numerically estimated by a linear regression between lnE and ln∥k∥.” -> multiple studies show that linear fits in double logarithmic plots are not ideal for determining power-law exponents, e.g. Clauset et al. (2009). Please elaborate why you chose this method. Also how does you metric takes into account if such a scaling actually exists or not, or are you computing exponents regardless of the distribution?
L 270: “in” -> on
L281-282: “However, the impact of the global Eq. 6) on the RMSE relatively small compared to the influence of including additional time steps.” -> This sentence does not fit to the observed results. When adding the constraint to the 1 input NN the global RMSE reduces by an order of magnitude (as written in the sentence before). Comparing both unconstrained NNs, the global RMSE reduces to about 25% when including additional time steps, so a much lower reduction compared to including the constraint.
L. 283: “The impact of adding..” -> Just “Adding…”
L288: “reduce” -> the RMSE increases!
L 289: “surrogate RMSE” -> what is the surrogate RMSE?
L291-292: “after 12 hours, the global RMSE has improved by a factor 9.4 for the one input surrogate” -> Please comment why there is so little improvement for the 2 input NN.
P 307: “leaded” -> ??
L308: “both biases” -> which biases?
L310: “higher likelihood of errors being introduced in the input data” -> What kind of errors are you talking about here? The input data is taken from a model simulation where all data points should be consistent with each other. Except for numerical precision, these data should not have a considerate uncertainty as for instance satellite observations. Please clarify.
L310-316: “As we cycle the neural network, …” -> Do you want to say that the 2-input NN is able to represent a higher degree of nonlinear physics and therefore shows more chaotic behavior?
L 324-326: “The consistent performance of this model across different evaluation metrics and scenarios further validates its reliability and 325 robustness. This surrogate configuration is able to capture the essential features and patterns of SIT dynamics, enabling more accurate predictions compared to other configurations.” -> Please add references to the Tables and Figures with the results you are referring to.
Figure 4. -> Please add which NN are displayed with or without constraint.
L334: “in” -> on
Table 4: 1) Please add at which lead time these statistics are computed. 2) Fig. 5 -> Fig. 5a
Figure6. “surrogate model” -> which of the four models is actually shown here?
Figure 6. “The trajectories for 30 days are shown in red for neXtSIM and yellow for the surrogate model.” -> Please use different colors to not confuse them with the ice extent plotted in the same colors in subfigure a) and b).
L.354-360: “In order to verify this visual impression,…” -> This entire paragraph requires a more in-depth analysis. What is the separation of the two trajectories over time, etc. Also, more than just the four trajectories would be helpful to better quantify these errors.
L 356. “important crack” -> What is important about the crack?
L 360: “but these differences do not indicate incoherent or erratic behavior.” -> Unclear what is meant by this. The deviations are errors?
L363-365: “The observation of a smoothing effect on fine-scale features which increases with the forecast lead time aligns with our expectations” -> This sentence is unfortunately formulated in a misleading way. If properly forecasted fine scale feature would improve the forecast skill. Only if assumed that the model is unable to properly place the features a smoothed forecast might outperform the fine-scale forecast.
L371: “8” -> Fig. 8
L374: “important” -> important for what?
L.375: “decrease” -> decreases
L379-380: “We hypothesize that the neural network has attained its resolution capacity for a correct advection of the sea-ice on the global scale by reducing the fine-scale dynamics that is inherently chaotic and stochastic.” -> It is unclear to me what is meant by this sentence and how it would lead to more structure in the forecasted ice fields. Please elaborate.
L 383: “arctic” -> Arctic
L386: “initialization periods evenly distributed during that period” -> Be more specific: initialized every month?
L. 387: “In the appendix see Fig. D3” -> Does not fit to the rest of the sentence, please rewrite.
L 387: “propose” -> ?
L.389: “In the bottom panel of the figure” -> Which figure are you talking about? Fig.D3 does not show global average SIT…
L391-401: “This consistency…” -> This paragraph is hard to understand and the described hypothesis is hard to follow. Please clarify and add a more comprehensive analysis to justify your points raised.
L400-401: “As anticipated, the surrogate model performs significantly better than persistence during periods of high variation, particularly during summer and autumn.” -> What is in the other seasons? From Fig.10 it looks like the surrogate model only clearly outperforms persistence from August/September to January, while in spring there seems to be no skill. Please elaborate on this and clarify in which periods there is no gain over persistence.
L406: “This opens the perspective to run a large ensemble of simulations for complex sea-ice models, which can facilitate data assimilation.” -> Please discuss how this fits to the smoothening effect of the model. It might be hard to create an ensemble spread if the model blurs all features. (See major comment above)
Figure 8 caption: “blue” -> orange?
L411: “has reached its resolution capacity for correctly simulating the advection of sea ice on a global scale.” -> This sentence appears the second time in the text and it is unclear to me what is meant with it.
L417: “This hypothesis implies that the surrogate model focuses on capturing the dominant advection patterns that drive the overall behavior of sea ice, while sacrificing some of the finer details.” -> This sounds a bit too active for a computer model to me. Isn’t that focus determined by the researchers defining the cost function that the model aims to minimize while training? Please comment on strategies how to overcome this issue, e.g. new loss functions, more training data, etc. Or do you think a NN is unable to reproduce these fine-scale features at all? (See major comment above)
L 425: “have important information for the prediction from the physical model” -> Unclear what is meant with this!
L432: “instantiation” -> ?
L436: “similarly simulated” -> The simulated fields are very smooth and hardly similar in nature to the feature-rich fields that neXtSIM is capable of simulating.
Appendix C1 “Partial Convolution algorithm” -> Here seems to be text missing.
References:
- Goessling, H. F., Tietsche, S., Day, J. J., Hawkins, E., and Jung, T. (2016), Predictability of the Arctic sea ice edge, Geophys. Res. Lett., 43, 1642–1650, doi:10.1002/2015GL067232.
- Clauset, A., Shalizi, C. R. & Newman, M. E. J. Power-Law Distributions in Empirical Data. Siam Rev 51, 661–703 (2009).
Citation: https://doi.org/10.5194/egusphere-2023-1384-RC1 - AC1: 'Reply on RC1', Charlotte Durand, 15 Dec 2023
-
RC2: 'Comment on egusphere-2023-1384', Anonymous Referee #2, 15 Nov 2023
The paper presents a strong case of surrogate modeling by using neural networks to emulate the increase in sea ice thickness, however, the paper lacks clarity at several places in the manuscript and requires minor revisions:
1. There is little information provided on the choice of atmospheric variables considered as forcings. Please provide more evidence from literature on this.
2. If the neural network is designed for future forecasting, none of the input features should belong to the same timestep as the target. In case of this paper, all the atmospheric variables are of same timestep whereas like SIT, they should also be up till 't' timestep. You can justify through experiments how the current setting performs better than the one suggested.
3. There are some minor errors that should be corrected:
>> UNet by definition is not a convolutional architecture but it is an encoder-decoder Neural Network architecture with skip-connections. There are several papers utilizing LSTM-based UNet or ConvLSTM-based UNet.
>> Andersson et al. did not propose IceNet for SIC prediction. Their work targets SIP predictions which is slightly different from SIC.
4. There are several other recent papers that utilize CNN, ConvLSTM and LSTM for SIC predictions. There is not enough convincing argument present on just relying on UNet for the surrogate model. Did the authors try a CNN or ConvLSTM based architecture for surrogate modeling?
5. How was 100 decided as the optimal value of lambda? Did you experiment with other values of lambda in calculating the global loss?
6. What is the timestep used in case of longterm forecasting?
7. Did the authors consider using custom loss function instead of partial convolution to incorporate land-mask into the modeling?
Ref:
1. Ebert-Uphoff, Imme, et al. "CIRA Guide to Custom Loss Functions for Neural Networks in Environmental Sciences--Version 1." arXiv preprint arXiv:2106.09757 (2021).
2. Ali, Sahara, and Jianwu Wang. "MT-IceNet-A Spatial and Multi-Temporal Deep Learning Model for Arctic Sea Ice Forecasting." 2022 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT). IEEE, 2022.
3. Kim, Eliot, et al. "Multi-task deep learning based spatiotemporal arctic sea ice forecasting." 2021 IEEE International Conference on Big Data (Big Data). IEEE, 2021.Citation: https://doi.org/10.5194/egusphere-2023-1384-RC2 - AC2: 'Reply on RC2', Charlotte Durand, 15 Dec 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
411 | 160 | 27 | 598 | 15 | 16 |
- HTML: 411
- PDF: 160
- XML: 27
- Total: 598
- BibTeX: 15
- EndNote: 16
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
1 citations as recorded by crossref.
Charlotte Durand
Tobias Sebastian Finn
Alban Farchi
Marc Bocquet
Einar Òlason
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(11511 KB) - Metadata XML