the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Data-driven equation discovery of a sea ice albedo parametrisation
Abstract. In the Finite-Element Sea Ice Model (FESIM), a part of the Finite-Element Sea ice Ocean Model (FESOM), sea ice albedo is treated as a tuning parameter defined by four constant values depending on snow cover and surface temperature. This parametrisation is too simple to capture the spatiotemporal variability of observed sea ice albedo. Here, we aim for an improved parametrisation by discovering an interpretable, physically consistent equation for sea ice albedo using symbolic regression, an interpretable machine learning technique, combined with physical constraints. Leveraging daily pan-Arctic satellite and reanalyses data from 2013 to 2020, we apply sequential feature selection which identifies snow depth, surface temperature, sea ice thickness and 2 m air temperature as the most informative features for sea ice albedo. As a function of these features, our data-driven equation identifies two critical mechanisms for determining sea ice albedo: the high sensitivity of sea ice albedo to small changes in thin snow and a weighted difference of the sea ice surface and 2 m air temperature, serving as a seasonal proxy that indicates the transition between melting and freezing conditions. To understand how additional model complexity reduces errors, we evaluate our discovered equation against baseline models with different complexities, such as multilayer perceptron neural networks (NNs) and polynomials on an error-complexity plane, showing that the equation excels in balancing error and complexity and reduces the mean squared error by about 51 % compared to the current FESIM parametrisation. Unlike NNs, our discovered equation allows for further regional and seasonal analyses due to its inherent interpretability. By fine-tuning its coefficients we uncover differences in physical conditions that drive sea ice albedo. This study demonstrates that learning an equation from observational data can deepen the process-level understanding of the Arctic Ocean's surface radiative budget and improve climate projections.
- Preprint
(15323 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-3556', Anonymous Referee #1, 06 Jan 2026
-
RC2: 'Comment on egusphere-2025-3556', Guillaume Boutin, 30 Jan 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3556/egusphere-2025-3556-RC2-supplement.pdf
-
RC3: 'Comment on egusphere-2025-3556', Anonymous Referee #3, 05 Feb 2026
This is a reference for the submission entitled “Data-driven equation discovery of a sea ice albedo parametrisation”. The authors use data-driven approaches to derive an albedo equation for use as a parametrisation in FESIM. This paper is a nice idea, and has novelties. It shows that AI has potential in learning and replacing parametrisations. The purpose is also clearly laid out (e.g. L65-69), well presented.
There are a few important points that I think should be addressed/clarified before recommending it for publication.
=====
- L145 - by doing this you are essentially committing the marginal ice zone. I think this is ok in your analysis but perhaps you should phrase this as a parametrisation for sea ice in the interior ice pack/not so much the MIZ. That is quite important for the future performance of the emulator: we know that the future of the sea ice surface will increasingly resemble fragmented, MIZ-like conditions, with lower sea ice concentration, thus it is a good scientific first step, but there should also be substantial caution in using such an parametrisation in longer term projections because it is not learning to emulate albedo in these MIZ like surfaces/regions.
- I think given the future of the sea ice will be very much like the MIZ, and this is approximately/overlapping with what you have ruled out, it should be noted in more than just this line e.g. in the title or abstract that this is an emulator for the central Arctic. By excluding lower sea ice concentrations, and approximately MIZ-like regions it has lesser value for future climate projections (as it is not trained on the regime that will dominate future summers). This needs to be noted more strongly, or all regions (including with low sea ice concentration should be kept in the training if the title/abstract is kept as is. Otherwise “A data-driven equation discovery of a sea ice albedo parametrisation … for interior icepack/central Arctic “. Essentially its use in climate models for future climate change scenarios should either be addressed by updating the science a bit or just the text should make it clearer as to potential limitations.
- L181 - when downsampling how do you ensure that you have some sort of representative sample of the whole dataset? I would assume to go from a few million to a few thousand you would need to have some tests/assurance it is representative. I think something needs to be done to show this is somehow a meaningful/representative sample - especially in the context of section 5. Hopefully this could be a relatively simple test.
- Section 3.2.2. - nice work!
- Eqn 8 - maybe it is better to use something like \alpha_{^} or some symbol to suggest that this is the albedo according to equation discovery from this dataset, and a proposed relation rather than definitely fundamental for the real albedo. It may help people track it but is a small issue.
- When you refer to Appendix C with the multiple equations it now casts doubt on the ability to infer “physical meaning from the first one”. Whilst they have near-identical MSE and lie at the pareto front. Yet their representations/equation forms are markedly different (e.g. presence or absence of tanh, others are quadratic, coupling etc.). This indicates equation discovery problem isn’t well-identified: multiple structurally different equations explain the data equally well. I think this should be noted - what you are showing is effectively that you cannot draw a single family of consistent equations using your approach, and it therefore weakens the validity in choosing any given one and then drawing physical meaning from that. Why not choose any another equation with nearly identical MSE and then gain a substantially different physical interpretation?
- Also I think the difference in the given equations should be highlighted/stated strongly here, rather than relegating it to the appendix. That equation discovery cannot give a similar family of equations seems to cast doubt on the approach, drawing in meaning from any given one. I do strongly appreciate these are novel first steps though. But seeing this reduces confidence in drawing physical interpretation from any equation.
- Fig 7 - this is quite a substantial difference it looks like both peaks/locations and particularly the magnitude/frequency are substantially different - in fact a frequency of over 4 times as high for the high albedo peak around .8 indicating that the models are not necessarily learning things associated with climate change (and more the central pack). I think it should be more heavily noted.
- L365 - I think it is slightly inconsistent to suggest making a pan-Arctic dataset for the value of including pan-Arctic observations (instead of one e.g. refined on MOSAiC only, and specifically because it includes more than MOSAiC which may be ‘the best’ for accuracy, but not pan-Arctic) and then when your emulator disagrees with the pan-Arctic data you cite MOSAiC as to why pan-Arctic data can be ignored. I favour your approach and a pan-Arctic design, but I do not think it works to then say that when your data disagrees with your model you can choose to ignore it as it has a wider variation than MOSAiC. You are precisely using pan-Arctic data as it represents more than a narrowed campaign and a narrow window in time, MOSAiC will not capture the full range of processes and values, and thus this is why you choose pan-Arctic data. I think it’s better to rewording that there is just a disagreement, and that neither the observational data nor the emulator is necessarily perfect.
- A comment after reading section (5): here it focuses on the regional variation and different processes e.g. in the Barents Sea, and this would supports that you can’t use a single campaign in a single region to override pan-Arctic conclusions. Thus again I think it is safest to say your emulator simply disagrees with these observations, whilst admitting neither is necessarily perfect (but not that the pan-Arctic observation ranges can be ignored as they don’t fit within MOSAiC, in the case that these observations disagree with your model).
- L383 - do you mean you are achieving an R2 score that is good because it is compared to the dataset it is trained on? (It covers 2013 onwards) Have you considered testing on a separate dataset to verify both against? Or likely testing on different times at very least? (Not e.g. 2013 onwards covering the training period) It seems over-enhancing the performance this way.
- The PW79 is quite old compared to other potential schemes - why is this one chosen? Surely there are better models and better albedo representations? If AI is to be competitive it should be compared to the more state of the art/most modern approaches/models?
- Fig 9 I find 4 very similar shadings all overlapping really quite hard to see what is going on.
- L474 - as regards again a physically “consistent” equation being used in the conclusions - you derived multiple equations that have substantially different terms i.e. they are inconsistent, yet all have nearlyidentical MSE. Does this not show that any given one of these is not truly valid for interpretation by definition because all would lead to different interpretations? Again if they were all very similar equations I think this is fairer to say it is consistent, and fair to then draw in meaning. I think you are demonstrating novel approaches, but at the same time currently not demonstrating the case that equation discovery can be used to come up with a single consistent or interpretable form.
- Is it possible to do this process in whatever way such that you have always multiple models of low MSE (e.g. top 5 models) and that are all very similar equation forms? This to me would be consistent. I find it hard to suggest one equation should be valued above another when their MSEs are so very close. Currently any equation could be chosen pretty reasonably. (Which is almost the opposite of the point of this as such)
- In fact the more I think about it, to use equation discovery I think you probably need to show either that you can do this such that your equations are all very similar in the Appendix and therefore you have identified the right data and methods such that you have found the genuine underlying physics, or if not maybe add in all 5 equation terms into the main text analyse them all for physical interpretations/drivers (similar to 3.2) and then explain why their differences in physical interpretations can be overlooked, why one should/could prefer ‘the’ one equation, and that it is the one you choose. Otherwise it seems like this approach is actually instead just showing that a single equation/specific derived physical meaning in fact cannot be found this way. I would anticipate under future forcings these could lead to notably different projections. Albeit it is indeed novel to try and I like the appeal.
- L495 - I do not fully understand. Do you mean that the AWI will be proceeding with FESIM (and not Icepack) thus it is a pragmatic choice to try to work with this model?
Citation: https://doi.org/10.5194/egusphere-2025-3556-RC3
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 310 | 183 | 35 | 528 | 23 | 23 |
- HTML: 310
- PDF: 183
- XML: 35
- Total: 528
- BibTeX: 23
- EndNote: 23
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General assessment
This paper presents a novel and well-executed application of symbolic regression for data-driven discovery of a sea ice albedo parametrisation based on satellite and reanalysis data. The authors demonstrate how an interpretable machine-learning approach can be used to derive a physically constrained, low-complexity functional form that substantially improves upon the very simple albedo parametrisation currently used in FESIM. The methodological framework is sound, the ML methods are appropriate and carefully applied, and the Pareto-optimal perspective on error versus complexity is particularly convincing.
The paper is well written, the figures are clear and convincing, and the presentation is very well structured. The appendices are comprehensive and helpful, and the authors do a good job at explaining and interpreting the discovered equation in physical terms. Overall, this is a very nice example of how observational and reanalysis datasets can be used in an innovative and effective way to inform parameterisations of physical processes in climate models.
My comments above are mostly in the spirit of model development and robustness, and aim at strengthening the physical interpretation, generality, and long-term applicability of the proposed approach.
Main limitations and suggestions for future work
In its current form, the study remains an offline, data-driven evaluation: a proper assessment of the impact of the new parametrisation within FESOM itself is still missing. While I do not think this is a blocker for the present paper, it is an important next step if the results are to become truly impactful for climate modelling.
In addition, the benchmarking is currently limited to statistical and ML-based baselines. The study would be significantly strengthened by a comparison against a more physically based or numerically robust albedo scheme, for exampleradiation schemes used in models such as Icepack, or a parameterisation like that of Holland et al. (2012, 10.1175/JCLI-D-11-00078.1). Even if such schemes are not implemented in FESOM, they would provide a much more meaningful physical reference than polynomials or neural networks.
More generally, some of my comments raise concerns about generalisability, in particular:
Finally, given that the study does not yet demonstrate the impact of the scheme inside FESOM, it might be worth toning down the emphasis on FESOM/FESIM in the framing. At present, the work is best seen as a general, observation-driven albedo parameterisation study rather than a demonstrated FESOM model improvement.
Overall recommendation
Overall, I find this to be a strong, original, and well-executed study that convincingly demonstrates the potential of interpretable machine learning for parameterisation development. The paper is already of high quality, and addressing (or at least discussing) the points raised above would further strengthen its physical robustness, scope, and long-term relevance for climate modelling.
Detailed comments
Line 25: The statement that thinner and younger sea ice necessarily has a lower albedo than thicker, older ice seems too general. While thicker ice can indeed accumulate more snow, which increases surface albedo, the albedo of bare ice does not necessarily decrease with decreasing ice age or thickness. In particular, younger ice typically has higher salinity, which may increase scattering and thus lead to higher bare-ice albedo. The authors may wish to clarify the conditions under which this statement holds (e.g. snow-covered vs. bare ice) or provide a supporting reference.
Line 115: It is unclear whether the reference here is still to PIOMAS. The sentence would benefit from greater clarity and, if appropriate, an explicit citation. In addition, the discussion would be strengthened by considering relevant recent work such as Cocetta et al. (2024, The Cryosphere), which may be pertinent to the points being made.
Line 132: Over the course of one week, sea ice can drift substantially. It is therefore unclear whether sea ice motion is accounted for in the weekly accumulation of snowfall and rainfall. Could the authors clarify whether these accumulations are computed in a Lagrangian (ice-following) or Eulerian framework? If not accounted for, this mismatch may introduce inconsistencies between surface forcing and ice properties. Relatedly, could sea ice velocity or the atmospheric wind beyond the day before provide additional predictive skill for albedo, particularly in regions with strong ice drift?
Line 188 (Point 3): With respect to point 3, please see my earlier comment regarding the role of salinity. The assumption that thicker ice necessarily has a higher albedo under freezing conditions may not always hold for bare ice. An explicit dependence on ice age, which is closely linked to salinity and microstructure, might therefore be more appropriate or at least worth discussing.
Line 229: The statement that snow is “the most reflective medium in nature” appears too strong and would benefit from clarification and a supporting reference. While fresh, clean snow is among the most reflective natural surfaces, particularly in the visible spectrum, its albedo depends strongly on grain size, age, wetness, and impurities, and decreases substantially in the near-infrared. Moreover, other natural media (e.g. certain cloud types) can exhibit higher broadband reflectance. A more precise phrasing such as “snow is among the most reflective natural surfaces, especially when fresh and dry” would be more accurate.
Line 239: An alternative explanation could be that sea ice thickness is strongly correlated with snow depth, such that ice thickness partly serves as a proxy for snow information. Given that all features are standardised during training, this correlation may lead the model to attribute predictive skill to ice thickness that in fact originates from snow-related information. Clarifying the degree of correlation between these variables and its impact on feature selection would strengthen the interpretation.
Line 245: While the interpretation is plausible in principle, in practice it should be treated with caution. The 2 m air temperature (T2m) is taken from a reanalysis product that does not assimilate sea ice or snow thickness, nor near-surface Arctic observations, apart from surface pressure from stations and drifting buoys. In contrast, T0m is derived from satellite observations. The fact that the machine-learning model does not fully exploit the strong physical relationship between T0m and T2m may therefore reflect inconsistencies between these two datasets rather than genuine additional predictive skill of T2m for sea ice albedo. This distinction would be worth discussing more explicitly.
Subsection 3.2.2: The interpretation of the weighted difference between T0m and T2m as a physically meaningful seasonal proxy deserves further scrutiny. The seasonal variation of this difference largely reflects a well-documented bias in the ERA5 reanalysis, which has been shown to be larger in winter than in summer (e.g. Batrak and Müller, 2019; Zampieri et al., 2023). As such, this signal may primarily encode characteristics of the reanalysis system rather than an intrinsic physical control on sea ice albedo.
This raises the question of whether it is desirable to base a model parametrisation on a feature that is strongly influenced by a known, non-stationary reanalysis bias. In particular, the robustness of the proposed parametrisation is unclear ifdifferent forcing datasets are used to run FESOM (e.g. a future ERA6 reanalysis or a coupled model configuration), wherethe magnitude or structure of near-surface temperature biases may differ substantially.
Moreover, ERA5 temperature biases are not static in time and are themselves affected by climate change. It is therefore uncertain whether the derived parametrisation would generalise well under future climate conditions.
If the intention is to encode seasonal information, a predictor more directly linked to the physical drivers of seasonality, such as variables related to insulation, radiative forcing, or surface energy balance, might be more appropriate than relying on the difference between two temperature fields with differing observational constraints.
A more explicit discussion of these limitations and their implications for generalisability would strengthen the manuscript.
Line 363: Even if these extreme albedo values are partly attributable to uncertainties or errors in the satellite retrieval and processing, it is still noteworthy that the model is unable to reproduce them. This may indicate limitations in the model’s flexibility or in the chosen functional form, and could merit a brief discussion.
Section 5.1: It would be interesting to compare the regional and monthly optimisation strategies with a single global optimisation that explicitly encodes spatial (e.g. longitude/latitude) and temporal (e.g. day of year) information within an Arctic-wide model. In practice, such an approach might be more usable and could avoid issues related to sharp transitions between predefined regions and seasons, while still allowing the model to represent spatial and seasonal variability.
Section 6 (Conclusions): The conclusions would benefit from a more detailed discussion of several important aspects related to the applicability and implications of the proposed parametrisation.
First, FESOM is a global model. It would therefore be important to discuss how this parametrisation might be applied to the Antarctic sea ice, and whether the authors expect the same functional form and predictors to remain meaningful in the Southern Hemisphere.
Second, it would be valuable to comment on whether the proposed parametrisation is expected to generalise beyond the present-day climate, for example to future climate change conditions or even to palaeoclimate regimes. Given the emphasis placed on the physical interpretability of the discovered equation, the authors are in a good position to discuss this point.
Third, FESOM is used in a variety of configurations (e.g. coupled versus ERA5-forced). Introducing this parametrisation removes an important tuning knob that currently exists in the system. Is there a plan to retain some degree of tunability (e.g. through selected coefficients), and can the authors provide recommendations for how this scheme should be integrated and tuned in different model configurations?
Addressing these points would strengthen the conclusions and clarify the practical implications for future model development and applications.