Data-driven equation discovery of a sea ice albedo parametrisation
Abstract. In the Finite-Element Sea Ice Model (FESIM), a part of the Finite-Element Sea ice Ocean Model (FESOM), sea ice albedo is treated as a tuning parameter defined by four constant values depending on snow cover and surface temperature. This parametrisation is too simple to capture the spatiotemporal variability of observed sea ice albedo. Here, we aim for an improved parametrisation by discovering an interpretable, physically consistent equation for sea ice albedo using symbolic regression, an interpretable machine learning technique, combined with physical constraints. Leveraging daily pan-Arctic satellite and reanalyses data from 2013 to 2020, we apply sequential feature selection which identifies snow depth, surface temperature, sea ice thickness and 2 m air temperature as the most informative features for sea ice albedo. As a function of these features, our data-driven equation identifies two critical mechanisms for determining sea ice albedo: the high sensitivity of sea ice albedo to small changes in thin snow and a weighted difference of the sea ice surface and 2 m air temperature, serving as a seasonal proxy that indicates the transition between melting and freezing conditions. To understand how additional model complexity reduces errors, we evaluate our discovered equation against baseline models with different complexities, such as multilayer perceptron neural networks (NNs) and polynomials on an error-complexity plane, showing that the equation excels in balancing error and complexity and reduces the mean squared error by about 51 % compared to the current FESIM parametrisation. Unlike NNs, our discovered equation allows for further regional and seasonal analyses due to its inherent interpretability. By fine-tuning its coefficients we uncover differences in physical conditions that drive sea ice albedo. This study demonstrates that learning an equation from observational data can deepen the process-level understanding of the Arctic Ocean's surface radiative budget and improve climate projections.
General assessment
This paper presents a novel and well-executed application of symbolic regression for data-driven discovery of a sea ice albedo parametrisation based on satellite and reanalysis data. The authors demonstrate how an interpretable machine-learning approach can be used to derive a physically constrained, low-complexity functional form that substantially improves upon the very simple albedo parametrisation currently used in FESIM. The methodological framework is sound, the ML methods are appropriate and carefully applied, and the Pareto-optimal perspective on error versus complexity is particularly convincing.
The paper is well written, the figures are clear and convincing, and the presentation is very well structured. The appendices are comprehensive and helpful, and the authors do a good job at explaining and interpreting the discovered equation in physical terms. Overall, this is a very nice example of how observational and reanalysis datasets can be used in an innovative and effective way to inform parameterisations of physical processes in climate models.
My comments above are mostly in the spirit of model development and robustness, and aim at strengthening the physical interpretation, generality, and long-term applicability of the proposed approach.
Main limitations and suggestions for future work
In its current form, the study remains an offline, data-driven evaluation: a proper assessment of the impact of the new parametrisation within FESOM itself is still missing. While I do not think this is a blocker for the present paper, it is an important next step if the results are to become truly impactful for climate modelling.
In addition, the benchmarking is currently limited to statistical and ML-based baselines. The study would be significantly strengthened by a comparison against a more physically based or numerically robust albedo scheme, for exampleradiation schemes used in models such as Icepack, or a parameterisation like that of Holland et al. (2012, 10.1175/JCLI-D-11-00078.1). Even if such schemes are not implemented in FESOM, they would provide a much more meaningful physical reference than polynomials or neural networks.
More generally, some of my comments raise concerns about generalisability, in particular:
Finally, given that the study does not yet demonstrate the impact of the scheme inside FESOM, it might be worth toning down the emphasis on FESOM/FESIM in the framing. At present, the work is best seen as a general, observation-driven albedo parameterisation study rather than a demonstrated FESOM model improvement.
Overall recommendation
Overall, I find this to be a strong, original, and well-executed study that convincingly demonstrates the potential of interpretable machine learning for parameterisation development. The paper is already of high quality, and addressing (or at least discussing) the points raised above would further strengthen its physical robustness, scope, and long-term relevance for climate modelling.
Detailed comments
Line 25: The statement that thinner and younger sea ice necessarily has a lower albedo than thicker, older ice seems too general. While thicker ice can indeed accumulate more snow, which increases surface albedo, the albedo of bare ice does not necessarily decrease with decreasing ice age or thickness. In particular, younger ice typically has higher salinity, which may increase scattering and thus lead to higher bare-ice albedo. The authors may wish to clarify the conditions under which this statement holds (e.g. snow-covered vs. bare ice) or provide a supporting reference.
Line 115: It is unclear whether the reference here is still to PIOMAS. The sentence would benefit from greater clarity and, if appropriate, an explicit citation. In addition, the discussion would be strengthened by considering relevant recent work such as Cocetta et al. (2024, The Cryosphere), which may be pertinent to the points being made.
Line 132: Over the course of one week, sea ice can drift substantially. It is therefore unclear whether sea ice motion is accounted for in the weekly accumulation of snowfall and rainfall. Could the authors clarify whether these accumulations are computed in a Lagrangian (ice-following) or Eulerian framework? If not accounted for, this mismatch may introduce inconsistencies between surface forcing and ice properties. Relatedly, could sea ice velocity or the atmospheric wind beyond the day before provide additional predictive skill for albedo, particularly in regions with strong ice drift?
Line 188 (Point 3): With respect to point 3, please see my earlier comment regarding the role of salinity. The assumption that thicker ice necessarily has a higher albedo under freezing conditions may not always hold for bare ice. An explicit dependence on ice age, which is closely linked to salinity and microstructure, might therefore be more appropriate or at least worth discussing.
Line 229: The statement that snow is “the most reflective medium in nature” appears too strong and would benefit from clarification and a supporting reference. While fresh, clean snow is among the most reflective natural surfaces, particularly in the visible spectrum, its albedo depends strongly on grain size, age, wetness, and impurities, and decreases substantially in the near-infrared. Moreover, other natural media (e.g. certain cloud types) can exhibit higher broadband reflectance. A more precise phrasing such as “snow is among the most reflective natural surfaces, especially when fresh and dry” would be more accurate.
Line 239: An alternative explanation could be that sea ice thickness is strongly correlated with snow depth, such that ice thickness partly serves as a proxy for snow information. Given that all features are standardised during training, this correlation may lead the model to attribute predictive skill to ice thickness that in fact originates from snow-related information. Clarifying the degree of correlation between these variables and its impact on feature selection would strengthen the interpretation.
Line 245: While the interpretation is plausible in principle, in practice it should be treated with caution. The 2 m air temperature (T2m) is taken from a reanalysis product that does not assimilate sea ice or snow thickness, nor near-surface Arctic observations, apart from surface pressure from stations and drifting buoys. In contrast, T0m is derived from satellite observations. The fact that the machine-learning model does not fully exploit the strong physical relationship between T0m and T2m may therefore reflect inconsistencies between these two datasets rather than genuine additional predictive skill of T2m for sea ice albedo. This distinction would be worth discussing more explicitly.
Subsection 3.2.2: The interpretation of the weighted difference between T0m and T2m as a physically meaningful seasonal proxy deserves further scrutiny. The seasonal variation of this difference largely reflects a well-documented bias in the ERA5 reanalysis, which has been shown to be larger in winter than in summer (e.g. Batrak and Müller, 2019; Zampieri et al., 2023). As such, this signal may primarily encode characteristics of the reanalysis system rather than an intrinsic physical control on sea ice albedo.
This raises the question of whether it is desirable to base a model parametrisation on a feature that is strongly influenced by a known, non-stationary reanalysis bias. In particular, the robustness of the proposed parametrisation is unclear ifdifferent forcing datasets are used to run FESOM (e.g. a future ERA6 reanalysis or a coupled model configuration), wherethe magnitude or structure of near-surface temperature biases may differ substantially.
Moreover, ERA5 temperature biases are not static in time and are themselves affected by climate change. It is therefore uncertain whether the derived parametrisation would generalise well under future climate conditions.
If the intention is to encode seasonal information, a predictor more directly linked to the physical drivers of seasonality, such as variables related to insulation, radiative forcing, or surface energy balance, might be more appropriate than relying on the difference between two temperature fields with differing observational constraints.
A more explicit discussion of these limitations and their implications for generalisability would strengthen the manuscript.
Line 363: Even if these extreme albedo values are partly attributable to uncertainties or errors in the satellite retrieval and processing, it is still noteworthy that the model is unable to reproduce them. This may indicate limitations in the model’s flexibility or in the chosen functional form, and could merit a brief discussion.
Section 5.1: It would be interesting to compare the regional and monthly optimisation strategies with a single global optimisation that explicitly encodes spatial (e.g. longitude/latitude) and temporal (e.g. day of year) information within an Arctic-wide model. In practice, such an approach might be more usable and could avoid issues related to sharp transitions between predefined regions and seasons, while still allowing the model to represent spatial and seasonal variability.
Section 6 (Conclusions): The conclusions would benefit from a more detailed discussion of several important aspects related to the applicability and implications of the proposed parametrisation.
First, FESOM is a global model. It would therefore be important to discuss how this parametrisation might be applied to the Antarctic sea ice, and whether the authors expect the same functional form and predictors to remain meaningful in the Southern Hemisphere.
Second, it would be valuable to comment on whether the proposed parametrisation is expected to generalise beyond the present-day climate, for example to future climate change conditions or even to palaeoclimate regimes. Given the emphasis placed on the physical interpretability of the discovered equation, the authors are in a good position to discuss this point.
Third, FESOM is used in a variety of configurations (e.g. coupled versus ERA5-forced). Introducing this parametrisation removes an important tuning knob that currently exists in the system. Is there a plan to retain some degree of tunability (e.g. through selected coefficients), and can the authors provide recommendations for how this scheme should be integrated and tuned in different model configurations?
Addressing these points would strengthen the conclusions and clarify the practical implications for future model development and applications.