A Deep Learning Approach for Lake Ice Cover Forecasting
Abstract. Lakes cover a significant proportion of the high-latitude landscape and exert a strong influence on local weather and climate. Their seasonal lake ice cover (LIC) further impacts lake-atmosphere interactions, while also providing key socioeconomic services for northern communities. Climate change is impacting LIC and its thickness, two thematic products of Lakes as an Essential Climate Variable (ECV). Accurate prediction of LIC improves numerical weather prediction (e.g. lake-effect snowfall and thermal moderation) and is crucial for anticipating the impacts of climate change in lake-rich regions of the Northern Hemisphere.
This paper introduces LIF-DL (Lake Ice Forecasting using Deep Learning), a novel data-driven model for forecasting LIC extent across entire lake surfaces. LIF-DL uses Spatial-Temporal Transformer Networks (STTN) to capture relationships between lake conditions (ice and open water), lake depth and atmospheric forcings. The study focuses on five large Canadian lakes with pronounced ice phenology: Great Slave Lake, Great Bear Lake, Lake Winnipeg, Lake Athabasca, and Reindeer Lake. Data sources included ice cover observations from the Interactive Multi-Sensor Snow and Ice Monitoring System (IMS), atmospheric reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF) 5th generation of European ReAnalysis (ERA5 and ERA5-Land), and Canadian Ice Service (CIS) records for external validation. To benchmark the proposed approach against a traditional physics-based model, the widely used Freshwater Lake (FLake) model embedded in ERA5 and ERA5-Land was employed. LIF-DL was trained to produce one-week forecasts using data from 2004–2017 and then deployed auto-regressively to predict ice cover during the 2018–2022 holdout period. Forecasts were evaluated against IMS and CIS observations and compared with those from FLake.
Across all evaluations—phenology timing, ice cover fraction, and spatial patterning—LIF-DL consistently outperformed FLake. Freeze-up and break-up events were predicted within 3–9 days of observations (versus 5–22 days for FLake), and ice cover fraction (range 0–1) root mean squared errors were reduced (0.06–0.16 versus 0.1–0.2). A key advantage of LIF-DL was its capacity to represent spatial dependencies across lake surfaces, producing coherent freeze-up and break-up dynamics and realistic spatial clustering of early and late ice timing compared to the fragmented patterns of FLake. These improvements reduced extreme timing biases—from as much as 30 days to only 4–6 days—particularly for large, deep lakes. Variable importance analysis indicated sensitivity to physically meaningful drivers, including air temperature, accumulated degree days, solar radiation, and lake depth, suggesting that LIF-DL learned relevant physical processes rather than statistical artifacts. Finally, the model maintained stable performance when iteratively forecasting over a four-year period, demonstrating robustness under varying atmospheric conditions.
The demonstrated accuracy, robustness, and physical interpretability of LIF-DL highlight the potential of deep learning for advancing lake ice modelling. Future research should focus on integrating physical constraints to develop hybrid physics-machine learning frameworks, improving model interpretability, and expanding to new predictive variables such as ice thickness and snow cover. Leveraging emerging high-resolution satellite datasets will further enhance spatial fidelity and enable application to smaller lakes. Ultimately, spatiotemporal deep learning represents a transformative step toward next-generation, spatially resolved lake ice forecasts that can improve weather and climate prediction, inform northern transportation planning, and support climate change adaptation in lake-rich regions of the Northern Hemisphere.
The manuscript addresses an important modelling problem concerning the cryosphere with a novel spatial machine learning approach. Generally, the manuscript is well written and well organised, and it is particularly strong in its evaluation, the interpretation of results, and the generation of insights. The scientific context is sufficiently laid out and limitations are carefully addressed throughout the manuscript. Data and code are provided to a detailed extent. Nonetheless, there are two major concerns, primarily regarding the clarity in introducing the machine learning model, and the framing of this work as a forecasting system:
These two concerns should be addressed by a revised manuscript.
Below I present more minor line-wise comments by section:
Abstract:
Line 6: Lakes only cover a significant proportion of Northern high-latitude landscape, but not the Southern high-latitude landscape. Insert "Northern" to make this distinction clear.
Line 8: Mentioning lake ice thickness this early on alongside LIC may lead readers to assume that lake ice thickness is also modelled in this paper. For the abstract I recommend focussing on the key variable modelled in this paper.
Line 9: The word "lakes" should not be capitalised here.
Line 9: Double "prediction": Potentially change the first "prediction" to "forecasting".
Line 15: Changing "lake conditions" to "lake phase" or "lake state" could help to avoid confusion about what conditions are modelled.
Line 15: I recommend changing the order to naming inputs first and outputs second, as such: "[…] to capture relationships between atmospheric forcings, lake depth, and lake phase (frozen or open water)."
Line 16: I suggest sticking to one order of naming the five lakes throughout the paper (e.g. the order used in Figure 1).
Line 21: Referring to "one-week-ahead" forecasts would be clearer here. Maybe also specify that the model makes daily LIC predictions at 4 km spatial resolution, and that forecasts are 1 to 7 days ahead. Referring to the task as a segmentation task would also add more clarity early on.
The abstract exceeds typical word count limits and should be shortened.
Introduction:
Line 51: Bracket "(freeze-up/break-up)" is not necessary here and harms reading flow. This is already explained in the latter part of the same sentence.
Line 52: Replace "stimulating" e.g. with "leading to".
Line 55: Potentially relate to similar trends observed in sea ice to provide wider scientific context.
Line 63: "Lakes" does not need to be capitalised.
Line 73: What composition variable is CLIMo predicting?
Line 73: I suggest replacing "more wholly" with something like "more comprehensively".
Line 74: Specify what aspect of the model is two-layer, and explain what type of model FLake is.
Line 85: I suggest referring to this as the "point-wise gridded application of one-dimensional lake models" to conform with the wider literature. "Multiple points" is not specific enough.
Line 90: No comma needed: "Data-driven deep learning approaches […]."
(Line 91: Include additional citations such as perhaps
Line 92: The positioning of the subsentence ", such as Spatial-Temporal Transformer Networks […]", is not ideal, as it may rather suggest that STTNs are datasets.
Line 96: Towards reads oddly.
Line 99: I believe this should say "develop a deep learning model" rather than "develop a model using deep learning"?
Line 101: Be more specific about the adaptation of the pre-existing model: Was STTN extended?
Line 102: The text only just mentioned that the STTN was developed for video inpainting.
Data Sources:
Line 112: North Pole should be capitalised.
Line 115: Improve "applied persistence". Is temporal extrapolation used to fill data gaps? Or say "lake phase was assumed to remain unchanged (or stationary in time) when no data was available".
Line 120: Add a sentence to explain how these two "ground truth" datasets differ and foreshadow which one is used in this study. Also mention the spatial resolution of CIS.
Line 135: Potentially say "two additional temporally aggregated variables".
Line 138: From what I understand this is the sum of days with temperatures below/above 0, not the sum of temperatures. Table 1 also misspecifies this. Is this calculated for each calendar year or for 365 days following 1 August? Maybe add a sentence to convey the intent here ("freezing days since the last summer are accumulated…")
Line 164: Replace "also" with "additionally" to make clear that LIF-DL does not predict these.
Study Lakes:
Like 180: Figure 1: Order of the lake plots: Tile 3 is usually expected on the left (reading direction).
Line 181: Table 1: Make the text in Table 1 left-bounded for the ease of readability (particularly the leftmost column). There also is an issue with the relative humidity row. Fix AFDD and ATDD description: Some places suggest this is the number of days while others suggest this is a temperature.
Data preprocessing:
Line 195: Explain why nearest neighbour interpolation/regridding was chosen over e.g. bilinear interpolation.
Line 197: Why do we need one-hot encoding when "masked" is not part of the prediction task? Inference can just be run for test regions.
Line 208: "To provide additional testing, CIS records and FLake model predictions were used over the testing period." How else were these used?
Line 209: "For some of the lakes in this study, the CIS record divided the lake body into two sections. These separate records were combined and averaged to obtain a single ice cover observation for the entire lake." This is not clear to me. Why did various records exist for the same areas?
LIF-DL:
Line 215: Point to Figure 5 for the complete model visualisation.
Line 217: Specify that it produces a daily one-week-ahead forecast (or a 1- to 7-day-ahead forecast).
Line 217: "Parametrization" means something different in the context of machine learning. I suggest saying "forecast horizon" or "supervised learning set-up" to avoid confusion.
Line 221: See general comment on this point. Why would we assume to have access to ERA data for the future? This is reanalysis data, not forecast data.
Line 225: This schematic lacks clarity: The "first time step" gate is not very logical and inputs and outputs should have variable names, and formally introduced temporal indexing.
Line 231: At this point it is not clear at all that this is a sequence-to-sequence set-up. Formal notation (in addition to a slight hint in Figure 5) in the main text must be used to comprehensively introduce inputs, outputs, their dimensionalities and time indices. This is a main weakness of the manuscript.
Line 235: Figure 3, 4 and 5 are not very well connected in the text or visualisations. "MODEL" in Figure 3 should be replaced with the LIF-DL, and the three inputs into the STTN blocks (Q,K, V) from Figure 5 should match the inputs shown in Figure 4 for more coherence and clarity. Visually integrating the dual-branch encoder-STTN-decoder set-up in Figure 3 would be beneficial. Figure 4 is not needed if the reader can refer to existing literature and there are no novel aspects presented.
Line 245: The pre-defined Transformer architecture already utilises "multiple layers". What is meant here? Stacking multiple layers of Transformers or using the original Transformer architecture?
Line 250: You may want to refer to this as a dual-branch architecture.
Line 252: Relating to overall comment: It is not clear at this point in the text that future atmospheric forcing data is assumed to be available.
Line 264: The Figure caption is not sufficient and variables names and indexing needs to be used.
Model Optimisation:
Line 265: Maybe change to "hyperparameter tuning and parameter/model training" for parallelism and clarity.
Line 271: This is an unusual description since the model also is built in Pytoch.
Line 272: Calling this a "custom loss" if one filtering operation is applied is a bit of an overstatement.
Evaluation Methods:
Line 297: Table 2 may be moved to the Appendix.
Line 312: From line 293 I expect a comparison with both IMS and CIS.
Line 327: Mention if this is done per lake.
Line 352: It is not clear enough what a one-dimensional fraction of ice cover time series is. This is why notation (e.g. tensor notation) is necessary.
Results and Discussion:
Line 374: Maybe change to "the thermodynamics of lakes drive […]".
Line 379: Figure 6: In the top right corner or figure caption repeat the definitions of the Freeze-up and Break-up seasons and make clear that these are "variable importance estimates for predictions during freeze-up and break-up seasons".
Line 385: Air temperature does not need to be capitalised.
Line 420: Table 3: Consider displaying an additional digit to make differences a bit more clear. The superior performance should be highlighted in some way (applies to all results tables).
Line 467: Figure 7: Add (proposed) and (ground truth) labels for clarity. Maybe add a visualisation of the errors in the Appendix.
Line 526: Figure 11: FLake and LIF-DL are visually hard to discern (orange and red dotted/dashed lines are too similar). Maybe only chose zoomed in view on a selection of FUS/BUS segments.
I acknowledge the significant work contained in this manuscript and encourage the authors to address the two major and additional minor weaknesses. The FUS and BUS perspectives in the evaluation as well as the variable importance analysis already provide significant scientific insight. For future work I would also suggest considering “teacher forcing” as a training strategies for auto-regressive roll-outs, and a short discussion of the importance of lake ice cover to indigenous communities. Another research avenue would be to train a unified model across the full region.