Assimilating high-resolution satellite snow cover data in a permafrost model
Abstract. In high-latitude and mountain areas, the seasonal snow cover features considerable variability over spatial scales of tens of meters which strongly impacts the ground thermal regime. In permafrost areas, spatial patterns of snow accumulation and melt strongly influence the landscape-scale response to atmospheric warming, creating a need to develop simulation tools that can represent such small-scale variability. In this study, we introduce a data assimilation scheme designed to integrate satellite-derived fractional snow covered area (FSCA) into the permafrost model CryoGrid. The idea is to reconstruct the winter snowpack from the satellite-derived timing of the meltout, which at the same time constrains the insulating effect of the snow cover and thereby improves simulations of the ground thermal regime. For this purpose, we employ an iterative ensemble-based data assimilation approach combing a Particle Batch Smoother with Adaptive Multiple Importance Sampling, which proves efficient to manage the high computational demands of the CryoGrid model. The outcomes of the study are assessed using spatially distributed measurements of ground surface temperature (GST) and snow water equivalent collected during four years in an area of approximately 0.5 km2 on Svalbard. The data assimilation scheme is evaluated using idealized, synthetic FSCA observations compiled from the GST measurements, as well as Sentinel-2 derived FSCA at a spatial resolution of 10 m. For both data sets, the assimilation workflow markedly enhances the representation of the annual GST cycle in the model when simulating extremes in the snow distribution (wind-exposed ridges and snowdrifts) at the scale of individual pixels. However, due to frequent cloudiness and spatial mismatches between the point-scale GST and the 10 m resolution Sentinel-2 data, the enhancements are less consistent using Sentinel-2 derived FSCA, even resulting in poorer performance in certain years and locations. The data assimilation scheme is further adapted to simulate GST distributions over the entire study area using snowmelt statistics compiled from the 10 m Sentinel-2 FSCA pixels. The results show that the algorithm can not only reproduce the observed spatial variability of mean annual GST of up to 4.5 °C, but also the temporal pattern of the spatial variability, with GST varying significantly more in space during snow-covered winter and the snowmelt period compared to the snow-free months. The study showcases that a close integration of observations and models can significantly enhance our ability to characterize the state of the terrestrial cryosphere. In particular, we demonstrate that the data assimilation can uncover the "hidden" variable GST when using a process-rich land surface model like CryoGrid.
Summary of the work: This paper introduces and evaluates a data assimilation method to integrate high-resolution snow cover area into the permafrost model CryoGrid to constrain the melt-out timing of snowpack and improve simulations of ground surface temperature (GST). The method is evaluated using both idealized, binary snow cover data derived from in-situ GST measurements and real fractional snow cover area (fSCA) observations from Sentinel-2 satellites. To manage the high computational demands of the model, the authors used an efficient, iterative method called Adaptive Particle Batch Smoother (AdaPBS). This approach combines a Particle Batch Smoother, a method widely used in cryosphere science, with Adaptive Multiple Importance Sampling. This study demonstrates that assimilating snow cover data improves GST simulations compared to a standard model run, especially at in-situ points with extreme snow conditions like deep snowdrifts and wind-swept ridges. The results also show that the integration of satellite observations into the permafrost model is a promising step forward. While this work is a valuable contribution, several key areas require clarification and revision.
Major comments:
My main concern is related to the assumptions underlying the idealized experiments, the justification for parameter selection, the clarity of the spatialization method, and some presentation of the results.
1. Assumptions and limitations of the “idealized” experiment:
The authors use GST-derived binary snow cover as an “idealized” benchmark for the data assimilation scheme. While this is a reasonable approach, the manuscript would be improved by an explicit and thorough discussion of the assumptions and limitations in the synthetic data. For example:
a. Derivation method: the method assumes that when in-situ GST follows the air temperature cycle, the ground is snow-free. The limitations of this assumption should be discussed. The method assumes a single, final melt-out date for each location ("FSCA=1 before this date and FSCA=0 = after"). It doesn't account for late-season snowfall events that might temporarily re-establish a snowpack after an initial melt.
b. Patchy Snow: A temperature logger measures a single point. It's possible for the ground immediately around the sensor to become bare while small patches of snow remain nearby. This can lead to a discrepancy between the synthetic data and the physical reality it is meant to represent.
More importantly, there is a circularity in the experiment design: the synthetic snow cover data is generated from the same GST measurements later used for verification. The authors should clarify the rationale for applying DA in a scenario where the ground truth is already used to construct the benchmark.
2. Justification of perturbed parameters:
This study perturbs the snow redistribution factor. However, many other critical factors influence the melt-out timing of fSCA. For example, temperature, shortwave radiation, and precipitation. These variables can have large uncertainties in the meteorological forcing dataset (e.g., ERA5). The authors should provide a clear justification for why only the snow redistribution parameters were selected for perturbation in the ensemble, while other significant sources of uncertainty were not considered.
3. Definition of the measurement error of fSCA:
The manuscript should clarify how the measurement error for fSCA was determined. Additionally, a justification is needed for using the same measurement error value for both the “idealized” synthetic data and remote sensing data, as the sources and magnitudes of their respective errors are likely different.
4. Clarity of the spatial DA method:
The description of how the point-scale DA results are spatialized is unclear and could be helpful by providing a more detailed explanation:
a. What exactly is the “area-weighted histogram of melt-out dates”, and how is it constructed from the fSCA data?
b. At which specific locations (e.g., all in-situ sites, a subset, etc.) are the point-scale DA simulations performed?
c. How is the histogram used to translate the point-scale DA results into the spatially distributed GST and SWE fields? An explanation for why the final outputs are presented as mean monthly and mean annual fields would also be helpful.
5. Presentation of results:
The benefit of the DA is not consistently demonstrated. It can be improved by considering:
a. For the spatial results (Sect. 4.3), figures should include the prior (open-loop, without DA) results alongside the posterior results and observations. This direct comparison is essential for visually demonstrating the improvement gained from DA.
b. It is strongly recommended to add a new figure for the idealized synthetic experiment that follows the format of Fig. 4. This figure should show time series of the prior ensemble fSCA, the synthetic fSCA, and the posterior ensemble fSCA. This would clearly demonstrate the DA mechanism in a controlled setting and provide a helpful reference for interpreting the real-data results. This would also show the characteristics of the synthetic data.
Minor comments:
1. Line 20: Please clarify what "spatial variability of 4.5 °C" refers to (e.g., standard deviation, range, etc.).
2. Line 92. Please remove the article "a" before "100 randomly selected points".
3. Fig.1. Please highlight the specific locations of the points used for the illustrative results (e.g., those shown in Figure 4) to provide better context.
4. Lines 118-120: For improved readability, consider moving the description of the model structure to the end of line 114, where the model is first introduced.
5. Line 148: The variable τ_i is described as a characteristic timescale. For clarity, this description should be associated with “timescale” rather than “wind transport” at the end of the sentence. It would also be helpful to introduce the intuition behind an index before presenting its equation.
6. Lines 145-164: These paragraphs explain the snow-redistribution variables. It would be beneficial to also explain how the model simulates the snow cover area (binary or fractional?), as this is the key variable being assimilated. How does the redistribution of snow influence the snow cover area?
7. Line 154: Please specify the units for the "amount of mobilized snow".
8. Line 163: The reference to the DA scheme feels premature here in a section introducing the model. Please clarify that the details of the parameter perturbation are discussed in Section 3.3.1, or move the sentence to that section.
9. Line 165: What is the temporal resolution (e.g., hourly, daily) of the meteorological forcing data? Please also describe the method used to downscale the coarse ERA5 data to the model's fine spatial resolution.
10. Line 175: The wind downscaling method is unclear. For example, what are the minimum/maximum wind speeds? It would be helpful to show an illustrative figure
11. Line 189: How was it determined if the daily cycle of GST was "in line with air temperatures"? Was a specific metric or threshold (e.g., on the temperature difference) used?
12. Line 198: Please describe the methodology used to visually distinguish clouds from snow in the satellite imagery, as both can appear bright white. For instance, were snapshots from adjacent dates compared?
13. Lines 203-217: In this general description of the DA equations, please specify what the variables y, z, and θ represent in the context of this particular study to make the methodology more concrete for the reader.
14. Line 226: When stating that the model is run for two years, with the second being of interest, please briefly explain the purpose of the first year (e.g., for model spin-up).
15. Fig. 3: Please specify the time step of the plotted GST variable (e.g., daily, monthly mean) in the figure caption.
16. Fig. 4. What do the whiskers on the fSCA observations represent (e.g., measurement error, standard deviation)? Also, is the prior fSCA modeled as a binary variable, with the grey curve representing the ensemble mean?
17. Fig. 4. The fSCA observations during the mid-melt season appear to have a limited influence on the posterior estimate. This is evident as the posterior curves (in blue) often diverge from the observations during this time. A discussion of this discrepancy would be beneficial.
18. Fig. 6. It is difficult to read the specific values (e.g., median, mode) of the peak SWE distributions discussed in the text from the histogram's x-axis. Please consider adding labels for these key values directly onto the plot. Also, please specify that "w.e." is the abbreviation for "water equivalent" in the caption.