Assimilating high-resolution satellite snow cover data in a permafrost model

Willmes, Clarissa; Aalstad, Kristoffer; Westermann, Sebastian

doi:10.5194/egusphere-2025-3142

Preprints

https://doi.org/10.5194/egusphere-2025-3142

Preprints

17 Sep 2025

| 17 Sep 2025

Assimilating high-resolution satellite snow cover data in a permafrost model

Clarissa Willmes, Kristoffer Aalstad, and Sebastian Westermann

Abstract. In high-latitude and mountain areas, the seasonal snow cover features considerable variability over spatial scales of tens of meters which strongly impacts the ground thermal regime. In permafrost areas, spatial patterns of snow accumulation and melt strongly influence the landscape-scale response to atmospheric warming, creating a need to develop simulation tools that can represent such small-scale variability. In this study, we introduce a data assimilation scheme designed to integrate satellite-derived fractional snow covered area (FSCA) into the permafrost model CryoGrid. The idea is to reconstruct the winter snowpack from the satellite-derived timing of the meltout, which at the same time constrains the insulating effect of the snow cover and thereby improves simulations of the ground thermal regime. For this purpose, we employ an iterative ensemble-based data assimilation approach combing a Particle Batch Smoother with Adaptive Multiple Importance Sampling, which proves efficient to manage the high computational demands of the CryoGrid model. The outcomes of the study are assessed using spatially distributed measurements of ground surface temperature (GST) and snow water equivalent collected during four years in an area of approximately 0.5 km² on Svalbard. The data assimilation scheme is evaluated using idealized, synthetic FSCA observations compiled from the GST measurements, as well as Sentinel-2 derived FSCA at a spatial resolution of 10 m. For both data sets, the assimilation workflow markedly enhances the representation of the annual GST cycle in the model when simulating extremes in the snow distribution (wind-exposed ridges and snowdrifts) at the scale of individual pixels. However, due to frequent cloudiness and spatial mismatches between the point-scale GST and the 10 m resolution Sentinel-2 data, the enhancements are less consistent using Sentinel-2 derived FSCA, even resulting in poorer performance in certain years and locations. The data assimilation scheme is further adapted to simulate GST distributions over the entire study area using snowmelt statistics compiled from the 10 m Sentinel-2 FSCA pixels. The results show that the algorithm can not only reproduce the observed spatial variability of mean annual GST of up to 4.5 °C, but also the temporal pattern of the spatial variability, with GST varying significantly more in space during snow-covered winter and the snowmelt period compared to the snow-free months. The study showcases that a close integration of observations and models can significantly enhance our ability to characterize the state of the terrestrial cryosphere. In particular, we demonstrate that the data assimilation can uncover the "hidden" variable GST when using a process-rich land surface model like CryoGrid.

Received: 01 Jul 2025 – Discussion started: 17 Sep 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3326 KB)

Supplement (4538 KB)

Download & links

Clarissa Willmes, Kristoffer Aalstad, and Sebastian Westermann

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-3142', Anonymous Referee #1, 16 Oct 2025

Summary of the work: This paper introduces and evaluates a data assimilation method to integrate high-resolution snow cover area into the permafrost model CryoGrid to constrain the melt-out timing of snowpack and improve simulations of ground surface temperature (GST). The method is evaluated using both idealized, binary snow cover data derived from in-situ GST measurements and real fractional snow cover area (fSCA) observations from Sentinel-2 satellites. To manage the high computational demands of the model, the authors used an efficient, iterative method called Adaptive Particle Batch Smoother (AdaPBS). This approach combines a Particle Batch Smoother, a method widely used in cryosphere science, with Adaptive Multiple Importance Sampling. This study demonstrates that assimilating snow cover data improves GST simulations compared to a standard model run, especially at in-situ points with extreme snow conditions like deep snowdrifts and wind-swept ridges. The results also show that the integration of satellite observations into the permafrost model is a promising step forward. While this work is a valuable contribution, several key areas require clarification and revision.
Major comments:

My main concern is related to the assumptions underlying the idealized experiments, the justification for parameter selection, the clarity of the spatialization method, and some presentation of the results.

1. Assumptions and limitations of the “idealized” experiment:

The authors use GST-derived binary snow cover as an “idealized” benchmark for the data assimilation scheme. While this is a reasonable approach, the manuscript would be improved by an explicit and thorough discussion of the assumptions and limitations in the synthetic data. For example:

a. Derivation method: the method assumes that when in-situ GST follows the air temperature cycle, the ground is snow-free. The limitations of this assumption should be discussed. The method assumes a single, final melt-out date for each location ("FSCA=1 before this date and FSCA=0 = after"). It doesn't account for late-season snowfall events that might temporarily re-establish a snowpack after an initial melt.

b. Patchy Snow: A temperature logger measures a single point. It's possible for the ground immediately around the sensor to become bare while small patches of snow remain nearby. This can lead to a discrepancy between the synthetic data and the physical reality it is meant to represent.

More importantly, there is a circularity in the experiment design: the synthetic snow cover data is generated from the same GST measurements later used for verification. The authors should clarify the rationale for applying DA in a scenario where the ground truth is already used to construct the benchmark.
2. Justification of perturbed parameters:

This study perturbs the snow redistribution factor. However, many other critical factors influence the melt-out timing of fSCA. For example, temperature, shortwave radiation, and precipitation. These variables can have large uncertainties in the meteorological forcing dataset (e.g., ERA5). The authors should provide a clear justification for why only the snow redistribution parameters were selected for perturbation in the ensemble, while other significant sources of uncertainty were not considered.
3. Definition of the measurement error of fSCA:

The manuscript should clarify how the measurement error for fSCA was determined. Additionally, a justification is needed for using the same measurement error value for both the “idealized” synthetic data and remote sensing data, as the sources and magnitudes of their respective errors are likely different.
4. Clarity of the spatial DA method:

The description of how the point-scale DA results are spatialized is unclear and could be helpful by providing a more detailed explanation:

a. What exactly is the “area-weighted histogram of melt-out dates”, and how is it constructed from the fSCA data?

b. At which specific locations (e.g., all in-situ sites, a subset, etc.) are the point-scale DA simulations performed?

c. How is the histogram used to translate the point-scale DA results into the spatially distributed GST and SWE fields? An explanation for why the final outputs are presented as mean monthly and mean annual fields would also be helpful.
5. Presentation of results:

The benefit of the DA is not consistently demonstrated. It can be improved by considering:

a. For the spatial results (Sect. 4.3), figures should include the prior (open-loop, without DA) results alongside the posterior results and observations. This direct comparison is essential for visually demonstrating the improvement gained from DA.

b. It is strongly recommended to add a new figure for the idealized synthetic experiment that follows the format of Fig. 4. This figure should show time series of the prior ensemble fSCA, the synthetic fSCA, and the posterior ensemble fSCA. This would clearly demonstrate the DA mechanism in a controlled setting and provide a helpful reference for interpreting the real-data results. This would also show the characteristics of the synthetic data.
Minor comments:

1. Line 20: Please clarify what "spatial variability of 4.5 °C" refers to (e.g., standard deviation, range, etc.).

2. Line 92. Please remove the article "a" before "100 randomly selected points".

3. Fig.1. Please highlight the specific locations of the points used for the illustrative results (e.g., those shown in Figure 4) to provide better context.

4. Lines 118-120: For improved readability, consider moving the description of the model structure to the end of line 114, where the model is first introduced.

5. Line 148: The variable τ_i is described as a characteristic timescale. For clarity, this description should be associated with “timescale” rather than “wind transport” at the end of the sentence. It would also be helpful to introduce the intuition behind an index before presenting its equation.

6. Lines 145-164: These paragraphs explain the snow-redistribution variables. It would be beneficial to also explain how the model simulates the snow cover area (binary or fractional?), as this is the key variable being assimilated. How does the redistribution of snow influence the snow cover area?

7. Line 154: Please specify the units for the "amount of mobilized snow".

8. Line 163: The reference to the DA scheme feels premature here in a section introducing the model. Please clarify that the details of the parameter perturbation are discussed in Section 3.3.1, or move the sentence to that section.

9. Line 165: What is the temporal resolution (e.g., hourly, daily) of the meteorological forcing data? Please also describe the method used to downscale the coarse ERA5 data to the model's fine spatial resolution.

10. Line 175: The wind downscaling method is unclear. For example, what are the minimum/maximum wind speeds? It would be helpful to show an illustrative figure

11. Line 189: How was it determined if the daily cycle of GST was "in line with air temperatures"? Was a specific metric or threshold (e.g., on the temperature difference) used?

12. Line 198: Please describe the methodology used to visually distinguish clouds from snow in the satellite imagery, as both can appear bright white. For instance, were snapshots from adjacent dates compared?

13. Lines 203-217: In this general description of the DA equations, please specify what the variables y, z, and θ represent in the context of this particular study to make the methodology more concrete for the reader.

14. Line 226: When stating that the model is run for two years, with the second being of interest, please briefly explain the purpose of the first year (e.g., for model spin-up).

15. Fig. 3: Please specify the time step of the plotted GST variable (e.g., daily, monthly mean) in the figure caption.

16. Fig. 4. What do the whiskers on the fSCA observations represent (e.g., measurement error, standard deviation)? Also, is the prior fSCA modeled as a binary variable, with the grey curve representing the ensemble mean?

17. Fig. 4. The fSCA observations during the mid-melt season appear to have a limited influence on the posterior estimate. This is evident as the posterior curves (in blue) often diverge from the observations during this time. A discussion of this discrepancy would be beneficial.

18. Fig. 6. It is difficult to read the specific values (e.g., median, mode) of the peak SWE distributions discussed in the text from the histogram's x-axis. Please consider adding labels for these key values directly onto the plot. Also, please specify that "w.e." is the abbreviation for "water equivalent" in the caption.

Citation: https://doi.org/10.5194/egusphere-2025-3142-RC1
- AC1: 'Reply on RC1', Clarissa Willmes, 21 May 2026
  
  We thank the reviewer for the constructive criticism and insightful comments, which have substantially improved the quality of the manuscript. Below, we provide point-by-point responses to all comments and questions. Reviewer comments are presented in bold font, our responses in regular font, and modifications to the manuscript are presented in italics and indicated by “ ”.
  Summary of the work: This paper introduces and evaluates a data assimilation method to integrate high-resolution snow cover area into the permafrost model CryoGrid to constrain the melt-out timing of snowpack and improve simulations of ground surface temperature (GST). The method is evaluated using both idealized, binary snow cover data derived from in-situ GST measurements and real fractional snow cover area (fSCA) observations from Sentinel-2 satellites. To manage the high computational demands of the model, the authors used an efficient, iterative method called Adaptive Particle Batch Smoother (AdaPBS). This approach combines a Particle Batch Smoother, a method widely used in cryosphere science, with Adaptive Multiple Importance Sampling. This study demonstrates that assimilating snow cover data improves GST simulations compared to a standard model run, especially at in-situ points with extreme snow conditions like deep snowdrifts and wind-swept ridges. The results also show that the integration of satellite observations into the permafrost model is a promising step forward. While this work is a valuable contribution, several key areas require clarification and revision.
  Major comments:
  My main concern is related to the assumptions underlying the idealized experiments, the justification for parameter selection, the clarity of the spatialization method, and some presentation of the results.
  1. Assumptions and limitations of the “idealized” experiment: The authors use GST-derived binary snow cover as an “idealized” benchmark for the data assimilation scheme. While this is a reasonable approach, the manuscript would be improved by an explicit and thorough discussion of the assumptions and limitations in the synthetic data. For example:
  a. Derivation method: the method assumes that when in-situ GST follows the air temperature cycle, the ground is snow-free. The limitations of this assumption should be discussed. The method assumes a single, final melt-out date for each location ("FSCA=1 before this date and FSCA=0 = after"). It doesn't account for late-season snowfall events that might temporarily re-establish a snowpack after an initial melt.
  We thank the reviewer for the feedback. We have now included a more thorough discussion of the limitation this comment in Section 3.1 Fractional Snow Covered Area (revised manuscript):
  “Deriving FSCA from in-situ GST measurements (see Sect. 3.2) offers binary snow / no snow point-scale observations, which are well-suited for testing the DA scheme by benchmarking the performance without additional complexities introduced by employing remotely sensed FSCA observations retrieved from space-borne satellite sensors (Sect. 3.5). The timing of the
  
  melt-out of the main seasonal snowpack was determined as in Lewkowicz (2008) through semi-quantitative inspection of the observed GST and model forcing air temperature. We consider snow to be completely melted at a measurement site when GSTs begin to show a clear daily cycle that follows air temperature variations, particularly with positive GST values. This occurs
  
  after the zero curtain period in GST, which reflects the isothermal snowpack conditions during the late phase of snowmelt. We assume FSCA = 1 before this date and FSCA = 0 after. The derived FSCA captures the meltout of the main seasonal snowpack but does not resolve intermittent late-season snowfall. Such late snowfall events typically involve a thin, short-lived
  
  snow cover and therefore have only a minor influence on the ground thermal regime. The resulting binary FSCA represents ideal FSCA observations of the main seasonal snowpack, providing subdaily temporal resolution at the spatial scale of a single point”
  And in Section 5.1 Challenges and Limitations (revised manuscript):
  “The availability of satellite-derived FSCA observations is limited by cloudiness which causes the temporal resolution and thus the performance of the DA scheme to vary between individual years. While Svalbard is a particularly cloudy region, this limitation can be expected to persist in most parts of the Arctic. This affects the temporal resolution of the melt-out but also implies that unluckily timed FSCA observations could cause the DA scheme to misinterpret intermittent late-season snowfall as the continuous main snow cover. Representing intermittent snowfall based on individual snowfall events is difficult, as it highly depends on these weather events being correctly represented in the meteorological forcing, which is is a common challenge in land surface modeling.”
  b. Patchy Snow: A temperature logger measures a single point. It's possible for the ground immediately around the sensor to become bare while small patches of snow remain nearby. This can lead to a discrepancy between the synthetic data and the physical reality it is meant to represent.
  We thank the reviewer for pointing this out. We have revised the experiment names in order to better differentiate between point-scale and pixel-scale to “Point-Scale DA with in-situ derived FSCA” and “Pixel-Scale DA with Sentinel-2 derived FSCA”. Pixel-scale heterogeneity of snow is discussed in the results, Section 4.2 Pixel-Scale DA with Sentinel-2 derived FSCA (revised manuscript):
  “(…) spatial variability within Sentinel-2 pixels in the AOI which causes a resolution mismatch, so-called representation error, when comparing to point-scale GST observations. In some cases, this uncertainty is captured by the posterior ensemble spread, as e.g. in 2017, whilst in other cases the DA fails to capture this uncertainty and is overconfident. Whether the DA reliably offers an improvement in comparison to the reference run thus depends on the representativeness of the 10x10 m pixel for the point scale, i.e. the spatial variability of the snow cover within each pixel.”
  and in the discussion, Section 5.1 Challenges and Limitations (revised manuscript):
  “(…) we found that subpixel variability of snowmelt poses challenges even for high-resolution Sentinel-2 pixels (10x10 m), which could also be a problem when applying the method to important permafrost landforms with variable snow cover, such as tundra polygons (Nitzbon et al., 2019) or peat plateaus (Martin et al., 2019). These challenges posed by the observations are a major limiting factors for satellite-based DA at pixel resolution in our study, as described in Sect. 4.2. The method can be expected to be feasible in different regions, though, if the topographic complexity occurs on scales larger than the observational footprint, and more frequent and correctly registered observations are available.”
  More importantly, there is a circularity in the experiment design: the synthetic snow cover data is generated from the same GST measurements later used for verification. The authors should clarify the rationale for applying DA in a scenario where the ground truth is already used to construct the benchmark.
  We are grateful to the reviewer for raising this point. We understand the concern and we would like to clarify that the rationale of the synthetic experiment is to show the internal consistency of the model system for ideal observations, i.e. whether the scheme is principally capable of representing melt-out dates and (winter) ground surface temperatures given the meteorological forcing and the constraints of the physical model. We don’t change snowfall in the meteorological forcing, but the snow cover and it’s thermal properties are physically constrained by snow redistribution and snow evolution. The ground surface temperature thereby is a “hidden” parameter. The thermal insulation of the snow cover could still be strongly biased during snow-covered times, even if melt-out was constructed correctly. The synthetic experiment therefore serves as a proof of concept, whether the scheme generally is capable of improving GST representation.
  We further clarified this in the experiment description in Section 3.5 DA Experiments (revised manuscript):
  “(…) This experiment serves as a proof of concept to evaluate how effectively the DA scheme can reproduce point-scale GST observations by assimilating ideal FSCA. In particular, it demonstrates whether the model system is in principle capable of inferring the unobserved parameter GST from observations of the melt-out date. This is particularly important to test as GSTs are dominated by winter GSTs and thereby strongly influenced by snow thermal properties, not only the melt-out timing and SWE.”
  as well as in Section 4.1 Point-Scale DA with in-situ derived FSCA (revised manuscript):
  “This synthetic experiment demonstrates that the model system is in principle capable of inferring the unobserved parameter GST from observations of the melt-out date.”
  2. Justification of perturbed parameters: This study perturbs the snow redistribution factor. However, many other critical factors influence the melt-out timing of fSCA. For example, temperature, shortwave radiation, and precipitation. These variables can have large uncertainties in the meteorological forcing dataset (e.g., ERA5). The authors should provide a clear justification for why only the snow redistribution parameters were selected for perturbation in the ensemble, while other significant sources of uncertainty were not considered.
  We thank the reviewer for this comment. We agree that this is an important issue. However, the study presented is a site-scale study, intending to capture variability within the area of interest (AOI). We agree with the reviewer that other factors such as temperature, shortwave radiation and precipitation should be perturbed for larger scale application, but for site-scale variability we focus on parameters causing small-scale variability, as described in section 3.4.1 Ensemble generation in CryoGrid (revised manuscript):
  “As the goal of the study is to capture small-scale spatial variability of the snowpack, we perturb parameters which are spatially variable within the AOI. Explicitly, we perturb the exposure e and the scaling factor for surface wind speeds f , which both modulate the lateral snow redistribution and thus the overall snow water equivalent in our one-dimensional (single column) model setup (Sect. 3.3).”
  Uncertainties in general model forcing are handled through model forcing correction. We have extended the description of the downscaling and biascorrection of the ERA5 reanalysis in Section 3.3.1 Model forcing data (revised manuscript):
  “For meteorological forcing, we use downscaled ERA5 reanalysis (Hersbach et al., 2020) surface fields bias-corrected with in-situ observations from the nearby Bayelva climate station as described in Westermann et al. (2023). The forcing data have a temporal resolution of 3 h and include air temperature, incoming shortwave radiation, incoming longwave radiation, specific humidity, air pressure, wind speed, rainfall, and snowfall. As described in Westermann et al. (2023), downscaling is performed by calculating a linear regression between the reanalysis data and available in-situ measurements within a 20-day window centered on each calendar day. This regression is then used to bias-correct the reanalysis. The procedure is applied to air temperature and incoming longwave radiation, while specific humidity is adjusted consistently with the temperature correction. Incoming shortwave radiation is used without downscaling, as its average value generally shows good agreement with observations, and precipitation is not corrected due to the absence of sufficiently reliable measurements (Westermann et al., 2023). When studying wind redistribution of snow, the near-surface wind speed is an especially important factor as it controls the occurrence and intensity of drifting snow through the driftability index Si (Sect. 3.3). The near-surface wind speed is often misrepresented in atmospheric reanalysis such as ERA5, especially during high wind periods (e.g., Wilczak et al., 2024; Gualtieri, 2021) which strongly contribute to snow redistribution and metamorphism. Furthermore, the wind speed is influenced by the local topography, with elevated ridges experiencing increased wind speeds compared to sheltered locations, thus leading to a small-scale variability of wind speeds that can affect snow drift intensity and patterns (Liston and Elder, 2006). Therefore, we scale the surface wind speed using quantile mapping (e.g., Fiddes et al., 2022). Specifically, we apply a linear scaling across the quantiles of the forcing surface wind speed: low wind speeds are left unchanged (scaling factor 1), the highest wind speeds are scaled by a wind scaling factor f , and intermediate quantiles are linearly scaled between 1 and f . In this study, f is an uncertain parameter, constrained to the range f ∈ [0.8, 1.8] and estimated by the DA scheme, which enables a better representation of the impact of surface wind on the snowpack. We assume all other meteorological forcing variables to be spatially constant over the small areal extent of the AOI (see also Sect. 3.4.1), but uncertainties in snowfall are to some extend corrected by perturbing the exposure parameter e (see Sect. 3.4.1) within the DA scheme.”
  And we return to the uncertainties in the large-scale model forcing in Section 5.5 Outlook and future work (revised manuscript):
  “Within our largely flat, high-latitude study area, it was not necessary to account for topography- or land cover-induced variability of the surface energy balance and thus snowmelt rates. However, this can be a critical process governing the melt patterns in other regions in addition to snow redistribution targeted in the DA procedure. For individual pixels, it is straight-forward to adjust e.g. the incoming radiation for the exposition (e.g. Fiddes and Gruber, 2014). For spatially distributed applications, it could be possible to include topographic factors as additional variables in an FSCA clustering procedure, as described above. In this case, pixels with similar meltout dates would be represented by different clusters if they feature different expositions. Depending on the spatial scales and the dominant processes governing snow heterogeneity in other applications (Sect.5.3), additional parameters, such as shortwave radiation, precipitation, and air temperature, could be perturbed in the DA scheme, either instead of or in combination with parameters describing wind redistribution. It is important to note that such a DA procedure can, in principle, also be used to correct the meteorological forcing by perturbing parameters that are both uncertain in meteorological reanalyses and of major importance, such as precipitation or air temperature. However, increasing the number of perturbed parameters raises computational costs and exacerbates issues of equifinality. In this study, we first corrected the meteorological forcing by bias-correcting the reanalysis with local weather station data (Sect. 3.3.1), allowing us to focus the DA on perturbing parameters that explicitly represent wind-driven snow redistribution.”
  3. Definition of the measurement error of fSCA: The manuscript should clarify how the measurement error for fSCA was determined. Additionally, a justification is needed for using the same measurement error value for both the “idealized” synthetic data and remote sensing data, as the sources and magnitudes of their respective errors are likely different.
  We appreciate the reviewer’s valuable suggestions. We added a clarification of the error used for the fSCA observations in Section 3.1 Fractional Snow Covered Area (revised manuscript):
  “The uncertainty of the FSCA observations is considered in the DA scheme by applying an observational variance of sigma^2_FSCA = 0.05. This value was chosen as a conservative upper bound estimate of the FSCA retrieval error (e.g., Aalstad et al. (2020)). In addition to representing measurement uncertainty, the prescribed standard deviation also controls the relative weight given to each observation in the DA and thereby steers the strength of the updates (see Sect. 3.4). To enable a comparison between the performance of the different experiments, we use the same observational standard deviation for both the in-situ derived FSCA and Sentinel-2 derived FSCA.”
  4. Clarity of the spatial DA method:
  The description of how the point-scale DA results are spatialized is unclear and could be helpful by providing a more detailed explanation:
  a. What exactly is the “area-weighted histogram of melt-out dates”, and how is it constructed from the fSCA data?
  b. At which specific locations (e.g., all in-situ sites, a subset, etc.) are the point-scale DA simulations performed?
  c. How is the histogram used to translate the point-scale DA results into the spatially distributed GST and SWE fields? An explanation for why the final outputs are presented as mean monthly and mean annual fields would also be helpful.
  We thank the reviewer for the helpful questions and comments and have revised the explanation and formulation of the experiment Spatial DA with Sentinel-2 derived FSCA in section 3.5 DA Experiments (revised manuscript):
  “Furthermore, we employ the DA in a spatial manner in a Spatial DA with Sentinel-2 derived FSCA experiment. This allows us to model the spatial distributions of the ground thermal regime within a medium-scale (500 m-1 km) area of interest (AOI). For this, we use the area-integrated FSCA retrieved from Sentinel-2 over the AOI, yielding one FSCA value for the AOI for each retrieval date. For each of these retrieval dates, we then run the DA scheme for a point that is representative for the area within the AOI that becomes snow-free exactly on that retrieval date. This is implemented by assimilating synthetic binary FSCA in the same way as in Point-Scale DA with in-situ derived FSCA. In doing so, the snow redistribution parameters are perturbed so that the model reproduces melt-out on that retrieval date, while all other model parameters are kept unchanged and generally valid for the entire area. Each DA run produces an ensemble mean for the variables of interest (e.g. GST or SWE). From the AOI-wide FSCA time series, we next derive a histogram of melt-out dates. For this step, we assume a constant melt rate between successive FSCA retrievals. The ensemble means from the DA runs corresponding to melt-out at each retrieval date are then combined using area weights determined by this melt-out histogram. This produces spatially representative distributions of ground and snow properties, in particular mean monthly and mean annual ground surface temperature (GST) and snow water equivalent (SWE).”
  We present the results as mean monthly and mean annual GST distributions in order to keep the figures readable and to highlight the spatial distributions as well as the monthly, seasonal, and annual patterns. A direct one-to-one comparison between the observed and modeled monthly/annual GST distributions in a single figure would otherwise be more difficult.
  5. Presentation of results: The benefit of the DA is not consistently demonstrated. It can be improved by considering:
  a. For the spatial results (Sect. 4.3), figures should include the prior (open-loop, without DA) results alongside the posterior results and observations. This direct comparison is essential for visually demonstrating the improvement gained from DA.
  We thank the reviewer for this valuable suggestion. We have added the results of the reference run (open loop, without DA) in the figures in Section 4.3.
  b. It is strongly recommended to add a new figure for the idealized synthetic experiment that follows the format of Fig. 4. This figure should show time series of the prior ensemble fSCA, the synthetic fSCA, and the posterior ensemble fSCA. This would clearly demonstrate the DA mechanism in a controlled setting and provide a helpful reference for interpreting the real-data results. This would also show the characteristics of the synthetic data.
  We thank the reviewer for this suggestion and improvement to the manuscript. We have added a new figure for FSCA for the experiment Point-Scale DA with in-situ derived FSCA (Figure 3) and a corresponding section in the text in Section 4.1 Point-Scale DA with in-situ derived FSCA (revised manuscript):
  “Fig. 3 displays the in-situ derived FSCA observations as well as the modeled FSCA of the reference run and of the DA scheme during the melt seasons of the hydrological years 2016 and 2017. As discussed in Sect. 3.1, the in-situ derived FSCA observations represent idealized observations with high spatial and temporal resolution. The observations are binary and have been sampled and assimilated at a high frequency (every third day) from mid-winter to summer. The reference run without DA (open loop) is a single point-scale run that does not account for lateral snow redistribution and is the same for both locations (red solid line). The DA results are displayed as the mean and range of the ensemble of individual model runs (ensemble members). The prior ensemble range containing FSCA values from 0 to 1 in mid-winter, as present for both benchmark sites in 2017, indicates that the prior ensemble contains members with low SWE which become snow-free during individual wintertime melt events or rain-on-snow (ROS) events. In case of the snow-drift (left panel in Fig. 3), the posterior ensemble does no longer contain members which become snow-free during winter, whilst it contains members which are periodically snow-free during mid-winter at the wind-exposed benchmark site (right panel in Fig. 3). The posterior ensemble generally reproduces the assimilated FSCA observations and the binary melt-out signal well and shows a clear improvement compared to the reference run. For the wind-blown ridge in the hydrological year of 2017, modeled FSCA shows a too late meltout also after the DA. The scheme fails to reproduce the early meltout of the windblown ridge in this particular year. This is likely due to the occurrence of snowfall in the atmospheric forcing which did not occur in reality, which is a common limitation occurring when using coarse scale forcing in physics-based modeling on such small scales (Fang et al., 2023). In addition, the posterior ensemble displays melt in March and April, containing both members that have melted out and members that are snow covered during this period. The posterior ensemble does not fully reproduce the assimilated FSCA observations during e.g. mid-April, indicating that the snow drift location has very little snow and is sensitive to the meteorological forcing during these times. It is possible that the snow cover during this period was very shallow or absent, which is not covered in the idealized synthetic observations.”
  Minor comments:
  1. Line 20: Please clarify what "spatial variability of 4.5 °C" refers to (e.g., standard deviation, range, etc.).
  We thank the reviewer for the comment. We have revised the text in line 20 (revised manuscript) to:
  “The results show that the algorithm can not only reproduce the observed spatial variability of mean annual GST, spanning a range of up to 4.5 deg C (…)“
  2. Line 92. Please remove the article "a" before "100 randomly selected points".
  We thank the reviewer for the careful reading and have corrected line 92 (revised manuscript) to:
  “(…) are conducted at 100 randomly selected points (...)“
  3. Fig.1. Please highlight the specific locations of the points used for the illustrative results (e.g., those shown in Figure 4) to provide better context.
  We thank the reviewer for the suggestion and have color-marked the locations of points for the illustrative results, i.e. the snowdrift and wind-blown ridge presented in Section 4, in Figure 1.
  4. Lines 118-120: For improved readability, consider moving the description of the model structure to the end of line 114, where the model is first introduced.
  We thank the reviewer for the suggestion and improving the readability of the manuscript. We have reorganized the model description in Section 3.3 The CryoGrid community model (revised manuscript) to:
  “(…) In this study, we use a physically-based, one-dimensional (single column) model configuration of a layered subsurface and snowpack with lateral fluxes of water and snow between the model column and external reservoirs. The model domain is represented by vertically layered grid cells associated with different "stratigraphy classes" representing layers in the subsurface ground and the seasonal snowpack, for which a certain set of governing equations and process parametrizations is applied (Westermann et al., 2023). The ground or snow surface is coupled to the atmosphere by the surface energy balance as well as the mass balance for liquid and solid precipitation, driven by the meteorological forcing data (Sect. 3.3.1). At the lower boundary of the model domain (set to 100 m depth in this study), a geothermal heat flux of 50 mW/m2 is applied, as in Westermann et al. (2023). For the subsurface, the state variables are the volumetric contents of mineral, organic, air, water and ice, as well as the enthalpy related to the temperature and ice content of the grid cell. Energy is exchanged between the grid cells (...)”
  5. Line 148: The variable τ_i is described as a characteristic timescale. For clarity, this description should be associated with “timescale” rather than “wind transport” at the end of the sentence. It would also be helpful to introduce the intuition behind an index before presenting its equation.
  We thank the reviewer for the improvement of the manuscript and have changed the description in line 200 (revised manuscript) to:
  “Each layer i is assigned a characteristic timescale (τ_i) for the change of the parameters under wind transport, which assumes an exponential decrease of snow mobility with depth (Vionnet et al., 2012):”
  6. Lines 145-164: These paragraphs explain the snow-redistribution variables. It would be beneficial to also explain how the model simulates the snow cover area (binary or fractional?), as this is the key variable being assimilated. How does the redistribution of snow influence the snow cover area?
  We thank the reviewer for the suggestion and have included a description of how FSCA is modeled in CryoGrid in Section 3.3 The CryoGrid community model (revised manuscript):
  “Within CryoGrid, FSCA is modeled as FSCA = 1 for SWE > 0.005 m w.e. and FSCA = 0 for SWE = 0 m w.e., with a linear interpolation in between. The low threshold value in SWE effectively leads to a near-binary decrease of FSCA between consecutive observation dates, which is adequate for point-scale applications (Sect. 3.5).”
  The redistribution of snow affects SWE in the one-dimensional model column by adding and removing snow, and by altering other snow properties such as albedo, which in turn influences melt and thus the modeled FSCA, as described above.
  7. Line 154: Please specify the units for the "amount of mobilized snow".
  We thank the reviewer for the comment and have now specified in the text in Section 3.3 The CryoGrid community model (revised manuscript) that F_i is a fraction of the total snow in the model layer i:
  “We assume that, for each layer i, the fraction of mobilized snow F_i_snow per unit time is proportional to the inverse of τ_i.”
  8. Line 163: The reference to the DA scheme feels premature here in a section introducing the model. Please clarify that the details of the parameter perturbation are discussed in Section 3.3.1, or move the sentence to that section.
  We thank the reviewer for the comment and have deleted the reference to the DA scheme (Section 3.3 The CryoGrid community model (revised manuscript)):
  “In principle, the exposure parametrizes the elevation of the model column relative to the surrounding local topography.”
  9. Line 165: What is the temporal resolution (e.g., hourly, daily) of the meteorological forcing data? Please also describe the method used to downscale the coarse ERA5 data to the model's fine spatial resolution.
  We thank the reviewer for the question and suggestion, and clarified the forcing data resolution, downscaling and biascorrection in Section 3.3.1 Model forcing data (revised manuscript):
  “The forcing data have a temporal resolution of 3 h and include air temperature, incoming shortwave radiation, incoming longwave radiation, specific humidity, air pressure, wind speed, rainfall, and snowfall. As described in Westermann et al. (2023), downscaling is performed by calculating a linear regression between the reanalysis data and available in-situ measurements within a 20-day window centered on each calendar day. This regression is then used to bias-correct the reanalysis. The procedure is applied to air temperature and incoming longwave radiation, while specific humidity is adjusted consistently with the temperature correction. Incoming shortwave radiation is used without downscaling, as its average value generally shows good agreement with observations, and precipitation is not corrected due to the absence of sufficiently reliable measurements (Westermann et al., 2023).”
  10. Line 175: The wind downscaling method is unclear. For example, what are the minimum/maximum wind speeds? It would be helpful to show an illustrative figure
  We thank the reviewer for the question. We added further details on the scaling of the surface wind speeds in Section 3.3.1 Model forcing data and decided against a figure for simplicity:
  “Furthermore, the wind speed is influenced by the local topography, with elevated ridges experiencing increased wind speeds compared to sheltered locations, thus leading to a small-scale variability of wind speeds that can affect snow drift intensity and patterns (Liston and Elder, 2006). Therefore, we scale the surface wind speed using quantile mapping (e.g., Fiddes et al., 2022). Specifically, we apply a linear scaling across the quantiles of the forcing surface wind speed: low wind speeds are left unchanged (scaling factor 1), the highest wind speeds are scaled by a wind scaling factor f , and intermediate quantiles are linearly scaled between 1 and f . In this study, f is an uncertain parameter, constrained to the range f ∈ [0.8, 1.8] and estimated by the DA scheme, which enables a better representation of the impact of surface wind on the snowpack.”
  11. Line 189: How was it determined if the daily cycle of GST was "in line with air temperatures"? Was a specific metric or threshold (e.g., on the temperature difference) used?
  We thank the reviewer for the comment and have clarified the description in Section 3.1 Fractional Snow Covered Area (revised manuscript):
  “The timing of the melt-out of the main seasonal snowpack was determined as in Lewkowicz (2008) through semi-quantitative inspection of the observed GST and model forcing air temperature. We consider snow to be completely melted at a measurement site when GSTs begin to show a clear daily cycle that follows air temperature variations, particularly with positive GST values. This occurs after the zero curtain period in GST, which reflects the isothermal snowpack conditions during the late phase of snowmelt. We assume FSCA = 1 before this date and FSCA = 0 after. The derived FSCA captures the meltout of the main seasonal snowpack but does not resolve intermittent late-season snowfall. Such late snowfall events typically involve a thin, short-lived snow cover and therefore have only a minor influence on the ground thermal regime. The resulting binary FSCA represents ideal FSCA observations of the main seasonal snowpack, effectively providing infinitely high spacial and temporal resolution.”
  12. Line 198: Please describe the methodology used to visually distinguish clouds from snow in the satellite imagery, as both can appear bright white. For instance, were snapshots from adjacent dates compared?
  We thank the reviewer for the question. We visually compared scenes adjacent in time over a larger area, i.e. the peninsula, to distinguish clouds and snow. We have specified the method in Section 3.1 Fractional Snow Covered Area (revised manuscript):
  “Cloud-free scenes were identified manually by visually comparing acquisitions adjacent in time and space.”
  13. Lines 203-217: In this general description of the DA equations, please specify what the variables y, z, and θ represent in the context of this particular study to make the methodology more concrete for the reader.
  We thank the reviewer for the suggestion and have included specifications for the variables mentioned in Section 3.4 Data Assimilation (revised manuscript):
  “In this study, DA is used to constrain the representation of the snowpack and ground surface temperature given (i) the physical model M (the CryoGrid model) with parameters θ and (ii) the observations y (FSCA). Formally, we infer θ from observations y via Bayes’ rule (MacKay, 2003; Evensen et al., 2022) (…) and Z is a normalizing constant.”
  14. Line 226: When stating that the model is run for two years, with the second being of interest, please briefly explain the purpose of the first year (e.g., for model spin-up).
  We thank the reviewer for the comment. We have extended the description of model spin-up in section 3.4.1 Ensemble generation in CryoGrid (revised manuscript):
  “Each particle is run for two consecutive hydrological years, of which the second year is the hydrological year of interest for which FSCA is assimilated. This ensures a consistent model spin-up of the ground thermal regime and hydrology in the uppermost soil layers from a common initial condition for all ensemble members, which is sufficient when targeting ground surface temperatures (Sects. 2 3.2).”
  and clarified which years are analyzed in the Results Section 4.1 Point-Scale DA with in-situ derived FSCA;
  “We present the results of the post–spinup period, i.e. the second year of each simulation, for the exemplary, independent DA runs of 2016 and 2017. “
  15. Fig. 3: Please specify the time step of the plotted GST variable (e.g., daily, monthly mean) in the figure caption.
  We thank the reviewer for the comment and have included the time steps of the plotted variables in the figure captions of Figures 3-6.
  16. Fig. 4. What do the whiskers on the fSCA observations represent (e.g., measurement error, standard deviation)? Also, is the prior fSCA modeled as a binary variable, with the grey curve representing the ensemble mean?
  We thank the reviewer for the questions and have included a description of the whiskers representing the standard deviation in the figure captions of the corresponding Figures 3 and 5 in the revised manuscript. The modeled FSCA for each individual point-scale model run is nearly binary and described in the revised Section 3.3 The CryoGrid community model:
  “Within CryoGrid, FSCA is modeled as FSCA = 1 for SWE > 0.005 mw.e. and FSCA = 0 for SWE = 0 mw.e., with a linear interpolation in between. The low threshold value in SWE effectively leads to a near-binary decrease of FSCA between consecutive observation dates, which is adequate for point-scale applications (Sect. 3.5).”
  The shaded areas (grey for prior ensemble, blue for posterior ensemble) represent the range of all binary modeled FSCA curves within the (prior of posterior) ensemble. This is further described in the caption of Figure 3-6, e.g.:
  “Figure 3. In-situ derived FSCA and modeled 6-hourly FSCA during the hydrological years of 2016 and 2017 for a snow drift location (left panels) and a wind-exposed ridge (right panels). The synthetic FSCA observations and the corresponding standard deviations (STD) are shown in black. The modeled FSCA of the reference run without DA is shown in red and is binary as it results from an individual point-scale simulation (Sect. 3.3). The prior ensemble (before DA) is shown in gray, with the prior ensemble mean displayed as the gray solid line and the prior ensemble range as the gray shaded area. The posterior results of the DA scheme assimilating Sentinel-2 derived FSCA are shown in blue, with the blue solid line being the posterior ensemble mean and the blue shaded area the posterior ensemble range.”
  17. Fig. 4. The fSCA observations during the mid-melt season appear to have a limited influence on the posterior estimate. This is evident as the posterior curves (in blue) often diverge from the observations during this time. A discussion of this discrepancy would be beneficial.
  We thank the reviewer for the observation and comment, and have added a discussion in Section 4.2 Pixel-Scale DA with Sentinel-2 derived FSCA in the revised manuscript:
  “The FSCA observations during mid-melt season appear to be less informative for the DA scheme, as visible in the tendency of the posterior FSCA estimates to deviate from the observations during this period. This reduced impact is likely due to the combined effect of (i) fewer available observations in the mid-melt season, which makes them less constraining for the posterior, and (ii) the relatively large prescribed observational error, which down-weights individual observations in the analysis. In addition, the DA updates are constrained by the model physics and forcing. If the meteorological forcing misrepresents some melt- or wind events, the DA scheme can not fully adjust the snowpack to match the observed FSCA, even when discrepancies are present.”
  18. Fig. 6. It is difficult to read the specific values (e.g., median, mode) of the peak SWE distributions discussed in the text from the histogram's x-axis. Please consider adding labels for these key values directly onto the plot. Also, please specify that "w.e." is the abbreviation for "water equivalent" in the caption.
  We thank the reviewer for the suggestion and have added key values to the figure (Figure 7 in the revised manuscript). We also specified the abbreviation in Section 4.3 Spatial DA with Sentinel-2 derived FSCA:
  “In the studied years, the range of peak SWE typically extends from close to 0 m water equivalent (m w.e.) at wind-exposed locations to 0.315 m w.e. to 0.875 m w.e. at snow drift locations.”
  
  We appreciate the reviewer’s careful reading and insightful comments, which have significantly strengthened the revised version.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3142-AC1
RC2:
'Comment on egusphere-2025-3142', Anonymous Referee #2, 22 Jan 2026

In “Assimilating high-resolution satellite snow cover data in a permafrost model” the authors seek to improve simulated GST in a snow-permafrost model by assimilating FSCA data into runs of the model. The authors leverage a study cite in the Arctic which contains numerous in-situ GST, snow depth, and density observations, as well as FSCA data derived from the Sentinel-2 mission. The CryoGrid model is used, which features a simplistic scheme for wind redistribution of snow. Two parameters related to wind redistribution, a multiplier of wind speeds and a measure of how exposed or sheltered a site is, are calibrated by assimilating observations of GST. An adaptive particle batch smoother is presented with the goal of minimizing computational cost compared to other particle filter methods. The point-scale model results assimilated to in-situ data demonstrate the efficacy of this new DA technique, demonstrating an improvement in the RMSE of GST over the course of a winter season. Simulations assimilating remote sensing data are then shown, again demonstrating an improvement in RMSE of GST, albeit with mixed performance for years of lower snow and a low density of satellite observations. Lastly, spatialized simulations are performed to derive distributions of GST and SWE over the study site, and ultimately maps of GST. The setup of these spatialized simulations, particularly the backmapping ones, is unclear and requires further description.
The paper is well written, and results are presented in a clear manner. Some parts of the methodology require further explanation to ensure reproducibility, particularly how spatialized SWE is derived from sparse point observations of snow depth and density, how in-situ GST is used to derive a binary FSCA, and how the backmapping experiments yield annual or seasonal maps of GST. Additionally, there are a few points of discussion which would enhance the paper, namely the sensitivity of the AdaPBS to ensemble size and termination criteria. I recommend minor revisions of the manuscript and encourage the authors to resubmit after considering these comments. Clarification of the major points above, and minor points of correction, are provided below:

L9: “combining”

L73: GST should be defined here as it is referenced for the first time

L190: How do you quantitatively define “in line with air temperature”? What air temperature is used? Measured, modeled?

L308: “much too a very narrow” ß edit for clarity

Section 3.3: Did you experiment with the size of the ensemble, and how this affects the spread of the final posterior distribution? Some kind of sensitivity analysis?

Section 3.4 should go before 3.3, I was confused to read exclusively about in-situ observations after the DA scheme description. Additionally, the authors should provide more information as to how SWE and SD measurements were collected (i.e. manual measurements with a probe for SD). How were the density measurements taken, and how were each of the snow depth measurements assigned a “representative snow pit” to derive SWE?

Section 3.5: A conceptual figure of how the different DA runs are performed would make this section much clearer.

L377: Does this also explain why the posterior ensemble spread is so much lower in 2017, and for the exposed ridge site in general? If not, can you comment on this? Relatedly, is 2016 the spinup year mentioned in section 3.3.1? It is not clear to me if you are showing results from the spinup period or not.

Fig 5. Please comment on why the RMSE of the posterior mean would be higher than the RMSE of the prior mean. This would suggest a failure of the DA scheme, perhaps because observations are too late in the season to inform the simulation in mid-winter, and a thinner mid-winter snowpack is preferred to match the melt-out date?

L192: You have no observations to assimilate to until after May 1^st in your later sections. This seems to be related to solar elevation angle, suggesting that your method for deriving FSCA does not work in the arctic winter since it uses 3/6 bands from the visible spectrum. Is this true? If not, why do you only have FSCA observations after May 1^st? Highlighting this limitation could motivate the use of your technique in lower-latitude regions which would benefit from higher density assimilation data.

Section 4.3: The description of the back mapping runs in section 3.5 mentions only a method for calculating GST for the estimated melt out date of a pixel. How do you then arrive at the mean annual GST shown in this section? Also, for the discussion of the back mapping runs, it would be helpful to have a table or information about how many Sentinel 2 observations were available during the melt-out period for the different years.

L476: Visually this makes sense when comparing figures 1 and 2a/b, but an overlay of measurement points onto figure 2 would make this comparison more intuitive. Alternatively, a histogram of FSCA from the total Sentinel-2 data and from the Sentinel-2 data which has a co-located point-measurement would quantify how well the point measurements sample the true distribution.

L481: should be “temporally” not “temporarily”

Section 5.3: The discussion of approaches to modeling wind-redistribution of snow is not quite complete. The studies of Quéno et al., 2024, Mower et al., 2024, Reynolds et al., 2024 , and Vionnet et al., 2021, all use an intermediate-complexity snow model (of lower complexity than Marsh et al., 2024) to simulate wind-redistribution over areas much larger than the study domain shown here. Machine learning techniques, as in Saigger et al., 2026, are also an increasingly popular method to efficiently represent wind-redistribution of snow. Additionally, Sharma et al., 2023 represented wind-redistribution of snow in 250m resolution simulations with a “sophisticated” snow model as mentioned on lines 536-538.

The notion of using data-assimilation to improve snow models is absolutely justified. Data assimilation is useful for correcting remaining model biases, and the methodology used here (calibrating a simplistic wind-redistribution scheme) is sound as a demonstration of a new DA technique. However, implying that data-assimilation is a suitable replacement for explicitly modeling wind redistribution, as I feel this section does, is somewhat outdated given these recent successes I think this point about using more advanced wind-redistribution schemes is more of a “for future studies” one than an actual limitation of the science at the moment.

L579: “virtuous” is a strange adjective for this sentence. (“in a cycle of high moral character”)

Additional references:

Mower, R., Gutmann, E. D., Liston, G. E., Lundquist, J., and Rasmussen, S.: Parallel SnowModel (v1.0): a parallel implementation of a distributed snow-evolution modeling system (SnowModel), Geosci. Model Dev., 17, 4135–4154, https://doi.org/10.5194/gmd-17-4135-2024, 2024.

Saigger, M., Goger, B., and Mölg, T.: SNOWstorm (v1.0) – a deep-learning based model for near-surface winds and drifting snow in mountain environments, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2025-5608, 2026.

Sharma, V., Gerber, F., and Lehning, M.: Introducing CRYOWRF v1.0: multiscale atmospheric flow simulations with advanced snow cover modelling, Geosci. Model Dev., 16, 719–749, https://doi.org/10.5194/gmd-16-719-2023, 2023.

Quéno, L., Mott, R., Morin, P., Cluzet, B., Mazzotti, G., and Jonas, T.: Snow redistribution in an intermediate-complexity snow hydrology modelling framework, The Cryosphere, 18, 3533–3557, https://doi.org/10.5194/tc-18-3533-2024, 2024

Vionnet, V., Marsh, C. B., Menounos, B., Gascoin, S., Wayand, N. E., Shea, J., Mukherjee, K., and Pomeroy, J. W.: Multi-scale snowdrift-permitting modelling of mountain snowpack, The Cryosphere, 15, 743–769, https://doi.org/10.5194/tc-15-743-2021, 2021.

Citation: https://doi.org/10.5194/egusphere-2025-3142-RC2
- AC2: 'Reply on RC2', Clarissa Willmes, 21 May 2026
  
  We thank the reviewer for the constructive criticism and insightful comments, which have substantially improved the quality of the manuscript. Below, we provide point-by-point responses to all comments and questions. Reviewer comments are presented in bold font, our responses in regular font, and modifications to the manuscript are presented in italics and indicated by “ ”.
  
  In “Assimilating high-resolution satellite snow cover data in a permafrost model” the authors seek to improve simulated GST in a snow-permafrost model by assimilating FSCA data into runs of the model. The authors leverage a study cite in the Arctic which contains numerous in-situ GST, snow depth, and density observations, as well as FSCA data derived from the Sentinel-2 mission. The CryoGrid model is used, which features a simplistic scheme for wind redistribution of snow. Two parameters related to wind redistribution, a multiplier of wind speeds and a measure of how exposed or sheltered a site is, are calibrated by assimilating observations of GST. An adaptive particle batch smoother is presented with the goal of minimizing computational cost compared to other particle filter methods. The point-scale model results assimilated to in-situ data demonstrate the efficacy of this new DA technique, demonstrating an improvement in the RMSE of GST over the course of a winter season. Simulations assimilating remote sensing data are then shown, again demonstrating an improvement in RMSE of GST, albeit with mixed performance for years of lower snow and a low density of satellite observations. Lastly, spatialized simulations are performed to derive distributions of GST and SWE over the study site, and ultimately maps of GST. The setup of these spatialized simulations, particularly the backmapping ones, is unclear and requires further description.
  The paper is well written, and results are presented in a clear manner. Some parts of the methodology require further explanation to ensure reproducibility, particularly how spatialized SWE is derived from sparse point observations of snow depth and density, how in-situ GST is used to derive a binary FSCA, and how the backmapping experiments yield annual or seasonal maps of GST. Additionally, there are a few points of discussion which would enhance the paper, namely the sensitivity of the AdaPBS to ensemble size and termination criteria. I recommend minor revisions of the manuscript and encourage the authors to resubmit after considering these comments. Clarification of the major points above, and minor points of correction, are provided below:
  L9: “combining”
  We thank the reviewer for the comment and have corrected this in the revised manuscript:
  “For this purpose, we employ an iterative ensemble-based data assimilation approach combining a Particle Batch Smoother with Adaptive Multiple Importance Sampling (...)”
  L73: GST should be defined here as it is referenced for the first time
  We thank the reviewer for the comment and have added the definition in the suggested line:
  “The presented data assimilation workflows target ground surface temperature (GST) on spatial scales from point to small catchments (…) ”
  L190: How do you quantitatively define “in line with air temperature”? What air temperature is used? Measured, modeled?
  We thank the reviewer for the comment and have clarified the description in Section 3.1 Fractional Snow Covered Area (revised manuscript):
  “Deriving FSCA from in-situ GST measurements (see Sect. 3.2) offers binary snow / no snow point-scale observations, which are well-suited for testing the DA scheme by benchmarking the performance without additional complexities introduced by employing remotely sensed FSCA observations retrieved from space-borne satellite sensors (Sect. 3.5). The timing of the melt-out of the main seasonal snowpack was determined as in Lewkowicz (2008) through semi-quantitative inspection of the observed GST and model forcing air temperature. We consider snow to be completely melted at a measurement site when GSTs begin to show a clear daily cycle that follows air temperature variations, particularly with positive GST values. This occurs after the zero curtain period in GST, which reflects the isothermal snowpack conditions during the late phase of snowmelt. We assume FSCA = 1 before this date and FSCA = 0 after. The derived FSCA captures the meltout of the main seasonal snowpack but does not resolve intermittent late-season snowfall. Such late snowfall events typically involve a thin, short-lived snow cover and therefore have only a minor influence on the ground thermal regime. The resulting binary FSCA represents ideal FSCA observations of the main seasonal snowpack, providing subdaily temporal resolution at the spatial scale of a single point."
  L308: “much too a very narrow” edit for clarity
  We thank the reviewer for the comment and corrected the text in line 351 in the revised manuscript to:
  “For a degenerate ensemble (i.e. RD ≪ 1), the first term collapses to a single unique particle or a very narrow distribution so the second term ensures that a sufficiently diverse ensemble can still be sampled.”
  Section 3.3: Did you experiment with the size of the ensemble, and how this affects the spread of the final posterior distribution? Some kind of sensitivity analysis? Termination criteria
  We thank the reviewer for raising this relevant point. The behavior of the AdaPBS scheme, including its dependence on ensemble size and termination criteria, is described in more detail in Aalstad et al. (2026). At the time of submission, this manuscript had not yet been available but is now in review and referenced in our revised manuscript to provide a detailed description and evaluation of the method. In brief, AdaPBS trades parallel runtime (larger initial ensembles) against sequential runtime (more iterations, thereby increasing the overall ensemble size). In this study, computational constraints limit the feasible ensemble sizes. We have performed a limited sensitivity analysis, which indicates that the DA scheme still improves the ensemble mean even with initial ensemble sizes as small as N=5 for most cases, although the representation of uncertainty deteriorates strongly in these cases. And when few ensemble members are generated per iteration, the statistical assumptions underlying AdaPBS become more restrictive. The number of initial ensemble members and iterations required to reach the termination criterion clearly depends on how close the prior distribution is to the true state. Increasing the minimum effective ensemble size, which is our termination criterion, defined as the number of “successful” ensemble members, generally leads to a better representation of ensemble spread. As in all ensemble-based approaches, this spread remains constrained by other sources of uncertainty (e.g. model forcing, model physics, and fixed parameters) and therefore does not reflect the full uncertainty, but only that associated with the perturbed parameters. Decreasing the minimum effective ensemble size consequently worsens the spread of the posterior distribution, and at too low numbers (typically N_eff<3) the posterior distribution stays degenerate and over-confident.
  A more exhaustive sensitivity analysis is beyond the scope of the present study, but we plan to address this in future work.
  We have now added a brief qualitative description of these aspects at the end of Section 3.4.2 Adaptive Particle Batch Smoother (AdaPBS) of the revised manuscript:
  “Our results indicate that while the ensemble mean is robust also for smaller initial ensemble sizes than applied in this study, the representation of uncertainty improves with increasing termination thresholds D and degrades when too few “successful” ensemble members contribute to the posterior.”
  Section 3.4 should go before 3.3, I was confused to read exclusively about in-situ observations after the DA scheme description. Additionally, the authors should provide more information as to how SWE and SD measurements were collected (i.e. manual measurements with a probe for SD). How were the density measurements taken, and how were each of the snow depth measurements assigned a “representative snow pit” to derive SWE?
  We thank the reviewer for the comments and suggestions. We have moved Section 3.4 In-situ observations ahead of the model and DA description in the revised manuscript, which improved readability.
  We also included further information on the SWE and SD data used, now in Section 3.2 In-situ observations (revised manuscript):
  “In-situ snow observations from the end of the snow accumulation season in hydrological years 2016–2019 are used for additional validation. As described in Zweigel et al. (2021), these observations were collected around the time of peak snow water equivalent (SWE) and consist of snow depth (SD) measurements at each GST logger site, together with SWE estimates for each site. This results in a consistent spatial representation of the GST and SD/SWE datasets. The SWE estimates were derived by combining the measured SD at each location with the mean bulk snow density in the AOI for that year. The mean bulk snow density was obtained from detailed snow density profiles taken at a number of representative GST logger sites (Aalstad et al., 2018).”
  Section 3.5: A conceptual figure of how the different DA runs are performed would make this section much clearer.
  We thank the reviewer for the comment. We understand that the section on the different DA experiments needed more clarification, and have rewritten Section 3.5 DA Experiments. We especially revised the section on the Spatial DA experiment. In the light of the new description, we are of the opinion that a figure is no longer needed:
  “In this study, we apply the aforementioned DA in different configurations: We conduct a Point-Scale DA with in-situ derived FSCA experiment, in which we assimilate binary FSCA observations derived from GST measurements as described in Sect. 3.1. The resulting FSCA timeseries corresponds to that of a hypothetical satellite system with effectively infinitely high resolution in space and time covering the meltout of the main seasonal snowpack. This experiment serves as a proof of concept to evaluate how effectively the DA scheme can reproduce point-scale GST observations by assimilating ideal FSCA. In particular, it demonstrates whether the model system is in principle capable of inferring the unobserved parameter GST from observations of the melt-out date. This is particularly important to test as GSTs are dominated by winter GSTs and thereby strongly influenced by snow thermal properties, not only the melt-out timing and SWE.
  We then extend this point-scale study to the use of real-world satellite observations, i.e. FSCA derived from Sentinel-2, in a Pixel-Scale DA with Sentinel-2 derived FSCA experiment. This has the purpose of obtaining high-resolution (10 m x 10 m Sentinel-2 resolution) simulations for individual sites, approaching the spatial scale of individual GST measurement sites. The Sentinel-2 FSCA observations can contain relatively long (a week or more) data gaps due to cloudiness.
  Furthermore, we employ the DA in a spatial manner in a Spatial DA with Sentinel-2 derived FSCA experiment. This allows us to model the spatial distributions of the ground thermal regime within a medium-scale (500 m-1 km) area of interest (AOI). For this, we use the area-integrated FSCA retrieved from Sentinel-2 over the AOI, yielding one FSCA value for the AOI for each retrieval date. For each of these retrieval dates, we then run the DA scheme for a point that is representative for the area within the AOI that becomes snow-free exactly on that retrieval date. This is implemented by assimilating synthetic binary FSCA in the same way as in Point-Scale DA with in-situ derived FSCA. In doing so, the snow redistribution parameters are perturbed so that the model reproduces melt-out on that retrieval date, while all other model parameters are kept unchanged and generally valid for the entire area. Each DA run produces an ensemble mean for the variables of interest (e.g. GST or SWE). From the AOI-wide FSCA time series, we next derive a histogram of melt-out dates. For this step, we assume a constant melt rate between successive FSCA retrievals. The ensemble means from the DA runs corresponding to melt-out at each retrieval date are then combined using area weights determined by this melt-out histogram. This produces spatially representative distributions of ground and snow properties, in particular mean monthly and mean annual ground surface temperature (GST) and snow water equivalent (SWE).
  Based on the Spatial DA with Sentinel-2 derived FSCA experiment, we further produce 10 m resolution GST maps in a GST Backmapping experiment. For each pixel, we assume that melt-out occurs between the last snow-covered and first snow-free Sentinel-2 scene of that pixel. The DA runs described above provide state variables, i.e. ensemble-mean GST values, for conditions with melt-out on the individual retrieval dates. For each pixel, the GST is calculated as the average between the ensemble GSTs obtained from the DA simulations for the last snow-covered and first snow-free Sentinel-2 scene of that pixel, i.e. for the two scenes between which the actual meltout of the pixel occurred. This provides insights into spatial GST patterns in the AOI without additional computational costs, as it utilizes the simulations of the Spatial DA with Sentinel-2 derived FSCA experiment.”
  L377: Does this also explain why the posterior ensemble spread is so much lower in 2017, and for the exposed ridge site in general? If not, can you comment on this? Relatedly, is 2016 the spinup year mentioned in section 3.3.1? It is not clear to me if you are showing results from the spinup period or not.
  We thank the reviewer for the questions. Early melt-out generally is more constraining on the snowpack and GST than late melt-out, which can be caused by a larger variety of snow depth, snow density and snow water equivalent combinations. The informational content gained through the DA is therefor larger in years with little snow. Similarly, the constraint on GST through early melt-out (wind exposed ridge) is larger than the constraint on GST through late melt-out (snow drift). Nevertheless, the ensemble spread in itself is difficult to interpret in this study, as it is systematically overconfident, as pointed out in the Result Section 4.1 Point-Scale DA with in-situ derived FSCA (revised manuscript). This is due to the small ensemble size and a conscious decision to not have been prioritized because of numerical costs. It nevertheless becomes evident that the ensemble mean is an improvement in comparison to the reference / open loop run:
  “The DA thus markedly improves the simulated GSTs in comparison to the reference run. However, the posterior GST ensembles generally display a small spread that often does not overlap with the observations, which indicates overconfidence by placing most of the weight on just a few parameter combinations. This is a tradeoff from using a small initial ensemble size for numerical cost reasons which makes well calibrated uncertainty quantification using this DA scheme in our experiments difficult, but it does not compromise the main goal of the scheme, which is improving the local GST representation as obtained by the ensemble mean as an optimal point estimate. By reproducing local binary FSCA the scheme finds better parameter combinations of e and f that directly allow for an overall improved representation of the local snow cover, which indirectly results in improved model estimates of GST.”
  Regarding the presented and discussed results, it is always only the second simulated year, i.e. the year with the assimilated FSCA observations and after spin-up, which is presented and analyzed. We have now clarified this in the results section in the revised manuscript.
  Fig 5. Please comment on why the RMSE of the posterior mean would be higher than the RMSE of the prior mean. This would suggest a failure of the DA scheme, perhaps because observations are too late in the season to inform the simulation in mid-winter, and a thinner mid-winter snowpack is preferred to match the melt-out date?
  We thank the reviewer for the comment. Figure 5 refers to the results of modeled GST of the Pixel-Scale DA with Sentinel-2 derived FSCA experiment, where the RMSE of the posterior mean is higher than the RMSE of the prior mean for some cases, e.g. the snowdrift in the hydrological year of 2016 (upper left figure). This is very likely due sub-pixel variability and the validation data scale of the sensor not representing the satellite data scale, which is improved by the DA scheme. We have added a more thorough discussion in the results Section 4.2 Pixel-Scale DA with Sentinel-2 derived FSCA (revised manuscript):
  “However, in some cases the DA slightly worsens the GST representation in comparison to the reference run, for example in 2016 for the wind-blown ridge where the RMSEmm is increased by 0.2 ◦C, or in comparison to the prior ensemble, for example in 2016 for the snowdrift where the RMSEmm is increased by 0.7 ◦C. This occurs due to spatial variability within Sentinel-2 pixels in the AOI, which causes a resolution mismatch, so-called representation error, when comparing to point-scale GST observations. The DA scheme improves the snowpack representation on pixel-scale, which the point location of the GST sensor is not necessarily representative for.In some cases, this uncertainty is captured by the posterior ensemble spread, as e.g. in 2017, whilst in other cases the DA fails to capture this uncertainty and is overconfident. Whether the DA reliably offers an improvement in comparison to the reference run thus depends on the representativeness of the 10x10 m pixel for the point scale, i.e. the spatial variability of the snow cover within each pixel.”
  L192: You have no observations to assimilate to until after May 1^st in your later sections. This seems to be related to solar elevation angle, suggesting that your method for deriving FSCA does not work in the arctic winter since it uses 3/6 bands from the visible spectrum. Is this true? If not, why do you only have FSCA observations after May 1^st? Highlighting this limitation could motivate the use of your technique in lower-latitude regions which would benefit from higher density assimilation data.
  We thank the reviewer for the careful reading and questions. We indeed assimilate Sentinel-2 observations only between May 1^st and August 1^st of each year. This is a pragmatic site-specific choice as there is a persistent snow-cover in our area of interest until this time. There is therefor no additional information content in earlier observations. Additionally, adjacent topography shading the AOI can become a problem in the earlier season, but not in the considered period. We have added this information to Section 3.1 Fractional Snow Covered Area (revised manuscript):
  “As there is a persistent snow cover in the AOI until May, we select observations from 1. of May to 1. of August for each hydrological year.”
  Section 4.3: The description of the back mapping runs in section 3.5 mentions only a method for calculating GST for the estimated melt out date of a pixel. How do you then arrive at the mean annual GST shown in this section? Also, for the discussion of the back mapping runs, it would be helpful to have a table or information about how many Sentinel 2 observations were available during the melt-out period for the different years.
  We thank the reviewer for the comments and suggestions. We have revised the description of the backmapping experiment in Section 3.5 DA Experiments (revised manuscript):
  “Based on the Spatial DA with Sentinel-2 derived FSCA experiment, we further produce 10 m resolution GST maps in a GST Backmapping experiment. For each pixel, we assume that melt-out occurs between the last snow-covered and first snow- free Sentinel-2 scene of that pixel. The DA runs described above provide state variables, i.e. ensemble-mean GST values, for conditions with melt-out on the individual retrieval dates. For each pixel, the GST is calculated as the average between the ensemble GSTs obtained from the DA simulations for the last snow-covered and first snow-free Sentinel-2 scene of that pixel, i.e. for the two scenes between which the actual meltout of the pixel occurred. This provides insights into spatial GST patterns in the AOI without additional computational costs, as it utilizes the simulations of the Spatial DA with Sentinel-2 derived FSCA experiment.”
  And we have added information on the number of assimilated Sentinel-2 scenes to the results section 4.3 Spatial DA with Sentinel-2 derived FSCA (revised manuscript):
  “The number of assimilated observations is 10 in 2016, 18 in 2017, 11 in 2018 and 23 in 2019.”
  L476: Visually this makes sense when comparing figures 1 and 2a/b, but an overlay of measurement points onto figure 2 would make this comparison more intuitive. Alternatively, a histogram of FSCA from the total Sentinel-2 data and from the Sentinel-2 data which has a co-located point-measurement would quantify how well the point measurements sample the true distribution.
  We thank the reviewer for the suggestion and have included the locations of the in-situ measurements in Figure 2.
  L481: should be “temporally” not “temporarily”
  We thank the reviewer for the comment and have corrected the text in Section 5.1 Challenges and Limitations in the revised manuscript:
  “In our study, the results are sensitive to the description of albedo development and surface roughness of the snowpack, both of which can be temporally (event-based) as well as spatially variable within the AOI.“
  Section 5.3: The discussion of approaches to modeling wind-redistribution of snow is not quite complete. The studies of Quéno et al., 2024, Mower et al., 2024, Reynolds et al., 2024 , and Vionnet et al., 2021, all use an intermediate-complexity snow model (of lower complexity than Marsh et al., 2024) to simulate wind-redistribution over areas much larger than the study domain shown here. Machine learning techniques, as in Saigger et al., 2026, are also an increasingly popular method to efficiently represent wind-redistribution of snow. Additionally, Sharma et al., 2023 represented wind-redistribution of snow in 250m resolution simulations with a “sophisticated” snow model as mentioned on lines 536-538.
  The notion of using data-assimilation to improve snow models is absolutely justified. Data assimilation is useful for correcting remaining model biases, and the methodology used here (calibrating a simplistic wind-redistribution scheme) is sound as a demonstration of a new DA technique. However, implying that data-assimilation is a suitable replacement for explicitly modeling wind redistribution, as I feel this section does, is somewhat outdated given these recent successes. I think this point about using more advanced wind-redistribution schemes is more of a “for future studies” one than an actual limitation of the science at the moment.
  We thank the reviewer for this helpful comment and for pointing us to including several important recent studies. In the revised manuscript, we have therefore expanded and clarified this section. We now included the mentioned studies using intermediate-complexity snow models to represent wind redistribution over larger domains and at high spatial resolution, as well as the additional example of a machine-learning application.
  Furthermore, we now state explicitly that data assimilation is not a replacement for state-of-the-art wind-redistribution schemes, but rather a complementary tool to correct remaining model biases and to improve the representation of snow states where explicit modeling alone remains challenging. We have rephrased the concluding sentences of the section accordingly.
  We have revised Section 5.3 to:
  “Key processes influencing the spatial heterogeneity of the snowpack include variations in snowfall and precipitation rates (typically on scales of kilometers), variations in the surface energy balance and thus snow sublimation and melt (tens of meters to kilometers), avalanches (hundreds of meters), as well as snow drifting due to wind (tens of meters to kilometers). Explicit representation of wind-driven redistribution of snow in complex terrain requires three-dimensional models and comes at high computational costs. This has traditionally limited their application to case studies, intermediate complexity snow-models, or small areas. Recent studies, however, demonstrate that intermediate-complexity snow models can be used to explicitely represent wind redistribution at resolutions down to 50 m (e.g. Vionnet et al., 2021; Marsh et al., 2024; Quéno et al., 2024; Mower et al., 2024). These models typically represent the snowpack with a limited number of layers, yet can successfully reproduce melt-out and SWE (Marsh et al., 2024). In Zweigel et al. (2021), the CryoGrid model is used as a laterally coupled land surface model with integrated version of the multilayered Crocus snow model (Vionnet et al., 2012) and explicit representation of wind redistribution and snow microphysics. Sharma et al. (2023) explicitly model snow redistribution at ∼ 100 m resolution using the multilayer SNOWPACK model, which includes detailed snow microphysics (Lehning et al., 1999). In addition, machine-learning approaches trained on such explicit models are emerging as an efficient way to represent wind redistribution at larger scales (e.g. Saigger et al., 2026). Further advances in explicit modeling of snow redistribution on small to intermediate scales using complex snow models can be expected in the coming years, although numerical costs remain a challenge for very large domains.
  A commonly used alternative to represent spatial variability in the snowpack is assimilating satellite-retrieved snow observations into physical snow models. This facilitates accounting for small-scale variability on the scale of the satellite resolution using computationally less expensive, one-dimensional model schemes which do not explicitly exchange snow with neighboring pixels. Such models can be applied for individual pixels (e.g. Aalstad et al., 2018), in a spatially distributed manner (e.g. Margulis et al., 2016) or for a number of representative sites in combination with clustering (e.g. Fiddes et al., 2019).
  Most of the above approaches primarily focus on the correct representation of snow depth and SWE. To study the impact of the snowpack on the ground thermal regime, the insulating properties of the snowpack must be represented, e.g. through multiple layers, transient evolution of snow microphysics and density, as well as albedo. Therefore, the assimilation of satellite-retrieved snow observations into land-surface models like CryoGrid presents significant potential (Reynolds et al., 2024). In particular, this study shows that data assimilation is indeed capable of inferring "hidden" variables related to the ground thermal regime, such as GST, which are inaccessible through direct satellite observations. Whilst data assimilation is not a replacement for state-of-the-art wind-redistribution schemes, it provides a means to correct remaining model biases and to improve the representation of snow states where explicit modeling alone remains challenging.”
  L579: “virtuous” is a strange adjective for this sentence. (“in a cycle of high moral character”)
  We thank the reviewer for the comment and have replaced “virtuous” by “iterative” in the text in Section 5.4 Data Assimilation in complex Land Surface Models in the revised manuscript:
  “A promising way forward could be the use of active learning (e.g. Alsing et al., 2019) where uncertainty-aware emulators are used to design new full model runs which are then in turn used to train the emulator in a iterative cycle to improve inference.”
  
  We appreciate the reviewer’s careful reading and insightful comments, which have significantly strengthened the revised version.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3142-AC2

Clarissa Willmes, Kristoffer Aalstad, and Sebastian Westermann

Supplement

https://doi.org/10.5194/egusphere-2025-3142-supplement

Clarissa Willmes, Kristoffer Aalstad, and Sebastian Westermann

Viewed

Total article views: 9,078 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
7,490	1,390	198	9,078	409	224	254

HTML: 7,490
PDF: 1,390
XML: 198
Total: 9,078
Supplement: 409
BibTeX: 224
EndNote: 254

Views and downloads (calculated since 17 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	5,623	63	29	5,715
Oct 2025	655	175	30	860
Nov 2025	164	234	19	417
Dec 2025	143	259	34	436
Jan 2026	265	269	24	558
Feb 2026	185	111	22	318
Mar 2026	343	215	28	586
Apr 2026	65	30	2	97
May 2026	39	20	8	67
Jun 2026	5	6	1	12
Jul 2026	3	8	1	12

Cumulative views and downloads (calculated since 17 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	5,623	63	29	5,715
Oct 2025	655	175	30	860
Nov 2025	164	234	19	417
Dec 2025	143	259	34	436
Jan 2026	265	269	24	558
Feb 2026	185	111	22	318
Mar 2026	343	215	28	586
Apr 2026	65	30	2	97
May 2026	39	20	8	67
Jun 2026	5	6	1	12
Jul 2026	3	8	1	12

Viewed (geographical distribution)

Total article views: 9,055 (including HTML, PDF, and XML) Thereof 9,055 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 20 Jul 2026

Short summary

In permafrost areas, the spatial variability of the snow cover significantly affects the ground thermal regime. This study presents an algorithm that integrates satellite data into a permafrost model to enhance the spatial detail of ground temperature simulations. At a site on Svalbard, the algorithm improves simulations of ground temperatures especially for wind-blown ridges with little snow, as well as snow-drift sites.


Total:	0
HTML:	0
PDF:	0
XML:	0