<em>snowman</em>: an open-source R package for automated 30-m snow and ice cover mapping using the Landsat archive

Niittynen, Pekka

doi:10.31223/X5FF32

Preprints

https://doi.org/10.31223/X5FF32

Preprints

08 Apr 2026

| 08 Apr 2026

snowman: an open-source R package for automated 30-m snow and ice cover mapping using the Landsat archive

Pekka Niittynen

Abstract. Seasonal snow and ice cover are critical components of the cryosphere yet mapping their dynamics at ecologically relevant spatiotemporal scales remains challenging. Here I present snowman, an open-source R package and algorithm for automated mapping of snow and ice cover dynamics at 30-m resolution using Landsat satellite imagery (1982–present). The algorithm combines globally trained probabilistic Random Forest classifiers with pixel-wise generalised additive models to estimate snow phenology metrics—including snow cover duration, snowmelt timing, and new-snow onset—across any location on Earth, without requiring specialist expertise in remote sensing. Trained on 691,925 manually labelled points from 529 Landsat scenes across 49 globally distributed sites, the classifier achieved an overall accuracy of 96.3 % on an independent 15,000-point test dataset, compared to 80.0 % for traditional normalised difference snow index-based (NDSI) approaches. Critically, snowman retained up to 2.2 times more usable observations than NDSI methods across a cloud-prone mountain landscape, enabling more detailed estimation of the snow dynamics. At two Finnish weather stations, snowman estimated snow cover duration, snowmelt timing, and new-snow onset to within 3–11 days of multi-year station records. Snow phenology maps showed strong spatial correspondence with independent fine-scale satellite-borne snow classifications (Pearson r = 0.79–0.83) and a high-resolution microclimate dataset (r = 0.82). The snowman algorithm is fully automated and scalable from personal computers to high-performance computing environments and offers a reproducible tool for snow and ice monitoring in climate science, hydrology, and ecological research.

Received: 06 Mar 2026 – Discussion started: 08 Apr 2026

Pekka Niittynen

Status: final response (author comments only)

Subscribe to comment alert

RC1:
'Comment on egusphere-2026-1268', Anonymous Referee #1, 02 Jun 2026
I took time to carefully read this manuscript. It presents snowman, an open-source R package for automated 30 m snow and ice cover mapping using the Landsat archive. I find the paper very interesting and potentially valuable for the community. It provides a practical tool for users who want to exploit Landsat for snow applications without relying on Google Earth Engine, and it makes the approach accessible to R users. The use of a probabilistic Random Forest classifier is also a valuable alternative to more traditional NDSI-based workflows.
However, I think the manuscript currently overstates the novelty and generality of the approach. In several respects, and specifically for snow phenology metrics, the proposed workflow is close to recent Landsat-based snow phenology approaches, which derived long-term 30 m snow melt-out dates from Landsat time series and explicitly analysed NDSI thresholds and sensitivity to observation number (see references below). The present manuscript is interesting, because the method is different, open-source in R, and relies on Random Forest classifier rather than direct NDSI values, and so it will be a valuable contribution for the community without any doubt.
I strongly recommend improving the integration of recent remote-sensing snow-mapping literature, clarifying the multi-year nature of the method, justifying the many thresholds used in the workflow, and better demonstrating performance across the full Landsat archive, including the earlier decades.
The abstract currently overstates what the algorithm seems to be capable of, or at least what has been demonstrated. In particular, the manuscript does not demonstrate that the algorithm can derive 30 m snow dynamics from 1982 to the present in the sense of year-to-year estimates. The current wording could lead readers to think that the method provides annual snow dynamics across the full Landsat archive, whereas the workflow appears mainly designed to estimate multi-year average snow phenology metrics. This should be clarified in the abstract and introduction. Beyond that, here are major and minor comments.
Major comments
The paper presents snowman as filling an important gap, but several recent studies have already proposed Landsat-based or Landsat/Sentinel-2 snow products at medium to high spatial resolution. In particular, Bayle et al. (2025) proposed a 30 m Landsat-based snow melt-out date product across European temperate mountains and analysed NDSI threshold choice and observation-number sensitivity. Poussin et al. (2025) produced a 37-year Landsat/Sentinel-2 snow-cover time series for Switzerland. Barrou Dumont et al. (2025) combined SPOT, Landsat, and Sentinel-2 to derive 38 years of annual snow melt-out dates over the French Alps and Pyrenees, with validation against 276 in situ stations. These papers should be discussed in the introduction, not only as background references but as direct methodological comparators. The authors should explain precisely what snowman adds relative to these works: for example, global applicability, R implementation, avoidance of GEE, probabilistic classification, cloud/shadow handling, lake ice mapping, or easier transferability to user-defined AOIs. At present, the manuscript risks implying that Landsat is almost unused for snow phenology, whereas recent work shows that its use is increasing.

The abstract and introduction do not make it clear enough that snowman is primarily designed to estimate multi-year average snow phenology metrics, rather than reliable year-by-year snow dynamics. This distinction is important. The manuscript states that Landsat’s spatiotemporal sparsity makes snow dynamics best estimated over a multi-year period, but this should be stated explicitly in the abstract and early introduction. The term “snow dynamics” is also ambiguous. In places it seems to mean snow phenology indicators such as snow cover duration, melt-out date, and new-snow onset; elsewhere it could be interpreted as temporal snow-cover changes or annual dynamics. I suggest replacing broad wording such as “snow dynamics” with more precise wording such as “snow phenology metrics”, unless annual estimates are actually supported and validated.

One of the main advantages of Landsat is its four-decade archive. The manuscript repeatedly emphasizes Landsat’s temporal depth, but the validation and examples focus mainly on recent decades, especially 2014–2023. This leaves an important gap: can the algorithm perform as well with Landsat 4/5 TM and Landsat 7 ETM+ imagery from the 1980s, 1990s, and early 2000s? This is especially important because early Landsat data have different sensors, lower acquisition density in some regions, potential geolocation issues, and fewer observations. The author should either provide a specific validation for earlier decades or temper claims about exploiting the full archive. At minimum, the discussion should clearly state that the method’s performance in the earliest decades remains less well demonstrated.

The manuscript states that 529 Landsat images from 49 areas were manually labelled, and that separate models were fitted for TM, ETM+, and OLI sensors. This is promising, but the reader needs more information. Please provide the distribution of the training images by: (1) Landsat sensor: TM, ETM+, OLI; (2) decade or year; (3) season/month; (4) class balance across land, water, snow/ice, cloud, and artefact. This is essential to assess whether the training data are representative of the full Landsat archive. The model may perform well on recent OLI imagery but less well on TM imagery from earlier decades. The manuscript should show that this has been considered.

The Random Forest classifier clearly improves pixel-level classification accuracy relative to simple NDSI thresholds in the presented validation (maps look better). However, I am not yet convinced that this necessarily translates into better snow phenology indicators, especially melt-out date. For melt-out timing, the method ultimately uses a probability threshold of 0.5, which is another form of binarization. In that sense, the approach is conceptually similar to applying a threshold to NDSI or snow-cover fraction. The advantage of NDSI is that its physical basis is well understood. It should also discuss why a more complex, partly black-box model is preferable to simpler and physically interpretable spectral indices, especially when recent papers have studied the relation between NDSI, snow cover fraction, snow depth, and melt-out timing.

The manuscript uses thresholds such as at least 50 Landsat images and at least 20 observations per pixel, but the implications of these choices are not sufficiently explored. This is a major issue because in many mountain regions, especially before 2013, 20 clear-sky observations in a single year is often unrealistic. More importantly, 20 observations across a year do not guarantee observations during the critical periods of snow onset or melt-out. Bayle et al. (2025) explicitly analysed sensitivity to the number of observations, which is highly relevant here. The snowman manuscript should include a similar sensitivity analysis or discuss how the reliability of snow cover duration, melt-out date, and new-snow onset depends on observation density and seasonal distribution. It would provide a very interesting analysis for the snow mapping community. The current threshold of 20 observations appears arbitrary. It may be acceptable as a default, but it needs stronger justification and uncertainty guidance.

The manuscript presents snowman as an alternative to private-company tools such as Google Earth Engine. This is a fair and useful distinction, but the workflow still relies on the Microsoft Planetary Computer. This does not invalidate the approach, but the distinction should be phrased more carefully. The key advantage may not be complete independence from private infrastructure, but rather that the processing is open-source, portable, and executable locally or on HPC systems, while Microsoft Planetary Computer is used mainly as a data-access backend.

Minor comments :
L90. I agree that Landsat has historically been underutilized in snow research compared with MODIS or Sentinel-2. However, this statement should be nuanced, because Landsat-based snow-cover and snow-phenology studies are becoming increasingly common, including several very recent examples that are directly relevant to the present manuscript.
L105. You state that the spatiotemporal sparsity of Landsat data makes it difficult to estimate “snow dynamics” over a single year. Could you clarify what you mean by “snow dynamics” here? This term could refer either to continuous snow-cover changes through time or to derived snow phenology indicators such as snow cover duration, snow melt-out date, and new-snow onset.
Predictor variables. The set of predictor variables is very well constructed and interesting. Well done for that. Still, I’m not sure I understand why you used a land cover classification as a predictor for a land cover classification. This land cover map is static, so I do not see how it could be relevant for Landsat images acquired far from the date of the land cover product, especially for the earliest decades of the Landsat archive. Please clarify the rationale for including this predictor, and ideally show whether it improves the classification results.
L264. 50 Landsat images over what ? The temporal window asked by the user ? The AOI ? Both ?
L273. I’m not sure to understand what you mean by “the class probability is utilized as the weight of the observations in the subsequent models”. Which class are you talking about ? What does “weight of the observations” means ?
L286. “At least 20 observations”. Over what period ? It’s not clear to me if the algorithm always works on a “day of year” basis, on which you decide how many years are aggregated to compose the initial time series, or if it will compute the snow dynamic metrics on a yearly basis if you ask for multiple years. In many mountains of the world, it’s rare to have 20 clear sky observations over a single year, so it basically mean that your algorithm is not suited for year to year estimates of snow dynamics over the four decades. It might work for the most recent one, but hardly before 2013. Plus, having 20 observations throughout the year don’t say much about the number of observations at the most critical moment, snow onset and snow melt-out. It’s not an issue if the algorithm have these limitations, but it should be stated and discussed more thoroughly in the abstract, introduction and discussion.
L358. See https://www.nature.com/articles/s41597-025-05044-2 for more information on the best NDSI threshold for Landsat surface reflectance product which is found to be around 0.15 instead of 0.4.
L517. For snow dynamic indicators, I am not fully convinced that the Random Forest approach is necessarily better than a simple NDSI-based metric. For snow melt-out date, you use a probability threshold of 0.5 to determine the date, which is still a form of binarization, as can also be done with NDSI. If snow probability is intended to mimic snow cover fraction, then this is conceptually similar to applying a threshold to NDSI to estimate melt-out timing. The difference is that, for NDSI, several studies have examined its relationship with snow cover fraction or snow depth, which helps interpret what the detected melt-out date represents. In your case, what does a snow probability threshold of 0.5 mean in terms of snow fraction, snow depth, or surface condition? More generally, if a simple method based on well-understood spectral properties of snow works reasonably well, what is the added value of a more complex and partly black-box Random Forest algorithm?
L535. This limitation should be acknowledged earlier in the paper.
References :
https://www.nature.com/articles/s41597-025-05044-2
https://tc.copernicus.org/articles/19/2407/2025/
https://www.nature.com/articles/s41597-025-04961-6
Citation: https://doi.org/10.5194/egusphere-2026-1268-RC1
RC2: 'Comment on egusphere-2026-1268', Anonymous Referee #2, 05 Jul 2026

Overall assessment
The manuscript presents a useful R package and a substantial engineering effort: an open-source, locally executable Landsat workflow for probabilistic snow/ice classification and snow phenology estimation. However, the current framing overstates what has been demonstrated. The evidence supports a promising software tool and classifier, but not yet a validated global snow-and-ice product, not a demonstrated scientific advance in snow phenology, and not a robust outperformance claim against current snow products. The strongest claims in the abstract—global applicability, 96.3% accuracy, superiority to NDSI, doubled usable observations, and 3–11 day phenology accuracy—need either stronger evidence or substantial qualification.
Recommendation: Major revisions.
Major comments
The 96.3% accuracy claim is not independent validation
The “independent” test set is independent of the training points, but not independent of the labelling process. Both training and test labels were produced by visual interpretation of Landsat imagery, apparently by the same interpreter, using the same spectral information that the classifier learns from. The test set therefore measures agreement with one manual Landsat interpretation protocol, not accuracy against independent surface truth. The paper should not present the 96.3% figure as external validation.
The manuscript should describe the labelling protocol in enough detail to be reproducible, report inter- or intra-interpreter consistency, and state clearly that the 96.3% is agreement with manual photointerpretation. A second interpreter, blind relabelling of a subset, or a site-held-out validation would substantially strengthen the claim. The test points and labels should also be released, because they are central to the manuscript’s main quantitative result.
The NDSI comparison is not a clean benchmark
The current comparison does not establish that snowman generally outperforms existing snow-mapping approaches. It compares a multi-predictor Random Forest against a simple NDSI/CFMask pipeline, while snowman itself uses CFMask-derived probabilities and ESA WorldCover information as predictors. This makes parts of the comparison partly circular, especially for clouds, water, and cloud-shadow strata. The snow-class improvement may be real and important, but the headline overall-accuracy contrast mixes snow detection with cloud/water detection and with predictors that the model already sees.
The comparison should be reframed around the specific claim that matters: snow detection under canopy, shadow, mixed pixels, and cloud-prone conditions. At minimum, report class-specific snow omission and commission errors, and evaluate the subset of observations that NDSI/CFMask discards but snowman retains. “More observations” is not necessarily better unless those recovered observations are shown to be accurate.
The paper should also benchmark against established fractional snow products where overlap allows. The USGS Landsat Collection 2 fSCA product provides 30 m fractional snow products, including viewable snow and canopy-adjusted ground snow layers; Copernicus HRSI FSC provides 20 m top-of-canopy and on-ground fractional snow cover, with the on-ground layer accounting for tree cover density. This is directly relevant to the manuscript’s novelty claim about canopy handling.
The accuracy assessment lacks the metrics needed for a phenology product
Overall accuracy is not an adequate primary metric here. The class distribution is imbalanced and design-dependent, and the table reports mainly recall-like quantities. It does not show whether pixels classified as snow are actually snow. That omission matters because snow commission and omission have opposite effects on snow cover duration, melt-out, and onset. Please provide the full confusion matrix, class-specific producer’s and user’s accuracy, balanced accuracy, precision, recall, and F1 that account for the clustered sampling design. A good example is Cannistra, A. F., Shean, D. E., & Cristea, N. C. (2021). High-resolution CubeSat imagery and machine learning for detailed snow-covered area. Remote Sensing of Environment, 258, 112399.
The snow cover duration metric depends on unvalidated probability calibration
The paper defines “snow cover duration” as the sum of GAM-predicted snow probabilities over a year. That quantity is interpretable as “days” only if the probabilities are calibrated estimates of snow presence. Yet the PlanetScope comparison explicitly states that classifier probability should not be interpreted directly as fractional snow cover. These two statements are in tension.
The manuscript needs to resolve this. Either demonstrate calibration, for example with reliability diagrams and calibration error, or avoid interpreting the probability integral as snow-cover duration in days. The paper should also state which duration metric users should prefer: raw RF duration, GAM days above 0.5, or summed GAM probability. At present, the workflow produces multiple duration-like quantities but does not justify the primary one.
The phenology validation is too narrow for the global claim
The quantitative phenology validation is limited to two Finnish weather stations, three PlanetScope dates at one Finnish site, and one microclimate comparison at the same general site. The station comparison is also partly self-referential because station melt and onset dates are derived using a similar binomial GAM. PlanetScope validates instantaneous fractional snow patterns on three dates, not multi-year phenology, and the microclimate start-of-season product is explicitly not equivalent to snowmelt.
The manuscript should either add validation in contrasting snow regimes—maritime, alpine, forested, ephemeral, Southern Hemisphere, and older TM/ETM+ periods—or substantially soften the claim that the method works “across any location on Earth.” The current evidence is promising but geographically and climatically narrow.
“Ice” is claimed but not validated
Snow and ice are combined into a single class, and ice-specific performance is not evaluated. Yet the title, abstract, Figure 2 caption, and discussion present the method as snow-and-ice mapping. Without lake, river, glacier, or coastal ice validation, the manuscript should either remove or strongly qualify the ice-mapping claims. A combined “snow or ice” class is not enough to support ice phenology or ice-cover-duration claims.
Software and reproducibility need a versioned release
The package and model objects should be archived in a versioned release linked to the manuscript, preferably with a DOI. The current availability statement points only to GitHub source code. For reproducibility, the archived material should include a package version for future reference.

Citation: https://doi.org/10.5194/egusphere-2026-1268-RC2

Pekka Niittynen

Viewed

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 246 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
245	0	1	246	0	0

HTML: 245
PDF: 0
XML: 1
Total: 246
BibTeX: 0
EndNote: 0

Views and downloads (calculated since 08 Apr 2026)

Month	HTML	PDF	XML
Apr 2026	150	0	150
May 2026	84	0	84
Jun 2026	11	1	12
Jul 2026	0

Cumulative views and downloads (calculated since 08 Apr 2026)

Month	HTML	PDF	XML
Apr 2026	150	0	150
May 2026	84	0	84
Jun 2026	11	1	12
Jul 2026	0

Viewed (geographical distribution)

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 242 (including HTML, PDF, and XML) Thereof 242 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 05 Jul 2026

Short summary

Snow cover is vanishing fast globally. I developed snowman, a free software tool that uses 40 years of satellite imagery to map snow and ice at fine spatial scales anywhere on Earth. By combining machine learning with statistical modelling, it detects snow more accurately than existing methods. This makes long-term, detailed snow monitoring accessible to any researcher, helping scientists better understand how shrinking snowpack affects water, wildlife, and communities worldwide.


Total:	0
HTML:	0
PDF:	0
XML:	0