the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Identification and correction of snow depth bias in ERA5 datasets over Central Europe using machine learning
Abstract. Accurate estimation of snow depth is a crucial problem from both meteorological and hydrological points of view. Global and regional reanalyses still struggle to address it, mostly because the scale of snow spatial heterogeneity is widely beyond current resolutions of the databases. In the study, snow depth estimation from Copernicus reanalyses ERA5 and ERA5-Land are compared and evaluated against point measurements in Poland, Czech Republic and Slovakia in winter seasons 2001/2002–2020/2021. Additionally, a Random Forests (RF) model is developed to reduce identified errors based on various environmental variables and parameters derived from the reanalyses and a digital elevation model. As mountains are main snow water reservoirs for Central Europe, the model is then used to spatially downscale snow depth over a fine-scaled subdomain in mountainous terrain.
For both reanalyses, the deviations are relatively small in flat or gently rolling terrain (below 500 m a.s.l.). ERA5 (0.25°) outperforms ERA5-Land (0.1°) due to the presence of data assimilation. Since only synop measurements are assimilated, errors are the lowest for these stations, however, lower-ranked stations are also affected. In more complex terrain, both reanalyses exhibit an underestimation of snow that increases with elevation. In this area, ERA5-Land is slightly less biased due to its higher resolution and the fact that observations from mountainous sites are often masked out from the data assimilation in ERA5. The proposed RF model improves accuracy of estimation by around 48% with respect to the best reanalysis. The results of spatial downscaling certainly provide added value to the problem of snow estimation in complex terrain. Although they cannot be considered entirely valid and reliable since not all factors determining spatial variability of snow at such resolution are taken into account, they might be useful for future studies concerning, e.g., climatological variability of snow with respect to altitudinal zonation.
- Preprint
(3253 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-5084', Anonymous Referee #1, 17 Dec 2025
-
AC1: 'Reply on RC1', Gabriel Stachura, 01 Feb 2026
Dear Reviewer,
we are grateful for your suggestions regarding the manuscript. All of them were very relevant and undoubtedly improved the quality of the paper. Attached you can find our response to each of your comment. We remain open for suggestions on further improvement.
-
AC1: 'Reply on RC1', Gabriel Stachura, 01 Feb 2026
-
RC2: 'Comment on egusphere-2025-5084', Anonymous Referee #2, 19 Dec 2025
This article evaluates the accuracy of snow-depth estimates from the Copernicus ERA5 and ERA5-Land reanalyses using station observations from Central Europe for winters 2001/2002–2020/2021 and shows that both products perform reasonably well in lowland areas but increasingly underestimate snow in complex mountainous terrain. Additionally, authors developed a Random Forest model for estimating snow depth and achieved approximately a 48% improvement over the best reanalysis product. The RF model was also tested for the downscaling of snow depth in mountainous areas.
The article is mostly well written, but there are a few points that would benefit from further clarification; see comments below. As the authors discuss, ERA5 and ERA5-Land have been evaluated using meteorological data in previous studies, but the novelty here lies in their assessment over Central Europe. I also think the paper would benefit from testing the RF model with spatially split data to evaluate how well the model performs in areas with no available reference data.
Comments:
L202 Can you comment if 70 predictors are in normal range for this type of models?
L204 Why was number of trees set to 100?
L214 Can you elaborate bit more why no spatial split of data was tested? Even for just few years? It would be interesting to see how model performs in new locations.
Figure 2 The caption text is little confusing, maybe clarify that left side are the input predictors and right is the target (station data) used for training.
L 248 I recommend adding subsections to the result section to make it easier to follow.
L248-263 This paragraph is quite long, considered dividing into two parts
L260 What about station near the sea level but away from the shore?
Figure 7, Quite hard to see different years in figure (even bolded ones)
L312 Might be good idea to remind reader about the resolutions of the grids here (or in figure 8 caption)
L315 Can you clarify if data from the stations used to validate downscaling was also used for training model?
L424 There are two commas after “all”
L455 While it’s clear to most readers what “this part of Ventral Europe” means, might be good idea to be bit more precise here
Citation: https://doi.org/10.5194/egusphere-2025-5084-RC2 -
AC2: 'Reply on RC2', Gabriel Stachura, 01 Feb 2026
Dear Reviewer,
we are grateful for your suggestions. The manuscript have been revised according to them. We believe that your remarks have made our paper clearer and more informative. Attached you can find a detailed list of our responses to each of your comment.
-
AC2: 'Reply on RC2', Gabriel Stachura, 01 Feb 2026
-
RC3: 'Comment on egusphere-2025-5084', Anonymous Referee #3, 22 Dec 2025
Stachura and Ustrnul evaluate snow depth from ERA5 and ERA5-Land for Central Europe using in-situ measurements and train a random forest model to downscale snow depth. The topic is in line with TC and is relevant, also it seems to be the first time it is conducted in Central Europe. The manuscript is well written and thoroughly discusses also its own limitations. I have only a few general remarks:
-
One major missing element from the manuscript is the discussion of elevation mismatch between the coarse reanalysis gridcells and stations. Both in terms of evaluation, because from all previous studies it emerges that if one accounts for these differences, errors drop considerably. But also for the RF, it could be a key input variable.
-
In the RF date, day, month, and year are used as input. In an operational setting, year and date would not be available? From the variable importance analysis, they seem to have some influence. Would it make sense to test a model without these variables?
-
Sec 2.5 unclear how you split into training, test and validation sets. Was a validation set used at all? Similar to the previous reviewers, I strongly suggest including a validation set in the spatial domain. Moreover, it could be useful to give summary metrics for the different sets (training, test, validation), to see how well the model generalizes.
-
Example downscaling: it unclear if the authors used interpolation of surface meteorology from stations using bicubic? Or what variables were interpolated to perform the downscaling? Note that simple bicubic is not appropriate for variables that have a strong elevation dependency such as temperature, humidity, … I don’t know if this might be an explanation for the errors found. Of course it is difficult to validate such a dataset, but have you considered remote sensing products based on MODIS, such as globalsnowpack from the DLR or ESA snow CCI? Of course you’d have to convert snow depth to snow presence, but it could give you some independent spatial information.
Minor points:
-
L69: Avanzi and Fontrodona are not appropriate references for the statement.
-
L90: I guess topographic complexity can also be high in the Americas and HMA, depending on where you are.
Citation: https://doi.org/10.5194/egusphere-2025-5084-RC3 -
AC3: 'Reply on RC3', Gabriel Stachura, 01 Feb 2026
Der Reviewer,
we are very grateful for all the suggestions you made. They have been incorporated throughout the manuscript. Particularly, spatial split have been introduced and data from the GlobSnowPack base have been added at the stage of verification of the spatial downscalling experiment. Please find attached a detailed list of responses to your comments. We remain open to further improvements.
-
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 171 | 93 | 25 | 289 | 14 | 15 |
- HTML: 171
- PDF: 93
- XML: 25
- Total: 289
- BibTeX: 14
- EndNote: 15
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This study evaluates ERA5 and ERA5-Land snow depth estimates using in situ observations in Poland, the Czech Republic, and Slovakia. Additionally, it explores the potential of machine learning for improving snow depth estimates in complex terrain. The manuscript is generally well-written, and the methods and results are clearly presented. The inclusion of a machine learning approach provides valuable new insights.
However, some improvements are needed to strengthen the manuscript:
Minor Comments