the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankylä, Finland
Abstract. Snowpack characteristics such as snow depth and snow water equivalent (SWE) are widely studied in regions prone to heavy snowfall and long winters. These features are measured in the field via manual or automated observations and over larger spatial scales with stand-alone remote sensing methods. However, individually these methods may struggle with accurately assessing snow depth and SWE in local spatial scales of several square kilometers. One method for leveraging the benefits of each individual dataset is to link field-based observations with high-resolution remote sensing imagery and then employ machine learning techniques to estimate snow depth and SWE across a broader geographic region. Here, we combined field-based repeat snow depth and SWE measurements over six instances from December 2022 to April 2023 in Sodankylä, Finland with Light Detection and Ranging (LiDAR) and WorldView-2 (WV-2) data to estimate snow depth, SWE, and snow density over a 10 km2 local scale study area. This was achieved with an object-based machine learning ensemble approach by first upscaling more numerous snow depth field data and then utilizing the estimated local scale snow depth to aid in estimating SWE over the study area. Snow density was then calculated from snow depth and SWE estimates. Snow depth peaked in March, SWE shortly after in early April, and snow density at the end of April. The ensemble-based approach had encouraging success with upscaling snow depth and SWE. Associations were also identified with carbon- and mineral-based forest surface soils, alongside dry and wet peatbogs.
- Preprint
(2648 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 30 Apr 2025)
-
RC1: 'Comment on egusphere-2024-3936', Anonymous Referee #1, 26 Mar 2025
reply
Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankylä, Finland
egusphere-2024-3936
March 2025
General Comments:
Brodylo et al.’s manuscript is well-written, structured clearly, and supported by strong graphical presentation, providing a straightforward exploration into snow depth and snow water equivalent (SWE) estimation using an ensemble machine learning approach. The integration of LiDAR, remote sensing imagery, and in-situ observations is logical and aligns well with the type of studies frequently published in this journal. However, I have several significant concerns regarding the novelty of the approach, methodological clarity, and the limited sample size—particularly for SWE estimation—that need to be thoroughly addressed before the paper can be considered for publication. I have outlined these major concerns, along with specific suggestions for improvement, in detail below.
Major Comments:
1. Currently, the paper's primary novel contributions are unclear to me. While the presented approach effectively integrates established practices (ensemble machine learning methods, LiDAR-based snow depth estimation), the methodological novelty seems incremental and primarily focused on application in the specific context of Sodankylä, Finland. Intuitively, an ensemble approach should outperform individual techniques; however, given the limited sample size—especially with SWE data (only around a dozen observations)—it becomes challenging to conclusively demonstrate superiority over simpler, more traditional methods such as multiple linear regression. Indeed, as highlighted in Table 3, some machine learning models significantly underperform in certain months, likely due to this limited dataset. Thus, at present, the main takeaways and broader scientific significance are somewhat ambiguous. I encourage the authors to clearly articulate the core contributions of their approach, considering the constraints posed by dataset size. If a stronger case for novelty can be made, particularly in comparison to simpler or previously established methods, this would greatly strengthen the manuscript, as I am currently unsure of the main takeaways.2. Further clarity is needed regarding the training and validation processes for the machine learning models. The authors briefly mention using a "k-fold" validation but do not clearly specify how the data was partitioned into training, validation, and test sets at each step. Important details are missing, such as whether splits were random or sequential—random splits could inadvertently introduce spatial autocorrelation issues. Additionally, specifics on the machine learning implementations are essential. For instance, how deep were the random forest trees allowed to grow? What structure was adopted for training the multi-layer perceptron—including the number of hidden layers, neurons per layer, activation functions, epochs, and optimization methods? Providing visualizations of training and validation curves for MLP models would also help clarify the model training and generalization processes. These details are crucial for reproducibility and fully understanding the robustness of the results.
3. Given the inherently spatial nature of snow depth and SWE, I'm curious if the authors considered employing machine learning methods specifically designed to leverage spatial dependencies in data. The current choice of models—MLR, RF, and MLP—generally treats each data point independently, potentially losing valuable spatial context unless explicitly provided as an input feature. Models that explicitly capture spatial information (e.g., convolutional neural networks like U-Nets, or vision transformer approaches) could better represent the spatial variability across diverse land types. Exploring spatially-aware methods, despite your current dataset limitations, could significantly increase the novelty and impact of your study.
4. Finally, I also feel that this paper would really benefit from a more comprehensive comparison to existing approaches in the literature. Although your method is LiDAR-derived, related studies by Bair et al. (2018), King et al. (2020), Liljestrand et al. (2024), Shao et al. (2022), and Vafakhah et al. (2022) (amongst others) have utilized similar ML methodologies (RF and neural-network-based architectures) to predict regional variations of SWE. A clearer positioning of your work in relation to these papers would not only help justify the novelty of your method but also allow readers to better appreciate your contributions relative to the current state-of-the-art approaches. Such contextualization could also probably help address some of the concerns I raise in Comment 1 regarding methodological novelty.
Minor Comments:
- Lines 89: With all the different datasets being used here, I wonder if a summary table listing their names, variables, resolution, and source would help better situate readers?
- Lines 162-163: It wasn’t totally clear to me what this RF classification scheme was referring to here? Why is this step necessary?
- Section 3.1: I also don’t fully understand this image segmentation step and how it is “utilized as the spatial unit for image assessment”. Why does this need to be done for this project, and how are the resulting segments used in the models afterwards?
- Lines 189-192: I think this section is important, and I would add a little more detail describing each of these models and how they’ve been used in other studies, as they really underpin your main results. For instance, I’d mention bootstrapping and aggregation in the RF, and I would rework your description of the ANN (as the linkage to the human nervous system is somewhat spurious) and not a clear description of how it actually works (i.e., a feedforward directed acyclic graph connected with artificial neurons with nonlinear activation functions)
- Lines 203-204: Do you know why the SVM performance so poor? I’m wondering if the the sample was simply too small for this approach? This goes back to my earlier major point that the same issue with the limited SWE data is also likely impacting the other models. However, it does feel a bit odd to me to just choose to not include a model in some cases due to poor performance when using an ensemble approach
- Eqs. 1/2/3: This is personal preference but these are all very common metrics that don’t need to be explicitly defined in this work
- Lines 258-260: From a physical perspective, what do you think is causing this large swing in performance for the ANN over these months? Is there something about the onset snow in December that makes this an especially challenging task for the NN?
- Table 1: For this table and the others after, I am wondering if this would be more interpretable as a bar graph? Comparing so many numbers in a table like this can bit a bit challenging
- Table 2: Similar to my previous table comment
- Figure 5: The red->green color scheme for snow depth can be challenging to view for color blind individuals, and I would recommend moving to something more accessible
- Lines 318-319: Was the SVM left out because it had bad performance everywhere for SWE? As you state, the RF was also inconsistent for SWE prediction, but was still included in this part of the analysis
- Lines 344-362: I appreciate the detail the authors put into comparing SWE over various land cover types, however this section (and other similar paragraphs) are a bit challenging to parse in their current form. Currently, you list many statistics in a row, and it isn’t fully clear to me what I am to take from all of these stats? I wonder if you could restructure these paragraphs to highlight the most important findings and relate those to what the predictive accuracy means for each land cover type?
- Lines 428-429: When referring to EA here, it sounds as if it is it’s own technique, but really it is just a combination of the MLR/RF/MLP. And this enhanced performance in the EA is because of high variability in individual models with biases which mostly cancel out resulting in a more stable prediction. So is this section speaking primarily to the high variability of individual models?
- Line 430: I would reword this sentence “EA consistently produced the best or second best metrics, and generally produced the best metrics”
- Lines 471-475: Could you have included reanalysis estimates from say ERA5 to provide temperature, humidity and pressure data to your models? While coarse, this would perhaps give you some additional information about the surrounding environmental context at the time of observation?
- Lines 501-502: I would strongly recommend including some code for reproducing at least a subset of these results, perhaps in an interactive notebook uploaded to Google Colab with some test data? Then others could more easily test and build on what you have provided here
References
Bair, E. H., Abreu Calfa, A., Rittger, K., & Dozier, J. (2018). Using machine learning for real-time estimates of snow water equivalent in the watersheds of Afghanistan. The Cryosphere, 12(5), 1579–1594. https://doi.org/10.5194/tc-12-1579-2018
King, F., Erler, A. R., Frey, S. K., & Fletcher, C. G. (2020). Application of machine learning techniques for regional bias correction of snow water equivalent estimates in Ontario, Canada. Hydrology and Earth System Sciences, 24(10), 4887–4902. https://doi.org/10.5194/hess-24-4887-2020
Liljestrand, D., Johnson, R., Skiles, S. M., Burian, S., & Christensen, J. (2024). Quantifying regional variability of machine-learning-based snow water equivalent estimates across the Western United States. Environmental Modelling & Software, 177, 106053. https://doi.org/10.1016/j.envsoft.2024.106053
Shao, D., Li, H., Wang, J., Hao, X., Che, T., & Ji, W. (2022). Reconstruction of a daily gridded snow water equivalent product for the land region above 45° N based on a ridge regression machine learning approach. Earth System Science Data, 14(2), 795–809. https://doi.org/10.5194/essd-14-795-2022
Vafakhah, M., Nasiri Khiavi, A., Janizadeh, S., & Ganjkhanlo, H. (2022). Evaluating different machine learning algorithms for snow water equivalent prediction. Earth Science Informatics, 15(4), 2431–2445. https://doi.org/10.1007/s12145-022-00846-z
Citation: https://doi.org/10.5194/egusphere-2024-3936-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
129 | 27 | 7 | 163 | 6 | 5 |
- HTML: 129
- PDF: 27
- XML: 7
- Total: 163
- BibTeX: 6
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 77 | 44 |
Germany | 2 | 14 | 8 |
China | 3 | 12 | 6 |
France | 4 | 8 | 4 |
Netherlands | 5 | 6 | 3 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 77