Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankyl&auml;, Finland

Brodylo, David; Bosche, Lauren V.; Busby, Ryan R.; Deeb, Elias J.; Douglas, Thomas A.; Lemmetyinen, Juha

doi:10.5194/egusphere-2024-3936

Preprints

https://doi.org/10.5194/egusphere-2024-3936

Preprints

23 Jan 2025

| 23 Jan 2025

Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankylä, Finland

David Brodylo, Lauren V. Bosche, Ryan R. Busby, Elias J. Deeb, Thomas A. Douglas, and Juha Lemmetyinen

Abstract. Snowpack characteristics such as snow depth and snow water equivalent (SWE) are widely studied in regions prone to heavy snowfall and long winters. These features are measured in the field via manual or automated observations and over larger spatial scales with stand-alone remote sensing methods. However, individually these methods may struggle with accurately assessing snow depth and SWE in local spatial scales of several square kilometers. One method for leveraging the benefits of each individual dataset is to link field-based observations with high-resolution remote sensing imagery and then employ machine learning techniques to estimate snow depth and SWE across a broader geographic region. Here, we combined field-based repeat snow depth and SWE measurements over six instances from December 2022 to April 2023 in Sodankylä, Finland with Light Detection and Ranging (LiDAR) and WorldView-2 (WV-2) data to estimate snow depth, SWE, and snow density over a 10 km² local scale study area. This was achieved with an object-based machine learning ensemble approach by first upscaling more numerous snow depth field data and then utilizing the estimated local scale snow depth to aid in estimating SWE over the study area. Snow density was then calculated from snow depth and SWE estimates. Snow depth peaked in March, SWE shortly after in early April, and snow density at the end of April. The ensemble-based approach had encouraging success with upscaling snow depth and SWE. Associations were also identified with carbon- and mineral-based forest surface soils, alongside dry and wet peatbogs.

Received: 15 Dec 2024 – Discussion started: 23 Jan 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2648 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2648 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

24 Nov 2025

Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankylä, Finland

David Brodylo, Lauren V. Bosche, Ryan R. Busby, Elias J. Deeb, Thomas A. Douglas, and Juha Lemmetyinen

The Cryosphere, 19, 6127–6148, https://doi.org/10.5194/tc-19-6127-2025,https://doi.org/10.5194/tc-19-6127-2025, 2025

Short summary

David Brodylo, Lauren V. Bosche, Ryan R. Busby, Elias J. Deeb, Thomas A. Douglas, and Juha Lemmetyinen

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3936', Anonymous Referee #1, 26 Mar 2025
Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankylä, Finland
egusphere-2024-3936
March 2025

General Comments:
Brodylo et al.’s manuscript is well-written, structured clearly, and supported by strong graphical presentation, providing a straightforward exploration into snow depth and snow water equivalent (SWE) estimation using an ensemble machine learning approach. The integration of LiDAR, remote sensing imagery, and in-situ observations is logical and aligns well with the type of studies frequently published in this journal. However, I have several significant concerns regarding the novelty of the approach, methodological clarity, and the limited sample size—particularly for SWE estimation—that need to be thoroughly addressed before the paper can be considered for publication. I have outlined these major concerns, along with specific suggestions for improvement, in detail below.

Major Comments:

1. Currently, the paper's primary novel contributions are unclear to me. While the presented approach effectively integrates established practices (ensemble machine learning methods, LiDAR-based snow depth estimation), the methodological novelty seems incremental and primarily focused on application in the specific context of Sodankylä, Finland. Intuitively, an ensemble approach should outperform individual techniques; however, given the limited sample size—especially with SWE data (only around a dozen observations)—it becomes challenging to conclusively demonstrate superiority over simpler, more traditional methods such as multiple linear regression. Indeed, as highlighted in Table 3, some machine learning models significantly underperform in certain months, likely due to this limited dataset. Thus, at present, the main takeaways and broader scientific significance are somewhat ambiguous. I encourage the authors to clearly articulate the core contributions of their approach, considering the constraints posed by dataset size. If a stronger case for novelty can be made, particularly in comparison to simpler or previously established methods, this would greatly strengthen the manuscript, as I am currently unsure of the main takeaways.
2. Further clarity is needed regarding the training and validation processes for the machine learning models. The authors briefly mention using a "k-fold" validation but do not clearly specify how the data was partitioned into training, validation, and test sets at each step. Important details are missing, such as whether splits were random or sequential—random splits could inadvertently introduce spatial autocorrelation issues. Additionally, specifics on the machine learning implementations are essential. For instance, how deep were the random forest trees allowed to grow? What structure was adopted for training the multi-layer perceptron—including the number of hidden layers, neurons per layer, activation functions, epochs, and optimization methods? Providing visualizations of training and validation curves for MLP models would also help clarify the model training and generalization processes. These details are crucial for reproducibility and fully understanding the robustness of the results.
3. Given the inherently spatial nature of snow depth and SWE, I'm curious if the authors considered employing machine learning methods specifically designed to leverage spatial dependencies in data. The current choice of models—MLR, RF, and MLP—generally treats each data point independently, potentially losing valuable spatial context unless explicitly provided as an input feature. Models that explicitly capture spatial information (e.g., convolutional neural networks like U-Nets, or vision transformer approaches) could better represent the spatial variability across diverse land types. Exploring spatially-aware methods, despite your current dataset limitations, could significantly increase the novelty and impact of your study.
4. Finally, I also feel that this paper would really benefit from a more comprehensive comparison to existing approaches in the literature. Although your method is LiDAR-derived, related studies by Bair et al. (2018), King et al. (2020), Liljestrand et al. (2024), Shao et al. (2022), and Vafakhah et al. (2022) (amongst others) have utilized similar ML methodologies (RF and neural-network-based architectures) to predict regional variations of SWE. A clearer positioning of your work in relation to these papers would not only help justify the novelty of your method but also allow readers to better appreciate your contributions relative to the current state-of-the-art approaches. Such contextualization could also probably help address some of the concerns I raise in Comment 1 regarding methodological novelty.

Minor Comments:
Lines 89: With all the different datasets being used here, I wonder if a summary table listing their names, variables, resolution, and source would help better situate readers?

Lines 162-163: It wasn’t totally clear to me what this RF classification scheme was referring to here? Why is this step necessary?

Section 3.1: I also don’t fully understand this image segmentation step and how it is “utilized as the spatial unit for image assessment”. Why does this need to be done for this project, and how are the resulting segments used in the models afterwards?

Lines 189-192: I think this section is important, and I would add a little more detail describing each of these models and how they’ve been used in other studies, as they really underpin your main results. For instance, I’d mention bootstrapping and aggregation in the RF, and I would rework your description of the ANN (as the linkage to the human nervous system is somewhat spurious) and not a clear description of how it actually works (i.e., a feedforward directed acyclic graph connected with artificial neurons with nonlinear activation functions)

Lines 203-204: Do you know why the SVM performance so poor? I’m wondering if the the sample was simply too small for this approach? This goes back to my earlier major point that the same issue with the limited SWE data is also likely impacting the other models. However, it does feel a bit odd to me to just choose to not include a model in some cases due to poor performance when using an ensemble approach

Eqs. 1/2/3: This is personal preference but these are all very common metrics that don’t need to be explicitly defined in this work

Lines 258-260: From a physical perspective, what do you think is causing this large swing in performance for the ANN over these months? Is there something about the onset snow in December that makes this an especially challenging task for the NN?

Table 1: For this table and the others after, I am wondering if this would be more interpretable as a bar graph? Comparing so many numbers in a table like this can bit a bit challenging

Table 2: Similar to my previous table comment

Figure 5: The red->green color scheme for snow depth can be challenging to view for color blind individuals, and I would recommend moving to something more accessible

Lines 318-319: Was the SVM left out because it had bad performance everywhere for SWE? As you state, the RF was also inconsistent for SWE prediction, but was still included in this part of the analysis

Lines 344-362: I appreciate the detail the authors put into comparing SWE over various land cover types, however this section (and other similar paragraphs) are a bit challenging to parse in their current form. Currently, you list many statistics in a row, and it isn’t fully clear to me what I am to take from all of these stats? I wonder if you could restructure these paragraphs to highlight the most important findings and relate those to what the predictive accuracy means for each land cover type?

Lines 428-429: When referring to EA here, it sounds as if it is it’s own technique, but really it is just a combination of the MLR/RF/MLP. And this enhanced performance in the EA is because of high variability in individual models with biases which mostly cancel out resulting in a more stable prediction. So is this section speaking primarily to the high variability of individual models?

Line 430: I would reword this sentence “EA consistently produced the best or second best metrics, and generally produced the best metrics”

Lines 471-475: Could you have included reanalysis estimates from say ERA5 to provide temperature, humidity and pressure data to your models? While coarse, this would perhaps give you some additional information about the surrounding environmental context at the time of observation?

Lines 501-502: I would strongly recommend including some code for reproducing at least a subset of these results, perhaps in an interactive notebook uploaded to Google Colab with some test data? Then others could more easily test and build on what you have provided here

References
Bair, E. H., Abreu Calfa, A., Rittger, K., & Dozier, J. (2018). Using machine learning for real-time estimates of snow water equivalent in the watersheds of Afghanistan. The Cryosphere, 12(5), 1579–1594. https://doi.org/10.5194/tc-12-1579-2018
King, F., Erler, A. R., Frey, S. K., & Fletcher, C. G. (2020). Application of machine learning techniques for regional bias correction of snow water equivalent estimates in Ontario, Canada. Hydrology and Earth System Sciences, 24(10), 4887–4902. https://doi.org/10.5194/hess-24-4887-2020
Liljestrand, D., Johnson, R., Skiles, S. M., Burian, S., & Christensen, J. (2024). Quantifying regional variability of machine-learning-based snow water equivalent estimates across the Western United States. Environmental Modelling & Software, 177, 106053. https://doi.org/10.1016/j.envsoft.2024.106053
Shao, D., Li, H., Wang, J., Hao, X., Che, T., & Ji, W. (2022). Reconstruction of a daily gridded snow water equivalent product for the land region above 45° N based on a ridge regression machine learning approach. Earth System Science Data, 14(2), 795–809. https://doi.org/10.5194/essd-14-795-2022
Vafakhah, M., Nasiri Khiavi, A., Janizadeh, S., & Ganjkhanlo, H. (2022). Evaluating different machine learning algorithms for snow water equivalent prediction. Earth Science Informatics, 15(4), 2431–2445. https://doi.org/10.1007/s12145-022-00846-z
Citation: https://doi.org/10.5194/egusphere-2024-3936-RC1
- AC1: 'Reply on RC1', David Brodylo, 30 May 2025
  
  Response is attached.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3936-AC1
RC2:
'Comment on egusphere-2024-3936', Anonymous Referee #2, 24 Apr 2025
The paper “Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankylä, Finland,” authored by Brodylo et al., investigates the use of four machine learning techniques and their ensemble for snow depth estimation. The estimated snow depths were then used to estimate SWE. Finally, the ratio of the modeled SWE to snow depths was taken to estimate snow density. In my estimation, the paper is well written. However, I have major comments regarding the methodological clarity.

In section 3.2, the authors mentioned using Artificial Neural Networks (ANNs), among other models. However, they did not mention the exact architecture of the ANN (e.g., feed-forward, convolutional, transformers, etc.) used. Without this information, it is difficult to evaluate the appropriateness of the ANN architecture used in the study.

In section 3.2, the details of the hyperparameters of the ML models (SVM, RF, and ANN) used were not mentioned. For example, for ANN, in addition to the architecture type, it would be beneficial to add the number of layers and neurons per layer, the activation function used, regularization (if any), the number of epochs, and other important hyperparameters used. For SVM, the kernel used, gamma, tolerance, and other important hyperparameters should be specified. For RF, the number of trees, the maximum depth, the minimum number of samples required to be at a leaf node, the minimum number of samples required to split an internal node, and other important hyperparameters should be specified. These details are essential for reproducibility.

Also, in section 3.2, the authors mentioned using 10-fold cross-validation. However, important details are missing.
Was the 10-fold CV done on the entire dataset or just the training set?

No details about the train/test split ratio and strategy (random, stratified, etc) were mentioned.

During the CV, how were hyperparameter configurations selected? Was it a grid search or Bayesian? A table of the hyperparameters tuned and their optimal values can be placed in the appendix.

In section 3.3, the authors used Pearson’s correlation as a measure of prediction accuracy. However, a perfect correlation does not necessarily mean that the model is good or that the predicted values are close to the true values. For example, cor(y, y) = cor(y, 20y) = cor(y, 300y) = cor(y, 10000y) = 1. That is to say, a model could be doing significantly worse and still have a perfect correlation. I encourage the authors to use the coefficient of determination instead. Please do not square the correlation coefficient; you can use r2_score in sklearn (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html) or see this link for the formula (https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score).

This study uses 13 points of SWE and 88 points of depths to train the ML models. This is an extremely limited sample size for training any machine learning model, especially when trying to predict across 37,917 image objects with varying characteristics. This raises a serious concern about overfitting. With such a small training set, for example, for the SWE estimation problem, there's a high risk that the model would simply memorize the patterns in those 13 objects rather than learning generalizable relationships. Therefore, the authors should comment on how to validate the SWE across the upscaled 10 km². How did the authors ensure that the model wasn't overfitting for the SWE estimates? These points should be added to the discussion.

Line 204: The model weights should use another metric since correlation is not reliable based on comment 4. Also, I think adding the weighting formula would be helpful to readers.

Line 203: SVM was dropped due to poor performance. Could you please quantify "poor" in this scenario?

Figure 3: One might think field snow depth and field swe are inputs. The authors should clarify in the caption that they are the outcome variables, not the input. Or they could represent output data with a different color.

Tables 1-4: Were these metrics obtained from the entire dataset or just the testing set?

The authors should comment on the transferability of the ML models in this study. Can we grab this model and apply it elsewhere? The authors could dedicate a paragraph to model transferability in the discussion.

Line 167: A period is missing between "scale" and "In OBIA".
Citation: https://doi.org/10.5194/egusphere-2024-3936-RC2
- AC2: 'Reply on RC2', David Brodylo, 30 May 2025
  
  Response is attached.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3936-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3936', Anonymous Referee #1, 26 Mar 2025
Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankylä, Finland
egusphere-2024-3936
March 2025

General Comments:
Brodylo et al.’s manuscript is well-written, structured clearly, and supported by strong graphical presentation, providing a straightforward exploration into snow depth and snow water equivalent (SWE) estimation using an ensemble machine learning approach. The integration of LiDAR, remote sensing imagery, and in-situ observations is logical and aligns well with the type of studies frequently published in this journal. However, I have several significant concerns regarding the novelty of the approach, methodological clarity, and the limited sample size—particularly for SWE estimation—that need to be thoroughly addressed before the paper can be considered for publication. I have outlined these major concerns, along with specific suggestions for improvement, in detail below.

Major Comments:

1. Currently, the paper's primary novel contributions are unclear to me. While the presented approach effectively integrates established practices (ensemble machine learning methods, LiDAR-based snow depth estimation), the methodological novelty seems incremental and primarily focused on application in the specific context of Sodankylä, Finland. Intuitively, an ensemble approach should outperform individual techniques; however, given the limited sample size—especially with SWE data (only around a dozen observations)—it becomes challenging to conclusively demonstrate superiority over simpler, more traditional methods such as multiple linear regression. Indeed, as highlighted in Table 3, some machine learning models significantly underperform in certain months, likely due to this limited dataset. Thus, at present, the main takeaways and broader scientific significance are somewhat ambiguous. I encourage the authors to clearly articulate the core contributions of their approach, considering the constraints posed by dataset size. If a stronger case for novelty can be made, particularly in comparison to simpler or previously established methods, this would greatly strengthen the manuscript, as I am currently unsure of the main takeaways.
2. Further clarity is needed regarding the training and validation processes for the machine learning models. The authors briefly mention using a "k-fold" validation but do not clearly specify how the data was partitioned into training, validation, and test sets at each step. Important details are missing, such as whether splits were random or sequential—random splits could inadvertently introduce spatial autocorrelation issues. Additionally, specifics on the machine learning implementations are essential. For instance, how deep were the random forest trees allowed to grow? What structure was adopted for training the multi-layer perceptron—including the number of hidden layers, neurons per layer, activation functions, epochs, and optimization methods? Providing visualizations of training and validation curves for MLP models would also help clarify the model training and generalization processes. These details are crucial for reproducibility and fully understanding the robustness of the results.
3. Given the inherently spatial nature of snow depth and SWE, I'm curious if the authors considered employing machine learning methods specifically designed to leverage spatial dependencies in data. The current choice of models—MLR, RF, and MLP—generally treats each data point independently, potentially losing valuable spatial context unless explicitly provided as an input feature. Models that explicitly capture spatial information (e.g., convolutional neural networks like U-Nets, or vision transformer approaches) could better represent the spatial variability across diverse land types. Exploring spatially-aware methods, despite your current dataset limitations, could significantly increase the novelty and impact of your study.
4. Finally, I also feel that this paper would really benefit from a more comprehensive comparison to existing approaches in the literature. Although your method is LiDAR-derived, related studies by Bair et al. (2018), King et al. (2020), Liljestrand et al. (2024), Shao et al. (2022), and Vafakhah et al. (2022) (amongst others) have utilized similar ML methodologies (RF and neural-network-based architectures) to predict regional variations of SWE. A clearer positioning of your work in relation to these papers would not only help justify the novelty of your method but also allow readers to better appreciate your contributions relative to the current state-of-the-art approaches. Such contextualization could also probably help address some of the concerns I raise in Comment 1 regarding methodological novelty.

Minor Comments:
Lines 89: With all the different datasets being used here, I wonder if a summary table listing their names, variables, resolution, and source would help better situate readers?

Lines 162-163: It wasn’t totally clear to me what this RF classification scheme was referring to here? Why is this step necessary?

Section 3.1: I also don’t fully understand this image segmentation step and how it is “utilized as the spatial unit for image assessment”. Why does this need to be done for this project, and how are the resulting segments used in the models afterwards?

Lines 189-192: I think this section is important, and I would add a little more detail describing each of these models and how they’ve been used in other studies, as they really underpin your main results. For instance, I’d mention bootstrapping and aggregation in the RF, and I would rework your description of the ANN (as the linkage to the human nervous system is somewhat spurious) and not a clear description of how it actually works (i.e., a feedforward directed acyclic graph connected with artificial neurons with nonlinear activation functions)

Lines 203-204: Do you know why the SVM performance so poor? I’m wondering if the the sample was simply too small for this approach? This goes back to my earlier major point that the same issue with the limited SWE data is also likely impacting the other models. However, it does feel a bit odd to me to just choose to not include a model in some cases due to poor performance when using an ensemble approach

Eqs. 1/2/3: This is personal preference but these are all very common metrics that don’t need to be explicitly defined in this work

Lines 258-260: From a physical perspective, what do you think is causing this large swing in performance for the ANN over these months? Is there something about the onset snow in December that makes this an especially challenging task for the NN?

Table 1: For this table and the others after, I am wondering if this would be more interpretable as a bar graph? Comparing so many numbers in a table like this can bit a bit challenging

Table 2: Similar to my previous table comment

Figure 5: The red->green color scheme for snow depth can be challenging to view for color blind individuals, and I would recommend moving to something more accessible

Lines 318-319: Was the SVM left out because it had bad performance everywhere for SWE? As you state, the RF was also inconsistent for SWE prediction, but was still included in this part of the analysis

Lines 344-362: I appreciate the detail the authors put into comparing SWE over various land cover types, however this section (and other similar paragraphs) are a bit challenging to parse in their current form. Currently, you list many statistics in a row, and it isn’t fully clear to me what I am to take from all of these stats? I wonder if you could restructure these paragraphs to highlight the most important findings and relate those to what the predictive accuracy means for each land cover type?

Lines 428-429: When referring to EA here, it sounds as if it is it’s own technique, but really it is just a combination of the MLR/RF/MLP. And this enhanced performance in the EA is because of high variability in individual models with biases which mostly cancel out resulting in a more stable prediction. So is this section speaking primarily to the high variability of individual models?

Line 430: I would reword this sentence “EA consistently produced the best or second best metrics, and generally produced the best metrics”

Lines 471-475: Could you have included reanalysis estimates from say ERA5 to provide temperature, humidity and pressure data to your models? While coarse, this would perhaps give you some additional information about the surrounding environmental context at the time of observation?

Lines 501-502: I would strongly recommend including some code for reproducing at least a subset of these results, perhaps in an interactive notebook uploaded to Google Colab with some test data? Then others could more easily test and build on what you have provided here

References
Bair, E. H., Abreu Calfa, A., Rittger, K., & Dozier, J. (2018). Using machine learning for real-time estimates of snow water equivalent in the watersheds of Afghanistan. The Cryosphere, 12(5), 1579–1594. https://doi.org/10.5194/tc-12-1579-2018
King, F., Erler, A. R., Frey, S. K., & Fletcher, C. G. (2020). Application of machine learning techniques for regional bias correction of snow water equivalent estimates in Ontario, Canada. Hydrology and Earth System Sciences, 24(10), 4887–4902. https://doi.org/10.5194/hess-24-4887-2020
Liljestrand, D., Johnson, R., Skiles, S. M., Burian, S., & Christensen, J. (2024). Quantifying regional variability of machine-learning-based snow water equivalent estimates across the Western United States. Environmental Modelling & Software, 177, 106053. https://doi.org/10.1016/j.envsoft.2024.106053
Shao, D., Li, H., Wang, J., Hao, X., Che, T., & Ji, W. (2022). Reconstruction of a daily gridded snow water equivalent product for the land region above 45° N based on a ridge regression machine learning approach. Earth System Science Data, 14(2), 795–809. https://doi.org/10.5194/essd-14-795-2022
Vafakhah, M., Nasiri Khiavi, A., Janizadeh, S., & Ganjkhanlo, H. (2022). Evaluating different machine learning algorithms for snow water equivalent prediction. Earth Science Informatics, 15(4), 2431–2445. https://doi.org/10.1007/s12145-022-00846-z
Citation: https://doi.org/10.5194/egusphere-2024-3936-RC1
- AC1: 'Reply on RC1', David Brodylo, 30 May 2025
  
  Response is attached.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3936-AC1
RC2:
'Comment on egusphere-2024-3936', Anonymous Referee #2, 24 Apr 2025
The paper “Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankylä, Finland,” authored by Brodylo et al., investigates the use of four machine learning techniques and their ensemble for snow depth estimation. The estimated snow depths were then used to estimate SWE. Finally, the ratio of the modeled SWE to snow depths was taken to estimate snow density. In my estimation, the paper is well written. However, I have major comments regarding the methodological clarity.

In section 3.2, the authors mentioned using Artificial Neural Networks (ANNs), among other models. However, they did not mention the exact architecture of the ANN (e.g., feed-forward, convolutional, transformers, etc.) used. Without this information, it is difficult to evaluate the appropriateness of the ANN architecture used in the study.

In section 3.2, the details of the hyperparameters of the ML models (SVM, RF, and ANN) used were not mentioned. For example, for ANN, in addition to the architecture type, it would be beneficial to add the number of layers and neurons per layer, the activation function used, regularization (if any), the number of epochs, and other important hyperparameters used. For SVM, the kernel used, gamma, tolerance, and other important hyperparameters should be specified. For RF, the number of trees, the maximum depth, the minimum number of samples required to be at a leaf node, the minimum number of samples required to split an internal node, and other important hyperparameters should be specified. These details are essential for reproducibility.

Also, in section 3.2, the authors mentioned using 10-fold cross-validation. However, important details are missing.
Was the 10-fold CV done on the entire dataset or just the training set?

No details about the train/test split ratio and strategy (random, stratified, etc) were mentioned.

During the CV, how were hyperparameter configurations selected? Was it a grid search or Bayesian? A table of the hyperparameters tuned and their optimal values can be placed in the appendix.

In section 3.3, the authors used Pearson’s correlation as a measure of prediction accuracy. However, a perfect correlation does not necessarily mean that the model is good or that the predicted values are close to the true values. For example, cor(y, y) = cor(y, 20y) = cor(y, 300y) = cor(y, 10000y) = 1. That is to say, a model could be doing significantly worse and still have a perfect correlation. I encourage the authors to use the coefficient of determination instead. Please do not square the correlation coefficient; you can use r2_score in sklearn (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html) or see this link for the formula (https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score).

This study uses 13 points of SWE and 88 points of depths to train the ML models. This is an extremely limited sample size for training any machine learning model, especially when trying to predict across 37,917 image objects with varying characteristics. This raises a serious concern about overfitting. With such a small training set, for example, for the SWE estimation problem, there's a high risk that the model would simply memorize the patterns in those 13 objects rather than learning generalizable relationships. Therefore, the authors should comment on how to validate the SWE across the upscaled 10 km². How did the authors ensure that the model wasn't overfitting for the SWE estimates? These points should be added to the discussion.

Line 204: The model weights should use another metric since correlation is not reliable based on comment 4. Also, I think adding the weighting formula would be helpful to readers.

Line 203: SVM was dropped due to poor performance. Could you please quantify "poor" in this scenario?

Figure 3: One might think field snow depth and field swe are inputs. The authors should clarify in the caption that they are the outcome variables, not the input. Or they could represent output data with a different color.

Tables 1-4: Were these metrics obtained from the entire dataset or just the testing set?

The authors should comment on the transferability of the ML models in this study. Can we grab this model and apply it elsewhere? The authors could dedicate a paragraph to model transferability in the discussion.

Line 167: A period is missing between "scale" and "In OBIA".
Citation: https://doi.org/10.5194/egusphere-2024-3936-RC2
- AC2: 'Reply on RC2', David Brodylo, 30 May 2025
  
  Response is attached.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3936-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (02 Jun 2025) by Nora Helbig

AR by David Brodylo on behalf of the Authors (15 Jul 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (16 Jul 2025) by Nora Helbig

RR by Anonymous Referee #1 (29 Jul 2025)

RR by Anonymous Referee #2 (13 Aug 2025)

ED: Publish subject to revisions (further review by editor and referees) (14 Aug 2025) by Nora Helbig

AR by David Brodylo on behalf of the Authors (04 Oct 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (07 Oct 2025) by Nora Helbig

AR by David Brodylo on behalf of the Authors (10 Oct 2025) Manuscript

Journal article(s) based on this preprint

24 Nov 2025

Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankylä, Finland

David Brodylo, Lauren V. Bosche, Ryan R. Busby, Elias J. Deeb, Thomas A. Douglas, and Juha Lemmetyinen

The Cryosphere, 19, 6127–6148, https://doi.org/10.5194/tc-19-6127-2025,https://doi.org/10.5194/tc-19-6127-2025, 2025

Short summary

David Brodylo, Lauren V. Bosche, Ryan R. Busby, Elias J. Deeb, Thomas A. Douglas, and Juha Lemmetyinen

Viewed

Total article views: 2,875 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,310	477	88	2,875	124	136

HTML: 2,310
PDF: 477
XML: 88
Total: 2,875
BibTeX: 124
EndNote: 136

Views and downloads (calculated since 23 Jan 2025)

Month	HTML	PDF	XML	Total
Jan 2025	130	18	8	156
Feb 2025	64	22	4	90
Mar 2025	86	20	2	108
Apr 2025	70	22	4	96
May 2025	76	20	8	104
Jun 2025	52	12	10	74
Jul 2025	36	16	2	54
Aug 2025	256	8	0	264
Sep 2025	1,008	16	4	1,028
Oct 2025	76	26	6	108
Nov 2025	128	40	8	176
Dec 2025	62	58	8	128
Jan 2026	80	40	12	132
Feb 2026	28	36	2	66
Mar 2026	74	72	2	148
Apr 2026	27	24	3	54
May 2026	46	21	4	71
Jun 2026	11	6	1	18
Jul 2026	0

Cumulative views and downloads (calculated since 23 Jan 2025)

Month	HTML	PDF	XML	Total
Jan 2025	130	18	8	156
Feb 2025	64	22	4	90
Mar 2025	86	20	2	108
Apr 2025	70	22	4	96
May 2025	76	20	8	104
Jun 2025	52	12	10	74
Jul 2025	36	16	2	54
Aug 2025	256	8	0	264
Sep 2025	1,008	16	4	1,028
Oct 2025	76	26	6	108
Nov 2025	128	40	8	176
Dec 2025	62	58	8	128
Jan 2026	80	40	12	132
Feb 2026	28	36	2	66
Mar 2026	74	72	2	148
Apr 2026	27	24	3	54
May 2026	46	21	4	71
Jun 2026	11	6	1	18
Jul 2026	0

Viewed (geographical distribution)

Total article views: 2,874 (including HTML, PDF, and XML) Thereof 2,874 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2648 KB)
Metadata XML

Short summary

We combined field-based snow depth and snow water equivalent (SWE) measurements, remote sensing data, and machine learning to estimate snow depth and SWE over a 10 km² local scale area in Sodankylä, Finland. Associations were found for snow depth and SWE with carbon- and mineral-based forest surface soils, alongside dry and wet peatbogs. This approach to upscale field-based snow depth and SWE measurements to a local scale can be used in regions that regularly experience snowfall.

Object-based ensemble estimation of snow depth and snow water equivalent over multiple months in Sodankylä, Finland

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)


Total:	0
HTML:	0
PDF:	0
XML:	0