the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Learning to filter: Snow data assimilation using a Long Short-Term Memory network
Abstract. Trustworthy estimates of snow water equivalent and snow depth are essential for water resource management in snow-dominated regions. While ensemble-based data assimilation techniques, such as the Ensemble Kalman Filter (EnKF), are commonly used in this context to combine model predictions with observations therefore to improve model performance, these ensemble methods are computationally demanding and thus face significant challenges when integrated into time-sensitive operational workflows. To address this challenge, we present a novel approach for data assimilation in snow hydrology by utilizing Long Short-Term Memory (LSTM) networks. By leveraging data from 7 diverse study sites across the world to train the algorithm on the output of an EnKF, the proposed framework aims to further unlock the use of data assimilation in snow hydrology by balancing computational efficiency and complexity.
We found that a LSTM-based data assimilation framework achieves comparable performance to state estimation based on an EnKF in improving open-loop estimates with only a small performance drop in terms of RMSE for snow water equivalent (+ 6 mm on average) and snow depth (+ 6 cm), respectively. All but 2 out of 14 LSTM site specific configurations improved on the Open Loop estimates. The inclusion of a memory component further enhanced LSTM stability and performance, particularly in situations of data sparsity. When trained on long datasets (25 years), this LSTM data assimilation approach also showed promising spatial transferability, with less than a 20 % reduction in accuracy for snow water equivalent and snow depth estimation.
Once trained, the framework is computationally efficient, achieving a 70 % reduction in computational time compared to a parallelized EnKF. Training this new data assimilation approach on data from multiple sites showed that its performance is robust across various climate regimes, during dry and average water-year types, with only a limited drop in performance compared to the EnKF (+6 mm RMSE for SWE and +18 cm RMSE for snow depth). This work paves the way for the use of deep learning for data assimilation in snow hydrology and provides novel insights into efficient, scalable, and less computationally demanding modeling framework for operational applications.
Competing interests: one author is an editor of The Cryosphere
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(19896 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 02 Apr 2025)
-
RC1: 'Comment on egusphere-2025-423', Anonymous Referee #1, 13 Mar 2025
reply
This work developed a surrogate for EnKF-DA using an LSTM network. The introduction and methods sections are well written and structured. However, there are several errors in the results that are inconsistent with the plots. More importantly, the results lack sufficient explanation and analysis regarding why the LSTM performs differently from EnKF at different sites or scenarios. The discussion could benefit from additional comparisons with previous studies and a deeper analysis of the results. Currently, it leans more toward reinforcing the need for LSTM in data assimilation, which somewhat repeats points already made in the introduction. Therefore, I recommend a major revision before publication.
Line 103-105: What is the source of the meteorological forcing data? Are they derived from gridded datasets?
Table 2: The data time span for each site needs to be mentioned.
Line 171: Forecasted model state is x_k^f
Line 254: Double “the”
Line 271: “predictions”
Line 277-278: Please clarify how the data were split: by individual data points or by continuous time spans?
Line 276: Please clarify what are site-specific limits here
Line 280: The inline formula here should not include 'star,' as 'star' was previously used to represent the LSTM output, not the input from S3M. Please keep consistent.
Line 288-290: Please use a formula to clarify this configuration. Do you mean that x^f and forcing at both time steps k and k-1 are used as LSTM inputs in the second test? Please refer to Figure 2 for clarity.
Line 292-294: This part is confusing. What is the difference between Configuration 1 and Configuration 3? Was a single LSTM selected from Configuration 1 and then applied to other sites? Please clarify.
Line 299-300: Is there a specific reason to randomly sample water years for data splitting rather than using a continuous historical time span to train the model and a continuous future time span to test it? Random sampling can create artificially easier test conditions by allowing test data (time period) to fall between training water years, which may provide the model with indirect information about future conditions.
LSTM structure and hyperparameters were not mentioned in this work.
Line 309-311 (Figure 3): Is this result from testing or operational testing? Please clarify
Line 313-314: It is somewhat difficult to distinguish the EnKF-DA and LSTM boxes in the plots. If the last box in each panel represents LSTM-DA, it suggests that the RMSE values of LSTM-DA for KHT, RME, and FMI-ARC increased compared to EnKF-DA, with KHT showing the largest increase. This appears inconsistent with the narrative presented here. Please check.
Figure 3&4: The Nash-Sutcliffe coefficient can be used as a score to evaluate the accuracy of the time series in (a)–(d).
Line 321-324: Why is the LSTM trained with outputs (states) from EnKF-DA more sensitive to the sparsity of observation data? Could you explain this here? Including observation data as an input may introduce artificial errors when filling in missing data in the input.
Line 336-337: Only Figure 5b shows improvement with memory component, rather than c and d
Line 342-348: Cite Figure 6 here.
Line 344: 0.5 m? The reduction shown in figure 6f is not that large.
Line 346: These strategies were not mentioned and explained in the method.
Section 3.3: This result does not seem meaningful, as the spatial transferability of all models appears to be poor. Please consider removing it.
Line 370-371: Any explanation for this result?
Section 3.4: Instead of presenting the spatial transferability of a single model, it might be more meaningful to compare and discuss the site-specific LSTM and the multi-site LSTM.
Please refer (this is not my work and no need to cite it.): Kratzert, Frederik, Martin Gauch, Daniel Klotz, and Grey Nearing. "HESS Opinions: Never train a Long Short-Term Memory (LSTM) network on a single basin." Hydrology and Earth System Sciences 28, no. 17 (2024): 4187-4201.
Line 410: No results were shown to support this.
Line 415: 7 sites?
Citation: https://doi.org/10.5194/egusphere-2025-423-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
104 | 41 | 8 | 153 | 7 | 5 |
- HTML: 104
- PDF: 41
- XML: 8
- Total: 153
- BibTeX: 7
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1