the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evolving beyond collapse: An adaptive particle batch smoother for cryospheric data assimilation
Abstract. We present a new adaptive particle-based data assimilation scheme for cryospheric applications that leverages promising developments in importance sampling. Beyond our cryospheric focus, the scheme has the potential to be applied directly to the closely related fields of land surface and hydrological data assimilation as well as more general geoscientific Bayesian inference problems. The proposed approach seeks to combine some of the advantages of two widely used classes of schemes: particle methods and iterative ensemble Kalman methods. Specifically, it extends the Particle Batch Smoother (PBS) that is commonly used in cryospheric data assimilation, with the Adaptive Multiple Importance Sampling algorithm. This adaptive formulation transforms the PBS into an iterative scheme with improved resilience against ensemble collapse and the ability to implement early-stopping strategies. As such, computational cost is automatically adapted to the complexity of the problem at hand, even down to the grid-cell and water year level in distributed multiyear simulations.
In homage to the schemes that it builds on, we coin this new algorithm the Adaptive Particle Batch Smoother (AdaPBS) and we test it across a range of scenarios. First, we conducted an intercomparison of some of the most commonly used cryospheric data assimilation algorithms using Markov Chain Monte Carlo (MCMC) simulation as a costly gold-standard benchmark in a simplified temperature index model assimilating snow depth observations. We further evaluated AdaPBS by assimilating snow depth observations from the ESMSnowMIP project at 6 different sites spanning 3 continents, using an ensemble of simulations generated with the more complex Flexible Snow Model (FSM2). Our results demonstrate that AdaPBS is a robust and reliable tool, outperforming or at least matching the performance of other commonly used algorithms and successfully handling complex cases with dense observational datasets. All experiments were carried out using the open-source Multiple Snow Data Assimilation System (MuSA) toolbox, which now includes AdaPBS and MCMC among the growing list of available cryospheric data assimilation methods.
Status: open (until 18 Jun 2026)
- EC1: 'Development and technical papers on GMD', Fabien Maussion, 23 Apr 2026 reply
-
RC1: 'Comment on egusphere-2026-831', Steven Margulis, 21 May 2026
reply
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-831/egusphere-2026-831-RC1-supplement.pdf
-
RC2: 'Comment on egusphere-2026-831', Richard L.H. Essery, 30 May 2026
reply
I am very enthusiastic about this paper, having experienced PBS ensemble collapse in assimilation of frequent and over-confident observations myself, and it has potential applications beyond cryospheric data assimilation. My comments are mostly more in the nature of discussion than requirements for revisions.
The broad review of assimilation methods relevant to the subject is interesting, but could possibly be shortened, and it is a little repetitive in places (e.g. MCMC is referred to as “gold standard” 15 times). There is much less in general terms on the nature of filter collapse that is central to this paper, but the simple illustration in Figure 2 could provide insight. I think that the PBS mean is suboptimal because the necessarily broad prior does not efficiently sample the narrow posterior, and the degeneracy is because of the prohibitive cost in Equation 4 for all except the nearest particle.
Isn’t the mode of the lognormal distribution for the precipitation multiplier in 3.2 equal to 0.86, i.e. not centred on being unbiased? It won’t make much difference, but why is a logit-normal distribution chosen instead for perturbing other strictly positive forcing variables in 3.3 (it would make sense for relative humidity, but that is not listed as a perturbed variable)?
The banana-shaped posterior distribution in Figure 2 is the equifinality which is familiar (because easily visualized) in calibration of temperature index models. A hydrologist seeking to calibrate a 2-parameter model would not consider this challenging at all; a better solution than the degenerate PBS in Figure 2 could be found efficiently by any number of standard methods.
There are only a few snow depth measurements in Figure 2, so I guess that the ensemble collapse is due to the observations having high confidence. The assumed observation error should be stated in the text and shown on the figures.
Is the KDE in Figure 6 misleading? The PBS posterior approximation appears to be too wide, but it should actually be too narrow.
In situ meteorological data are available for all of the ESM-SnowMIP sites; is ERA5 used instead for generality? ERA5 grid elevations without downscaling can be expected to be considerably lower than most of the site elevations (Menard at al. 2019, Figure 9). The final version of Essery et al. (2024) is https://doi.org/10.5194/gmd-18-3583-2025. Of the parameters listed in Table 1, rgr0 and rhow will have no influence on snow depth simulations; this could be apparent from the posterior parameters. Can MuSA diagnose parameter importance?
It would be easy to add PBS RMSE and bias to Table 3, allowing the reader to judge for themselves how badly it is struggling. There will be enormous redundancy in hourly snow depth measurements because of autocorrelation; would decimated or averaged observations be less challenging?
Minor errors:
p3
“Ensembe Kalman”p4-5
“comprehensive texts … for a comprehensive”p12
“that we can easy generate”p13
”samples form the target”p15
“each iterations proposals density” (missing apostrophe)p19
No reference in the text to Figure 1 (could be related to the steps in 2.6.1).p20
“cryospheric sciene”p22
“The idea being …” is an incomplete sentence as written.p23
“FMSM2”
“7 perturbation parameters to correct the forcing” – only six are listed.p26
The MCMC distribution in Figure 2 should have a colour scale.p29
Figure 5 would be clearer if ESS in the legend was changed to N_eff as in the caption.
Rather than ending up selecting a threshold value, you must have started out by doing so if it is preselected (p16).Citation: https://doi.org/10.5194/egusphere-2026-831-RC2
Model code and software
MuSA: The Multiple Snow Assimilation system Authors/Creators Esteban Alonso-González and Kristoffer Aalstad https://zenodo.org/records/17292981
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 249 | 0 | 5 | 254 | 0 | 0 |
- HTML: 249
- PDF: 0
- XML: 5
- Total: 254
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
Thank you for your submission to GMD and thank you for your patience with us. Since this manuscript was submitted in preprint form from another source I didn't want to delay the review process further, but should the paper be accepted for final publication in GMD, there are a few editorial rules to follow:
Name the model version and name in the paper. I think your contribution falls into this category:
If the model development relates to a single model then the model name and the version number must be included in the title of the paper. If the main intention of an article is to make a general (i.e. model independent) statement about the usefulness of a new development, but the usefulness is shown with the help of one specific model, the model name and version number must be stated in the title. The title could have a form such as, "Title outlining amazing generic advance: a case study with Model XXX (version Y)".
Source: https://www.geoscientific-model-development.net/about/manuscript_types.html#item2
Furthermore, GMD's code and data policy not only requires the tool or model's code to be shared (which you have), but also the scripts, data and configuration files which have been used to generate the paper's figures and tables (https://www.geoscientific-model-development.net/policies/code_and_data_policy.html). Unless I'm mistaken, this code was not part of the data availability section. Could you please reply to this comment with a link and DOI to a separate repository sharing the analysis scripts?
Best wishes,
Fabien Maussion