Boosting Ensembles for Statistics of Tails at Conditionally Optimal Advance Split Times

Finkel, Justin; O'Gorman, Paul A.

doi:10.48550/arXiv.2507.22310

Preprints

https://doi.org/10.48550/arXiv.2507.22310

Preprints

03 Nov 2025

| 03 Nov 2025

Status: this preprint is open for discussion and under review for Nonlinear Processes in Geophysics (NPG).

Boosting Ensembles for Statistics of Tails at Conditionally Optimal Advance Split Times

Justin Finkel and Paul A. O'Gorman

Abstract. Climate science needs more efficient ways to study high-impact, low-probability extreme events, which are rare by definition and costly to simulate in large numbers. Rare event sampling (RES) and ensemble boosting use small perturbations to turn moderate events into a severe ones, which otherwise might not come for many more simulation-years, and thus enhance sample size. But the viability of this approach hinges on two open questions: (1) are boosted events representative of the yet-unrealized events? (2) How does this depend on the specific form of perturbation, i.e., timing and structure? Timing in particular is crucial for sudden, transient events like precipitation. In this work, we formulate a concrete optimization problem for the advance split time (AST) hyperparameter, and study it on an idealized but physically informative model system: passive tracer fluctuations in a turbulent channel, which captures key elements of midlatitude storm track dynamics. Three major questions guide our investigation: (1) Can RES methods, in particular "ensemble boosting" equipped with a probability estimator and "trying-early adaptive multilevel splitting", accurately and efficiently sample extreme events? (2) What is the optimal AST, and how does it depend on the event definition, in particular the target location and surrounding flow conditions? (3) Can the AST be optimized "online" while running RES?

Our answers support RES as a viable method: (1) RES can meaningfully improve tail estimation, using (2) an optimal AST of 1-3 eddy turnover timescales depending on location. (3) A "thresholded entropy" statistic is a good proxy for AST optimality, bypassing the tedious threshold-setting that often hinders RES methods. Our work clarifies aspects of the response function of transient extreme events to perturbations, giving a guide for designing efficient, reliable sampling strategies.

Received: 18 Oct 2025 – Discussion started: 03 Nov 2025

Justin Finkel and Paul A. O'Gorman

Status: open (extended)

Post a comment Subscribe to comment alert

CC1: 'Comment on egusphere-2025-5092', Moyan Liu, 11 Dec 2025 reply

The paper is interesting and addresses a timely problem: the scarcity of extreme-event data in climate systems and the need for more efficient rare‐event sampling. With the increasing trend and societal impact of extreme events, methods that can better explore tails of the distribution are of clear importance for future research.
The authors aim to identify an optimal Advance Split Time (AST) at which perturbations should be introduced so that rare-event algorithms produce more realistic, diverse, and physically relevant extremes. Instead of relying on traditional threshold-based methods, they develop system-intrinsic indicators that diagnose when perturbations have grown sufficiently to diversify extremes without losing dynamical connection to the original event. They demonstrate this principle first on a simple system and then on a physically meaningful 2-layer quasigeostrophic (QG) model with a passive tracer, illustrating how optimal AST varies with spatial structure, target region, and underlying dynamics.
The study is thoughtfully executed and provides a promising conceptual foundation. I have some comments that may strengthen the manuscript:
1. Computational cost.

The manuscript does not quantify the computational cost of evaluating multiple AST values or generating boosted ensembles. Since computational efficiency is central to the motivation for rare-event sampling, it would be helpful for the authors to comment on the relative cost of their procedure compared with established splitting algorithms such as AMS or TEAMS. Even approximate scaling behavior (e.g., with ensemble size, model resolution) would be informative.
2. Chaotic divergence and event identity.

Because climate dynamics are chaotic, boosted descendants launched too early may drift toward unrelated extreme configurations. The manuscript discusses decorrelation qualitatively but does not describe a mechanism to ensure that boosted samples still represent intensifications of the same physical event as the ancestor. Could the authors clarify whether additional constraints are needed to maintain physical relevance in boosted ensembles?
3. Applicability to full climate models.

The framework is compelling in the idealized QG setting. However, applying entropy-based AST selection and ensemble boosting to operational climate or weather models introduces substantial challenges, including high dimensionality, model biases, observation uncertainty, and the difficulty of maintaining event identity in chaotic flows. Could the authors comment on the main obstacles to such an extension? In particular, do they envision a role for machine learning methods for latent-space reductions or event-type classifiers, which makes the approach computationally feasible in high-dimensional systems?
Overall, the paper provides a valuable contribution and opens an important line of inquiry. Addressing these points would clarify the method’s practical scope and future potential.

Reply

Citation: https://doi.org/10.5194/egusphere-2025-5092-CC1
RC1: 'Comment on egusphere-2025-5092', Anonymous Referee #1, 19 Dec 2025 reply

This manuscript explores methodological aspects of rare event simulation, which is a promising line of research to improve our understanding of extreme events in a broad range of complex systems.

Specifically, the authors study in a systematic manner some important design choices for resampling algorithms applied to deterministic systems such as the timing and structure of the imposed perturbation and their impact on the ability of the algorithm to sample more extreme events.

According to me, the major novelties in the manuscript are the following:

- introducing an estimator for the probability of resampled trajectories in the "ensemble boosting" method, which allows to compute the climatological probability of the rare events under study.

- introducing a different method to generate perturbations at resampling times compared to what was done before in the litterature, constructed by sampling a low-dimensional space. This allows to study in details the relation between the perturbation and the resampled amplitude of the event (called severity in the manuscript).

- introducing optimality criteria to select "advance split times".

To investigate these methodological aspects, the manuscript uses as an example the fluctuations of a tracer concentration in a baroclinically unstable two-layer QG model.
The work presented in this manuscript is very rigorous and it is presented in a precise manner. The problems under study are an important part of methodological developments which have a great potential for wide applications. For this reason, the manuscript should be very useful for researchers wishing to implement rare event algorithms in practice. In my opinion this is very high-quality work.

Perhaps the only general shortcoming I can see is that due to its technical nature, the manuscript is a bit difficult to read, and its conclusions are restricted to methodological aspects. Maybe one way to broaden its potential audience could be to reinforce the physical content, for instance by discussing the potential importance of extreme mixing events in baroclinic turbulence (typical theories diagnose effective diffusivity based only on typical fluctuations). Given the impressive amount of work that already went into the current manuscript I would not make this a condition for recommending acceptance of the manuscript, but only a suggestion that the authors might want to consider to extend the geophysical impact of their work.
In addition to this general comment I have a few questions on the technical content of the manuscript.
- The question of the effect of introducing a random perturbation on the statistics is very interesting. In the example studied in the paper, Fig. 13 shows that some AST selection procedures leads to apparent bias. Presumably this is just a sampling issue: because these AST selection procedures are less efficient at generating larger severities, the tail probability tends to be underestimated. In addition to the empirical evidence, can you obtain analytical insight on the bias of your estimator?
- Could you extend the method to estimate statistics of any observable conditioned on the fact that the severity is above a given threshold within ensemble boosting? In diffusion Monte-Carlo algorithms this is possible because it performs importance sampling in trajectory space. Is it the case here?
- another interesting aspect of the manuscript is the idea to use construct perturbations from a low-dimensional sample space. Intuitively it seems that it should lead to smaller variance of the estimators. However, in the case considered here I wonder if this would necessarily be the case. If the growth of the perturbation is indeed governed by the linear baroclinic instability mechanism, a random perturbation directly constructed in streamfunction or vorticity space should project onto the most unstable mode (the same one you are forcing by construction in the manuscript), and the evolution should quickly be dominated by this mode. Could you comment on this?
- while the optimality criteria introduced in the manuscript are interesting, for practical use one would have to estimate the AST online, without systematically searching for the optimum. Do you expect this to be a limitation for applications?
- I understand that it is not the direct goal of the paper to design an algorithm which is more computationally efficient than DNS. Nevertheless, it would be natural to expect that it should be the case even without any specific effort. In previous uses of rare event algorithm with climate models for instance, there was an immediate gain of orders of magnitude. Here, Fig. 13 suggests that ensemble boosting with proper AST selection indeed performs better than equal-cost DNS, but it is difficult to appreciate how much really. Would it be possible to do such a comparison, for instance in terms of the return time of the most extreme events simulated? Unlike algorithms of the interacting particle systems family, ensemble boosting is not iterative: the initial ancestors are used to try to simulate more extreme descendents, but these descendents themselves are never used as ancestors again. Do you think this plays a part for computational efficiency?

Specific comments:

- some of the questions formulated in the abstract are very general and the manuscript addresses only part of them. I would be in favor of a more focused abstract.

- Fig. 1: the caption is long and it is not immediately evident when looking at the figure what is plotted on panels b)c)d)iii; perhaps add a legend directly on the figure to make it more clear? Even with the caption it is not completely transparent why there are two solid red lines in those panels.

- p10: "can be estimated it by...": remove "it".

- p18, paragraph 2: the relevant timescales, in particular the eddy turnover time, is estimated empirically here if I understand correctly. Is it compatible with the phenomenology of baroclinic turbulence, which should allow you to estimate it a priori from model parameters?

- p18, paragraph 3: the results suggest that mixing is more efficient in eastward jets, can you relate this to known results on the phenomenology of baroclinic transport?

- Fig. 4: shouldn't the GPD parameters exhibit some symmetry with respect to the middle of the domain? Could the lack of such symmetry be related to problems of statistical convergence for the estimators of these parameters?

- Fig. 7 caption: "from the long DNS (dashed black curves)": did you mean short DNS?

- p30, "displayed in Fig. 9a": the figure does not have multiple panels.

- section 5.2 and caption of Fig. 8: the notation R^2 for the coefficient of determination is a bit confusing given that R is the intensity, even if you never need to square it...

- Fig. 9: for long ASTs and small scale parameter $s$ the conditional severity PDFs have most of their mass outside of the sampled severities. Is this robust from a statistical point of view?

- Fig. 10: I am not fond of the unusual scale used for the two bottom rows, which at first glance gives the impression that the correlation coefficient depends essentially linearly on the AST.

- Fig. 11 caption: "Shaded regions show the areas between truncated upper and lower means." I am not sure I understand the justification for this.

Reply

Citation: https://doi.org/10.5194/egusphere-2025-5092-RC1

Justin Finkel and Paul A. O'Gorman

Model code and software

COAST Zenodo repository Justin Finkel https://doi.org/10.5281/zenodo.17355215

Justin Finkel and Paul A. O'Gorman

Viewed

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 144 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
141	0	3	144	0	0

HTML: 141
PDF: 0
XML: 3
Total: 144
BibTeX: 0
EndNote: 0

Views and downloads (calculated since 03 Nov 2025)

Month	HTML	PDF	XML
Nov 2025	83	0	83
Dec 2025	46	3	49
Jan 2026	12	0	12

Cumulative views and downloads (calculated since 03 Nov 2025)

Month	HTML	PDF	XML
Nov 2025	83	0	83
Dec 2025	46	3	49
Jan 2026	12	0	12

Viewed (geographical distribution)

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 124 (including HTML, PDF, and XML) Thereof 124 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Jan 2026

Short summary

Estimating small probabilities of high-impact extreme weather events is a persistent computational challenge, motivating techniques such as "rare event sampling" and "ensemble boosting": lightly perturbing simulated moderate events into more extreme ones. We formulate a new, flexible sampling strategy and characterizes a critical parameter – the "advance split time", dictating when to perturb – in a simple atmospheric turbulence model, with generalizable entropy-based criteria.


Total:	0
HTML:	0
PDF:	0
XML:	0