Boosting Ensembles for Statistics of Tails at Conditionally Optimal Advance Split Times
Abstract. Climate science needs more efficient ways to study high-impact, low-probability extreme events, which are rare by definition and costly to simulate in large numbers. Rare event sampling (RES) and ensemble boosting use small perturbations to turn moderate events into a severe ones, which otherwise might not come for many more simulation-years, and thus enhance sample size. But the viability of this approach hinges on two open questions: (1) are boosted events representative of the yet-unrealized events? (2) How does this depend on the specific form of perturbation, i.e., timing and structure? Timing in particular is crucial for sudden, transient events like precipitation. In this work, we formulate a concrete optimization problem for the advance split time (AST) hyperparameter, and study it on an idealized but physically informative model system: passive tracer fluctuations in a turbulent channel, which captures key elements of midlatitude storm track dynamics. Three major questions guide our investigation: (1) Can RES methods, in particular "ensemble boosting" equipped with a probability estimator and "trying-early adaptive multilevel splitting", accurately and efficiently sample extreme events? (2) What is the optimal AST, and how does it depend on the event definition, in particular the target location and surrounding flow conditions? (3) Can the AST be optimized "online" while running RES?
Our answers support RES as a viable method: (1) RES can meaningfully improve tail estimation, using (2) an optimal AST of 1-3 eddy turnover timescales depending on location. (3) A "thresholded entropy" statistic is a good proxy for AST optimality, bypassing the tedious threshold-setting that often hinders RES methods. Our work clarifies aspects of the response function of transient extreme events to perturbations, giving a guide for designing efficient, reliable sampling strategies.
The paper is interesting and addresses a timely problem: the scarcity of extreme-event data in climate systems and the need for more efficient rare‐event sampling. With the increasing trend and societal impact of extreme events, methods that can better explore tails of the distribution are of clear importance for future research.
The authors aim to identify an optimal Advance Split Time (AST) at which perturbations should be introduced so that rare-event algorithms produce more realistic, diverse, and physically relevant extremes. Instead of relying on traditional threshold-based methods, they develop system-intrinsic indicators that diagnose when perturbations have grown sufficiently to diversify extremes without losing dynamical connection to the original event. They demonstrate this principle first on a simple system and then on a physically meaningful 2-layer quasigeostrophic (QG) model with a passive tracer, illustrating how optimal AST varies with spatial structure, target region, and underlying dynamics.
The study is thoughtfully executed and provides a promising conceptual foundation. I have some comments that may strengthen the manuscript:
1. Computational cost.
The manuscript does not quantify the computational cost of evaluating multiple AST values or generating boosted ensembles. Since computational efficiency is central to the motivation for rare-event sampling, it would be helpful for the authors to comment on the relative cost of their procedure compared with established splitting algorithms such as AMS or TEAMS. Even approximate scaling behavior (e.g., with ensemble size, model resolution) would be informative.
2. Chaotic divergence and event identity.
Because climate dynamics are chaotic, boosted descendants launched too early may drift toward unrelated extreme configurations. The manuscript discusses decorrelation qualitatively but does not describe a mechanism to ensure that boosted samples still represent intensifications of the same physical event as the ancestor. Could the authors clarify whether additional constraints are needed to maintain physical relevance in boosted ensembles?
3. Applicability to full climate models.
The framework is compelling in the idealized QG setting. However, applying entropy-based AST selection and ensemble boosting to operational climate or weather models introduces substantial challenges, including high dimensionality, model biases, observation uncertainty, and the difficulty of maintaining event identity in chaotic flows. Could the authors comment on the main obstacles to such an extension? In particular, do they envision a role for machine learning methods for latent-space reductions or event-type classifiers, which makes the approach computationally feasible in high-dimensional systems?
Overall, the paper provides a valuable contribution and opens an important line of inquiry. Addressing these points would clarify the method’s practical scope and future potential.