the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Estimating return periods for extreme events in climate models through Ensemble Boosting
Abstract. With climate change, extremes such as heatwaves, heavy precipitation events, droughts and extreme fire weather have become more frequent in different regions of the world. It is therefore crucial to further their physical understanding, but due to their rarity in both observational and climate modeling samples, this remains challenging. For numerical simulations, one way to overcome this under-sampling problem is Ensemble Boosting, which uses perturbed initial conditions of extreme events in an existing reference climate model simulation to efficiently generate physically consistent trajectories of very rare extremes in climate models. However, it has not yet been possible to estimate the return periods of these simulations, since the conditional resampling alters the probabilistic link between the boosted simulations and the underlying original climate simulation they come from.
Here, we introduce a statistical framework to estimate return periods for these simulations by using probabilities conditional on the shared antecedent conditions between the reference and perturbed simulations. This theoretical framework is applied to simulations of the fully-coupled climate model CESM2: first for a pre-industrial control simulation, and then in present-day conditions, where, as an example, we estimate the return period of the record-shattering 2021 Pacific Northwest heatwave to be 2500 years, with a 95 % confidence interval of about 2000 to 4000 years. Our evaluation of the method shows that return periods estimated from Ensemble Boosting are consistent with those of a 4000-year control simulation, while using approximately 6 times less computational resources. We thus outline the usage of Ensemble Boosting as an efficient tool for gaining statistical information on rare extremes. This could be valuable as a complement to existing storyline approaches, but also as an additional method of estimating return periods for real-life extreme events within a climate model context.
Competing interests: Some authors are members of the editorial board of WCD
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(13846 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2025-525', Cristian Martinez-Villalobos, 27 Mar 2025
reply
This paper introduces a new framework to estimate return periods of rare climate extremes using ensemble boosting and conditional probability theory. The technique enhances the sampling of extreme events through targeted perturbations, thereby improving return period estimates without requiring prohibitively long control runs. The method is carefully developed and applied to CESM2 under both stationary and transient conditions, including an application to the 2021 Pacific Northwest heatwave.
The manuscript is clearly written and proposes a promising and computationally efficient approach. That said, several assumptions and empirical decisions underlie the method, and their implications for robustness and generalizability are not fully explored. I believe the paper could make a strong contribution after revisions addressing the following points
Main Comments
Assumptions in the estimator
The validity of the boosting estimator depends on assumptions that may deserve further testing or clarification:
- The parent ensemble is assumed to adequately sample the antecedent condition set (AC^0_t). Section 4.2 includes some helpful discussion and testing of the number of parent events used, particularly in the pre-industrial slices. However, it would be useful to further clarify what aspects of the full antecedent condition space are critical for representativeness, and whether longer-term variability (e.g., decadal modes) might still be undersampled. Could the estimator be biased or unstable if AC^0_t is only sparsely or unevenly populated?
- The method also assumes independence between \hat{p}_{T \ge T_{\text{ref}}} and the conditional ratio \frac{\hat{p}_{T \ge T_{\text{ext}} \mid AC^\epsilon_t}}{\hat{p}_{T \ge T_{\text{ref}} \mid AC^\epsilon_t}}. This assumption is discussed in Appendix A1 and appears plausible in the authors' setup, but it is not directly tested. It would be helpful to evaluate how return period estimates change when varying T_{\text{ref}} or the number of parent events, to better understand whether this independence holds in practice.
Methodological choices and tuning
Several aspects of the boosting design are empirical and would benefit from more context or testing:
- The use of specific humidity as the perturbation variable is described as effective, but it’s not entirely clear why this variable was chosen over others. Were other variables (e.g., temperature, geopotential height) tried? If so, the rationale could be made more explicit.
- The perturbation magnitude (1 + 10−13⋅R10^{-13}\cdot R10−13⋅R) seems designed to stay within numerical noise limits. Still, it would be good to mention whether other values were tested, or whether results depend on this factor at all.
- The lead time of −12 days is said to balance realism and divergence. Section 4.3 justifies this choice based on ensemble spread and trajectory divergence, which is useful. Still, have return period estimates themselves been tested for robustness to this choice? Pooling across lead times (as shown in Fig. 4d) seems helpful — if that’s generally recommended, it might be worth saying so directly.
This isn’t to criticize the empirical design — that’s often necessary in early-stage methods — but documenting what was tested and what was fixed would strengthen the work and help future applications.
Validation in a simpler, fully controlled setting
To me, one of the most convincing ways to build confidence in the proposed estimator would be to test it in a much simpler, controlled setting — for example, a low-order stochastic model or linear inverse model where the true return periods are known (or can be computed empirically over very large samples).
This would allow a direct comparison between the boosted estimator and ground truth, and help isolate where biases or over-/under-confidence may arise. It could also help evaluate how the estimator behaves when assumptions like conditional independence or adequate ACₜ sampling are or aren't satisfied.
Even a basic demonstration of this kind would be extremely informative and, in my view, would strengthen the paper considerably.
Confidence interval handling The method appears to yield narrower confidence intervals than GEV-based estimates in some cases. While this could reflect improved sampling, it might also result from underestimating uncertainty in the boosted setting. Appendix A mentions that bootstrapping is used, which is helpful. Still, it would be good to clarify whether the intervals fully reflect all sources of uncertainty (e.g., finite Nparent, dependence structures, or sensitivity to NbN_bNb).
Minor comments/suggestions
Nonstationarity correction. Line 279: The paper states that results are corrected for non-stationarity, but the method used for that correction isn’t described in much detail. How is the rolling climatology computed? Is it applied to each member individually or to ensemble means? And does the choice of window matter?
Section 2.3: Including computational cost (e.g., node-hours or wall-clock time) for the boosted ensemble would help support the method’s efficiency claims.
Notation: Several variables (e.g., TXx5d, TbnT_b^nTbn, TextT_{\text{ext}}Text) appear. A glossary or symbol table might help readers.
Confidence intervals: Have you tested how return period confidence intervals behave if Nb=1500N_b = 1500Nb=1500 or 6000? Even a brief comment would help.
Alternative thresholds: Appendix A briefly discusses threshold sensitivity, but the main text might benefit from a more explicit statement. Would estimates change significantly if parents are selected above the 95th or 99th percentile instead of 90th?
This is a creative and carefully implemented study with a potentially valuable method for return period estimation. The framework is promising and the examples are well chosen. I appreciate that the authors are transparent about the method’s limitations, particularly regarding subjective choices and empirical design. That said, several of these choices and assumptions could still benefit from additional testing and sensitivity analysis. In particular, validating the method in a simple, controlled setting where return periods can be measured directly would provide a powerful test of its performance. With these revisions, the paper would be a strong contribution to the literature on climate extremes.
Cristian Martinez-Villalobos
Citation: https://doi.org/10.5194/egusphere-2025-525-RC1 -
RC2: 'Comment on egusphere-2025-525', Anonymous Referee #2, 31 Mar 2025
reply
My report is contained in the uploaded supplement.
Model code and software
Boosting_estimator Luna Bloin-Wibe, Robin Noyelle, and Vincent Humphrey https://github.com/luna-bloin/Boosting_estimator
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
198 | 42 | 6 | 246 | 7 | 8 |
- HTML: 198
- PDF: 42
- XML: 6
- Total: 246
- BibTeX: 7
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 45 | 19 |
Switzerland | 2 | 42 | 18 |
China | 3 | 38 | 16 |
United Kingdom | 4 | 14 | 6 |
Germany | 5 | 10 | 4 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 45