Towards accurate extreme event likelihoods from diffusion model climate emulators

Manshausen, Peter; Brenowitz, Noah; Berner, Julius; Kashinath, Karthik; Pritchard, Mike

doi:10.48550/arXiv.2605.03802

Preprints

https://doi.org/10.48550/arXiv.2605.03802

Preprints

20 May 2026

| 20 May 2026

Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

Towards accurate extreme event likelihoods from diffusion model climate emulators

Peter Manshausen, Noah Brenowitz, Julius Berner, Karthik Kashinath, and Mike Pritchard

Abstract. ML climate model emulators are useful for scenario planning and adaptation, allowing for cost-efficient experimentation. Recently, the diffusion model Climate in a Bottle (cBottle) has been proposed for generation of atmospheric states compatible with boundary conditions of solar position and sea surface temperatures. Crucially, cBottle can be guided to generate extreme events such as Tropical Cyclones (TCs) over locations of interest. Diffusion models such as cBottle work by approximating the probability density of the training data. Here, we show use cases of the probability density estimates of atmospheric states obtained from this climate emulator. Most importantly, these estimates allow us to calculate likelihoods of extreme events under guidance. When guiding the model towards states including TCs, comparing the probability density under the guided and unguided model enables us to quantify how much more likely the guidance has made the TC. We show how these odds ratios allow us to importance-sample from the TC distribution, reducing the standard error of the probability estimate compared to simple Monte Carlo sampling. Furthermore, we discuss results and limitations of the application of model probability densities to extreme event attribution-like experiments. We present these early but encouraging results hoping they will spur more research into probabilistic information that can be gained from diffusion models of the atmosphere.

Received: 06 May 2026 – Discussion started: 20 May 2026

Peter Manshausen, Noah Brenowitz, Julius Berner, Karthik Kashinath, and Mike Pritchard

Status: open (extended)

Post a comment Subscribe to comment alert

CEC1:
'Comment on egusphere-2026-2610 - No compliance with the policy of the journal', Juan Antonio Añel, 21 Jun 2026 reply

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. In addition, you have archived the trained cBottle model checkpoints in a nvidia.com site.; however, the mentioned site does not fulfil GMD’s requirements for a persistent data archive because:
- It does not appear to have a published policy for data preservation over many years or decades (some flexibility exists over the precise length of preservation, but the policy must exist).

- It does not appear to have a published mechanism for preventing authors from unilaterally removing material. Archives must have a policy which makes removal of materials only possible in exceptional circumstances and subject to an independent curatorial decision,
- It does not appear to issue a persistent identifier such as a DOI or Handle for each precise dataset.
If we have missed a published policy which does in fact address this matter satisfactorily, please post a response linking to it. If you have any questions about this issue, please post them in a reply.
Due to the above mentioned issues your manuscript should have not been accepted for Discussions or peer-review in GMD. The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
Later, if the Topical Editor decides to continue with the review or publication process of your manuscript and you are requested to upload a new version of it, then The 'Code and Data Availability’ section of your manuscript must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Reply

Citation: https://doi.org/10.5194/egusphere-2026-2610-CEC1
- AC1:
  'Reply on CEC1', Peter Manshausen, 25 Jun 2026 reply
  
  Dear Juan A. Añel,
  thank you for bringing this to our attention. While we note, that the reviewer can access the code and data during the ongoing review process via the links included in the manuscript, we understand the need for a guaranteed, durable archive. We have uploaded of the current code version, as well as the NGC model checkpoints to zenodo, archived under DOI https://doi.org/10.5281/zenodo.20832634
  We look forward to hearing back about our manuscript.
  Kind regards,
  Peter Manshausen
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2026-2610-AC1
  - CEC2: 'Reply on AC1', Juan Antonio Añel, 25 Jun 2026 reply
    
    Dear authors,
    Thanks for addressing this issue so quickly. I have checked the repository and we can consider now the current version of your manuscript in compliance with the code policy of the journal.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Reply
    
    Citation: https://doi.org/10.5194/egusphere-2026-2610-CEC2
RC1:
'Comment on egusphere-2026-2610', Guillaume Couairon, 28 Jun 2026 reply
Strengths:
Using diffusion models for computing sample probabilities is a very good idea, I would say even a motivation for using diffusion models in the first place, so it’s very good that a paper tackles this problem and this paper presents a solid analysis of the idea.

The presentation of the method and results is very good.

The paper is honest about the computational cost of the method and that it is currently not competitive with unguided Monte Carlo estimate. It’s indeed the ultimate goal to improve the Pareto frontier of computational cost vs probability uncertainty of Monte Carlo estimates.

Computing odds ratio for guided vs unguided samples is a neat idea that circumvents problems of the raw probability estimates.

Weaknesses:
The weakness of the method is that it is a bit weird to only apply guidance for a given noise range (15, 20). Even if it is motivated in the paper why (relevant noise scale for TCs), applying guidance at all scales should still work. I would have liked to see how this affects the importance sampling estimate. Also in general, using a classifier trained on clean data produces low-quality gradients when evaluated on denoising estimates, but it does not seem to be the case here since the classifier is the same model as the denoiser (if I understood correctly paragraph 2.2), so it does not have this limitation and should work at all noise levels.

The main weakness of the paper is for me the absence of an analysis of the estimate uncertainty depending on the computational cost, which can be varied e.g. by varying the ODE solver / number of steps. Since the main goal is to reduce compute cost of Monte Carlo estimation with the same accuracy, that would show a bit different operating regimes and if there is a favorable trade-off. While the paper suggests distillation can be used for fast probability estimation, I would like to see to what extent the 11x slowdown for ODE integration is needed.

Related to the point above, it would have been nice to compute probabilities not clustered by detection threshold from the classifier, but rather by TC strength (which is probably correlated, but would have given a more interpretable result). We expect that the uncertainty of the importance sampling estimate should decrease compared to the Monte Carlo estimate for more extreme events, which we begin to see on figure 4a) for 85% detection threshold but could be clearer.

Given the importance of TC classifier, it would have been nice to put a bit more details on it in the paper (how it was trained, its accuracy as function of noise, scale of its gradients..)

Questions
Figure 4a) we can see that for TC detection threshold = 85%, the guided model compensates for the under-representation of extremes in cBottle but barely increases upon the frequency in ERA5 (~ 1%). Is that all we can do ? If we increase guidance strength and guidance noise interval we would hope to be able to sample those extreme events a lot more.

Other remarks:

“This formalizes that the accuracy of the probabilities calculated here depends on how well the underlying model approximates the data distribution.” I think this is a very intuitive statement that does not require justification. Nevertheless I don’t see the purpose of the provided justification.
First the writing suggests that the negative log likelihood bound is the formalization, whereas it’s rather the decomposition (8) that is informative. Second, (8) is an average on the distribution, which does not tell you much about individual sample probabilities. It would be a better justification to establish a bound on E_{x~p_data} |p_data(x) - p_\theta(x)| or E_{x~p_data} |log p_data(x) - log p_\theta(x)| which would control the average deviation of log prob from their true value. It could turn out that this quantity is upper bounded by the KL under some conditions. I think that would be a better justification.

The discussion on adaptive guidance scale could probably put after the results or in related work to streamline presentation of the method.

Typos

Page 1 - “research has has focused”
Page 10 - “with the the”

Reply
Citation: https://doi.org/10.5194/egusphere-2026-2610-RC1

Peter Manshausen, Noah Brenowitz, Julius Berner, Karthik Kashinath, and Mike Pritchard

Viewed

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 164 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
163	0	1	164	0	0

HTML: 163
PDF: 0
XML: 1
Total: 164
BibTeX: 0
EndNote: 0

Views and downloads (calculated since 20 May 2026)

Month	HTML	PDF	XML
May 2026	119	0	119
Jun 2026	20	0	20
Jul 2026	24	1	25

Cumulative views and downloads (calculated since 20 May 2026)

Month	HTML	PDF	XML
May 2026	119	0	119
Jun 2026	20	0	20
Jul 2026	24	1	25

Viewed (geographical distribution)

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 157 (including HTML, PDF, and XML) Thereof 157 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Jul 2026

Short summary

We used a machine learning climate emulator which can quickly create realistic weather patterns to study rare and extreme storms. By steering the model toward such events and comparing how likely they are with and without this steering, we can better estimate their frequency. This approach improves accuracy while using fewer simulations. Our early results show promise for understanding and planning for extreme weather, and offer a starting point for more research on these methods.


Total:	0
HTML:	0
PDF:	0
XML:	0