the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Probabilistic Approach to Wildfire Spread Prediction Using a Denoising Diffusion Surrogate Model
Abstract. We propose a stochastic framework for wildfire spread prediction using deep generative diffusion models with ensemble sampling. In contrast to traditional deterministic approaches that struggle to capture the inherent uncertainty and variability of wildfire dynamics, our method generates probabilistic forecasts by sampling multiple plausible future scenarios conditioned on the same initial state. As a proof-of-concept, the model is trained on synthetic wildfire data generated by a probabilistic cellular automata-based simulator, which integrates realistic environmental features such as canopy cover, vegetation density, and terrain slope, and is grounded in historical fire events including the Chimney and Ferguson fires. To assess predictive performance and uncertainty modelling, we compare two surrogate models with identical network architecture: one trained via conventional supervised regression, and the other using a conditional diffusion framework with ensemble sampling. In the diffusion-based emulator, multiple inference passes are performed for the same input state by resampling the initial latent variable, allowing the model to capture a distribution of possible outcomes. Both models are evaluated on an independent ensemble testing dataset, ensuring robustness and fair comparison under unseen wildfire scenarios. Experimental results show that the diffusion model significantly outperforms its deterministic counterpart across various metrics. At a training size of 900, the diffusion model outperforms the deterministic baseline by a substantial margin. Averaged across the Chimney fire and Ferguson fire datasets, the diffusion model achieves a 67.6 % reduction in mean squared error (MSE), a 5.4 % improvement in structural similarity index (SSIM), and a 69.7 % reduction in Fréchet Inception Distance (FID). These findings demonstrate that diffusion-based ensemble modelling provides a more flexible and effective approach for wildfire forecasting. By capturing the distributional characteristics of future fire states, our framework supports the generation of fire susceptibility maps that offer actionable insights for risk assessment and resource planning in fire-prone environments.
- Preprint
(4077 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2025-2836', Shunji Kotsuki, 04 Aug 2025
This study proposes a probabilistic surrogate model for wildfire spread prediction using a denoising diffusion probabilistic model (DDPM). The manuscript is clearly written, and the authors provide a careful summary of related work and a well-structured experimental setup. However, I have concerns regarding the novelty of the work, as outlined below.
To me, this research seems to be an application of diffusion models to wildfire spread prediction. While the use of diffusion models in this context is interesting, the paper appears to be a straightforward application without sufficient methodological innovation or rigorous benchmarking. In particular, comparisons with widely used spatiotemporal prediction models such as ConvLSTM are lacking. Without such comparisons, it is difficult to evaluate the practical benefits of adopting a diffusion-based approach, especially given the computational complexity of DDPMs. Therefore, I recommend rejecting the manuscript at this stage, and suggest that the authors reconsider the experimental design and comparative evaluation. A future resubmission with stronger justification for model choice and clearer evidence of its advantages could make a valuable contribution to the field.
[Major Comment]
(1) Novelty of research: In recent years, the use of diffusion models in geoscientific applications has become increasingly common. While the authors briefly mention prior works such as GenCast in the introduction, there are many other studies in the literature that have applied diffusion models to various environmental prediction tasks. Therefore, applying a diffusion model to a new problem domain alone no longer constitutes sufficient scientific novelty, in my opnion. Although the application of diffusion models to wildfire spread may be somewhat novel, I do not believe that this contribution, in its current form, reaches the level of academic significance expected for publication in Geoscientific Model Development (GMD). To strengthen the scientific contribution, the authors should more clearly identify what challenges are unique to wildfire modeling, and explain how their proposed approach specifically addresses those challenges.
(2) Discussion: I believe the current manuscript lacks a discussion of the insights gained and their limitations, which is essential for it to be considered part of empirical science. As it stands, the paper focuses primarily on describing the method and results, and feels more like a technical report than a scientific publication. While it is true that GMD has a scope that includes technical advancements such as model descriptions, I still believe that a scientific discussion aimed at deepening our understanding is indispensable.
(3) Justification: While the study demonstrates the potential of DDPMs for probabilistic wildfire spread prediction, it lacks sufficient baseline experiments to support the claim that diffusion models meaningfully improve forecast skill. In particular, it is important to include comparisons with established spatiotemporal prediction models such as VAE and ConvLSTM or other deterministic and probabilistic deep learning approaches commonly used in Earth Science. Although the authors include a deterministic baseline, a comparison between DDPM and Res-Unet alone is insufficient to validate the advantages of using diffusion models in this context.
(4) Model sensitivity and hyperparameter tuning: While it may not be necessary to include an exhaustive hyperparameter analysis in the main text, the value of the paper could be significantly enhanced by reporting key insights gained during model development and training, particularly those related to the sensitivity of DDPMs to critical hyperparameters. In my own experience, factors such as the number of denoising steps and the choice of noise scheduling (e.g., linear vs. cosine), can have a considerable impact on the generated outputs. Providing observations or recommendations on these aspects, even in appendix, would be highly beneficial for researchers aiming to reproduce or extend this work.
(5) Insights unique to wildfire predictions: I could not clearly identify which aspects of the approach, model design, or analysis in this study are specific or novel to the context of wildfire prediction. As it stands, the insights obtained from applying DDPM seem similar to those observed in other probabilistic forecasting tasks. My comments on this paper may come across as overly critical, but I believe the main reason is that it was difficult to understand what the key contributions are, or what unique perspectives or findings arise specifically from applying this method to wildfire modeling. The current manuscript feels more like a demonstration of an experimental setup designed to apply DDPM to wildfire data, rather than a study that offers new and domain-specific insights into wildfire spread prediction.
Citation: https://doi.org/10.5194/egusphere-2025-2836-CC1 -
CC3: 'Reply on CC1', Sibo Cheng, 09 Aug 2025
We thank the reviewer for the detailed comments and suggestions on our manuscript. However, we believe that some key aspects of our work may have been overlooked.
1. The reviewer has repeatedly suggested that a benchmark comparing “DDPM vs. ConvLSTM (or other NNs)” is necessary for the manuscript. We would like to first clarify that sampling algorithms (e.g., DDPM or DDIM) and neural network architectures (e.g., ConvLSTM or U-Net) are fundamentally two different things. Sampling algorithms define how noise is added during the forward diffusion process and removed during the reverse (denoising) process (e.g., markovian in the case of DDPM and deterministic non-markovian in the case of DDIM, see Ho et al., 2020 & Song et al., 2021 for details), whereas different neural network architectures (such as U-Net or Transformer) could be chosen to train this denoising procedure. We believe the reviewer may be confused about this fundamental concept. We can not compare a sampling/denoising method against a neural network structure.
The main objective of our experiments here is to compare a diffusion-based generative training algorithm with the deterministic training method (based on MSE) for wildfire prediction, rather than to evaluate different neural network architectures. Therefore, we compared the performance of a conditional diffusion model based on U-Net to that of the same U-Net trained using a deterministic approach. In addition, using a different neural network architecture might improve the accuracy of deterministic training, but it would not provide probabilistic predictions or capture the uncertainty of fire propagation. And also, the new network architecture will likely improve the diffusion model’s performance as well. This does not qualitatively affect our comparison of diffusion and deterministic training.
We thank the reviewer for this question and will clarify the differences between neural network architectures and diffusion sampling algorithms for non-expert readers in ML.
2. Regarding the novelty of our work, although we agree that diffusion models have recently been applied in geoscience, to the authors’ knowledge, this is the first study to apply diffusion-based generative AI to wildfire spread prediction (see a recent review paper of Xu et al., 2025). In fact, to our knowledge, only one previous GMD publication (Elena Tomasi et al., April 2025) has applied a latent diffusion model to a downscaling task. Therefore, we believe that our paper is the first to use a conditional diffusion model for dynamical-system prediction in GMD.
More importantly, our diffusion model is trained using data generated from a stochastic simulator of wildfire. Therefore, we examine if the ensemble generated by the diffusion model could represent the stochasticity of the original physics model, which brings a unique contribution and insight to the community. We have also designed a specific validation procedure to compare the two ensembles generated by the stochastic physics model and the diffusion AI model, as described in Section 2.2.2 and illustrated in Figures 3 and 7 of our manuscript.
We believe that developing a surrogate model using diffusion-based generative method to capture uncertainties in stochastic physics simulators is novel within geoscience, if not in the broader computational physics field.
Following the reviewer’s suggestion, we will perform additional hyperparameter tuning in the revised manuscript to improve our diffusion model’s performance. However, as noted, our primary objective is to demonstrate a generative diffusion model’s ability to capture the stochasticity of the physics-based model, which our current results already successfully achieve.
3. The reviewer repeatedly refers to DDPM as our denoising approach and points out its computational inefficiency. However, in our manuscript we employ the DDIM algorithm, as clearly stated in the first sentence of Section 3.1.4, in Equation 9 on page 14, and in Algorithm 2 on page 15. We also explain our choice of DDIM over DDPM, indeed specifically for its superior computational efficiency, in Section 3.1 on page 15. Thus, we believe the reviewer may have overlooked some important statements in our methodology section.
Sibo Cheng & Wenbo Yu & Tobias Sebastian Finn & Marc Bocquet
References:
Song, J., Meng, C. and Ermon, S., 2021. Denoising diffusion implicit models, ICLR, available on arXiv:2010.02502.
Ho, J., Jain, A. and Abbeel, P., 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, pp.6840-6851.
Tomasi, E., Franch, G. and Cristoforetti, M., 2025. Can AI be enabled to perform dynamical downscaling? A latent diffusion model to mimic kilometer-scale COSMO5. 0_CLM9 simulations. Geoscientific Model Development, 18(6), pp.2051-2078.
Xu, Z., Li, J., Cheng, S., Rui, X., Zhao, Y., He, H., Guan, H., Sharma, A., Erxleben, M., Chang, R. and Xu, L.L., 2025. Deep learning for wildfire risk prediction: Integrating remote sensing and environmental data. ISPRS Journal of Photogrammetry and Remote Sensing, 227, pp.632-677.
Citation: https://doi.org/10.5194/egusphere-2025-2836-CC3
-
CC3: 'Reply on CC1', Sibo Cheng, 09 Aug 2025
-
RC1: 'Comment on egusphere-2025-2836', Shunji Kotsuki, 04 Aug 2025
I accidentally submitted my review as a Community Comment. Please refer to the Community Comment submitted by Shunji Kotsuki on August 4, 2025, for my review comments.
Citation: https://doi.org/10.5194/egusphere-2025-2836-RC1 -
CC2: 'Reply on RC1', Sibo Cheng, 09 Aug 2025
We thank the reviewer for the detailed comments and suggestions on our manuscript. However, we believe that some key aspects of our work may have been overlooked.
1. The reviewer has repeatedly suggested that a benchmark comparing “DDPM vs. ConvLSTM (or other NNs)” is necessary for the manuscript. We would like to first clarify that sampling algorithms (e.g., DDPM or DDIM) and neural network architectures (e.g., ConvLSTM or U-Net) are fundamentally two different things. Sampling algorithms define how noise is added during the forward diffusion process and removed during the reverse (denoising) process (e.g., markovian in the case of DDPM and deterministic non-markovian in the case of DDIM, see Ho et al., 2020 & Song et al., 2021 for details), whereas different neural network architectures (such as U-Net or Transformer) could be chosen to train this denoising procedure. We believe the reviewer may be confused about this fundamental concept. We can not compare a sampling/denoising method against a neural network structure.
The main objective of our experiments here is to compare a diffusion-based generative training algorithm with the deterministic training method (based on MSE) for wildfire prediction, rather than to evaluate different neural network architectures. Therefore, we compared the performance of a conditional diffusion model based on U-Net to that of the same U-Net trained using a deterministic approach. In addition, using a different neural network architecture might improve the accuracy of deterministic training, but it would not provide probabilistic predictions or capture the uncertainty of fire propagation. And also, the new network architecture will likely improve the diffusion model’s performance as well. This does not qualitatively affect our comparison of diffusion and deterministic training.
We thank the reviewer for this question and will clarify the differences between neural network architectures and diffusion sampling algorithms for non-expert readers in ML.
2. Regarding the novelty of our work, although we agree that diffusion models have recently been applied in geoscience, to the authors’ knowledge, this is the first study to apply diffusion-based generative AI to wildfire spread prediction (see a recent review paper of Xu et al., 2025). In fact, to our knowledge, only one previous GMD publication (Elena Tomasi et al., April 2025) has applied a latent diffusion model to a downscaling task. Therefore, we believe that our paper is the first to use a conditional diffusion model for dynamical-system prediction in GMD.
More importantly, our diffusion model is trained using data generated from a stochastic simulator of wildfire. Therefore, we examine if the ensemble generated by the diffusion model could represent the stochasticity of the original physics model, which brings a unique contribution and insight to the community. We have also designed a specific validation procedure to compare the two ensembles generated by the stochastic physics model and the diffusion AI model, as described in Section 2.2.2 and illustrated in Figures 3 and 7 of our manuscript.
We believe that developing a surrogate model using diffusion-based generative method to capture uncertainties in stochastic physics simulators is novel within geoscience, if not in the broader computational physics field.
Following the reviewer’s suggestion, we will perform additional hyperparameter tuning in the revised manuscript to improve our diffusion model’s performance. However, as noted, our primary objective is to demonstrate a generative diffusion model’s ability to capture the stochasticity of the physics-based model, which our current results already successfully achieve.
3. The reviewer repeatedly refers to DDPM as our denoising approach and points out its computational inefficiency. However, in our manuscript we employ the DDIM algorithm, as clearly stated in the first sentence of Section 3.1.4, in Equation 9 on page 14, and in Algorithm 2 on page 15. We also explain our choice of DDIM over DDPM, indeed specifically for its superior computational efficiency, in Section 3.1 on page 15. Thus, we believe the reviewer may have overlooked some important statements in our methodology section.
Sibo Cheng & Wenbo Yu & Tobias Sebastian Finn & Marc Bocquet
References:
Song, J., Meng, C. and Ermon, S., 2021. Denoising diffusion implicit models, ICLR, available on arXiv:2010.02502.
Ho, J., Jain, A. and Abbeel, P., 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, pp.6840-6851.
Tomasi, E., Franch, G. and Cristoforetti, M., 2025. Can AI be enabled to perform dynamical downscaling? A latent diffusion model to mimic kilometer-scale COSMO5. 0_CLM9 simulations. Geoscientific Model Development, 18(6), pp.2051-2078.
Xu, Z., Li, J., Cheng, S., Rui, X., Zhao, Y., He, H., Guan, H., Sharma, A., Erxleben, M., Chang, R. and Xu, L.L., 2025. Deep learning for wildfire risk prediction: Integrating remote sensing and environmental data. ISPRS Journal of Photogrammetry and Remote Sensing, 227, pp.632-677.
Citation: https://doi.org/10.5194/egusphere-2025-2836-CC2
-
CC2: 'Reply on RC1', Sibo Cheng, 09 Aug 2025
-
RC2: 'Comment on egusphere-2025-2836', Anonymous Referee #2, 09 Aug 2025
In this paper, a probabilistic method using a denoising diffusion surrogate model is applied to study the wildfire spread prediction, which has the advantage of quantifying the uncertainty. The study focuses on synthetic wildfire data generated by a probabilistic cellular automata-based simulator. The study is systematic, and the presentation of the results is detailed. I have a few minor suggestions, especially several clarification questions.
The authors highlighted that “this study seeks to address the limitations of traditional deterministic wildfire forecasting methods.” What about the existing stochastic or probabilistic models?
The authors may add more explicit statements to highlight the novelty of this work. Is this just an application or are there existing improvements in the techniques?
The interpretability of probabilistic forecasting needs more discussion. These forecasts indeed provide UQ. But is such a UQ reliable and accurate?
The physical mechanism is quite complicated, and therefore several variables are involved in the models. How sensitive is the diffusion emulator with respect to the perturbation of each parameter/input?
The role of some of the details of the emulator’s components needs to be discussed. For example, what if the attention mechanism is removed?
Some details about the background should be added. For example, subsampling frames at 20-hour intervals is used for training. Why is such a specific number chosen? There are a lot of mathematical details, but some of the physics or reasoning are missing.
Citation: https://doi.org/10.5194/egusphere-2025-2836-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
679 | 129 | 13 | 821 | 16 | 19 |
- HTML: 679
- PDF: 129
- XML: 13
- Total: 821
- BibTeX: 16
- EndNote: 19
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1