the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Simulating multivariate hazards with generative deep learning
Abstract. When natural hazards coincide or spread over large areas they can create major disasters. For accurate risk analysis, it is necessary to simulate many spatially resolved hazard events that capture the relationships between extreme variables, but this has proved challenging for conventional statistical methods. In this article, we show that deep generative models offer a powerful alternative method for creating sets of synthetic hazard events due to their ability to implicitly learn the joint distribution of high-dimensional data. Our framework combines generative adversarial networks with extreme value theory to construct a hybrid method that captures complex dependence structures in gridded multivariate weather data and provides a theoretical justification for extrapolation to new extremes. We apply our method to model the co-occurrence of strong winds, low pressure, and heavy precipitation during storms in the Bay of Bengal, demonstrating that our model learns the spatial and multivariate extremal dependence structures of the underlying data and captures the distribution of storm severities. Validation shows excellent preservation of spatial correlation structures (r = 0.977, MAE = 0.053) and multivariate dependencies (r = 0.817, MAE = 0.096) for wind, precipitation, and pressure fields. In a case study of storm risk to mangrove forests, we demonstrate that correctly modelling the dependence structures leads to far more realistic estimates of aggregate damages. While our method shows mild underestimation of the damages with a mean absolute error of 93.57 km2, this remains an order of magnitude lower than errors from independence assumptions (460.54 km2) and the total dependence assumption (1056.90 km2) that is implicit when using return period maps. The framework developed in this paper is flexible and applicable across a wide range of data regimes and hazard types.
- Preprint
(8361 KB) - Metadata XML
-
Supplement
(675 KB) - BibTeX
- EndNote
Status: open (until 31 Oct 2025)
-
RC1: 'Comment on egusphere-2025-3217', Anonymous Referee #1, 03 Sep 2025
reply
-
AC1: 'Reply on RC1', Alison Peard, 25 Sep 2025
reply
We thank the reviewer for their thoughtful and constructive feedback on our manuscript. We expect their suggestions will significantly improve the quality of the manuscript and we address each comment systematically below.
Comments regarding novelty and contributions
Reviewer Comment: This paper presents a novel framework HazGAN to simulate multivariate climate hazard event sets. I think the integration of extreme value statistics with GAN-based models is original. Novelty is there (although sometimes overstated). Moving from univariate to multivariate footprints is a clear advance.
Author Response: We thank the reviewer for the positive comment regarding the novelty of our work. We will clarify our specific contributions in the Introduction and Discussion of the revised manuscript, including:
- enabling spatially-coherent event synthesis by combining POT methods with hazard footprints;
- improving sensitivity to modelled extremes by training on Gumbel-transformed marginals;
- reducing data requirements by using the StyleGAN2-ADA model; and
- allowing customisable applications via the modular framework.
While Boulaguiem et al. (2022) [doi: 10.1017/eds.2022.4] first used EVT methods this way for GAN training, their single-hazard, componentwise annual maxima approach could not create spatially coherent event sets or model multi-hazards—limitations our work addresses. We will revise Sections 1.3–1.4 to better highlight how our work advances beyond existing methods rather than simply summarizing them.
Reviewer Comment: While this paper claims wide applicability, it only demonstrates storms to one case study. Different hazards may pose different challenges.
Author Response: We acknowledge our generalisability claim lacks specificity and we will clarify what we mean by ‘wide applicability’ in Sect. 6.0.2. While our single case study currently limits direct evidence, we believe the framework’s modular architecture can accommodate many hazard-specific challenges via customisable:
- variables from open-source climate data
- storm definition functions
- temporal aggregation functions (cumulative/maximum/mean)
- temporal aggregation time frames
- parametric distributions for extreme values.
However, the method has important restrictions: it cannot explicitly model temporal hazard evolution, is currently limited to three simultaneous variables, and requires sufficient training data. We will emphasise these limitations and their repercussions in Sect. 6.0.3.
Additional case studies are also underway which will provide further evidence of generalisability.
Reviewer Comment: There is little justification as to why hydrological hazards -> Bay of Bengal -> mangrove impacts are all chosen as the focus of the study.
Author Response: We will add clear justification for all components of our case study selection in the Introduction and reiterate them in Sect. 4, Sect. 5, and the Discussion.
- We chose hydrological hazards because they are major damage drivers that can interact and compound during storms.
- We chose the Bay of Bengal because it is highly exposed to storms and tropical cyclones, yet data-poor, with existing models underperforming in the region, motivating further study [doi:10.1038/s41467-022-33918-1].
- We chose mangroves because modelling their aggregate damages requires spatially coherent and multi-hazard event synthesis. This is because they are widespread across the Bay and both wind and cumulative rainfall are key predictors of damage during storms [doi:10.1088/1748-9326/ab82cf].
Additionally, this case study addresses a broader modelling challenge: compared to grey infrastructure, which has well-defined failure modes, it is difficult to stress test the resilience of complex ecosystems. Difficulties modelling the benefits, costs, and reliability are key barriers to the adoption of green infrastructure and nature-based adaptation policies (Seddon et al., 2020; doi:10.1098/rstb.2019.0120). By applying empirical stress testing approaches to ‘green infrastructure,’ we aim to highlight that data-driven methods like ours have the potential to facilitate integration of more complex green solutions in climate policies.
We will add these additional points to the revised manuscript.
Reviewer Comment: Discussion is fairly short (after such a huge intro!) but doesn't really engage deeply with the limitations. E.g.., data quality, assumptions in POT selection... Some of the results are oversold - strong claims without benchmarking against other model outputs.
Author Response: We agree the Discussion requires substantial expansion and will significantly strengthen it by: (1) deepening engagement with data quality limitations beyond current coverage in Sections 4.2.1 and 6.0.1; (2) expanding on EVT assumptions and appropriateness of POT threshold selection method (also covered in Comment X); and (3) moving detailed EVT discussion from the Introduction to the Discussion for better focus.
Regarding benchmarking, while jointly fitting parametric statistical models at our scale (4096 × 3 variables) is computationally unfeasible, we have benchmarked hazGAN against the Heffernan and Tawn (2004) conditional exceedance model for randomly sampled pairs of points, comparing the ability to capture the tail dependence between variables and across space. The benchmarking results are included with this document and will be included in the Results section of the revised manuscript.
Direct comparison with existing event generators is more challenging because many storm generators (e.g., STORM; doi:10.1038/s41597-020-0381-2), are specifically for tropical cyclones and are trained on IBTrACS data, which differs substantially from ERA5. We believe evaluating approaches based on their ability to capture target data statistics is more appropriate, which our Figures 7 & 8 demonstrate for extremal statistics across spatial and multi-hazard dimensions.
We will include this additional benchmarking and appropriately qualify our claims given these validation constraints. Future work will implement cross-validation methods for uncertainty intervals.
Reviewer Comment: It would be nice to see a bigger picture of how your model can be used and by who, and give details of what the effects of such a model will have.
Author Response: We will expand the Discussion to include concrete use cases, target users, and broader implications.
For use cases, we will highlight potential end-to-end applications in risk analysis requiring only ERA5 or CMIP6 data, e.g., simulating:
- low solar and wind power potential over Europe to stress test the energy grid’s resilience to low wind–low solar “dunkelflaute” events;
- wind speeds, direction, and antecedent precipitation to stress test the UK energy providers’ resilience to storm-driven power faults (see doi:10.1038/s43247-025-02176-6);
- high temperature and low precipitation globally to stress test the global food system capacity to handle different combinations of regional crop failures;
- risk to Bangladesh’s electricity grid from high winds, storm surge, and extreme precipitation (doi:10.22541/essoar.175648165.54911762/v1)
- high temperature and high humidity to assess different heat mortality scenarios; and
- rainfall, runoff, and soil moisture as inputs to generate spatially-coherent flood events over large scales.
Target users include researchers and practitioners familiar with climate datasets, statistics, and deep learning. While the current implementation requires GPU access (NVIDIA CUDA drivers for StyleGAN2-ADA), CPU training is possible and adaptation to other generative models could reduce hardware requirements (see doi:10.1017/eds.2022.4 which uses standard GANs).
Regarding broader effects, we believe this approach represents a step towards democratising catastrophe modelling by enabling end-to-end runs in under a day with standard equipment, using only open-source data. This allows less well-funded organisations to generate event sets without high-performance computing infrastructure or physical modelling capabilities. However, we will also emphasise that increased accessibility requires careful interpretation by users with appropriate domain expertise.
Comments regarding technical details
Reviewer Comment: There are some confusions with the terminologies that I have noted in the attached document e.g., vulnerability, and also conflating compound events and multi-hazard events etc.
Author Response: We will clarify terminology throughout the manuscript, addressing the specific examples notes:
- "Storm risk to mangroves" was intentional, examining the intersection of exposure, vulnerability, and hazard to mangroves. However, we will clarify this usage.
- "Vulnerability of natural and man-made assets typically exhibits nonlinear relationships with hazards" will be rephrased to clarify that vulnerability is being conceptualised as a function rather than a characteristic: "The vulnerability of natural and man-made assets is typically a nonlinear, multivariate function of hazard variables.”
- We will streamline the use of compound vs multi-hazard events throughout the text, ensuring consistent terminology. To clarify: we are modelling spatial multi-hazards, which facilitates modelling of potential compound hazards.
Reviewer Comment: Some sweeping statements need softening or referenced e.g., "max-stable processes almost never represent real events"
Author Response: We agree this statement could be made clearer. While this comes from Raphaël Huser et al. (2024) [https://arxiv.org/html/2401.17430v1], we will rephrase to make the source and context clearer for readers. The full text in our manuscript was: "They suffer a number of limitations, however, which are reviewed in-depth by Huser et al. (2024). In particular, annual maxima almost never represent a real event over large regions..."
Reviewer Comment: I am confused with the statement that GEV cannot produce a zero shape parameter, when the Gumbel case is standard in risk modelling.
Author Response: We will also clarify this statement to make our meaning clearer. The statement refers specifically to the computational challenge of fitting GPD or GEV distributions to data believed to be in the maximum domain of attraction of Type I extremes (Gumbel, ξ=0). With most fitting methods, ξ=0 cannot be obtained because it corresponds to a singularity in the likelihood function. This issue is documented in standard software packages and discussed in Harris (2005) [doi:10.1016/j.jweia.2005.02.004]. We will clarify our evidence for this statement and what it implies.
Reviewer Comment: A sensitivity analysis would help transparency in terms of the POT threshold choice.
Author Response: We acknowledge that we did not provide sufficient detail about the threshold selection procedure in the manuscript and we will include this in the Methods of the revised manuscript.
To clarify, the POT procedure involved systematic testing of multiple thresholds using goodness-of-fit criteria. Specifically, the threshold selection algorithm: (1) tests 28 thresholds between the 70th and 98th quantiles for each marginal; (2) fits GPD or Weibull distributions and performs Anderson-Darling goodness-of-fit tests; (3) repeats this for 5 bootstrap samples per threshold; (4) transforms p-values using the ForwardStop method to reduce noise; (5) selects the lowest threshold with transformed p-values exceeding 5%
While a full sensitivity analysis for our 4096 × 3 variables would require significant computational effort, we believe the systematic nature of this procedure provides confidence in the stability and robustness of our threshold selections.
We also note that the smooth spatial variation of POT thresholds and parameters shown in Figure 4 implies stability in the fits.
We will add the threshold selection details in the Methods section and we will draw attention to the smoothness of parameters in the Results.
Reviewer Comment: Comment on reproducibility would be good - computational requirements?
Author Response: We will include more practical details regarding reproducibility and computational requirements in the Discussion. Specifically:
Using a regular desktop computer, the 4096 × 3 marginal distribution fits require 2–3 hours. Training StyleGAN on 150 images for 300 epochs takes approx. three hours on an NVIDIA 1080ti GPU. Generation of 914 samples takes approx. one minute. Our method requires an NVIDIA GPU with CUDA due to the StyleGAN2 implementation, in future work we would like to explore alternative backbone models that can be run on any machine with a GPU.
All data used is ERA5 data available from the Copernicus Climate Change Service (C3S).
Re-training the StyleGAN-based framework will produce approximately equivalent statistical, but not identical, results. Two independently trained models using the same seeds will generate different samples, but from nearly identical distributions. This non-determinism is inherent to deep learning. PyTorch does not guarantee reproducibility “across PyTorch releases, individual commits, or different platforms,” and CUDA operations introduce additional stochasticity.
Comments regarding presentation and writing
Reviewer Comment: The main tasks are to streamline the introduction/theory, clarify terminology, better justify the case study, add sensitivity/benchmarking with other models where possible, and strengthen the discussion of limitations.
Author Response: We acknowledge these important points and will address each systematically in our revision.
Reviewer Comment: However, it is jargon heavy and very dense, reading more like a thesis chapter than focusing on the contribution. I worry the dense introduction and methods would lose hazard analysts who would want to use the model.
Author Response: We will streamline the introduction and methods sections to improve accessibility for hazard analysts, removing jargon and unnecessary detail. The reviewer has provided many additional comments in the attachment advising where to cut/clarify the text, and we will incorporate all of these suggestions. Some details will be relocated to the Discussion.
Reviewer Comment: The introduction is well referenced. Although in parts quite technical and written like a chapter rather than a paper - keeping it constrained and going through each past model and explaining what yours expands on would help.
Author Response: We will restructure the introduction to be more focused, systematically addressing each model and limiting our explanation to how our approach expands upon existing work.
Reviewer Comment: At times the tone is too journalistic "rich—though at times much-debated literature" or "catapulted to the forefront".
Author Response: We will revise the manuscript to adopt a more formal academic tone throughout.
Reviewer Comment: The theory section is too dense for this manuscript - perhaps should be moved to the supplementary.
Author Response: We will streamline the theory section and move detailed technical material to supplementary materials as appropriate.
Reviewer Comment: Conclusion is written like a proposal style with "powerful tool" and "scalable foundation".
Author Response: We agree that the tone of the conclusion needs some revision. We will adjust this.
-
AC1: 'Reply on RC1', Alison Peard, 25 Sep 2025
reply
Data sets
Code and data from paper: Simulating multivariate hazards with generative deep learning Alison Peard https://doi.org/10.5281/zenodo.15838238
Model code and software
Code and data from paper: Simulating multivariate hazards with generative deep learning Alison Peard https://doi.org/10.5281/zenodo.15838238
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,597 | 117 | 17 | 1,731 | 69 | 28 | 29 |
- HTML: 1,597
- PDF: 117
- XML: 17
- Total: 1,731
- Supplement: 69
- BibTeX: 28
- EndNote: 29
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
This paper presents a novel framework HazGAN to simulate multivariate climate hazard event sets. I think the integration of extreme value statistics with GAN-based models is original. Novelty is there (although sometimes overstated). Moving from univariate to multivariate footprints is a clear advance. This paper is well written and referenced. It is applied to a practical example of the Bay of Bengal. However, it is jargon heavy and very dense, reading more like a thesis chapter than focusing on the contribution. Figures are good (although perhaps streamline fig 5). I worry the dense introduction and methods would lose hazard analysts who would want to use the model. I think it would benefit from some improvements I list below:
- The introduction is well referenced. Although in parts quite technical and written like a chapter rather than a paper - keeping it constrained and going through each past model and explaining what yours expands on would help. There are some confusions with the terminologies that I have noted in the attached document e.g., vulnerability, and also conflating compound events and multi-hazard events etc. At times the tone is too journalistic "“rich—though at times much-debated literature” or “catapulted to the forefront”.
- There is little justification as to why hydrological hazards -> Bay of Bengal -> mangrove impacts are all chosen as the focus of the study. While this paper claims wide applicability, it only demonstrates storms to one case study. Different hazards may pose different challenges. Maybe tone down the claims of generalisability and focus on this road ahead for improving this for meterological hazards.
- The theory section is too dense for this manuscript - perhaps should be moved to the supplementary. None of the jargon are really explained and it is not adding to the manuscript if you are not linking these methods with your model or improvements in your study. Some sweeping statements need softening or referenced e.g., "max-stable processes almost never represent real events"
- I am confused with the statement that GEV cannot produce a zero shape parameter, when the Gumbel case is standard in risk modelling. The discussion of Weibull and wind tails could also be clearer. Are the negatives here a result of physics or the sample?
- A sensitivity analysis would help transparency in terms of the POT threshold choice.
- Comment on reproducability would be good - computational requirements?
- Discussion is fairly short (after such a huge intro!) but doesn't really engage deeply with the limitations. E.g., data quality, assumptions in POT selection. For example, ERA5 has course resolution addressed in the paper but does not address how this bias affects conclusions. With the biases explained with the training data, the underestimation of storm intensity is a major limitation. Also the limitations around over-engineering the process, transforming the Gumbels and then back-transforming. Some of the results are oversold - strong claims without benchmaking against other model outputs.
- Conclusion is written like a proposal style with "powerful tool" and "scalable foundation". Could be tightened for more factual contributions. It would be nice to see more bigger picture of how your model can be used and by who, and give details of what the effects of such a model will have. Rather than saying it can be applied by different hazards - which is not evidenced here.
The main tasks are to streamline the introduction/theory, clarify terminology, better justify the case study, add sensitivity/benchmarking with other models where possible, and strengthen the discussion of limitations. I provided further minor comments in the attached document.