Deep Learning Emulation of Multivariate Climate Indices: A Case Study of the Fire Weather Index in the Iberian Peninsula

Mirones, Óscar; Bedia, Joaquín; Soares, Pedro M. M.; Gutiérrez, José M.; Baño-Medina, Jorge

doi:10.5194/egusphere-2025-2349

Preprints

https://doi.org/10.5194/egusphere-2025-2349

Preprints

10 Jul 2025

| 10 Jul 2025

Deep Learning Emulation of Multivariate Climate Indices: A Case Study of the Fire Weather Index in the Iberian Peninsula

Óscar Mirones, Joaquín Bedia, Pedro M. M. Soares, José M. Gutiérrez, and Jorge Baño-Medina

Abstract. The Fire Weather Index (FWI) is an essential multivariate climate index for assessing wildfire risk and the associated impacts of climate change, as it provides a quantitative measure of wildfire danger by integrating different critical near-surface fire-weather variables, namely air temperature, relative humidity, wind speed, and precipitation. FWI calculation depends on instantaneous data representing noon local standard times, which are often unavailable in many climate data repositories – particularly in climate projections. In these instances, a "proxy" of actual FWI is often used, applying the same FWI formulation to daily aggregated values (mean, max, or min), despite known limitations in capturing extremes and temporal dynamics.

This study investigates the use of deep learning (DL) models to emulate the reference FWI over the Iberian Peninsula – a predominantly Mediterranean and fire-prone region – using only daily inputs. The emulators are trained and evaluated using ERA5-Land data, which, while not observational ground truth, provides a consistent and high-resolution dataset suitable for controlled inter-comparison. The focus is not on validating FWI against observations, but on assessing the ability of DL models to reproduce the reference FWI more accurately than traditional proxy approaches, using the same input data source.

Our results show substantial improvements in spatial accuracy, preservation of temporal sequences, and detection of extreme fire danger events when compared with the corresponding proxy version. Furthermore, after evaluating different combinations of input variables for DL model training, we find that precipitation can be excluded without substantially affecting accuracy – especially at the upper end – an important insight given the challenges climate models face in representing precipitation. These findings highlight the potential of deep learning tools to enhance the usability of FWI in contexts where sub-daily data are unavailable, and set the stage for the emulation of other multivariate climate indices, which are vital for climate impact studies, spatial planning and management, and adaptation decision-making.

Received: 19 May 2025 – Discussion started: 10 Jul 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Óscar Mirones, Joaquín Bedia, Pedro M. M. Soares, José M. Gutiérrez, and Jorge Baño-Medina

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-2349', Anonymous Referee #1, 21 Jul 2025
Review of Deep Learning Emulation of Multivariate Climate Indices: A Case Study of the Fire Weather Index in the Iberian Peninsula

I’m a bit confused as to how you have chosen the predictor sets for different experiments, namely P1 and P2. It does not seem to have been done systematically , at least how it is described. For eg. in L115-L120 you say that you have done experiments to avoid precipitation due to it’s complex nature, which in itself is fine. But you also say that P1 and P2 are done to improve upon the result of the best proxy P0 which uses DM Temp, Min Rel. Hum., daily accum precip and DM wind speed. The following questions arise based mainly from the way Table 1 is presented:

Why is Min Rel Hum. changed to DM Rel. Hum for P1 and P2 along with removing precip? Were other experiments done with precip included?

Have you done an experiment with the following as predictors: daily mean Temp, min. Rel. Humidity, and daily mean wind speed?

It may actually be only required to perhaps restructure L115-L120 to better suit Table 1 to avoid confusion.
L155-L165: How do you make sure that the model doesn’t overestimate low-moderate values to compensate for extreme value underestimation? Was this tested at all? I understand that it may not be relevant if you're only interested in extreme cases. However, the total performance of the model still needs to be physically consistent.

L220: Is this a description of Fig. 2?

L256-L261: The authors may perhaps explain if the smoother outputs from DL models are good/bad? And perhaps also give an explanation why it’s smooth.

Have you considered using some form of terrain-based predictor, say topography field for example? It might help in providing a more discernible profile to the fields.

Since you’re estimating FWI, I would expect that vegetation cover would be a necessary predictor. Why was it not used?

L300-L302: If it is not too much extra work, I would like to see how previous days’ (e.g. 24h prior) precipitation (lagged precipitation) might affect the model capabilities? The simplest model would suffice. If it is not possible to run the models, then you can also explain how this would affect/not affect the models’ performance (with necessary refs).
Citation: https://doi.org/10.5194/egusphere-2025-2349-RC1
- AC1: 'Reply on RC1', Oscar Mirones, 03 Oct 2025
  
  We are sending the responses to Referee 1 in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2349-AC1
RC2:
'Comment on egusphere-2025-2349', Anonymous Referee #2, 28 Jul 2025

In this study, the authors demonstrate the ability of deep learning (DL) models to emulate the Fire Weather Index (FWI) at 12 UTC. Specifically, three DL models are trained using either daily means or proxy data (as in Bedia et al., 2014) of weather variables relevant to FWI computation, to produce noon-time FWI estimates based on ERA5-Land. The authors also apply interpretability techniques to rank input variables according to their relevance in producing the FWI output. They find that, in high and extreme FWI scenarios, 24-hour accumulated precipitation is not needed to obtain accurate FWI values.
I appreciate the motivation behind this work and the authors’ methodology. The results are compelling and well presented. However, I believe a deeper analysis in certain areas would significantly enhance the value of the paper.
Major Comment 1: Generalization to datasets other than ERA5-Land

One of the main motivations of this work is to emulate reference FWI conditions (i.e., those computed using 12 UTC weather variables and 24-hour accumulated precipitation) using proxy data from datasets that typically provide only daily information, such as climate model outputs. However, the authors do not show an example of applying their DL models to such external datasets.
Given that DL models are often sensitive to the data distribution used during training, applying the trained models to daily means from a dataset different from ERA5-Land (e.g., GCMs or other reanlaysis) may yield inaccurate FWI emulations. Potential issues include discrepancies in statistical properties (e.g., mean, variance, extremes), spatial resolution (important for CNNs), or temporal characteristics.
I suggest the authors to assess how their models perform when applied to an alternative dataset to emphasize the potential of this approach for correcting FWI estimates in climate simulations lacking noon-time fields.
Major Comment 2: Temporal evaluation of FWI estimations

The authors conclude that their DL models capture both spatial and temporal variability of the reference FWI better than traditional proxy methods, and improve the detection of high-risk events. While the spatial evaluation is clearly presented, the paper does not seem to explicitly evaluate the temporal aspects of the DL-predicted FWI.
Beyond the Max Spell analysis, I suggest comparing the seasonal cycle of the reference FWI, proxy FWI, and DL-predicted FWI. This would help assess whether the models maintain consistent accuracy across different parts of the year or under seasonal biases. I also recommend extending the test dataset beyond the current 3-year window—e.g., from 2018 to 2024—to ensure more robust temporal assessment.
Minor 1:
The authors justify using the term “reference FWI” instead of “ground truth” because ERA5-Land is not observational. Since the DL models are trained on ERA5-Land, they likely inherit its biases. I suggest briefly discussing known limitations of ERA5-Land compared to observations, especially if no comparison with observed FWI is included. This is specially important considering major comment 1 in which the ERA5Land's biases learned by the model might be propagate to other models.
Minor 2:
Please provide the actual thresholds used by the Spanish Meteorological Agency (AEMET) for fire danger classification, as these can vary by country and are important for interpretation.
Minor 3:
I suggest moving Figure 1 to the Supplementary Information, as similar architectures have already been described in previous literature.
Minor 4:
Consider merging Figures 2 and 3 to highlight the comparison between reference FWI, proxy FWI, and the DL emulators in a more compact and interpretable format.
Minor 5:
It’s unclear why the Freq. FWI95 for 12 UTC is not shown in Figure 1, even though it is later used to compute biases. Including it would help clarify the comparison.
Minor 6:
Could the overestimation of Freq. FWI95 by the UNet be explained by a general overestimation of FWI in this model, as suggested by the scatter plot? Are all three DL models trained on exactly the same days and years?
Minor 7:
Please use either “proxy FWI” or “Proxy FWI” consistently throughout the text.
Minor 8:
Have you normalized the input data before training the DL models? If so, please specify the normalization method used.
Minor 9:
Since your DL architectures do not incorporate temporal dependencies, they may miss the effect of temporal accumulation in the Duff Moisture and Drought codes (e.g., DC, DMC). Why did you choose non-recurrent architectures over those incorporating temporal structure (e.g., LSTMs)?
Minor 10: Interpretation of input variable relevance

The saliency maps suggest that precipitation is only relevant in low-FWI scenarios. However, this may be a result of how precipitation contributes to the FWI calculation itself—namely, it offsets the fuel dryness components. Therefore, in high and extreme FWI events (which typically occur during dry periods), the precipitation input often has a value of zero, contributing little additional information for the DL model.
It would be insightful to give more information about why the model changes its focus depending on the region and the type of FWI (non or extreme value). Otherwise, the only information that this result gives us is that for the DL temperature, relative humidity and wind speed are sufficent for obtaining accurate high and extreme FWI events. But, is this true in reality?
Considering this work uses ERA5Land, the predictor variables (T, RH, P and ws) are non independent eachother. In fact, temperature and dew point temperature (needed to compute relative humidity) are variables calculated by the land surface model in ERA5Land, while total precipitation and wind components are forcing variables interpolated from ERA5. Therefore it is likely probable that temperature and relative humidity in ERA5Land reflects the effects of changes in total precipitation and wind speed and, therefore, these two last variables are not so needed by DL models. This is just a guess and it likely wont be the full explanation of your interpretability results... but in any case it would be very valuable to give more information about this or at least mention it if you agree on this limitation.
Minor 11:
I miss an experiment in which you assess how the DL models learn to compute the reference FWI using input variables at 12 UTC. Your experiments P1 and P2 address this question to some extent, but the resulting biases could also arise from difficulties relating daily aggregates to FWI 12 UTC data. It may be worth including this experiment to provide insight into where the obtained biases in the DL models may come from.

Citation: https://doi.org/10.5194/egusphere-2025-2349-RC2
- AC2: 'Reply on RC2', Oscar Mirones, 03 Oct 2025
  
  We are sending the responses to Referee 2 in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2349-AC2

Óscar Mirones, Joaquín Bedia, Pedro M. M. Soares, José M. Gutiérrez, and Jorge Baño-Medina

Data sets

Toy Dataset for Emulating the Fire Weather Index (FWI) Using Deep Learning Techniques Oscar Mirones et al. https://doi.org/10.5281/zenodo.15075367

Interactive computing environment

Deep Learning-Based Emulation of the Fire Weather Index in the Iberian Peninsula Using ERA5-Land Predictors: A toy example. Oscar Mirones et al. https://github.com/SantanderMetGroup/DeepFWI

Óscar Mirones, Joaquín Bedia, Pedro M. M. Soares, José M. Gutiérrez, and Jorge Baño-Medina

Viewed

Total article views: 1,117 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,018	72	27	1,117	26	36

HTML: 1,018
PDF: 72
XML: 27
Total: 1,117
BibTeX: 26
EndNote: 36

Views and downloads (calculated since 10 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	114	23	7	144
Aug 2025	262	15	4	281
Sep 2025	538	7	5	550
Oct 2025	55	13	9	77
Nov 2025	49	13	2	64
Dec 2025	1	0	1

Cumulative views and downloads (calculated since 10 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	114	23	7	144
Aug 2025	262	15	4	281
Sep 2025	538	7	5	550
Oct 2025	55	13	9	77
Nov 2025	49	13	2	64
Dec 2025	1	0	1

Viewed (geographical distribution)

Total article views: 1,115 (including HTML, PDF, and XML) Thereof 1,115 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 05 Dec 2025

Short summary

We explored how to better estimate wildfire risk using advanced computer models in a region that often experiences fires. Traditional methods rely on detailed weather data that is not always available, so we tested deep learning tools to fill this gap. Our approach produced more accurate results, especially for predicting extreme fire danger. This can help improve future climate impact studies and support better planning and decisions related to fire safety and climate adaptation.


Total:	0
HTML:	0
PDF:	0
XML:	0