Combining hazard, exposure and vulnerability data to predict historical North Atlantic hurricane damage
Abstract. Hurricanes are among the most destructive natural hazards globally. Accurate risk assessment requires integrated hazard, exposure, and vulnerability information, yet the widely used Saffir–Simpson scale, while an effective public-communication tool, is based on a single hazard quantity (wind speed) and is not well correlated with historical economic losses, limiting its predictive value. This study develops a statistical model to predict economic damage from landfalling North Atlantic hurricanes using optimally weighted, normalised-rank variables representing hazard, exposure, and vulnerability. The model significantly reduces root-mean-square error between predicted and observed losses from U.S.$35.6 billion (when using landfall wind speed) to U.S.$7.0 billion, and substantially outperforms single-parameter predictions, including landfall wind speed maxima and central pressure minima. To improve communication of financial risk, we introduce a loss-based 'Hurricane Predictive Damage Scale' to more directly link hurricane characteristics to economic impacts. Our results demonstrate that integrating exposure and vulnerability data with hazard observations yields markedly better estimates of historical hurricane economic impacts, and this approach is readily applicable to future forecast hurricanes, allowing assessment of how damage from an imminent landfall may rank among historical events. This framework is transferable to other cyclone-prone regions and highlights the critical need for open exposure and vulnerability data to advance climate risk quantification and inform policy.
The objective of the study is very worthwhile. The Saffir-Simpson scale has long been recognized as lacking and less-than-optimal to fulfil its aims of hurricane warning. Therefore, an effort to substitute of accompany that scale with another metric could very much benefit society and is very welcome. This study summons a multi-pronged approach to address this, consulting and combining a vast range of datasets for an ambitious multi-variate analysis of the drivers of hurricane damages, finally motivating and proposing such an alternative scale. I think the study is overall well planned and executed; however, critical gaps in the presentation of its data and methods prevent it from being publishable at this stage. I recommend major revisions, and I would like to review a new version, if the authors will produce it, such that I can more fully evaluate the Results and Discussion sections, that I cannot fully evaluate now due to unclear Data and Methods.
Main points:
Detailed points
Abstract
The passage “[the S-S scale is] an effective public communication tool” seems to purely pay lip service to the tool. This is repeated verbatim in the introduction. Nothing specifically about this scale seems to be particularly increase its effectiveness. Please consider rephrasing or skipping this – up to you.
“limiting its predictive value”: I would recommend adding “and early-warning value”.
The sentence starting with “The model significantly reduces” contains a seeming repetition, about comparisons with 1) a model “using landfall wind speed” and 2) a model of “single-parameter predictions, including landfall wind speed maxima”. What is the difference between the two?
The closing sentence promises transferability to other regions. This does not seem supported in the article, and I suggest either making a convincing case for this in the Discussion, or eliminating.
Significance statement
Does the second sentence need to include both terms “impact” and “losses”? that’s potentially confusing. There are more instances in the manuscript where loss and damage seem to be used interchangeably: please also address this.
Section 1
The NOAA 2024 reference does not seem to inform about damages of year 2025.
Sentence starting with “The most important threats” lacks a coordinating verb.
The explanation of why hurricane damages are more challenging or uncertain compared to TC activity is not clear or logical; please check this. Besides, please be explicit about the zoom in into hurricanes, from tropical cyclone. Besides still: this sentence is repeated two times, in this paragraph!
Please clarify “economic financial damage”.
Correct to: “is a critical step in mitigating…”
Line 68: please spell out what you mean by “intensity”, as it needs to be clear in this context.
The overview of previous efforts to improve on the Saffir Simpson scale is well written and accurate. Please consider contending with other potentially relevant efforts. To my knowledge: Tripathy et al. 2024 (https://doi.org/10.1038/s43247-023-01198-2) and the simple mean sea level pressure metric in the elsewhere cited paper of Klotzbach et al. 2022a. Further, it would be coherent to add a remark about the implications of Pilkington and Mahmoud (2016) and of Baldwin et al. (2023) for an alternative intensity scale (the focus of the paragraph).
The sentence “These recent studies add to a growing body of evidence that combining these factors is necessary to capture risk (Ward et al., 2020)” is almost repeated from one of the prior paragraph, including the same reference.
Line 95-on. Please revise syntax of this sentence. More in general, in this sentence and part, I think you could make the point more clear, that a ‘usable’ new scale should also fulfil the need for rapid implementation/computation, which is relevant for your methodology.
Research questions are a bit redundant and not very useful. The third one, in particular, does not seem appropriate, since the reader knows nothing yet about this ‘Hurricane Predictive Damage Scale’. In the second question, it is not clear what ‘more’ refers to. Please reconsider.
Section 2
“(section 2.2)” is repeated in short succession.
In this section, you start to deal with “losses”, whereas so far you had dealt with “damage”. Please harmonize this or clarify the difference.
Line 115: the list of loss sources does not correspond 1-to-1 to that in figure 1 (very useful figure, by the way!). It seems to me that it should. Please also check for the other categories of data.
In this part, you should explain, briefly, what “normalized” means, in general, and – if needed – in each different study. Normalization is further expanded from line 149, but there is no overall explanation of the purpose it serves and it comes too late. This is not trivial, as there seem to be different concepts of normalization, justifying different methods and supporting different results and conclusions. I do not specialize in this aspect, but it seems that the literature on the topic of trends of normalized hurricane losses is rich of discussions and fraught with implications. Data and methods across publications and datasets has fundamental differences, and it seems quite complicated to harmonize across them. I commend the authors for their effort here. But I am very confused about the criteria and methods for the selection of the data across sources. It is important that this is done transparently and clearly, and this needs strong improvement. For example: Blake et al seem to report disasters from 1851, why do you write that loss estimates are available from 1965? What are the differences between that source and National Centers for Environmental Information (2025)? What are the implication that the “Billion dollar loss record” only include records past that arbitrary threshold (unlike, e.g., EM-DAT)? What does the sentence at line 134-136 mean? Are EM-DAT and Delforge et al. (2025) the same source? If so, do not use both terms interchangeably. Why do you remain with a time series that starts at 1979 (table 1)?
Line 115: “Historical hurricane economic loss estimates were collated from various government agencies and published studies” and line 127: “We collated hurricane loss estimates from multiple sources”. Please avoid confusion and write this methods step in one place. Actually, the first 2 paragraphs of section 2.1 should be reorganized: as they are they report similar facts about each dataset in different places and in no particular order.
Please check if you can slightly improve the explanation of how you handle the complication that different sources treat differently data losses from hurricanes with multiple landfalls, lines 137-143. I think I understand it, but I wonder if clarity can be improved.
I am confused about Table 1: where are the named storms in it, and why are they relevant? Same for “bypassing” hurricanes.
At line 154 you explain that you normalize the damage data. But each of the source datasets already applies some sort of normalization, likely each in a different way. Why do you normalize again, are the differences across sources taken into account, or do you base your normalization on the raw data pre-normalization? This should be very clear, so that the reader can understand if your methods of normalization serves the scopes of this study.
Line 170: please specify the temporal duration of the maximum wind speed (1-min, 10-min, etc).
Line 171: is shouldn’t be necessary to specify “(i.e., beyond RMW)”, if the RMW and R34 etc are defined.
Line 173: at the timestep before the storm centre crosses over land, the effect of the land can be already present in large sectors of the hurricane (depending also on the timestep size, which you could specify). Reword to, e.g., “atmospheric fields are minimally impacted”.
Why are data from HURDAT2 prioritized over IBTrACS? What happened to hurricanes for which track data lack in both datasets (as implied by “if available”, line 177)? Did you check for, or are there reports on inconsistencies across the two datasets?
Citation is missing for the Global Tide Surge Model.
Line 184: “simulated storm-tide level”, for consistency with the prior terminology.
“Storm surge may be larger in the hours before or after a hurricane makes landfall, depending on antecedent tidal height”. This seems incorrect, according to general terminology (e.g., https://oceanservice.noaa.gov/facts/stormsurge-stormtide.html): storm surge only depends on meteorological forcing, not on tidal phasing (or only minorly and indirectly). Probably here you mean “storm tide”.
While for all other hazard-related variables you take instantaneous data, for rainfall you also take accumulations: why is that? Even if accumulation rather than instantaneous intensities is plausibly tightly related to damages, why using accumulation at locations far from the locations of damage, as done by integrating accumulations along the whole track? Why 500 km radii here?
Line 195: revise syntax. Also, how is rainfall data integrated with MSWEP?
Line 203: “vary between the two datasets”
Line 210: it seems that population density data for a year 1979 landfall come from WorldPop of year 2000. The assumption of population stationarity across a 21-year period seems problematic. One would wonder whether it is not best to discard population density data altogether, and avoid artifacts introduced by this limitation – also considering that the study focuses on economic damages, not on human impacts. More in general, data used combine time-varying and time-invariant datasets. This seems to be correctly stated for each dataset, and fig. 1 summarizes this visually. However, this combination can clearly introduce artifacts in the results. A short reflection on this aspect, maybe at the beginning of the Data section, could clarify the expected impacts of this; and this should be revisited in the Discussion section.
L 214: please define better the Hurricane risk score: vulnerability and resilience are indirectly correlated, so how are they together in the same score?
L 218: “and we averaged these two variables across…”. Besides, what is the footprint here: R34, 500 km, or other?
L 226-on: what is Vn? There is no Vmax in eq 3. Why you use Vhalf of 140 knots, if Vickery et al. suggest lower values? Why Vthresh of 40 knots?
L 235: “At each timestep, vmax is used with Eq. 2 and Eq. 3 (Emanuel, 2011) and the extracted exposure value and building density”. What does this sentence clarify, further than the prior explanation?
On the estimation of size for the older period. The method is mostly well documented, but some details are missing. From which datasets are vmax and the other physical variables taken? How do you obtain 4220 observations, if you have 134 hurricanes in table 1 (I imagine you took also non-landfalling, and you took multiple timesteps per track, please explain)? What are nm? From line 252 you move to R50 and R64, but seem to also discuss aspects that are relevant also for R34: e.g., that RMW is incomplete during 1979-2002: why? What is the difference between “estimates from HURDAT2” and “reconstructions from Gori et al.”?
On fig. 2: how can you include observations from 1979 here (as per caption), if observations start in 2002? What does “model observed” mean? Why are the correlation and MAE different between legend and main text? Why did you include the trendline in blue and did not present it? It suggests that predictions systematically underestimate R34, maybe something to mention in the main text. Lastly: figures generally don’t need titles, and info should be in the caption.
“Where RMW is missing from IBTrACS, RMW is replaced by values from HURDAT2 or Gori et al. (2023)”: this seems redundant with the prior sentences.
Line 257 “RMW values from the previous timestep were used” and line 261 “RMW observations from previous timesteps were used”. Please check this explanation, as there seem to be something redundant of wrong here.
Across the manuscript TC and hurricane are often used arbitrarily for the same concept. Please check and harmonize to one term.
L 281: “In this study, we used a weighted combined-rank framework, linear regression framework and the random forest decision-tree framework to combine input predictors across hazard, exposure and vulnerability to predict historical hurricane damage.” Please improve this important sentence. What is combined here: frameworks, inputs? “Inputs” and “predictors” seem redundant, correct? More in general, please clarify in this section how three, very different approaches are combined in your model: this is missing.
Suggestion: “Our target prediction variable is damage for each hurricane, averaged across the datasets presented in section X”.
What do you mean with “maximising the sample of hurricanes for which a loss estimate is available”? More in general, I don’t understand the difference between table 1 and 2: please clarify, and if possible, consolidate in one table, with one extra column. You seem to have 106 hurricanes for which all data are suitable, out of 134 hurricanes for which damage data are suitable. Are the 28 non-overlapping hurricanes entirely discarded from the analysis? If so, I suggest they shouldn’t feature in any table 1. Further, it is not clear why you deal with “named storms” in the caption of table 2, whereas the column header deals with TCs. Also the caption says damage, the column header says loss.
Line 296: “To reiterate: a key aim of this study, to develop an approach to estimate expected damage for future forecast landfalling hurricanes.” This is superfluous, if you deem that the aim is sufficiently clear from the introduction – as it should be.
Line 314: this sentence is unclear, in the light of the preceding explanation “Linear and normalised input variable ranks were derived”. In the sentence thereafter, what are ”alike loss ranks”? please make sure that sentence is clear.
Fig. 4. Please separate the normalized and non-normalized losses into two separate columns, instead of stacked. Or are red and blue proportions of a total? Unclear.
The titles of sections 3 and 4 should be harmonized, as they are they don’t clarify how their content is organized: “Historical relationship between hurricane vmax and damage”, “Relationships between historical hurricane damage and risk-related variables”. What are risk-related variables? You have not introduced them. Isn’t vmax risk-related, and if so why is it presented in a separate section?
Line 417: compared to what does the random forest improve skill? I imagine compared to the raking with single hazard-related variables. Please be explicit.
Fig. 6: the titles of both panels is the same. Correlation in the legend is also an indication of the goodness of fit.
Fig. 7: the titles of both panels is the same (please check across the manuscript this recurring problem). The description of panels in the caption does not seem to correspond to the axes titles.
Fig. 8: There needs to be a table that clearly explains each dataset, their reference and matches is with its abbreviation used here. This could be consolidated/integrated with/in fig. 1. Why do you only use the NOAA financial loss values here, instead of the multiple sources described in the data and methods sections? Maybe add a brief clarification in the caption, that IBTRACKS cp is strongly anticorrelated because its relationship to TC intensity and damage is inverse. Up to you.
Section 7 on Summary and conclusion should not include a section on “key results”, as per header of 7.1.
True to the header, the first two paragraphs of section 7 are a summary of the study. I question however how useful this is, given that there is an abstract already. I suggest to shrink this part to the minimum necessary to follow the main points argued in the rest of section 7.
Line 521: If I am not mistaken, you do not use variables of inland flooding and coastal flooding. You use storm surge, which is much different.