the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Deciphering the drivers of direct and indirect damages to companies from an unprecedented flood event: A data-driven, multivariate probabilistic approach
Abstract. Floods are among the most destructive natural hazards, causing extensive damage to companies through direct impacts on assets and prolonged business interruptions. The July 2021 flood in Germany caused unprecedented damages, particularly in North Rhine-Westphalia and Rhineland-Palatinate, affecting companies of all sizes. To date, no study has examined the factors influencing company damages during such an extreme event. This study addresses this gap using survey data from 431 companies affected by the July 2021 flood. Results show that 62 % of companies incurred direct damages exceeding €100,000. Machine learning models and Bayesian network analyses identify water depth and flow velocity as the primary drivers of both direct damage and business interruption. However, company characteristics (e.g., premises size, number of employees) and preparedness also play critical roles. Companies that implemented precautionary measures experienced significantly shorter business interruption durations—up to 58 % for water depths below 1 m and 44 % for depths above 2 m. These findings offer important insights for policy development and risk-informed decision-making. Incorporation of behavioral indicators into flood risk management strategies and improving early warning systems could significantly enhance business preparedness.
Competing interests: The author Heidi Kreibich is a member of the editorial board of Natural Hazards and Earth System Sciences.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(1186 KB) - Metadata XML
-
Supplement
(463 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-1715', Anonymous Referee #1, 16 Jun 2025
This is an interesting and highly relevant topic that can significantly contribute to a deeper understanding of the various factors influencing both direct and indirect damages to businesses caused by flooding events. The research methods employed are notably technical and innovative, offering fresh perspectives and valuable insights into the complexity of flood-related impacts on commercial sectors. However, despite the strengths of the approach, there are certain points that require further attention and refinement. These include the justification of chosen methodologies, the interpretation of the survey results, a more clear interpretation of the results, and the need for a more comprehensive discussion of the limitations and potential implications of the findings. Addressing these aspects would enhance the overall robustness and applicability of the study.
Review comments:
- Abstract, ‘to date no study has examined the factors influencing company damages during such an extreme event’; is this a correct statement? In the introduction you mention multiple papers that investigated the factors that influenced company damages such as Endendijk et al. (2024), Kreibich et al. (2010). Please clarify or revise this statement.
Method
- Survey data, it would be good to better define the variables in an appendix for example. It is not clear how business interruption is defined. Does business interruption mean that the business is not operational at all or that there is a reduction in business activity, if so how much is this reduction. This should be better defined.
- Variable selection, please introduce this section. The variable selection section dives into the three machine learning techniques without introducing why these thee techniques are used.
- Minimizing J(β) should be called Obj(β) or it should be made more clear that J stands for the objective function as in equation 1 it is defined as Obj(β) and not J(β).
- Variable importance: Please better explain/introduce why Bayesian Networks are used in this case.
- In general, the method section needs more structure: it should be better explained why each algorithm/method is used. A clear motivation as to why the three specific techniques are used is needed.
Results and discussion
- Overview of affected companies: It is implied that sales figures would be a better metric of company size although number of employees is more often used to classify whether the company is an SME or a large company. Therefore, this sentence is unnecessary in my opinion.
- ‘These disruptions can result in partial or complete business interruptions, triggering consequences ranging from loss of sales to bankruptcy’. This sentence is unclear, loss of sales is a form of business interruption.
- It would also be interesting to show the differences in vulnerability and exposure levels between sectors instead of only between company sizes. This should be added or otherwise be explained why it is left out.
- ‘Bankruptcy risks remain generally low across all company sizes’. How is bankruptcy risk defined? Isn’t this a very biased variable given that bankrupt companies are probably not surveyed? Please clarify or leave this out.
- ‘They highlight the need for tailored risk management (...)’. Please clarify to what it should be tailored, to company size or also to company sector?
- ‘Tend to recover more quickly, likely benefiting from greater resilience’. This sentence sounds tautological -> recovering more quickly is part of the definition of resilience.
- Figure 3: Please explain why the outlier levels differ between the business sizes. It seems weird to leave out observations for one class and leave them in for another. This does not look correct. Also, why are there no outliers removed for business restriction duration?
- Having an n=3 for large companies is too low for any inference. Please make this more clear. ‘should be interpreted with caution’ does not cover it fully in my opinion.
- ‘However, substantial variance within each category highlights the influence of extreme cases’. Maybe it is better to infer about the median values instead of the averages then. Please do this or clarify why not.
- 18 out of 19 variables had less than 7% missing data which was imputed. How much missing data did the other variable have and was this imputed too? Be more clear here.
- Figure 4 and Figure 5: these abbreviations are unclear, write them out or find another way of making them more informative. A figure should be understandable on its own.
- ‘This finding underscores (…) even during unprecedented events like the 2021 flood.’ The analysis was carried out for the unprecedented 2021 flood so the word ‘even’ feels misplaced.
- Figure 6: same comment as for Figure 4 and Figure 5. In addition, the resolution of this figure should be higher.
- The fact that the observed damage and business interruption/restriction durations are scaled from 0 to 1 make interpretation difficult. Saying that the 75th percentile decreases from 0.68 to 0.61 for example is hard to interpret. It would be better to make the results a bit more tangible, this way the results will also appeal more to policymakers and it makes the conclusion easier.
- “In addition, for smaller premises (75–500 m²) the uncertainty is very less”, remove the “very“ or replace with “much“.
Please also add a discussion that elaborates on any shortcomings such as low sample size for some company sizes/sectors and outliers, potential selection bias etc. Directions for future research.
Conclusion:
- The conclusion should be more extensive, this conclusion seems a bit too short and concise for an academic paper.
- There should be more links with the results section.
Citation: https://doi.org/10.5194/egusphere-2025-1715-RC1 -
RC2: 'Comment on egusphere-2025-1715', Anonymous Referee #2, 19 Jun 2025
This manuscript quantifies drivers of damages to companies by rare flood events via 3 data-driven techniques, which ultimately lead to a Bayesian Network. This study could have potential, but its possible novelty is currently hidden behind a rather complicated and untransparent chain of calculations. In particular, the justification of using the 3 data-driven models is unclear. Why not less? Why not more? Why these? This could easily be arbitrary. And what does the Bayesian Network add to the variable importance analysis via those 3 models? I raise more questions below. I believe these should be addressed before the paper can be reconsidered for publication.
Title: I suggest a different word than “deciphering” because that’s not what is being done in this study.
L44-52: The message needs to be streamlined here with regard to rare/high-impact events.
L107, 109, 199 and elsewhere: Consider something like “rare” in place of “unprecedented”, because there now is a precedent.
L141, L214: The analyses for each damage type could have been combined, as they are also internally related, via a multivariate regression. Why employ this more elegant solution making optimal use of all information (by not considering the responses as independent)?
L143: Across what scale where the missing data imputed, i.e. how far were they apart on average.
L155: J(beta) is not in the equation.
L157: What does use of the MAE as objective function imply about the nature of the residuals given a response which is between 0 and 1 or counts between 0 and 540?
L159f: It’s not entirely true that the model cannot handle nonlinearities – it can do so via transformations or in Generalised Linear Model form.
L201ff: What are the implications of combining the variable importance across the 3 models?
Eq9, L219, Appendix: It’s conditional probabilities, not fractions in Bayes Rule! I.e. X_i|E and E|X_i.
L222f: Why not leave it discrete rather than introducing another layer of assumptions?
L228f: How do the five models relate to the Bayesian Network?
L230-242: This part is redundant – see above. The function of the 3 models, despite factor selection is unclear. And why 3 models and not more or less?
Results & discussion: Too much time is spent describing univariate results. And the bivariate correlations kinf o defeat the purpose of multivariate analysis.
L394f: Purpose of sentence unclear.
L373: Who’s expert knowledge?
Fig6: What is it’s function for the manuscript?
Fig7: The directions matter here, no? And some of them are not intuitive!
Conclusion: Too short and doesn’t add sufficient novelty.
Citation: https://doi.org/10.5194/egusphere-2025-1715-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
442 | 84 | 17 | 543 | 71 | 16 | 33 |
- HTML: 442
- PDF: 84
- XML: 17
- Total: 543
- Supplement: 71
- BibTeX: 16
- EndNote: 33
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1