Deciphering the drivers of direct and indirect damages to companies from an unprecedented flood event: A data-driven, multivariate probabilistic approach

Guntu, Ravi Kumar; Mohor, Guilherme Samprogna; Thieken, Annegret H.; Müller, Meike; Kreibich, Heidi

doi:10.5194/egusphere-2025-1715

Preprints

https://doi.org/10.5194/egusphere-2025-1715

Preprints

25 Apr 2025

| 25 Apr 2025

Deciphering the drivers of direct and indirect damages to companies from an unprecedented flood event: A data-driven, multivariate probabilistic approach

Ravi Kumar Guntu, Guilherme Samprogna Mohor, Annegret H. Thieken, Meike Müller, and Heidi Kreibich

Abstract. Floods are among the most destructive natural hazards, causing extensive damage to companies through direct impacts on assets and prolonged business interruptions. The July 2021 flood in Germany caused unprecedented damages, particularly in North Rhine-Westphalia and Rhineland-Palatinate, affecting companies of all sizes. To date, no study has examined the factors influencing company damages during such an extreme event. This study addresses this gap using survey data from 431 companies affected by the July 2021 flood. Results show that 62 % of companies incurred direct damages exceeding €100,000. Machine learning models and Bayesian network analyses identify water depth and flow velocity as the primary drivers of both direct damage and business interruption. However, company characteristics (e.g., premises size, number of employees) and preparedness also play critical roles. Companies that implemented precautionary measures experienced significantly shorter business interruption durations—up to 58 % for water depths below 1 m and 44 % for depths above 2 m. These findings offer important insights for policy development and risk-informed decision-making. Incorporation of behavioral indicators into flood risk management strategies and improving early warning systems could significantly enhance business preparedness.

Received: 13 Apr 2025 – Discussion started: 25 Apr 2025

Competing interests: The author Heidi Kreibich is a member of the editorial board of Natural Hazards and Earth System Sciences.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1186 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1186 KB)

Supplement (463 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

16 Jan 2026

Deciphering the drivers of direct and indirect damages to companies from an unprecedented flood event: A data-driven, multivariate probabilistic approach

Ravikumar Guntu, Guilherme Samprogna Mohor, Annegret H. Thieken, Meike Müller, and Heidi Kreibich

Nat. Hazards Earth Syst. Sci., 26, 163–186, https://doi.org/10.5194/nhess-26-163-2026,https://doi.org/10.5194/nhess-26-163-2026, 2026

Short summary

Ravi Kumar Guntu, Guilherme Samprogna Mohor, Annegret H. Thieken, Meike Müller, and Heidi Kreibich

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1715', Anonymous Referee #1, 16 Jun 2025
This is an interesting and highly relevant topic that can significantly contribute to a deeper understanding of the various factors influencing both direct and indirect damages to businesses caused by flooding events. The research methods employed are notably technical and innovative, offering fresh perspectives and valuable insights into the complexity of flood-related impacts on commercial sectors. However, despite the strengths of the approach, there are certain points that require further attention and refinement. These include the justification of chosen methodologies, the interpretation of the survey results, a more clear interpretation of the results, and the need for a more comprehensive discussion of the limitations and potential implications of the findings. Addressing these aspects would enhance the overall robustness and applicability of the study.
Review comments:
Abstract, ‘to date no study has examined the factors influencing company damages during such an extreme event’; is this a correct statement? In the introduction you mention multiple papers that investigated the factors that influenced company damages such as Endendijk et al. (2024), Kreibich et al. (2010). Please clarify or revise this statement.

Method
Survey data, it would be good to better define the variables in an appendix for example. It is not clear how business interruption is defined. Does business interruption mean that the business is not operational at all or that there is a reduction in business activity, if so how much is this reduction. This should be better defined.

Variable selection, please introduce this section. The variable selection section dives into the three machine learning techniques without introducing why these thee techniques are used.

Minimizing J(β) should be called Obj(β) or it should be made more clear that J stands for the objective function as in equation 1 it is defined as Obj(β) and not J(β).

Variable importance: Please better explain/introduce why Bayesian Networks are used in this case.

In general, the method section needs more structure: it should be better explained why each algorithm/method is used. A clear motivation as to why the three specific techniques are used is needed.

Results and discussion
Overview of affected companies: It is implied that sales figures would be a better metric of company size although number of employees is more often used to classify whether the company is an SME or a large company. Therefore, this sentence is unnecessary in my opinion.

‘These disruptions can result in partial or complete business interruptions, triggering consequences ranging from loss of sales to bankruptcy’. This sentence is unclear, loss of sales is a form of business interruption.

It would also be interesting to show the differences in vulnerability and exposure levels between sectors instead of only between company sizes. This should be added or otherwise be explained why it is left out.

‘Bankruptcy risks remain generally low across all company sizes’. How is bankruptcy risk defined? Isn’t this a very biased variable given that bankrupt companies are probably not surveyed? Please clarify or leave this out.

‘They highlight the need for tailored risk management (...)’. Please clarify to what it should be tailored, to company size or also to company sector?

‘Tend to recover more quickly, likely benefiting from greater resilience’. This sentence sounds tautological -> recovering more quickly is part of the definition of resilience.

Figure 3: Please explain why the outlier levels differ between the business sizes. It seems weird to leave out observations for one class and leave them in for another. This does not look correct. Also, why are there no outliers removed for business restriction duration?

Having an n=3 for large companies is too low for any inference. Please make this more clear. ‘should be interpreted with caution’ does not cover it fully in my opinion.

‘However, substantial variance within each category highlights the influence of extreme cases’. Maybe it is better to infer about the median values instead of the averages then. Please do this or clarify why not.

18 out of 19 variables had less than 7% missing data which was imputed. How much missing data did the other variable have and was this imputed too? Be more clear here.

Figure 4 and Figure 5: these abbreviations are unclear, write them out or find another way of making them more informative. A figure should be understandable on its own.

‘This finding underscores (…) even during unprecedented events like the 2021 flood.’ The analysis was carried out for the unprecedented 2021 flood so the word ‘even’ feels misplaced.

Figure 6: same comment as for Figure 4 and Figure 5. In addition, the resolution of this figure should be higher.

The fact that the observed damage and business interruption/restriction durations are scaled from 0 to 1 make interpretation difficult. Saying that the 75^th percentile decreases from 0.68 to 0.61 for example is hard to interpret. It would be better to make the results a bit more tangible, this way the results will also appeal more to policymakers and it makes the conclusion easier.

“In addition, for smaller premises (75–500 m²) the uncertainty is very less”, remove the “very“ or replace with “much“.

Please also add a discussion that elaborates on any shortcomings such as low sample size for some company sizes/sectors and outliers, potential selection bias etc. Directions for future research.
Conclusion:
The conclusion should be more extensive, this conclusion seems a bit too short and concise for an academic paper.

There should be more links with the results section.
Citation: https://doi.org/10.5194/egusphere-2025-1715-RC1
- AC1: 'Reply on RC1', Ravi Kumar Guntu, 15 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1715/egusphere-2025-1715-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1715-AC1
RC2:
'Comment on egusphere-2025-1715', Anonymous Referee #2, 19 Jun 2025

This manuscript quantifies drivers of damages to companies by rare flood events via 3 data-driven techniques, which ultimately lead to a Bayesian Network. This study could have potential, but its possible novelty is currently hidden behind a rather complicated and untransparent chain of calculations. In particular, the justification of using the 3 data-driven models is unclear. Why not less? Why not more? Why these? This could easily be arbitrary. And what does the Bayesian Network add to the variable importance analysis via those 3 models? I raise more questions below. I believe these should be addressed before the paper can be reconsidered for publication.

Title: I suggest a different word than “deciphering” because that’s not what is being done in this study.

L44-52: The message needs to be streamlined here with regard to rare/high-impact events.

L107, 109, 199 and elsewhere: Consider something like “rare” in place of “unprecedented”, because there now is a precedent.

L141, L214: The analyses for each damage type could have been combined, as they are also internally related, via a multivariate regression. Why employ this more elegant solution making optimal use of all information (by not considering the responses as independent)?

L143: Across what scale where the missing data imputed, i.e. how far were they apart on average.

L155: J(beta) is not in the equation.

L157: What does use of the MAE as objective function imply about the nature of the residuals given a response which is between 0 and 1 or counts between 0 and 540?

L159f: It’s not entirely true that the model cannot handle nonlinearities – it can do so via transformations or in Generalised Linear Model form.

L201ff: What are the implications of combining the variable importance across the 3 models?

Eq9, L219, Appendix: It’s conditional probabilities, not fractions in Bayes Rule! I.e. X_i|E and E|X_i.

L222f: Why not leave it discrete rather than introducing another layer of assumptions?

L228f: How do the five models relate to the Bayesian Network?

L230-242: This part is redundant – see above. The function of the 3 models, despite factor selection is unclear. And why 3 models and not more or less?

Results & discussion: Too much time is spent describing univariate results. And the bivariate correlations kinf o defeat the purpose of multivariate analysis.

L394f: Purpose of sentence unclear.

L373: Who’s expert knowledge?

Fig6: What is it’s function for the manuscript?

Fig7: The directions matter here, no? And some of them are not intuitive!

Conclusion: Too short and doesn’t add sufficient novelty.

Citation: https://doi.org/10.5194/egusphere-2025-1715-RC2
- AC2: 'Reply on RC2', Ravi Kumar Guntu, 15 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1715/egusphere-2025-1715-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1715-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1715', Anonymous Referee #1, 16 Jun 2025
This is an interesting and highly relevant topic that can significantly contribute to a deeper understanding of the various factors influencing both direct and indirect damages to businesses caused by flooding events. The research methods employed are notably technical and innovative, offering fresh perspectives and valuable insights into the complexity of flood-related impacts on commercial sectors. However, despite the strengths of the approach, there are certain points that require further attention and refinement. These include the justification of chosen methodologies, the interpretation of the survey results, a more clear interpretation of the results, and the need for a more comprehensive discussion of the limitations and potential implications of the findings. Addressing these aspects would enhance the overall robustness and applicability of the study.
Review comments:
Abstract, ‘to date no study has examined the factors influencing company damages during such an extreme event’; is this a correct statement? In the introduction you mention multiple papers that investigated the factors that influenced company damages such as Endendijk et al. (2024), Kreibich et al. (2010). Please clarify or revise this statement.

Method
Survey data, it would be good to better define the variables in an appendix for example. It is not clear how business interruption is defined. Does business interruption mean that the business is not operational at all or that there is a reduction in business activity, if so how much is this reduction. This should be better defined.

Variable selection, please introduce this section. The variable selection section dives into the three machine learning techniques without introducing why these thee techniques are used.

Minimizing J(β) should be called Obj(β) or it should be made more clear that J stands for the objective function as in equation 1 it is defined as Obj(β) and not J(β).

Variable importance: Please better explain/introduce why Bayesian Networks are used in this case.

In general, the method section needs more structure: it should be better explained why each algorithm/method is used. A clear motivation as to why the three specific techniques are used is needed.

Results and discussion
Overview of affected companies: It is implied that sales figures would be a better metric of company size although number of employees is more often used to classify whether the company is an SME or a large company. Therefore, this sentence is unnecessary in my opinion.

‘These disruptions can result in partial or complete business interruptions, triggering consequences ranging from loss of sales to bankruptcy’. This sentence is unclear, loss of sales is a form of business interruption.

It would also be interesting to show the differences in vulnerability and exposure levels between sectors instead of only between company sizes. This should be added or otherwise be explained why it is left out.

‘Bankruptcy risks remain generally low across all company sizes’. How is bankruptcy risk defined? Isn’t this a very biased variable given that bankrupt companies are probably not surveyed? Please clarify or leave this out.

‘They highlight the need for tailored risk management (...)’. Please clarify to what it should be tailored, to company size or also to company sector?

‘Tend to recover more quickly, likely benefiting from greater resilience’. This sentence sounds tautological -> recovering more quickly is part of the definition of resilience.

Figure 3: Please explain why the outlier levels differ between the business sizes. It seems weird to leave out observations for one class and leave them in for another. This does not look correct. Also, why are there no outliers removed for business restriction duration?

Having an n=3 for large companies is too low for any inference. Please make this more clear. ‘should be interpreted with caution’ does not cover it fully in my opinion.

‘However, substantial variance within each category highlights the influence of extreme cases’. Maybe it is better to infer about the median values instead of the averages then. Please do this or clarify why not.

18 out of 19 variables had less than 7% missing data which was imputed. How much missing data did the other variable have and was this imputed too? Be more clear here.

Figure 4 and Figure 5: these abbreviations are unclear, write them out or find another way of making them more informative. A figure should be understandable on its own.

‘This finding underscores (…) even during unprecedented events like the 2021 flood.’ The analysis was carried out for the unprecedented 2021 flood so the word ‘even’ feels misplaced.

Figure 6: same comment as for Figure 4 and Figure 5. In addition, the resolution of this figure should be higher.

The fact that the observed damage and business interruption/restriction durations are scaled from 0 to 1 make interpretation difficult. Saying that the 75^th percentile decreases from 0.68 to 0.61 for example is hard to interpret. It would be better to make the results a bit more tangible, this way the results will also appeal more to policymakers and it makes the conclusion easier.

“In addition, for smaller premises (75–500 m²) the uncertainty is very less”, remove the “very“ or replace with “much“.

Please also add a discussion that elaborates on any shortcomings such as low sample size for some company sizes/sectors and outliers, potential selection bias etc. Directions for future research.
Conclusion:
The conclusion should be more extensive, this conclusion seems a bit too short and concise for an academic paper.

There should be more links with the results section.
Citation: https://doi.org/10.5194/egusphere-2025-1715-RC1
- AC1: 'Reply on RC1', Ravi Kumar Guntu, 15 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1715/egusphere-2025-1715-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1715-AC1
RC2:
'Comment on egusphere-2025-1715', Anonymous Referee #2, 19 Jun 2025

This manuscript quantifies drivers of damages to companies by rare flood events via 3 data-driven techniques, which ultimately lead to a Bayesian Network. This study could have potential, but its possible novelty is currently hidden behind a rather complicated and untransparent chain of calculations. In particular, the justification of using the 3 data-driven models is unclear. Why not less? Why not more? Why these? This could easily be arbitrary. And what does the Bayesian Network add to the variable importance analysis via those 3 models? I raise more questions below. I believe these should be addressed before the paper can be reconsidered for publication.

Title: I suggest a different word than “deciphering” because that’s not what is being done in this study.

L44-52: The message needs to be streamlined here with regard to rare/high-impact events.

L107, 109, 199 and elsewhere: Consider something like “rare” in place of “unprecedented”, because there now is a precedent.

L141, L214: The analyses for each damage type could have been combined, as they are also internally related, via a multivariate regression. Why employ this more elegant solution making optimal use of all information (by not considering the responses as independent)?

L143: Across what scale where the missing data imputed, i.e. how far were they apart on average.

L155: J(beta) is not in the equation.

L157: What does use of the MAE as objective function imply about the nature of the residuals given a response which is between 0 and 1 or counts between 0 and 540?

L159f: It’s not entirely true that the model cannot handle nonlinearities – it can do so via transformations or in Generalised Linear Model form.

L201ff: What are the implications of combining the variable importance across the 3 models?

Eq9, L219, Appendix: It’s conditional probabilities, not fractions in Bayes Rule! I.e. X_i|E and E|X_i.

L222f: Why not leave it discrete rather than introducing another layer of assumptions?

L228f: How do the five models relate to the Bayesian Network?

L230-242: This part is redundant – see above. The function of the 3 models, despite factor selection is unclear. And why 3 models and not more or less?

Results & discussion: Too much time is spent describing univariate results. And the bivariate correlations kinf o defeat the purpose of multivariate analysis.

L394f: Purpose of sentence unclear.

L373: Who’s expert knowledge?

Fig6: What is it’s function for the manuscript?

Fig7: The directions matter here, no? And some of them are not intuitive!

Conclusion: Too short and doesn’t add sufficient novelty.

Citation: https://doi.org/10.5194/egusphere-2025-1715-RC2
- AC2: 'Reply on RC2', Ravi Kumar Guntu, 15 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1715/egusphere-2025-1715-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1715-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (23 Oct 2025) by Robert Sakic Trogrlic

AR by Ravi Kumar Guntu on behalf of the Authors (25 Oct 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (27 Oct 2025) by Robert Sakic Trogrlic

RR by Anonymous Referee #2 (13 Nov 2025)

Suggestions for revision or reasons for rejection

In their revision, the authors have added additional detail and justifications of their methods. It is now clearer to see through the complex chain of methods.

The following points remain for me (I’m referring to line numbers in the tracked changes version of the manuscript!):

When discussing the objective functions of the various methods, how is the bounded nature of the response variables (0-1 and 0-540, respectively) captured? The duration variable in particular is really a right-censored variable. How is this taken into account? And if not, how do these characteristics of the responses affect the results from the methods chosen?

The value of the BN is clearer now. I appreciate that the structures inferred (Figure 7) represent conditional dependencies and not (necessarily) causal dependencies (L508f). However, the interpretation that follows (L511-534) is rather causal. I advise the authors to check their interpretation once more so that they don’t slip into a causal language.

What is argued on L520f I don’t think can be inferred from those BN structures.

It would help further if the authors would discuss the meaning of the DAGs (Figure 7) in terms of conditional independence and what we learn about the drivers of the damages that way.

On L528, that water depth is a primary factor seems to be in conflict with the variable importance analysis (e.g. Figure 6).

In the discussion of Figure 8 it should be noted that the predictions for the various scenarios overlap considerably, so the effects of some of the factors should be toned down.

On L270, please explain why it was necessary to discretize the variables in the BN. I assume because only a limited set of continuous pdfs is implemented in the software chosen.

On L275, how was prior knowledge incorporated in this case?

In equation 9, in place of the likelihood the posterior is written again. Please correct.

The notation of conditional probabilities throughout should use “|” instead of “/”.

When turning the discrete output from the BN into a continuous pdf the authors state that the representation is then more precise (L288). However, this is only because the imprecision introduced through fitting the pdf is neglected. This caveat should be mentioned in the text.

L440-442: Statistical tests could be performed to back up the inference from small, possibly unrepresentative samples to a larger population (with appropriate post-stratification or else to account for unrepresentativeness).

The comparison to the 2022-2016 floods (L595) was not analysed in this paper if I’m not mistaken.

Editorial comments:

L49: End of sentence from “despite …” is redundant given what was said before.

L205: Optimal values of alpha, gamma … and beta.

Table A1 needs a thorough check. There are some typos, wrong numbering and other small mistakes.

Hide

RR by Anonymous Referee #1 (14 Nov 2025)

ED: Publish subject to minor revisions (review by editor) (09 Dec 2025) by Robert Sakic Trogrlic

AR by Ravi Kumar Guntu on behalf of the Authors (17 Dec 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (22 Dec 2025) by Robert Sakic Trogrlic

AR by Ravi Kumar Guntu on behalf of the Authors (22 Dec 2025)

Journal article(s) based on this preprint

16 Jan 2026

Deciphering the drivers of direct and indirect damages to companies from an unprecedented flood event: A data-driven, multivariate probabilistic approach

Ravikumar Guntu, Guilherme Samprogna Mohor, Annegret H. Thieken, Meike Müller, and Heidi Kreibich

Nat. Hazards Earth Syst. Sci., 26, 163–186, https://doi.org/10.5194/nhess-26-163-2026,https://doi.org/10.5194/nhess-26-163-2026, 2026

Short summary

Ravi Kumar Guntu, Guilherme Samprogna Mohor, Annegret H. Thieken, Meike Müller, and Heidi Kreibich

Supplement

https://doi.org/10.5194/egusphere-2025-1715-supplement

Ravi Kumar Guntu, Guilherme Samprogna Mohor, Annegret H. Thieken, Meike Müller, and Heidi Kreibich

Viewed

Total article views: 976 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
801	143	32	976	104	30	48

HTML: 801
PDF: 143
XML: 32
Total: 976
Supplement: 104
BibTeX: 30
EndNote: 48

Views and downloads (calculated since 25 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	54	10	3	67
May 2025	59	16	4	79
Jun 2025	84	25	6	115
Jul 2025	37	8	1	46
Aug 2025	82	14	3	99
Sep 2025	413	20	5	438
Oct 2025	27	10	1	38
Nov 2025	18	15	3	36
Dec 2025	19	16	6	41
Jan 2026	8	9	0	17

Cumulative views and downloads (calculated since 25 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	54	10	3	67
May 2025	59	16	4	79
Jun 2025	84	25	6	115
Jul 2025	37	8	1	46
Aug 2025	82	14	3	99
Sep 2025	413	20	5	438
Oct 2025	27	10	1	38
Nov 2025	18	15	3	36
Dec 2025	19	16	6	41
Jan 2026	8	9	0	17

Viewed (geographical distribution)

Total article views: 945 (including HTML, PDF, and XML) Thereof 945 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 16 Jan 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1186 KB)
Metadata XML

Short summary

The 2021 flood in Germany caused severe damage to companies, with over half reporting losses above €100,000. Using probabilistic models, we identify key factors driving direct damage and business interruption. Water depth, flow velocity and company exposure were key factors, but preparedness played a crucial role. Companies that took good precaution recovered faster. Our findings stress the value of early warnings and risk communication to reduce damage from unprecedented flood events.


Total:	0
HTML:	0
PDF:	0
XML:	0