Regional index flood estimation at multiple durations with generalized additive models

Barna, Danielle M.; Engeland, Kolbjørn; Kneib, Thomas; Thorarinsdottir, Thordis L.; Xu, Chong-Yu

doi:https://doi.org/10.5194/egusphere-2023-2335

Preprints

https://doi.org/10.5194/egusphere-2023-2335

Preprints

18 Oct 2023

| 18 Oct 2023

Regional index flood estimation at multiple durations with generalized additive models

Danielle M. Barna, Kolbjørn Engeland, Thomas Kneib, Thordis L. Thorarinsdottir, and Chong-Yu Xu

Abstract. Estimation of flood quantiles at ungauged basins is often achieved through regression based methods. In situations where flood retention is important, e.g. floodplain management and reservoir design, flood quantile estimates are often needed at multiple durations. This poses a problem for regression-based models as the form of the functional relationship between catchment descriptors and the response may not be constant across different durations. A particular type of regression model that is well-suited to this situation is a generalized additive model (GAM), which allows for flexible, semi-parametric modeling and visualization of the relationship between predictors and the response. However, in practice, selecting predictors for such a flexible model can be challenging, particularly given the characteristics of available catchment descriptor datasets. We employ a machine learning-based variable pre-selection tool which, when combined with domain knowledge, enhances the practicality of constructing GAMs. In this study, we develop a GAM for index (median) flood estimation with the primary objective of investigating duration-specific differences in how catchment descriptors influence the median flood. As the accuracy of this explainable approach is dependent on the fitted GAM being adequate, the secondary objective of our study is prediction of the median flood at ungauged locations and multiple durations, where predictive performance and reliability at ungauged locations are used as proxies for adequacy of the GAM. Predictive performance of the GAM is compared to two benchmark models: the existing log-linear model for median flood estimation in Norway and a fully data-driven machine learning model (an extreme gradient boosting tree ensemble, XGBoost). We find that the predictive accuracy and reliability of the GAM matched or exceeded that of the benchmark models at both durations studied. Within the predictor set selected for this study, we observe duration-specific differences in the relationship between the median flood and the two catchment descriptors effective lake percentage and catchment shape. Ignoring these differences results in a statistically significant decline in predictive performance. This suggests that models developed and estimated for prediction of the index flood at one duration may have reduced performance when applied directly to situations outside of that specific duration.

Received: 11 Oct 2023 – Discussion started: 18 Oct 2023

Competing interests: Kolbjørn Engeland and Chong-Yu Xu report financial support was provided by the Research Council of Norway.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Danielle M. Barna, Kolbjørn Engeland, Thomas Kneib, Thordis L. Thorarinsdottir, and Chong-Yu Xu

Status: closed

RC1:
'Comment on egusphere-2023-2335', Anonymous Referee #1, 15 Nov 2023

The manuscript "Regional index flood estimation at multiple durations with generalized additive models" by Barna et al. is a comprehensive study on prediction of the median index flood on 234 stations in Norway. It aims to compare a GAM model with two benchmark models (a log-linear model and XGBoost) on two different flood duration (1h and 24h). Additionally, a variable selection method is included and models are tested in a cross-validation approach. The manuscript is generally well written and the results are worth publishing in HESS. However, I have two concerns about the manuscript, which I think can be addressed in a revised version.
First, one of the objectives of the study is "prediction of the median flood at ungauged locations" (Line 125-126). I am not quite sure if this is addressed adequately. One of the catchment descriptors is the mean annual runoff of the catchment of the SeNorge 2.0 dataset, which is based on observational data. In my opinion prediction at ungauged locations means, that there is no information about the runoff at this stations, so also no information about the mean annual runoff can be included in a prediction model. I understand that the information maybe necessary for the comparison with the RFA_2018 model, but I think it can also be beneficial to show that the GAM model performs as good without the mean annual runoff (and leave the mean annual runoff as predictor for the RFA_2018 model for simplicity). My second point concerning the prediction in ungauged locations, is the variable selection. If I understood it correctly, the variable selection is performed on the full dataset (with a cross validation scheme), and the selected variables are then used as input for the validation study (again with a cross-validation scheme). If this is correct, the variables are selected on the full dataset, not on a subset, so the prediction error is somehow biased, as the variable selection already included information about the full dataset. If my understanding of the validation scheme is wrong, I would suggest to make this clearer in the methods section.
Second, the manuscript is a bit too long. Some parts of the methods are in the Appendix, which is fine, but makes it hard to read at some point. Two examples: (i) The description in the Introduction between line 29-46 may be shortened, (ii) or the very detailed description of the reponse variable (Sect 2./2.1) can may be written more concise.

Additional minor comments:
The legend title of Figure 1c seems wrong.
Add panels in Figure 2.
In Table 1, the catchment descriptors $R_L$ is duplicated.
Line 351, what are the actual hyperparmameters that were used in the 10 final models? If it is easy possible, the information could be added.
Line 384: XGBoost is assessed only on the MAE; optimal predictors for the other four error metrics are not accessible for XGBoost when the data are assumed log normal. Why is XGBoost then used as for the variable selection procedure - and the MAPE as error metric, if this will produce unreliable result? Additionally, was it considered to alter the loss function in XGBoost for better comparison in terms of the chosen error metrics? I think this should be clarified in the main manuscript.
I think the formulas in Figure 5(a) may be wrong ($\times 100$), and as they are both partly defined in Table 3., I would suggest using the same variables.
Figure 6, the legend is not consistent with the actual plot.
Line 452 - 469, I think the description of partial response curves can be shortened.
Line 487, Maybe the results on regional models can be skipped in the Appendix.
Line 526, If this sentence is referring to the paragraph above, maybe change the order of the two sentences.

Citation: https://doi.org/10.5194/egusphere-2023-2335-RC1
- AC2: 'Reply on RC1', Danielle Barna, 28 Feb 2024
  
  We would like to thank anonymous reviewer #1 for the constructive review.
  The response is included as an attachment.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2335-AC2
RC2:
'Comment on egusphere-2023-2335', Anonymous Referee #2, 13 Jan 2024
Report on the manuscript untitled:
Regional index flood estimation at multiple durations with generalized additive models
The topic of the manuscript is appealing and at first sight it appears interesting. However, there are high number of major issues based on which I cannot recommend publication of this manuscript (such as validation of the employed approach is missing, recent and related literature is ignored, confusing terminology). I’m providing in the following major comments only:
The title is confusing and misleading in different aspects, such as: - different durations could be understood simultaneously or as a variable - the index flood model is based on quantile while the paper treats only the median (fixed quantile order)

why not using directly and fully Machine Learning? The compatibility between a statistical model GAM and variable selection method based on ML should be discussed. Especially, later in the paper, there is a formal method to select variables for GAM (implemented in mgcv package). In the same idea, (line 43) I’m wondering if the authors are using a modified version of IIS. Hence, this method should be checked and validated before for this choice/context. The question is about the compatibility of this change.

around lines 40-45: this text is ambiguous and not well justified/ motivated. It is based on a unique old paper (see next comments for recent papers). This is part is crucial and motivates the study. Hence, the paper motivation and foundation are questionable.

Indeed it is more informative to include the duration in the modeling. However, to deal with the duration, it is now appropriate to consider a multivariate framework involving the duration as a variable and simultaneously with other variable like the peak and/volume. The multivariate regional framework, index flood model, is already developed (e.g. Requena et al. 2016, J. of Hydrology; Azam et al. 2018, Water).

why this and only these values (1h and 24h)?

Around line 60: Dealing with nonlinearity is not only through transformation but directly using nonlinear approaches (see e.g. Ouali et al. 2017, J. Advances in Modeling Earth Systems; Cannon 2018, Stochastic environmental research and risk).

line 108: not sure about this statement, especially no refs given. As far as I know, variable selection is not the strength of ML. I don’t know what is reported in Guisan et al. 2002, but it may be not up to date (given the fast development of ML).

line 115: I'm surprised to see that such an important topic is treated only in the hydrological framework. It is questionable to heavily rely on this.

Last line page 4: This assumption is either strong or in contradiction with the problematic to be treated in the paper.

It is important to provide an equation for “median annual maximum flood” to be explicit and avoid confusion.

Some parts of the methodology should be in the results section (section 4.2 and from line 250).

Equation 6: something is missing or wrong. The right-hand side does not depend on i (so the summation is over what?).

Using the term permutation test could be misleading since this is a generic term on how to obtain p-value.
Citation: https://doi.org/10.5194/egusphere-2023-2335-RC2
- AC1: 'Reply on RC2', Danielle Barna, 28 Feb 2024
  
  We would like to thank anonymous reviewer #2 for the constructive review.
  The response is included as an attachment.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2335-AC1

Status: closed

RC1:
'Comment on egusphere-2023-2335', Anonymous Referee #1, 15 Nov 2023

The manuscript "Regional index flood estimation at multiple durations with generalized additive models" by Barna et al. is a comprehensive study on prediction of the median index flood on 234 stations in Norway. It aims to compare a GAM model with two benchmark models (a log-linear model and XGBoost) on two different flood duration (1h and 24h). Additionally, a variable selection method is included and models are tested in a cross-validation approach. The manuscript is generally well written and the results are worth publishing in HESS. However, I have two concerns about the manuscript, which I think can be addressed in a revised version.
First, one of the objectives of the study is "prediction of the median flood at ungauged locations" (Line 125-126). I am not quite sure if this is addressed adequately. One of the catchment descriptors is the mean annual runoff of the catchment of the SeNorge 2.0 dataset, which is based on observational data. In my opinion prediction at ungauged locations means, that there is no information about the runoff at this stations, so also no information about the mean annual runoff can be included in a prediction model. I understand that the information maybe necessary for the comparison with the RFA_2018 model, but I think it can also be beneficial to show that the GAM model performs as good without the mean annual runoff (and leave the mean annual runoff as predictor for the RFA_2018 model for simplicity). My second point concerning the prediction in ungauged locations, is the variable selection. If I understood it correctly, the variable selection is performed on the full dataset (with a cross validation scheme), and the selected variables are then used as input for the validation study (again with a cross-validation scheme). If this is correct, the variables are selected on the full dataset, not on a subset, so the prediction error is somehow biased, as the variable selection already included information about the full dataset. If my understanding of the validation scheme is wrong, I would suggest to make this clearer in the methods section.
Second, the manuscript is a bit too long. Some parts of the methods are in the Appendix, which is fine, but makes it hard to read at some point. Two examples: (i) The description in the Introduction between line 29-46 may be shortened, (ii) or the very detailed description of the reponse variable (Sect 2./2.1) can may be written more concise.

Additional minor comments:
The legend title of Figure 1c seems wrong.
Add panels in Figure 2.
In Table 1, the catchment descriptors $R_L$ is duplicated.
Line 351, what are the actual hyperparmameters that were used in the 10 final models? If it is easy possible, the information could be added.
Line 384: XGBoost is assessed only on the MAE; optimal predictors for the other four error metrics are not accessible for XGBoost when the data are assumed log normal. Why is XGBoost then used as for the variable selection procedure - and the MAPE as error metric, if this will produce unreliable result? Additionally, was it considered to alter the loss function in XGBoost for better comparison in terms of the chosen error metrics? I think this should be clarified in the main manuscript.
I think the formulas in Figure 5(a) may be wrong ($\times 100$), and as they are both partly defined in Table 3., I would suggest using the same variables.
Figure 6, the legend is not consistent with the actual plot.
Line 452 - 469, I think the description of partial response curves can be shortened.
Line 487, Maybe the results on regional models can be skipped in the Appendix.
Line 526, If this sentence is referring to the paragraph above, maybe change the order of the two sentences.

Citation: https://doi.org/10.5194/egusphere-2023-2335-RC1
- AC2: 'Reply on RC1', Danielle Barna, 28 Feb 2024
  
  We would like to thank anonymous reviewer #1 for the constructive review.
  The response is included as an attachment.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2335-AC2
RC2:
'Comment on egusphere-2023-2335', Anonymous Referee #2, 13 Jan 2024
Report on the manuscript untitled:
Regional index flood estimation at multiple durations with generalized additive models
The topic of the manuscript is appealing and at first sight it appears interesting. However, there are high number of major issues based on which I cannot recommend publication of this manuscript (such as validation of the employed approach is missing, recent and related literature is ignored, confusing terminology). I’m providing in the following major comments only:
The title is confusing and misleading in different aspects, such as: - different durations could be understood simultaneously or as a variable - the index flood model is based on quantile while the paper treats only the median (fixed quantile order)

why not using directly and fully Machine Learning? The compatibility between a statistical model GAM and variable selection method based on ML should be discussed. Especially, later in the paper, there is a formal method to select variables for GAM (implemented in mgcv package). In the same idea, (line 43) I’m wondering if the authors are using a modified version of IIS. Hence, this method should be checked and validated before for this choice/context. The question is about the compatibility of this change.

around lines 40-45: this text is ambiguous and not well justified/ motivated. It is based on a unique old paper (see next comments for recent papers). This is part is crucial and motivates the study. Hence, the paper motivation and foundation are questionable.

Indeed it is more informative to include the duration in the modeling. However, to deal with the duration, it is now appropriate to consider a multivariate framework involving the duration as a variable and simultaneously with other variable like the peak and/volume. The multivariate regional framework, index flood model, is already developed (e.g. Requena et al. 2016, J. of Hydrology; Azam et al. 2018, Water).

why this and only these values (1h and 24h)?

Around line 60: Dealing with nonlinearity is not only through transformation but directly using nonlinear approaches (see e.g. Ouali et al. 2017, J. Advances in Modeling Earth Systems; Cannon 2018, Stochastic environmental research and risk).

line 108: not sure about this statement, especially no refs given. As far as I know, variable selection is not the strength of ML. I don’t know what is reported in Guisan et al. 2002, but it may be not up to date (given the fast development of ML).

line 115: I'm surprised to see that such an important topic is treated only in the hydrological framework. It is questionable to heavily rely on this.

Last line page 4: This assumption is either strong or in contradiction with the problematic to be treated in the paper.

It is important to provide an equation for “median annual maximum flood” to be explicit and avoid confusion.

Some parts of the methodology should be in the results section (section 4.2 and from line 250).

Equation 6: something is missing or wrong. The right-hand side does not depend on i (so the summation is over what?).

Using the term permutation test could be misleading since this is a generic term on how to obtain p-value.
Citation: https://doi.org/10.5194/egusphere-2023-2335-RC2
- AC1: 'Reply on RC2', Danielle Barna, 28 Feb 2024
  
  We would like to thank anonymous reviewer #2 for the constructive review.
  The response is included as an attachment.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2335-AC1

Danielle M. Barna, Kolbjørn Engeland, Thomas Kneib, Thordis L. Thorarinsdottir, and Chong-Yu Xu

Viewed

Total article views: 1,680 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,428	206	46	1,680	41	66

HTML: 1,428
PDF: 206
XML: 46
Total: 1,680
BibTeX: 41
EndNote: 66

Views and downloads (calculated since 18 Oct 2023)

Month	HTML	PDF	XML	Total
Oct 2023	241	29	6	276
Nov 2023	16	4	0	20
Dec 2023	23	13	4	40
Jan 2024	40	12	2	54
Feb 2024	123	17	6	146
Mar 2024	302	7	1	310
Apr 2024	141	8	3	152
May 2024	38	12	3	53
Jun 2024	245	6	4	255
Jul 2024	33	4	1	38
Aug 2024	17	2	1	20
Sep 2024	49	4	0	53
Oct 2024	16	4	2	22
Nov 2024	11	4	0	15
Dec 2024	4	3	0	7
Jan 2025	11	6	0	17
Feb 2025	9	3	2	14
Mar 2025	13	4	1	18
Apr 2025	13	6	2	21
May 2025	16	6	0	22
Jun 2025	18	15	1	34
Jul 2025	19	9	1	29
Aug 2025	8	12	4	24
Sep 2025	18	7	1	26
Oct 2025	4	9	1	14

Cumulative views and downloads (calculated since 18 Oct 2023)

Month	HTML	PDF	XML	Total
Oct 2023	241	29	6	276
Nov 2023	16	4	0	20
Dec 2023	23	13	4	40
Jan 2024	40	12	2	54
Feb 2024	123	17	6	146
Mar 2024	302	7	1	310
Apr 2024	141	8	3	152
May 2024	38	12	3	53
Jun 2024	245	6	4	255
Jul 2024	33	4	1	38
Aug 2024	17	2	1	20
Sep 2024	49	4	0	53
Oct 2024	16	4	2	22
Nov 2024	11	4	0	15
Dec 2024	4	3	0	7
Jan 2025	11	6	0	17
Feb 2025	9	3	2	14
Mar 2025	13	4	1	18
Apr 2025	13	6	2	21
May 2025	16	6	0	22
Jun 2025	18	15	1	34
Jul 2025	19	9	1	29
Aug 2025	8	12	4	24
Sep 2025	18	7	1	26
Oct 2025	4	9	1	14

Viewed (geographical distribution)

Total article views: 1,689 (including HTML, PDF, and XML) Thereof 1,689 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 23 Oct 2025

Short summary

Estimating flood quantiles at data-scarce sites often involves single-duration regression models. However, floodplain management and reservoir design, for example, need estimates at several durations, posing challenges. Our flexible generalized additive model (GAM) enhances accuracy and explanation, revealing that single-duration models may underperform elsewhere, emphasizing the need for adaptable approaches.


Total:	0
HTML:	0
PDF:	0
XML:	0

Regional index flood estimation at multiple durations with generalized additive models

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.