Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js
Preprints
https://doi.org/10.5194/egusphere-2023-2335
https://doi.org/10.5194/egusphere-2023-2335
18 Oct 2023
 | 18 Oct 2023

Regional index flood estimation at multiple durations with generalized additive models

Danielle M. Barna, Kolbjørn Engeland, Thomas Kneib, Thordis L. Thorarinsdottir, and Chong-Yu Xu

Abstract. Estimation of flood quantiles at ungauged basins is often achieved through regression based methods. In situations where flood retention is important, e.g. floodplain management and reservoir design, flood quantile estimates are often needed at multiple durations. This poses a problem for regression-based models as the form of the functional relationship between catchment descriptors and the response may not be constant across different durations. A particular type of regression model that is well-suited to this situation is a generalized additive model (GAM), which allows for flexible, semi-parametric modeling and visualization of the relationship between predictors and the response. However, in practice, selecting predictors for such a flexible model can be challenging, particularly given the characteristics of available catchment descriptor datasets. We employ a machine learning-based variable pre-selection tool which, when combined with domain knowledge, enhances the practicality of constructing GAMs. In this study, we develop a GAM for index (median) flood estimation with the primary objective of investigating duration-specific differences in how catchment descriptors influence the median flood. As the accuracy of this explainable approach is dependent on the fitted GAM being adequate, the secondary objective of our study is prediction of the median flood at ungauged locations and multiple durations, where predictive performance and reliability at ungauged locations are used as proxies for adequacy of the GAM. Predictive performance of the GAM is compared to two benchmark models: the existing log-linear model for median flood estimation in Norway and a fully data-driven machine learning model (an extreme gradient boosting tree ensemble, XGBoost). We find that the predictive accuracy and reliability of the GAM matched or exceeded that of the benchmark models at both durations studied. Within the predictor set selected for this study, we observe duration-specific differences in the relationship between the median flood and the two catchment descriptors effective lake percentage and catchment shape. Ignoring these differences results in a statistically significant decline in predictive performance. This suggests that models developed and estimated for prediction of the index flood at one duration may have reduced performance when applied directly to situations outside of that specific duration. 

Competing interests: Kolbjørn Engeland and Chong-Yu Xu report financial support was provided by the Research Council of Norway.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share
Download
Short summary
Estimating flood quantiles at data-scarce sites often involves single-duration regression...
Share