the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Data-Driven Estimation of the hydrologic response via Generalized Additive Models
Abstract. Estimating the hydrologic response of watersheds to precipitation events is key to understanding streamflow generation processes. Impulse Response Functions, commonly referred to as the Instantaneous Unit Hydrograph (IUH) in hydrology, have been used for over 50 years to predict stormflow and compare catchment behaviors. These response functions are often strongly affected by modelers' choices of parameters and data preprocessing procedures, limiting their diagnostic utility and generalizability across different sites and time periods. With the increasing availability of compiled rainfall-runoff series, there is now a growing opportunity to develop new approaches that fully exploit such datasets. This paper introduces GAMCR, a novel data-driven approach for estimating impulse response functions using Generalized Additive Models. GAMCR is designed to capture the complex, nonlinear relationships between precipitation and runoff, offering a flexible and interpretable framework for the systematic analysis of hydrological responses. The model is succesfully validated on synthetic data, where the true response functions are known. Additionally, we demonstrate the model's potential using real-world data from six Swiss basins with distinct hydrological behaviors. Results are fully consistent with those obtained from ERRA, another recent data-driven approach with a very different architecture, as well as with the climate and physical properties of the sites. Overall, GAMCR is a modern and effective tool for leveraging rainfall-runoff datasets to investigate the controls on hydrologic responses worldwide.
- Preprint
(25146 KB) - Metadata XML
-
Supplement
(1081 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2025-1591', Cameron McIntosh, 18 May 2025
-
CC2: 'Reply on CC1', Quentin Duchemin, 10 Jun 2025
Thank you for your interest in our work and for your thoughtful feedback.
Let us recall that GAMCR is based on two key elements:
1) the basis functions used to decompose the transfer functions (denoted as $(b_l)_l$ in our paper)
2) the time-dependent coefficients in this basis, which are modeled using GAMs as a function of a time-varying feature vector $z_t$.1) The choice of the basis functions $(b_l)_l$ was motivated by prior knowledge about the typical shape of hydrological transfer functions. These functions often exhibit sharp peaks at short lags and smoother behavior at longer lags. To capture this, we use splines with knots placed on an irregular grid, denser at short lags to allow more flexibility in capturing localized spikes.
We agree that exploring alternative basis functions such as wavelets, or even learning the basis functions directly, could be an interesting direction for future research. However, the estimation of transfer functions from input-output fluxes is an ill-posed inverse problem. Introducing the appropriate inductive bias is essential to ensure not just good predictive performance, but also the recovery of physically meaningful latent structures. While spline bases naturally encode assumptions about local smoothness and support, defining such inductive biases with wavelet bases may be more challenging and could risk overfitting.
2) In GAMCR, the time-dependent coefficients of the transfer function decomposition are modeled using additive GAMs with univariate spline terms for each feature. These splines are typically estimated using a smoothing spline approach, which includes a penalty on the second derivative of each spline to control its smoothness. A hyperparameter governs the trade-off between fidelity to the data and the smoothness of the fitted function.
In our experiments, we found that the results were robust to the choice of this smoothing hyperparameter, and thus we used a fixed value across all experiments.
Citation: https://doi.org/10.5194/egusphere-2025-1591-CC2
-
CC2: 'Reply on CC1', Quentin Duchemin, 10 Jun 2025
-
RC1: 'Comment on egusphere-2025-1591', Anonymous Referee #1, 30 Jun 2025
General Comments:
The authors introduce GAMCR, a novel data-driven approach that uses Generalised Additive Models (GAM) to estimate time-dependent Catchment Responses (CR) or impulse response functions, which capture the complex, nonlinear relationships between rainfall and runoff. It performs well on synthetic data, where the true response functions are known. It produces reliable results across six Swiss catchments with distinct hydrological behaviours, aligning with findings from ensemble rainfall-runoff analysis (ERRA), another recent data-driven approach with a different architecture.
Overall, the authors make a strong case that with the rise of large rainfall-runoff datasets, the GAMCR is a novel approach to facilitate systematic comparisons of hydrological responses across sites globally where rainfall-runoff time series are available. While I think this article is excellent, I have a few minor comments that could enhance it.
Specific Comments:
My suggestions for the technical sections are as follows:
- In Section 1, you describe GAMCR as a novel data-driven approach with a different architecture from the ERRA method yet note that the results are fully consistent between the two. However, it would be helpful to clarify the specific advantages of using GAMCR over ERRA, particularly in estimating hydrological responses across diverse watersheds and enabling systematic cross-site comparisons where rainfall-runoff time series are available. For example, is the benefit primarily computational efficiency, or does GAMCR offer other advantages such as flexibility, ease of implementation, or improved performance for certain types of catchments?
- In Section 2.3 – Model training:
- While it is clear that the machine learning model was trained using the synthetic streamflow dataset from the Chiasso gauging station on the Breggia River (Section 3.1), it is not clear what specific data period or proportion of data (e.g., 80% training, 20% testing) was used to train the GAMs. Clarifying the data splitting strategy for both the synthetic and real datasets would help readers better understand the model training process.
- It would be helpful to provide the final hyperparameter λ1 and λ2 values used in the model training process. The default values of λ1=10−3 and λ2=1 are mentioned in Section 3.3, but what are the optimised values? Have you thought about how reducing the training data might impact the model's performance? This could give you some insight into the model's robustness with less data.
- You note that the model was trained on synthetic data (Section 3.1) for the three cases, and model testing is described in Section 3. However, the model validation is only introduced later in Section 4. For clarity and better flow, it would be helpful to briefly outline the validation approach in Section 3. Specifically, this could include a description of how the model is validated by computing the hydrologic response in the form of non-linear runoff functions (NRFs) across six quantiles of precipitation intensity and comparing these against both the ERRA-derived estimates and the benchmark outputs generated directly from the model.
- In Section 2.4, you could provide the link to the GAMCR model software and online tutorial mentioned in the text would be helpful. Only a link to the datasets used in the study is made available in the paper.
- In Section 3.1 - Synthetic data:
- You explain that you used a lumped conceptual model (outlined in Supplementary Section 2) with rainfall and air temperature inputs from the Lugano station to generate a 40-year synthetic hourly time series, calibrated against measured streamflow data at Chiasso, Ponte di Polenta, on the Breggia River. You note that the simulation ‘closely mirrors actual measurements’ and provide a four-month example from 2010 to support this. However, to strengthen this claim, it would be helpful to include quantitative model performance statistics (e.g., R2, NSE, PBias, RMSE) for Case A. Ideally, this should cover the full 40-year period and also highlight in a figure the performance for a wet year (e.g., 2014) and a dry year (e.g., 2003) at Chiasso for the Breggia as identified on the Swiss FOEN Hydrological data and forecasts website (Breggia - Chiasso, Ponte di Polenta).
- You explain that the synthetic streamflow data were generated using a lumped conceptual model with precipitation inputs from the Lugano station. In contrast, the real-world data for the six Swiss catchments use precipitation inputs from the 'CombiPrecip' product (MeteoSwiss), which combines radar data and land-based rain gauge observations. Since the GAMCR model is ultimately demonstrated on these six diverse catchments using CombiPrecip data, have you considered also generating a synthetic streamflow time series using CombiPrecip (2005 to 2019) as the precipitation input to train the model? This would allow for a more direct comparison and could potentially improve consistency in training and evaluating model performance.
- In Section 3.2 – Real-world data:
- It is clear why you selected the six Swiss watersheds for model testing, because they will have different hydrological responses without the presence of glaciers and have no catchment interventions, and have complete and reliable data records; however, it would help if you clarified why selecting a medium-sized catchment was a criterion.
- While Table 1 gives an overview of the gauging stations and catchment characteristics, you aim to demonstrate that the six watersheds have distinct hydrological flow regimes. To strengthen this, could you:
- Tabulate key hydrological statistics for each watershed, such as the annual mean, annual maximum, and annual minimum flow, using data available from the FOEN Hydrological Data and Forecasts website.
- Provide a Flow Duration Curve (FDC) for each watershed, based on observed streamflow data. The FDC offers a statistical summary of streamflow behaviour, independent of the sequence or timing of flows and would help to characterise the unique hydrological response of each catchment. This could supplement the in-depth analysis provided for the six Swiss catchments in the Supplementary Section 1.
- You mention compiling a 15-year record (2005–2019) of hourly precipitation-runoff data for six Swiss watersheds, using the CombiPrecip product from MeteoSwiss, which is generally considered reliable. However, CombiPrecip accuracy depends on the quality of its input data, namely, land-based rain gauges and radar estimates. Notably, the Swiss radar network was expanded from three to five stations between 2010 and 2016, and the number of rain gauges increased significantly from 71 (in 2005–2010) to over 200 by 2015, leading to substantial improvements in data quality. Since CombiPrecip also includes a quality index for assessing data reliability, have you considered excluding data before 2010 to enhance the robustness of your analysis?
- In Section 3.3 – Implementation details, you state that a precipitation intensity threshold Jth of 0.05 mm/h was applied, and the GAMCR model was trained only on events exceeding this threshold. However, in line 275, when discussing the weighted average RRDs and peak heights of the NRFs estimated by ERRA and GAMCR for the six catchments, the analysis appears to include only events with precipitation intensities above 0.5 mm/h. Should this threshold be consistent with the previously stated Jth = 0.05 mm/h?
- In Section 4 – Results:
- Be consistent when referring to the three cases, especially whether Case A is the reference response (line 234) or is it the base case (line 241), or in Figure 5 labelled ‘base’?
- Earlier in Section 3.1, you refer to the synthetic datasets as Case A (baseline), Case B (damped), and Case C (flashy). However, in Figure 5, the labels are inconsistent: the flashy system is shown as plot (a), the baseline as (b), and the damped as (c), ordered by decreasing NRF response (mm/h²). To avoid confusion and improve clarity, would you consider renaming the cases in Section 3.1 as Case 1, Case 2, and Case 3?
- You show that both the GAMCR and ERRA methods produce consistent estimates of peak height and runoff volume across different scenarios. However, both approaches struggle with accurately predicting peak lag. You note that GAMCR tends to produce ‘less variable lag values’ across different precipitation intensities, with the NRF peak ‘fixed’ at 2 hours for the base and flashy catchments (beyond 5 mm/h), highlighting a limitation of the method:
- Could you elaborate on why GAMCR struggles with peak lag prediction? Is this due to the use of a coarser temporal resolution where the flashy, base, and damped synthetic input time series have been aggregated to 2-, 3-, and 6-hour time steps, respectively?
- Additionally, what are the implications of this limitation for applying GAMCR in cross-site hydrological comparisons at a global scale, as you state in Section 5 that GAMCR should currently not be used to estimate the timing of the hydrologic response.
- In Section 4.2, Estimation of Real-world Hydrological Responses:
- You refer in Figure 7 to a calibration period from 2005 to 2017. I assume this refers to the training period. For clarity and consistency, it would be helpful to use consistent terminology throughout the paper, either "calibration" or "training", and to introduce this period earlier in the methods section, rather than for the first time in the results.
- Similarly, the term "out-of-sample predictions" appears here for the first time, along with the 2018–2019 period. If this refers to the testing or validation period, please make this explicit in the main text, rather than only in the figure caption. Also, define what is meant by "out-of-sample prediction" when it is first introduced, so readers unfamiliar with the term can clearly understand its role in the analysis.
- Are the out-of-sample predictions 1 year sufficient to test the GAMCR model performance?
- Referring to Figure 7, are there any issues with the quality of the observations at high flows at Chiasso for the Breggia (2349), i.e. the stage discharge relationship? Have you investigated the confidence in the stream flow records at this river gauge?
- The NSE for the out-of-sample prediction for the Chiasso dataset is 0.19, which indicates poor model performance. You mention that the GAMCR model was trained on synthetic data generated for the Chiasso catchment and that this synthetic data closely matches the real-world observations. However, in Figure S3, the model clearly overestimates streamflow compared to the measured data for the four months in 2019. Can you clarify why the model shows such a poor NSE and a weak fit to the observed streamflow time series, despite the apparent agreement between the synthetic and observed training data? Additionally, do you think the predictions for Chiasso would improve if you used the CombiPrecip product as the rainfall input, rather than Lugano rainfall data used here?
- In section 5, Discussion and Conclusions:
- You state that GAMCR should not currently be used to estimate the timing of the hydrologic response, noting that the timing of the NRF peak was generally inaccurate, with GAMCR systematically underestimating the peak lag. You suggest that this limitation could be addressed by using a different organisation of the basis functions that form the core of the response. For clarity, could you be more specific about what changes to the basis function structure you are proposing? Providing a brief explanation would help readers understand how this adjustment could improve peak timing estimation.
- In line 324, you say that ‘we verified that the modelled streamflow was generally realistic for both in-sample and out-of-sample data (Figure 7)’. Be consistent with the terminology you used, calibration period in Figure 7, not the term in-sample data. Similarly, you used out-of-sample prediction in Figure 7. Is this the testing period? More consistent terminology throughout the paper would help the reader understand what you mean by training, calibration, testing, validating, in-sample and out-of-sample prediction.
- Good to see that the next model developments could target different or denser basis functions capable of improving the estimation of the peak lag.
- Technical Corrections:
- Other minor suggestions:
- The title could be simplified by changing "via" to "using" for more clarity.
- The use of the phrase ‘real-world’ data is an odd turn of phrase as you are not comparing to a virtual world. You could consider changing this to simply measured or observed datasets for clarity. For example, you use the terminology observed streamflow in Figure 7.
- More consistent terminology throughout the paper would help the reader understand what you mean by training, calibration, testing, validating, in-sample and out-of-sample prediction.
- Other minor suggestions:
Overview comments:
Overall, the study is excellent and presents a novel data-driven approach to facilitate systematic comparisons of hydrological responses across sites where rainfall-runoff time series are available. The authors should be commended as the GAMCR approach produces results that are closely aligned with results from the ERRA approach. The authors demonstrate that the GAMCR is a robust tool to study runoff response behaviour in real-world catchments. However, it would be helpful to the reader if it were clearer about what the benefits of the GAMCR approach over the ERRA approach are for estimating the hydrological responses for diverse watersheds to justify the study earlier in the paper.
It is also encouraging that the authors acknowledge that this first application of GAMCR to synthetic and real-world data has helped them identify some current model limitations, and that the next model developments could target different or denser basis functions capable of improving the estimation of the peak lag. Minor revisions would strengthen the paper, but this use of a novel data-driven approach that uses GAM to estimate time-dependent catchment responses to capture the complex, non-linear relationship between rainfall and runoff is of interest to others in the field.
Citation: https://doi.org/10.5194/egusphere-2025-1591-RC1 -
RC2: 'Comment on egusphere-2025-1591', Anonymous Referee #2, 25 Aug 2025
I consider this study interesting and scientifically valid. However, it needs improvements before it can be published. Below are the corrections needed for the publication of this article, and I would like to be able to review this article again after the requested corrections have been made.
Major revision
-The main limitation of this study is that it considers the developed method to be of global application (in any watershed worldwide), but only calibrated and validated it in small and small/medium watersheds with climatic and physiographic characteristics existing in Switzerland. Therefore, I recommend two solutions to this problem: all mentions of the method's worldwide application should be replaced by mentions of its application restricted to the types and conditions of calibrated/validated watersheds. Or (better), the developed method should be calibrated and validated in basins with different climatic and physiographic characteristics worldwide. An example of a data source for large-sized tropical basins is the Ore-Hybam (but other sources should also be used).
Minor revision
-Line 7. Describe the abbreviation ‘GAMCR’ at its first mention in the abstract.
-Lines 17-18. Insert a source (citation) for the sentence: ‘The hydrologic response (or runoff response) is usually defined as the change in streamflow induced by a given input of precipitation.’
-Lines 99-101. Explain (rewrite) this sentence further. It suggests that it varies smoothly between basins with different climatic and physiographic characteristics.
-Line 102. The correct word is ‘third’, not ‘second’.
-Lines 163-164. Insert a source (citation) for the sentence: ‘In the case of the hydrologic response, the evaluation step is particularly important because the real-world impulse response functions cannot be measured directly...’.
-Line 198. I don't agree that basins measuring 34 to 195 km² are considered medium-sized. They are all small to small/medium in size. Please, correct this! Moreover, this is one of the many reasons to calibrate and validate the method in other basin types.
-Lines 203-206. Even when analyzing out of the snowy months, the basins' antecedent conditions are influenced by snow. This is another reason to calibrate and validate the method in other types of watersheds.
-Line 321. The basins are not climatically diverse as you mention. They are similar considering Switzerland's climate conditions in relation to the planet's climate diversity. Please correct this! Moreover, this is another reason to calibrate and validate the method in other parts (continents) of the world.
Citation: https://doi.org/10.5194/egusphere-2025-1591-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,150 | 92 | 21 | 1,263 | 33 | 32 | 41 |
- HTML: 1,150
- PDF: 92
- XML: 21
- Total: 1,263
- Supplement: 33
- BibTeX: 32
- EndNote: 41
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Excellent article!
I was wondering about the level of confidence in the smoothness assumption for the GAMCR (and GAMs in general) for the current application. Is it possible that there may be both smooth and spikey areas of the nonlinear function? Perhaps it may be beneficial to test the performance of spline vs. wavelet bases to see if there are any material differences?
Behera, J., Bishi, N., & Sahu, S. K. (2025). Adaptive Nonparametric Regression Using Hybrid Fourier-Wavelet Series: A Generalized Inference Approach for Multiscale Socioeconomic Dynamics. Asian Journal of Probability and Statistics, 27(4), 60–67. https://doi.org/10.9734/ajpas/2025/v27i4739
Fonseca, R., Morettin, P., & Pinheiro, A. (2024). Wavelet Feature Screening. Journal of Computational and Graphical Statistics, 33(4), 1310–1319. https://doi.org/10.1080/10618600.2024.2342984
dos Santos Sousa, A. R. (2024). A Bayesian wavelet shrinkage rule under LINEX loss function. Research in Statistics, 2(1). https://doi.org/10.1080/27684520.2024.2362926
Sousa, A. R. dos S., & Garcia, N. L. (2023). Wavelet shrinkage in nonparametric regression models with positive noise. Journal of Statistical Computation and Simulation, 93(17), 3011–3033. https://doi.org/10.1080/00949655.2023.2215372
Rodrigo, A., & Zevallos, M. (2025, May 10). A note on wavelet shrinkage in nonparametric regression models with ARFIMA errors. ArXiv.org. https://arxiv.org/abs/2505.06485
Rodrigo, A., & Zevallos, M. (2025). On Bayesian wavelet shrinkage estimation of nonparametric regression models with stationary correlated noise. Statistics and Computing, 35(4). https://doi.org/10.1007/s11222-025-10618-6
Agyemang E. F. (2025). A Gaussian Process Regression and Wavelet Transform Time Series approaches to modeling Influenza A. Computers in Biology and Medicine, 184, 109367. https://doi.org/10.1016/j.compbiomed.2024.109367
Didi, S., & Bouzebda, S. (2025). Wavelet Estimation of Partial Derivatives in Multivariate Regression Under Discrete-Time Stationary Ergodic Processes. Mathematics, 13(10), 1587. https://doi.org/10.3390/math13101587
Kou , J., & Chesneau , C. (2022). Wavelet Estimation of Regression Derivatives for Biased and Negatively Associated Data. REVSTAT-Statistical Journal, 20(3), 353–371. https://doi.org/10.57805/revstat.v20i3.375
Amato, U., Antoniadis, A., Feis, I. D., & Gijbels, I. (2023). Penalized wavelet nonparametric univariate logistic regression for irregular spaced data. Statistics, 57(5), 1037–1060. https://doi.org/10.1080/02331888.2023.2248679
Giacofci, M., Lambert-Lacroix, S., & Picard, F. (2017). Minimax wavelet estimation for multisample heteroscedastic nonparametric regression. Journal of Nonparametric Statistics, 30(1), 238–261. https://doi.org/10.1080/10485252.2017.1406091
Zhou, Y., Wan, A. T. K., Xie, S., & Wang, X. (2010). Wavelet analysis of change-points in a non-parametric regression with heteroscedastic variance. Journal of Econometrics, 159(1), 183–201. https://doi.org/10.1016/j.jeconom.2010.06.001
He, Q., & Chen, M. (2021). Consistency properties for the wavelet estimator in nonparametric regression model with dependent errors. Journal of Inequalities and Applications, 2021(1). https://doi.org/10.1186/s13660-021-02603-0
Zhou, X., & Zhu, F. (2020). Asymptotics for $L_{1}$-wavelet method for nonparametric regression. Journal of Inequalities and Applications, 2020(1). https://doi.org/10.1186/s13660-020-02483-w
Barber, S., & Nason, G. P. (2024-11-04). waveband: Computes credible intervals for Bayesian wavelet shrinkage. R package, Version 4.7.4. Comprehensive R Archive Network. https://CRAN.R-project.org/package=waveband