Data-Driven Estimation of the hydrologic response via Generalized Additive Models

Duchemin, Quentin; Zanoni, Maria Grazia; Floriancic, Marius G.; Seybold, Hansjörg; Obozinski, Guillaume; Kirchner, James W.; Benettin, Paolo

doi:10.5194/egusphere-2025-1591

Preprints

https://doi.org/10.5194/egusphere-2025-1591

Preprints

25 Apr 2025

| 25 Apr 2025

Data-Driven Estimation of the hydrologic response via Generalized Additive Models

Quentin Duchemin, Maria Grazia Zanoni, Marius G. Floriancic, Hansjörg Seybold, Guillaume Obozinski, James W. Kirchner, and Paolo Benettin

Abstract. Estimating the hydrologic response of watersheds to precipitation events is key to understanding streamflow generation processes. Impulse Response Functions, commonly referred to as the Instantaneous Unit Hydrograph (IUH) in hydrology, have been used for over 50 years to predict stormflow and compare catchment behaviors. These response functions are often strongly affected by modelers' choices of parameters and data preprocessing procedures, limiting their diagnostic utility and generalizability across different sites and time periods. With the increasing availability of compiled rainfall-runoff series, there is now a growing opportunity to develop new approaches that fully exploit such datasets. This paper introduces GAMCR, a novel data-driven approach for estimating impulse response functions using Generalized Additive Models. GAMCR is designed to capture the complex, nonlinear relationships between precipitation and runoff, offering a flexible and interpretable framework for the systematic analysis of hydrological responses. The model is succesfully validated on synthetic data, where the true response functions are known. Additionally, we demonstrate the model's potential using real-world data from six Swiss basins with distinct hydrological behaviors. Results are fully consistent with those obtained from ERRA, another recent data-driven approach with a very different architecture, as well as with the climate and physical properties of the sites. Overall, GAMCR is a modern and effective tool for leveraging rainfall-runoff datasets to investigate the controls on hydrologic responses worldwide.

Received: 04 Apr 2025 – Discussion started: 25 Apr 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 25146 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (25146 KB)

Supplement (1081 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

17 Nov 2025

Data-driven estimation of the hydrologic response using generalized additive models

Quentin Duchemin, Maria Grazia Zanoni, Marius G. Floriancic, Hansjörg Seybold, Guillaume Obozinski, James W. Kirchner, and Paolo Benettin

Geosci. Model Dev., 18, 8663–8678, https://doi.org/10.5194/gmd-18-8663-2025,https://doi.org/10.5194/gmd-18-8663-2025, 2025

Short summary

Quentin Duchemin, Maria Grazia Zanoni, Marius G. Floriancic, Hansjörg Seybold, Guillaume Obozinski, James W. Kirchner, and Paolo Benettin

Interactive discussion

Status: closed

CC1:
'Comment on egusphere-2025-1591', Cameron McIntosh, 18 May 2025

Excellent article!
I was wondering about the level of confidence in the smoothness assumption for the GAMCR (and GAMs in general) for the current application. Is it possible that there may be both smooth and spikey areas of the nonlinear function? Perhaps it may be beneficial to test the performance of spline vs. wavelet bases to see if there are any material differences?
Behera, J., Bishi, N., & Sahu, S. K. (2025). Adaptive Nonparametric Regression Using Hybrid Fourier-Wavelet Series: A Generalized Inference Approach for Multiscale Socioeconomic Dynamics. Asian Journal of Probability and Statistics, 27(4), 60–67. https://doi.org/10.9734/ajpas/2025/v27i4739
Fonseca, R., Morettin, P., & Pinheiro, A. (2024). Wavelet Feature Screening. Journal of Computational and Graphical Statistics, 33(4), 1310–1319. https://doi.org/10.1080/10618600.2024.2342984
dos Santos Sousa, A. R. (2024). A Bayesian wavelet shrinkage rule under LINEX loss function. Research in Statistics, 2(1). https://doi.org/10.1080/27684520.2024.2362926
Sousa, A. R. dos S., & Garcia, N. L. (2023). Wavelet shrinkage in nonparametric regression models with positive noise. Journal of Statistical Computation and Simulation, 93(17), 3011–3033. https://doi.org/10.1080/00949655.2023.2215372
Rodrigo, A., & Zevallos, M. (2025, May 10). A note on wavelet shrinkage in nonparametric regression models with ARFIMA errors. ArXiv.org. https://arxiv.org/abs/2505.06485
Rodrigo, A., & Zevallos, M. (2025). On Bayesian wavelet shrinkage estimation of nonparametric regression models with stationary correlated noise. Statistics and Computing, 35(4). https://doi.org/10.1007/s11222-025-10618-6
Agyemang E. F. (2025). A Gaussian Process Regression and Wavelet Transform Time Series approaches to modeling Influenza A. Computers in Biology and Medicine, 184, 109367. https://doi.org/10.1016/j.compbiomed.2024.109367
Didi, S., & Bouzebda, S. (2025). Wavelet Estimation of Partial Derivatives in Multivariate Regression Under Discrete-Time Stationary Ergodic Processes. Mathematics, 13(10), 1587. https://doi.org/10.3390/math13101587
Kou , J., & Chesneau , C. (2022). Wavelet Estimation of Regression Derivatives for Biased and Negatively Associated Data. REVSTAT-Statistical Journal, 20(3), 353–371. https://doi.org/10.57805/revstat.v20i3.375
Amato, U., Antoniadis, A., Feis, I. D., & Gijbels, I. (2023). Penalized wavelet nonparametric univariate logistic regression for irregular spaced data. Statistics, 57(5), 1037–1060. https://doi.org/10.1080/02331888.2023.2248679
Giacofci, M., Lambert-Lacroix, S., & Picard, F. (2017). Minimax wavelet estimation for multisample heteroscedastic nonparametric regression. Journal of Nonparametric Statistics, 30(1), 238–261. https://doi.org/10.1080/10485252.2017.1406091
Zhou, Y., Wan, A. T. K., Xie, S., & Wang, X. (2010). Wavelet analysis of change-points in a non-parametric regression with heteroscedastic variance. Journal of Econometrics, 159(1), 183–201. https://doi.org/10.1016/j.jeconom.2010.06.001
‌He, Q., & Chen, M. (2021). Consistency properties for the wavelet estimator in nonparametric regression model with dependent errors. Journal of Inequalities and Applications, 2021(1). https://doi.org/10.1186/s13660-021-02603-0
Zhou, X., & Zhu, F. (2020). Asymptotics for $L_{1}$-wavelet method for nonparametric regression. Journal of Inequalities and Applications, 2020(1). https://doi.org/10.1186/s13660-020-02483-w
Barber, S., & Nason, G. P. (2024-11-04). waveband: Computes credible intervals for Bayesian wavelet shrinkage. R package, Version 4.7.4. Comprehensive R Archive Network. https://CRAN.R-project.org/package=waveband

Citation: https://doi.org/10.5194/egusphere-2025-1591-CC1
- CC2: 'Reply on CC1', Quentin Duchemin, 10 Jun 2025
  
  Thank you for your interest in our work and for your thoughtful feedback.
  Let us recall that GAMCR is based on two key elements:
  
  1) the basis functions used to decompose the transfer functions (denoted as $(b_l)_l$ in our paper)
  
  2) the time-dependent coefficients in this basis, which are modeled using GAMs as a function of a time-varying feature vector $z_t$.
  1) The choice of the basis functions $(b_l)_l$ was motivated by prior knowledge about the typical shape of hydrological transfer functions. These functions often exhibit sharp peaks at short lags and smoother behavior at longer lags. To capture this, we use splines with knots placed on an irregular grid, denser at short lags to allow more flexibility in capturing localized spikes.
  We agree that exploring alternative basis functions such as wavelets, or even learning the basis functions directly, could be an interesting direction for future research. However, the estimation of transfer functions from input-output fluxes is an ill-posed inverse problem. Introducing the appropriate inductive bias is essential to ensure not just good predictive performance, but also the recovery of physically meaningful latent structures. While spline bases naturally encode assumptions about local smoothness and support, defining such inductive biases with wavelet bases may be more challenging and could risk overfitting.
  2) In GAMCR, the time-dependent coefficients of the transfer function decomposition are modeled using additive GAMs with univariate spline terms for each feature. These splines are typically estimated using a smoothing spline approach, which includes a penalty on the second derivative of each spline to control its smoothness. A hyperparameter governs the trade-off between fidelity to the data and the smoothness of the fitted function.
  In our experiments, we found that the results were robust to the choice of this smoothing hyperparameter, and thus we used a fixed value across all experiments.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1591-CC2
RC1:
'Comment on egusphere-2025-1591', Anonymous Referee #1, 30 Jun 2025
General Comments:
The authors introduce GAMCR, a novel data-driven approach that uses Generalised Additive Models (GAM) to estimate time-dependent Catchment Responses (CR) or impulse response functions, which capture the complex, nonlinear relationships between rainfall and runoff. It performs well on synthetic data, where the true response functions are known. It produces reliable results across six Swiss catchments with distinct hydrological behaviours, aligning with findings from ensemble rainfall-runoff analysis (ERRA), another recent data-driven approach with a different architecture.
Overall, the authors make a strong case that with the rise of large rainfall-runoff datasets, the GAMCR is a novel approach to facilitate systematic comparisons of hydrological responses across sites globally where rainfall-runoff time series are available. While I think this article is excellent, I have a few minor comments that could enhance it.
Specific Comments:
My suggestions for the technical sections are as follows:
In Section 1, you describe GAMCR as a novel data-driven approach with a different architecture from the ERRA method yet note that the results are fully consistent between the two. However, it would be helpful to clarify the specific advantages of using GAMCR over ERRA, particularly in estimating hydrological responses across diverse watersheds and enabling systematic cross-site comparisons where rainfall-runoff time series are available. For example, is the benefit primarily computational efficiency, or does GAMCR offer other advantages such as flexibility, ease of implementation, or improved performance for certain types of catchments?

In Section 2.3 – Model training:
While it is clear that the machine learning model was trained using the synthetic streamflow dataset from the Chiasso gauging station on the Breggia River (Section 3.1), it is not clear what specific data period or proportion of data (e.g., 80% training, 20% testing) was used to train the GAMs. Clarifying the data splitting strategy for both the synthetic and real datasets would help readers better understand the model training process.

It would be helpful to provide the final hyperparameter λ1 and λ2 values used in the model training process. The default values of λ1=10−3 and λ2=1 are mentioned in Section 3.3, but what are the optimised values? Have you thought about how reducing the training data might impact the model's performance? This could give you some insight into the model's robustness with less data.

You note that the model was trained on synthetic data (Section 3.1) for the three cases, and model testing is described in Section 3. However, the model validation is only introduced later in Section 4. For clarity and better flow, it would be helpful to briefly outline the validation approach in Section 3. Specifically, this could include a description of how the model is validated by computing the hydrologic response in the form of non-linear runoff functions (NRFs) across six quantiles of precipitation intensity and comparing these against both the ERRA-derived estimates and the benchmark outputs generated directly from the model.

In Section 2.4, you could provide the link to the GAMCR model software and online tutorial mentioned in the text would be helpful. Only a link to the datasets used in the study is made available in the paper.

In Section 3.1 - Synthetic data:
You explain that you used a lumped conceptual model (outlined in Supplementary Section 2) with rainfall and air temperature inputs from the Lugano station to generate a 40-year synthetic hourly time series, calibrated against measured streamflow data at Chiasso, Ponte di Polenta, on the Breggia River. You note that the simulation ‘closely mirrors actual measurements’ and provide a four-month example from 2010 to support this. However, to strengthen this claim, it would be helpful to include quantitative model performance statistics (e.g., R2, NSE, PBias, RMSE) for Case A. Ideally, this should cover the full 40-year period and also highlight in a figure the performance for a wet year (e.g., 2014) and a dry year (e.g., 2003) at Chiasso for the Breggia as identified on the Swiss FOEN Hydrological data and forecasts website (Breggia - Chiasso, Ponte di Polenta).

You explain that the synthetic streamflow data were generated using a lumped conceptual model with precipitation inputs from the Lugano station. In contrast, the real-world data for the six Swiss catchments use precipitation inputs from the 'CombiPrecip' product (MeteoSwiss), which combines radar data and land-based rain gauge observations. Since the GAMCR model is ultimately demonstrated on these six diverse catchments using CombiPrecip data, have you considered also generating a synthetic streamflow time series using CombiPrecip (2005 to 2019) as the precipitation input to train the model? This would allow for a more direct comparison and could potentially improve consistency in training and evaluating model performance.

In Section 3.2 – Real-world data:
It is clear why you selected the six Swiss watersheds for model testing, because they will have different hydrological responses without the presence of glaciers and have no catchment interventions, and have complete and reliable data records; however, it would help if you clarified why selecting a medium-sized catchment was a criterion.

While Table 1 gives an overview of the gauging stations and catchment characteristics, you aim to demonstrate that the six watersheds have distinct hydrological flow regimes. To strengthen this, could you:
Tabulate key hydrological statistics for each watershed, such as the annual mean, annual maximum, and annual minimum flow, using data available from the FOEN Hydrological Data and Forecasts website.

Provide a Flow Duration Curve (FDC) for each watershed, based on observed streamflow data. The FDC offers a statistical summary of streamflow behaviour, independent of the sequence or timing of flows and would help to characterise the unique hydrological response of each catchment. This could supplement the in-depth analysis provided for the six Swiss catchments in the Supplementary Section 1.

You mention compiling a 15-year record (2005–2019) of hourly precipitation-runoff data for six Swiss watersheds, using the CombiPrecip product from MeteoSwiss, which is generally considered reliable. However, CombiPrecip accuracy depends on the quality of its input data, namely, land-based rain gauges and radar estimates. Notably, the Swiss radar network was expanded from three to five stations between 2010 and 2016, and the number of rain gauges increased significantly from 71 (in 2005–2010) to over 200 by 2015, leading to substantial improvements in data quality. Since CombiPrecip also includes a quality index for assessing data reliability, have you considered excluding data before 2010 to enhance the robustness of your analysis?

In Section 3.3 – Implementation details, you state that a precipitation intensity threshold J^th of 0.05 mm/h was applied, and the GAMCR model was trained only on events exceeding this threshold. However, in line 275, when discussing the weighted average RRDs and peak heights of the NRFs estimated by ERRA and GAMCR for the six catchments, the analysis appears to include only events with precipitation intensities above 0.5 mm/h. Should this threshold be consistent with the previously stated J^th = 0.05 mm/h?

In Section 4 – Results:
Be consistent when referring to the three cases, especially whether Case A is the reference response (line 234) or is it the base case (line 241), or in Figure 5 labelled ‘base’?

Earlier in Section 3.1, you refer to the synthetic datasets as Case A (baseline), Case B (damped), and Case C (flashy). However, in Figure 5, the labels are inconsistent: the flashy system is shown as plot (a), the baseline as (b), and the damped as (c), ordered by decreasing NRF response (mm/h²). To avoid confusion and improve clarity, would you consider renaming the cases in Section 3.1 as Case 1, Case 2, and Case 3?

You show that both the GAMCR and ERRA methods produce consistent estimates of peak height and runoff volume across different scenarios. However, both approaches struggle with accurately predicting peak lag. You note that GAMCR tends to produce ‘less variable lag values’ across different precipitation intensities, with the NRF peak ‘fixed’ at 2 hours for the base and flashy catchments (beyond 5 mm/h), highlighting a limitation of the method:
Could you elaborate on why GAMCR struggles with peak lag prediction? Is this due to the use of a coarser temporal resolution where the flashy, base, and damped synthetic input time series have been aggregated to 2-, 3-, and 6-hour time steps, respectively?

Additionally, what are the implications of this limitation for applying GAMCR in cross-site hydrological comparisons at a global scale, as you state in Section 5 that GAMCR should currently not be used to estimate the timing of the hydrologic response.

In Section 4.2, Estimation of Real-world Hydrological Responses:
You refer in Figure 7 to a calibration period from 2005 to 2017. I assume this refers to the training period. For clarity and consistency, it would be helpful to use consistent terminology throughout the paper, either "calibration" or "training", and to introduce this period earlier in the methods section, rather than for the first time in the results.

Similarly, the term "out-of-sample predictions" appears here for the first time, along with the 2018–2019 period. If this refers to the testing or validation period, please make this explicit in the main text, rather than only in the figure caption. Also, define what is meant by "out-of-sample prediction" when it is first introduced, so readers unfamiliar with the term can clearly understand its role in the analysis.

Are the out-of-sample predictions 1 year sufficient to test the GAMCR model performance?

Referring to Figure 7, are there any issues with the quality of the observations at high flows at Chiasso for the Breggia (2349), i.e. the stage discharge relationship? Have you investigated the confidence in the stream flow records at this river gauge?

The NSE for the out-of-sample prediction for the Chiasso dataset is 0.19, which indicates poor model performance. You mention that the GAMCR model was trained on synthetic data generated for the Chiasso catchment and that this synthetic data closely matches the real-world observations. However, in Figure S3, the model clearly overestimates streamflow compared to the measured data for the four months in 2019. Can you clarify why the model shows such a poor NSE and a weak fit to the observed streamflow time series, despite the apparent agreement between the synthetic and observed training data? Additionally, do you think the predictions for Chiasso would improve if you used the CombiPrecip product as the rainfall input, rather than Lugano rainfall data used here?

In section 5, Discussion and Conclusions:
You state that GAMCR should not currently be used to estimate the timing of the hydrologic response, noting that the timing of the NRF peak was generally inaccurate, with GAMCR systematically underestimating the peak lag. You suggest that this limitation could be addressed by using a different organisation of the basis functions that form the core of the response. For clarity, could you be more specific about what changes to the basis function structure you are proposing? Providing a brief explanation would help readers understand how this adjustment could improve peak timing estimation.

In line 324, you say that ‘we verified that the modelled streamflow was generally realistic for both in-sample and out-of-sample data (Figure 7)’. Be consistent with the terminology you used, calibration period in Figure 7, not the term in-sample data. Similarly, you used out-of-sample prediction in Figure 7. Is this the testing period? More consistent terminology throughout the paper would help the reader understand what you mean by training, calibration, testing, validating, in-sample and out-of-sample prediction.

Good to see that the next model developments could target different or denser basis functions capable of improving the estimation of the peak lag.

Technical Corrections:
Other minor suggestions:
The title could be simplified by changing "via" to "using" for more clarity.

The use of the phrase ‘real-world’ data is an odd turn of phrase as you are not comparing to a virtual world. You could consider changing this to simply measured or observed datasets for clarity. For example, you use the terminology observed streamflow in Figure 7.

More consistent terminology throughout the paper would help the reader understand what you mean by training, calibration, testing, validating, in-sample and out-of-sample prediction.

Overview comments:
Overall, the study is excellent and presents a novel data-driven approach to facilitate systematic comparisons of hydrological responses across sites where rainfall-runoff time series are available. The authors should be commended as the GAMCR approach produces results that are closely aligned with results from the ERRA approach. The authors demonstrate that the GAMCR is a robust tool to study runoff response behaviour in real-world catchments. However, it would be helpful to the reader if it were clearer about what the benefits of the GAMCR approach over the ERRA approach are for estimating the hydrological responses for diverse watersheds to justify the study earlier in the paper.
It is also encouraging that the authors acknowledge that this first application of GAMCR to synthetic and real-world data has helped them identify some current model limitations, and that the next model developments could target different or denser basis functions capable of improving the estimation of the peak lag. Minor revisions would strengthen the paper, but this use of a novel data-driven approach that uses GAM to estimate time-dependent catchment responses to capture the complex, non-linear relationship between rainfall and runoff is of interest to others in the field.
Citation: https://doi.org/10.5194/egusphere-2025-1591-RC1
- AC1:
  'Reply on RC1', Maria Grazia Zanoni, 19 Sep 2025
  Dear Reviewer,
  Thank you very much for your valuable comments and suggestions on our manuscript. We appreciate the time and effort you have dedicated to reviewing our work. We have carefully considered each point raised and have revised the manuscript accordingly. A detailed response to each comment is provided below.
  In Section 1, [...] it would be helpful to clarify the specific advantages of using GAMCR over ERRA. [...]
  
  We thank Reviewer 1 for this comment. An exhaustive discussion of the differences between the models is presented in the discussion, but we will clarify already at the end of the introduction that ``Differently from ERRA, GAMCR aims to estimate the hydrologic response to each individual precipitation events using combinations of spline basis functions, with coefficients determined through machine learning techniques. This approach, though requiring to fit Generalized Additive Models, allows for greater flexibility since additional information (e.g., temperature, dam operations, or site-specific characteristics) can be incorporated into the model."
  In Section 2.3 – Model training: [...] it is not clear what specific data period or proportion of data
  
  We thank the reviewer for the comment. For real data, we used 13 years of data (2005-2017) to train the models. We used data from the years 2018 and 2019 to test the model. We will clarify this detail about the training/test split for the real data in Section 3.2. Section 2.3, instead, describes the general training methodology, which is not specific to either synthetic or real data.
  In Section 2.3 – Model training: It would be helpful to provide the final hyperparameter λ1 and λ2 values used in the model training process. The default values of λ1=10^-3 and λ2=1 are mentioned in Section 3.3.
  
  We will specify in Section 2.3 that the default values λ1=10^-3and λ2=1 were used.
  In Section 2.3 – Model training: the model validation is only introduced later in Section 4.
  
  We will introduce a subsection ``Model validation strategy'' already within the methods section (Section 3), to clarify our strategy from the beginning. This will allow us to simplify Section 4 to focus entirely on the results. Thank you for this comment.
  In Section 2.4, you could provide the link to the GAMCR model software and online tutorial mentioned in the text would be helpful.
  
  This will be certainly done in the revised manuscript, thanks.
  In Section 3.1 - Synthetic data: You note that the simulation ‘closely mirrors actual measurements’ and provide a four-month example from 2010 to support this. However, to strengthen this claim, it would be helpful to include quantitative model performance statistics (e.g., R2, NSE, PBias, RMSE) for Case A. Ideally, this should cover the full 40-year period and also highlight in a figure the performance for a wet year (e.g., 2014) and a dry year (e.g., 2003) [...]
  
  We will include the RMSE statistic (which is 0.20 mm h^-1 ) in the supplement but prefer to omit it from the main manuscript, as it might mislead readers into thinking the synthetic data generator is designed to best reproduce observed data. Its purpose is instead to create hydrologically realistic series with fully known responses, not to best replicate the specific data at the Chiasso, Ponte di Polenta station. We will clarify this in the manuscript and in the supplement.
  In Section 3.1 - Synthetic data: Have you considered also generating a synthetic streamflow time series using CombiPrecip (2005 to 2019) as the precipitation input to train the model? This would allow for a more direct comparison and could potentially improve consistency in training and evaluating model performance.
  
  As for the previous response, we will better stress in the revised manuscript that the purpose of the data generator is just to create realistic synthetic data featuring nonlinear and nonstationary hydrologic responses. Using a different precipitation input would not result in increased consistency between training and evaluating model performance. The different input series would result in slightly different model parameters, but the synthetic data on which the GAMCR model is tested would still be ``perfect'' (in the sense that it is not subject to measurement errors) because it is produced by a model.
  In Section 3.2 – Real-world data: it would help if you clarified why selecting a medium-sized catchment was a criterion.
  
  We thank the reviewer for the insightful request and we will discuss in more detail why we chose the selected basins and how we can define them as midsize catchments. We selected basins between 34–185 km², which are considered small to medium-sized. In the Swiss context, they are well-suited for hydrological analysis, as their scale reduces the need for complex distributed models while still capturing key processes. The chosen sites also represent diverse hydrological regimes, elevations, and soil depths. This will be clarified and detailed in Section 3.2.
  In Section 3.2 - Real-world data: Could you: i) Tabulate key hydrological statistics for each watershed ii) Provide a Flow Duration Curve (FDC) for each watershed.
  
  We will add a Table 2 in section 3.2 with key hydrological statistics; we will also provide the FDCs for each watershed for the snow-free period in a new figure.
  In Section 3.2 – Real-world data: Since CombiPrecip also includes a quality index for assessing data reliability, have you considered excluding data before 2010 to enhance the robustness of your analysis?
  
  We were aware of the modifications to the Swiss radar network implemented between 2005 and 2016, which significantly improved the data quality of the CombiPrecip product. We included data starting from 2005, as initial experiments indicated that the model's performance was not substantially impacted by the lower data quality prior to 2015. This decision also reflects a more realistic scenario in which flux data is available over at least a decade.
  In Section 3.3 – Implementation details: Should this threshold be consistent with the previously stated Jth = 0.05 mm/h?
  
  It is normal that the thresholds differ: the 0.05 mm/h threshold was used for training stability and efficiency, while the 0.5 mm/h threshold was chosen for analysis, focusing on more hydrologically relevant events. We will clarify this point in the revision.
  In Section 4 – Results: Be consistent when referring to the three cases. in Figure 5, the labels are inconsistent
  
  We thank the reviewer for raising this point. We will change the order of the panels in Figure 6 to make sure figures a, b and c match the cases A, B and C considered.
  
  In Section 4 - Results: Could you elaborate on why GAMCR struggles with peak lag prediction? Additionally, what are the implications of this limitation for applying GAMCR in cross-site hydrological comparisons at a global scale.
  
  In section 5: For clarity, could you be more specific about what changes to the basis function structure you are proposing? Providing a brief explanation would help readers understand how this adjustment could improve peak timing estimation.
  
  We believe that predicting the peak lag is particularly challenging from a statistical perspective. This is because changes in the peak lag can result in only minor differences in the induced discharge, making the estimation problem very difficult. The main implication is that the peak lag estimated by GAMCR should not be used for cross-site hydrological comparisons, as we already stated in the discussion (lines 318--319, ``GAMCR should currently not be used to estimate the timing of the hydrologic response").
  In such an ill-conditioned scenario, the first lever to improve peak lag estimation is increasing the size of the training dataset. However, model architecture also plays a significant role. GAMCR, for instance, relies on a set of positive, unimodal basis functions that are denser at shorter lags. The choice and design of these basis functions directly affect the accuracy of peak lag estimation.
  It would therefore be valuable to investigate the optimal knot placement for B-splines in the context of peak lag estimation. Reducing the number of knots may limit the model's expressiveness, while having too many basis functions can cause identifiability issues—different combinations of basis functions might result in similar discharge responses, further complicating peak lag estimation.
  For this reason, we believe that instead of predefining the basis functions, it would be an interesting research direction to learn them directly from data. This would allow the model to adapt the number and shape of basis functions to better match the characteristics of the watershed under study. The challenge, however, lies in imposing appropriate constraints on the learned basis functions: a well-designed inductive bias is essential to mitigate the ill-posed nature of recovering the latent transfer function from streamflow time series. We leave this investigation for future work.
  These points will be included in Section 5 of the revised manuscript.
  In Section 4.2: For clarity and consistency, it would be helpful to use consistent terminology throughout the paper, either "calibration" or "training".
  
  In Section 5: Be consistent with the terminology you used, calibration period in Figure 7, not the term in-sample data.
  
  We thank the reviewer for pointing out this lack of consistency in the terminology. In the revised manuscript, we will stick to the terminology "training" and "test periods" to enhance clarity.
  Section 4.2: Are the out-of-sample predictions 1 year sufficient to test the GAMCR model performance?
  
  The test period spans two years: 2018 and 2019. With hourly data, we believe that this gives a reasonable dataset to evaluate the model performance. In section 3.2 of the revised manuscript, we will clarify the training/test split considered.
  Section 4.2: Referring to Figure 7, are there any issues with the quality of the observations at high flows at Chiasso for the Breggia (2349), i.e. the stage discharge relationship? Have you investigated the confidence in the stream flow records at this river gauge?
  
  We do not see any particular issues with the quality of the high-flow observations at Chiasso (Breggia, 2349). The discharge data used in this study were official data provided by the Federal Office for the Environment (FOEN), which applies standardized procedures for flow measurement and quality control at all official gauging stations in Switzerland. As such, we rely on these validated records, and no indications of systematic uncertainty or bias at high flows were reported for this station.
  In Section 4.2: The NSE for the out-of-sample prediction for the Chiasso dataset is 0.19, which indicates poor model performance. Can you clarify why the model shows such a poor NSE and a weak fit to the observed streamflow time series, despite the apparent agreement between the synthetic and observed training data? Additionally, do you think the predictions for Chiasso would improve if you used the CombiPrecip product as the rainfall input, rather than Lugano rainfall data used here?
  
  For all real-world data experiments, we rely on the CombiPrecip product as the rainfall input. The Lugano rainfall was used only to create the synthetic datasets.
  To clarify, in Section 4.2 we focus exclusively on real datasets; synthetic data is not considered. Figure 7 presents the streamflow estimates obtained with GAMCR for the six study sites. It is important to note that GAMCR was not specifically designed to achieve state-of-the-art performance in streamflow prediction. A model optimized for this task would need to be carefully designed and trained to handle rare events such as extreme rainfall or very high streamflow. Nevertheless, we included Figure 7 to demonstrate that GAMCR is capable of streamflow prediction, even though it is not tailored for it. The NSE coefficient is known to be sensitive to high-flow values, and a few large errors—typically occurring under high-streamflow conditions as illustrated in Figure 7, and possibly affected by poor rainfall estimates—can markedly reduce NSE values, even when the model performs well in most other cases.
  Technical Corrections
  
  We thank the reviewer for these helpful corrections, which we are happy to implement in the revised manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1591-AC1
RC2:
'Comment on egusphere-2025-1591', Anonymous Referee #2, 25 Aug 2025

I consider this study interesting and scientifically valid. However, it needs improvements before it can be published. Below are the corrections needed for the publication of this article, and I would like to be able to review this article again after the requested corrections have been made.
Major revision
-The main limitation of this study is that it considers the developed method to be of global application (in any watershed worldwide), but only calibrated and validated it in small and small/medium watersheds with climatic and physiographic characteristics existing in Switzerland. Therefore, I recommend two solutions to this problem: all mentions of the method's worldwide application should be replaced by mentions of its application restricted to the types and conditions of calibrated/validated watersheds. Or (better), the developed method should be calibrated and validated in basins with different climatic and physiographic characteristics worldwide. An example of a data source for large-sized tropical basins is the Ore-Hybam (but other sources should also be used).
Minor revision
-Line 7. Describe the abbreviation ‘GAMCR’ at its first mention in the abstract.
-Lines 17-18. Insert a source (citation) for the sentence: ‘The hydrologic response (or runoff response) is usually defined as the change in streamflow induced by a given input of precipitation.’
-Lines 99-101. Explain (rewrite) this sentence further. It suggests that it varies smoothly between basins with different climatic and physiographic characteristics.
-Line 102. The correct word is ‘third’, not ‘second’.
-Lines 163-164. Insert a source (citation) for the sentence: ‘In the case of the hydrologic response, the evaluation step is particularly important because the real-world impulse response functions cannot be measured directly...’.
-Line 198. I don't agree that basins measuring 34 to 195 km² are considered medium-sized. They are all small to small/medium in size. Please, correct this! Moreover, this is one of the many reasons to calibrate and validate the method in other basin types.
-Lines 203-206. Even when analyzing out of the snowy months, the basins' antecedent conditions are influenced by snow. This is another reason to calibrate and validate the method in other types of watersheds.
-Line 321. The basins are not climatically diverse as you mention. They are similar considering Switzerland's climate conditions in relation to the planet's climate diversity. Please correct this! Moreover, this is another reason to calibrate and validate the method in other parts (continents) of the world.

Citation: https://doi.org/10.5194/egusphere-2025-1591-RC2
- AC2:
  'Reply on RC2', Maria Grazia Zanoni, 19 Sep 2025
  I consider this study interesting and scientifically valid. However, it needs improvements before it can be published. Below are the corrections needed for the publication of this article, and I would like to be able to review this article again after the requested corrections have been made.
  
  We thank Reviewer 2 for their evaluation of our manuscript and for providing useful and constructive feedback. We agree with the comments and will incorporate the suggested revisions (see detailed responses below). Since the reviewer has already given clear guidance and the required changes are rather minor, we do not see the necessity of another round of review. The manuscript has already been under public discussion for five months, and we believe there is no reason to further delay its publication.
  Major revision
  
  The main limitation of this study is that it considers the developed method to be of global application (in any watershed worldwide), but only calibrated and validated it in small and small/medium watersheds with climatic and physiographic characteristics existing in Switzerland. Therefore, I recommend two solutions to this problem: all mentions of the method's worldwide application should be replaced by mentions of its application restricted to the types and conditions of calibrated/validated watersheds. Or (better), the developed method should be calibrated and validated in basins with different climatic and physiographic characteristics worldwide. An example of a data source for large-sized tropical basins is the Ore-Hybam (but other sources should also be used).
  
  We thank the reviewer for this constructive feedback. A recurring concern was the limited geographic and climatic diversity of the study sites, since all basins are located in Switzerland. We acknowledge that we may have been unclear about this point and would like to emphasize that it was not our intention to claim that GAMCR has already been validated across global climatic conditions. Our deliberate focus on Swiss basins ensured the use of high-quality, high temporal resolution, well-documented data from sites free of strong anthropogenic disturbances (e.g., dams, major abstractions), allowing us to conduct a robust first evaluation of the method. We have now computed the Flow Duration Curves (FDCs) of streamflow for all six basins (new figure that will be added to the manuscript) and shown the runoff coefficient values, obtained with ERRA, to comment on Figure 9. These analyses will complement the existing ones and reinforce that, even within Switzerland, the selected watersheds span a gradient of hydrological regimes and flow behaviors, ranging from rainfall-dominated systems with frequent low flows to nival- and nivo-pluvial-influenced catchments capable of sustaining higher discharges. This supports our claim that the method was tested across distinct runoff regimes and hydrologic responses, even though all sites belong to the same national context.
  We don't see reasons why the method should not work in different climatic regions, as long as a streamflow response to precipitation is recorded in the data, but we agree that this claim would need to be tested. Therefore, we will carefully revise the manuscript to restrict statements about worldwide applicability. Our results are now explicitly framed as valid for basins with climatic and physiographic characteristics similar to those studied here. Broader testing in other climatic regions is indeed an important avenue for future research, but it is beyond the scope of this initial study.
  Minor revision
  
  Line 7. Describe the abbreviation ‘GAMCR’ at its first mention in the abstract.
  
  We thank the reviewer for having noticed it. We will clarify the abbreviation GAMCR in the abstract.
  Lines 17-18. Insert a source (citation) for the sentence: ‘The hydrologic response (or runoff response) is usually defined as the change in streamflow induced by a given input of precipitation.’
  
  We will add the following sources supporting this statement:
  [1] Ponce, V. M. 1995. Hydrologic and environmental impact of small and medium-size basins. San Diego State University. Available from: https://pon.sdsu.edu/protected29/cive445_ponce_chapter05a_lecture.html
  [2] Kirchner, J. W., P. Benettin, and I. van Meerveld. 2023. Instructive surprises in the hydrological functioning of landscapes. Annu. Rev. Earth Planet. Sci. 51:277–299. doi:10.1146/annurev-earth-071822-100356.
  [3] Kirchner, J. W. (2022). Impulse Response Functions for Nonlinear, Nonstationary, and Heterogeneous Systems, Estimated by Deconvolution and Demixing of Noisy Time Series. Sensors, 22(9), 3291. https://doi.org/10.3390/s22093291
  Lines 99-101. Explain (rewrite) this sentence further. It suggests that it varies smoothly between basins with different climatic and physiographic characteristics.
  
  Thank you for identifying this unclear statement. GAMCR is trained and applied within a specific catchment. What we meant in that sentence is that the hydrologic response is expected to vary over time, and we assume that such variations are not abrupt/discontinuous. We will rephrase the sentence as: "The feature vector encodes the catchment’s current state, reflecting that its response tends to follow predictable patterns; for example, minor variations in precipitation three months prior are expected to produce moderate changes in response."
  Line 102. The correct word is ‘third’, not ‘second’.
  
  We thank the reviewer for noting this oversight.
  Lines 163-164. Insert a source (citation) for the sentence: ‘In the case of the hydrologic response, the evaluation step is particularly important because the real-world impulse response functions cannot be measured directly...’.
  
  We plan to update the manuscript with the following: "In the case of the hydrologic response, the evaluation step is particularly important because impulse response functions in real basins cannot be measured directly and the model is trained on streamflow data only [3,4,5]."
  [3] Kirchner, J. W. (2022). Impulse Response Functions for Nonlinear, Nonstationary, and Heterogeneous Systems, Estimated by Deconvolution and Demixing of Noisy Time Series. Sensors, 22(9), 3291. https://doi.org/10.3390/s22093291
  
  [4] Gupta, H.V., Wagener, T. and Liu, Y. (2008), Reconciling theory with observations: elements of a diagnostic approach to model evaluation. Hydrol. Process., 22: 3802-3813. https://doi.org/10.1002/hyp.6989
  [5] McDonnell, J. J., and K. Beven (2014), Debates—The future of hydrological sciences: A (common) path forward? A call to action aimed at understanding velocities, celerities, and residence time distributions of the headwater hydrograph, Water Resour. Res., 50, 5342–5350, doi:10.1002/2013WR015141.
  Line 198. I don't agree that basins measuring 34 to 195 km² are considered medium-sized. They are all small to small/medium in size. Please, correct this! Moreover, this is one of the many reasons to calibrate and validate the method in other basin types.
  
  We thank the reviewer for this important comment. We agree that, based solely on drainage area, the studied basins are better classified as small to small/medium. At the same time, we note that there is no universally accepted, size-based classification for catchments. In hydrology, many definitions consider process-based characteristics rather than area alone when describing catchment size. For instance, [1] defines a midsize catchment based on hydrological behavior, including features such as rainfall intensity varying within a storm, spatially uniform precipitation, runoff dominated by overland and channel flow, and negligible channel storage due to steep slopes. We would claim that the selected basins meet these hydrologic criteria. We will revised the manuscript to clearly distinguish the area-based classification (small to small/medium) from the process-based hydrologic definition, to avoid ambiguity while providing proper justification for our methodological choices.
  [1] Ponce, V. M. 1995. Hydrologic and environmental impact of small and medium-size basins. San Diego State University. \par Available from: https://pon.sdsu.edu/protected29/cive445_ponce_chapter05a_lecture.html
  Lines 203-206. Even when analyzing out of the snowy months, the basins' antecedent conditions are influenced by snow. This is another reason to calibrate and validate the method in other types of watersheds.
  
  We thank the reviewer for this observation, and we acknowledge that antecedent conditions in the studied basins can be influenced by residual snow, even outside the main snowy months. In this study, we carefully selected events and basins to minimize snow-related effects, ensuring a robust evaluation of the GAMCR method under the targeted conditions. While this choice limits the generality of the results, it was necessary to avoid confounding factors such as snowmelt that could bias model assessment. Extending calibration and validation to basins with different snow or climatic regimes is an important avenue for future work but lies beyond the scope of the present study.
  Line 321. The basins are not climatically diverse as you mention. They are similar considering Switzerland's climate conditions in relation to the planet's climate diversity. Please correct this! Moreover, this is another reason to calibrate and validate the method in other parts (continents) of the world.
  
  We acknowledge that, in a global context, the selected Swiss basins are not {\it climatically} diverse. However, within Switzerland, the basins were chosen to cover a range of hydrological regimes, including Jura-nivopluvial, transition nival, pluvial, southern nivo-pluvial, and southern pluvio-nival. This diversity allowed us to test the GAMCR method across different runoff behaviors and seasonal patterns resulting in a broad range of responses. Indeed, Figure 7 shows that the same rainfall intensities result in hydrologic responses that are ten times larger at Euthal compared to Salmsach.
  We believe that our site selection does not diminish the value of the results, as the methodology and framework developed with these basins are applicable to other regions, provided that careful data evaluation is conducted. Testing in other climates and continents is an important avenue for future work, but it lies beyond the scope of this initial study.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1591-AC2

Interactive discussion

Status: closed

CC1:
'Comment on egusphere-2025-1591', Cameron McIntosh, 18 May 2025

Excellent article!
I was wondering about the level of confidence in the smoothness assumption for the GAMCR (and GAMs in general) for the current application. Is it possible that there may be both smooth and spikey areas of the nonlinear function? Perhaps it may be beneficial to test the performance of spline vs. wavelet bases to see if there are any material differences?
Behera, J., Bishi, N., & Sahu, S. K. (2025). Adaptive Nonparametric Regression Using Hybrid Fourier-Wavelet Series: A Generalized Inference Approach for Multiscale Socioeconomic Dynamics. Asian Journal of Probability and Statistics, 27(4), 60–67. https://doi.org/10.9734/ajpas/2025/v27i4739
Fonseca, R., Morettin, P., & Pinheiro, A. (2024). Wavelet Feature Screening. Journal of Computational and Graphical Statistics, 33(4), 1310–1319. https://doi.org/10.1080/10618600.2024.2342984
dos Santos Sousa, A. R. (2024). A Bayesian wavelet shrinkage rule under LINEX loss function. Research in Statistics, 2(1). https://doi.org/10.1080/27684520.2024.2362926
Sousa, A. R. dos S., & Garcia, N. L. (2023). Wavelet shrinkage in nonparametric regression models with positive noise. Journal of Statistical Computation and Simulation, 93(17), 3011–3033. https://doi.org/10.1080/00949655.2023.2215372
Rodrigo, A., & Zevallos, M. (2025, May 10). A note on wavelet shrinkage in nonparametric regression models with ARFIMA errors. ArXiv.org. https://arxiv.org/abs/2505.06485
Rodrigo, A., & Zevallos, M. (2025). On Bayesian wavelet shrinkage estimation of nonparametric regression models with stationary correlated noise. Statistics and Computing, 35(4). https://doi.org/10.1007/s11222-025-10618-6
Agyemang E. F. (2025). A Gaussian Process Regression and Wavelet Transform Time Series approaches to modeling Influenza A. Computers in Biology and Medicine, 184, 109367. https://doi.org/10.1016/j.compbiomed.2024.109367
Didi, S., & Bouzebda, S. (2025). Wavelet Estimation of Partial Derivatives in Multivariate Regression Under Discrete-Time Stationary Ergodic Processes. Mathematics, 13(10), 1587. https://doi.org/10.3390/math13101587
Kou , J., & Chesneau , C. (2022). Wavelet Estimation of Regression Derivatives for Biased and Negatively Associated Data. REVSTAT-Statistical Journal, 20(3), 353–371. https://doi.org/10.57805/revstat.v20i3.375
Amato, U., Antoniadis, A., Feis, I. D., & Gijbels, I. (2023). Penalized wavelet nonparametric univariate logistic regression for irregular spaced data. Statistics, 57(5), 1037–1060. https://doi.org/10.1080/02331888.2023.2248679
Giacofci, M., Lambert-Lacroix, S., & Picard, F. (2017). Minimax wavelet estimation for multisample heteroscedastic nonparametric regression. Journal of Nonparametric Statistics, 30(1), 238–261. https://doi.org/10.1080/10485252.2017.1406091
Zhou, Y., Wan, A. T. K., Xie, S., & Wang, X. (2010). Wavelet analysis of change-points in a non-parametric regression with heteroscedastic variance. Journal of Econometrics, 159(1), 183–201. https://doi.org/10.1016/j.jeconom.2010.06.001
‌He, Q., & Chen, M. (2021). Consistency properties for the wavelet estimator in nonparametric regression model with dependent errors. Journal of Inequalities and Applications, 2021(1). https://doi.org/10.1186/s13660-021-02603-0
Zhou, X., & Zhu, F. (2020). Asymptotics for $L_{1}$-wavelet method for nonparametric regression. Journal of Inequalities and Applications, 2020(1). https://doi.org/10.1186/s13660-020-02483-w
Barber, S., & Nason, G. P. (2024-11-04). waveband: Computes credible intervals for Bayesian wavelet shrinkage. R package, Version 4.7.4. Comprehensive R Archive Network. https://CRAN.R-project.org/package=waveband

Citation: https://doi.org/10.5194/egusphere-2025-1591-CC1
- CC2: 'Reply on CC1', Quentin Duchemin, 10 Jun 2025
  
  Thank you for your interest in our work and for your thoughtful feedback.
  Let us recall that GAMCR is based on two key elements:
  
  1) the basis functions used to decompose the transfer functions (denoted as $(b_l)_l$ in our paper)
  
  2) the time-dependent coefficients in this basis, which are modeled using GAMs as a function of a time-varying feature vector $z_t$.
  1) The choice of the basis functions $(b_l)_l$ was motivated by prior knowledge about the typical shape of hydrological transfer functions. These functions often exhibit sharp peaks at short lags and smoother behavior at longer lags. To capture this, we use splines with knots placed on an irregular grid, denser at short lags to allow more flexibility in capturing localized spikes.
  We agree that exploring alternative basis functions such as wavelets, or even learning the basis functions directly, could be an interesting direction for future research. However, the estimation of transfer functions from input-output fluxes is an ill-posed inverse problem. Introducing the appropriate inductive bias is essential to ensure not just good predictive performance, but also the recovery of physically meaningful latent structures. While spline bases naturally encode assumptions about local smoothness and support, defining such inductive biases with wavelet bases may be more challenging and could risk overfitting.
  2) In GAMCR, the time-dependent coefficients of the transfer function decomposition are modeled using additive GAMs with univariate spline terms for each feature. These splines are typically estimated using a smoothing spline approach, which includes a penalty on the second derivative of each spline to control its smoothness. A hyperparameter governs the trade-off between fidelity to the data and the smoothness of the fitted function.
  In our experiments, we found that the results were robust to the choice of this smoothing hyperparameter, and thus we used a fixed value across all experiments.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1591-CC2
RC1:
'Comment on egusphere-2025-1591', Anonymous Referee #1, 30 Jun 2025
General Comments:
The authors introduce GAMCR, a novel data-driven approach that uses Generalised Additive Models (GAM) to estimate time-dependent Catchment Responses (CR) or impulse response functions, which capture the complex, nonlinear relationships between rainfall and runoff. It performs well on synthetic data, where the true response functions are known. It produces reliable results across six Swiss catchments with distinct hydrological behaviours, aligning with findings from ensemble rainfall-runoff analysis (ERRA), another recent data-driven approach with a different architecture.
Overall, the authors make a strong case that with the rise of large rainfall-runoff datasets, the GAMCR is a novel approach to facilitate systematic comparisons of hydrological responses across sites globally where rainfall-runoff time series are available. While I think this article is excellent, I have a few minor comments that could enhance it.
Specific Comments:
My suggestions for the technical sections are as follows:
In Section 1, you describe GAMCR as a novel data-driven approach with a different architecture from the ERRA method yet note that the results are fully consistent between the two. However, it would be helpful to clarify the specific advantages of using GAMCR over ERRA, particularly in estimating hydrological responses across diverse watersheds and enabling systematic cross-site comparisons where rainfall-runoff time series are available. For example, is the benefit primarily computational efficiency, or does GAMCR offer other advantages such as flexibility, ease of implementation, or improved performance for certain types of catchments?

In Section 2.3 – Model training:
While it is clear that the machine learning model was trained using the synthetic streamflow dataset from the Chiasso gauging station on the Breggia River (Section 3.1), it is not clear what specific data period or proportion of data (e.g., 80% training, 20% testing) was used to train the GAMs. Clarifying the data splitting strategy for both the synthetic and real datasets would help readers better understand the model training process.

It would be helpful to provide the final hyperparameter λ1 and λ2 values used in the model training process. The default values of λ1=10−3 and λ2=1 are mentioned in Section 3.3, but what are the optimised values? Have you thought about how reducing the training data might impact the model's performance? This could give you some insight into the model's robustness with less data.

You note that the model was trained on synthetic data (Section 3.1) for the three cases, and model testing is described in Section 3. However, the model validation is only introduced later in Section 4. For clarity and better flow, it would be helpful to briefly outline the validation approach in Section 3. Specifically, this could include a description of how the model is validated by computing the hydrologic response in the form of non-linear runoff functions (NRFs) across six quantiles of precipitation intensity and comparing these against both the ERRA-derived estimates and the benchmark outputs generated directly from the model.

In Section 2.4, you could provide the link to the GAMCR model software and online tutorial mentioned in the text would be helpful. Only a link to the datasets used in the study is made available in the paper.

In Section 3.1 - Synthetic data:
You explain that you used a lumped conceptual model (outlined in Supplementary Section 2) with rainfall and air temperature inputs from the Lugano station to generate a 40-year synthetic hourly time series, calibrated against measured streamflow data at Chiasso, Ponte di Polenta, on the Breggia River. You note that the simulation ‘closely mirrors actual measurements’ and provide a four-month example from 2010 to support this. However, to strengthen this claim, it would be helpful to include quantitative model performance statistics (e.g., R2, NSE, PBias, RMSE) for Case A. Ideally, this should cover the full 40-year period and also highlight in a figure the performance for a wet year (e.g., 2014) and a dry year (e.g., 2003) at Chiasso for the Breggia as identified on the Swiss FOEN Hydrological data and forecasts website (Breggia - Chiasso, Ponte di Polenta).

You explain that the synthetic streamflow data were generated using a lumped conceptual model with precipitation inputs from the Lugano station. In contrast, the real-world data for the six Swiss catchments use precipitation inputs from the 'CombiPrecip' product (MeteoSwiss), which combines radar data and land-based rain gauge observations. Since the GAMCR model is ultimately demonstrated on these six diverse catchments using CombiPrecip data, have you considered also generating a synthetic streamflow time series using CombiPrecip (2005 to 2019) as the precipitation input to train the model? This would allow for a more direct comparison and could potentially improve consistency in training and evaluating model performance.

In Section 3.2 – Real-world data:
It is clear why you selected the six Swiss watersheds for model testing, because they will have different hydrological responses without the presence of glaciers and have no catchment interventions, and have complete and reliable data records; however, it would help if you clarified why selecting a medium-sized catchment was a criterion.

While Table 1 gives an overview of the gauging stations and catchment characteristics, you aim to demonstrate that the six watersheds have distinct hydrological flow regimes. To strengthen this, could you:
Tabulate key hydrological statistics for each watershed, such as the annual mean, annual maximum, and annual minimum flow, using data available from the FOEN Hydrological Data and Forecasts website.

Provide a Flow Duration Curve (FDC) for each watershed, based on observed streamflow data. The FDC offers a statistical summary of streamflow behaviour, independent of the sequence or timing of flows and would help to characterise the unique hydrological response of each catchment. This could supplement the in-depth analysis provided for the six Swiss catchments in the Supplementary Section 1.

You mention compiling a 15-year record (2005–2019) of hourly precipitation-runoff data for six Swiss watersheds, using the CombiPrecip product from MeteoSwiss, which is generally considered reliable. However, CombiPrecip accuracy depends on the quality of its input data, namely, land-based rain gauges and radar estimates. Notably, the Swiss radar network was expanded from three to five stations between 2010 and 2016, and the number of rain gauges increased significantly from 71 (in 2005–2010) to over 200 by 2015, leading to substantial improvements in data quality. Since CombiPrecip also includes a quality index for assessing data reliability, have you considered excluding data before 2010 to enhance the robustness of your analysis?

In Section 3.3 – Implementation details, you state that a precipitation intensity threshold J^th of 0.05 mm/h was applied, and the GAMCR model was trained only on events exceeding this threshold. However, in line 275, when discussing the weighted average RRDs and peak heights of the NRFs estimated by ERRA and GAMCR for the six catchments, the analysis appears to include only events with precipitation intensities above 0.5 mm/h. Should this threshold be consistent with the previously stated J^th = 0.05 mm/h?

In Section 4 – Results:
Be consistent when referring to the three cases, especially whether Case A is the reference response (line 234) or is it the base case (line 241), or in Figure 5 labelled ‘base’?

Earlier in Section 3.1, you refer to the synthetic datasets as Case A (baseline), Case B (damped), and Case C (flashy). However, in Figure 5, the labels are inconsistent: the flashy system is shown as plot (a), the baseline as (b), and the damped as (c), ordered by decreasing NRF response (mm/h²). To avoid confusion and improve clarity, would you consider renaming the cases in Section 3.1 as Case 1, Case 2, and Case 3?

You show that both the GAMCR and ERRA methods produce consistent estimates of peak height and runoff volume across different scenarios. However, both approaches struggle with accurately predicting peak lag. You note that GAMCR tends to produce ‘less variable lag values’ across different precipitation intensities, with the NRF peak ‘fixed’ at 2 hours for the base and flashy catchments (beyond 5 mm/h), highlighting a limitation of the method:
Could you elaborate on why GAMCR struggles with peak lag prediction? Is this due to the use of a coarser temporal resolution where the flashy, base, and damped synthetic input time series have been aggregated to 2-, 3-, and 6-hour time steps, respectively?

Additionally, what are the implications of this limitation for applying GAMCR in cross-site hydrological comparisons at a global scale, as you state in Section 5 that GAMCR should currently not be used to estimate the timing of the hydrologic response.

In Section 4.2, Estimation of Real-world Hydrological Responses:
You refer in Figure 7 to a calibration period from 2005 to 2017. I assume this refers to the training period. For clarity and consistency, it would be helpful to use consistent terminology throughout the paper, either "calibration" or "training", and to introduce this period earlier in the methods section, rather than for the first time in the results.

Similarly, the term "out-of-sample predictions" appears here for the first time, along with the 2018–2019 period. If this refers to the testing or validation period, please make this explicit in the main text, rather than only in the figure caption. Also, define what is meant by "out-of-sample prediction" when it is first introduced, so readers unfamiliar with the term can clearly understand its role in the analysis.

Are the out-of-sample predictions 1 year sufficient to test the GAMCR model performance?

Referring to Figure 7, are there any issues with the quality of the observations at high flows at Chiasso for the Breggia (2349), i.e. the stage discharge relationship? Have you investigated the confidence in the stream flow records at this river gauge?

The NSE for the out-of-sample prediction for the Chiasso dataset is 0.19, which indicates poor model performance. You mention that the GAMCR model was trained on synthetic data generated for the Chiasso catchment and that this synthetic data closely matches the real-world observations. However, in Figure S3, the model clearly overestimates streamflow compared to the measured data for the four months in 2019. Can you clarify why the model shows such a poor NSE and a weak fit to the observed streamflow time series, despite the apparent agreement between the synthetic and observed training data? Additionally, do you think the predictions for Chiasso would improve if you used the CombiPrecip product as the rainfall input, rather than Lugano rainfall data used here?

In section 5, Discussion and Conclusions:
You state that GAMCR should not currently be used to estimate the timing of the hydrologic response, noting that the timing of the NRF peak was generally inaccurate, with GAMCR systematically underestimating the peak lag. You suggest that this limitation could be addressed by using a different organisation of the basis functions that form the core of the response. For clarity, could you be more specific about what changes to the basis function structure you are proposing? Providing a brief explanation would help readers understand how this adjustment could improve peak timing estimation.

In line 324, you say that ‘we verified that the modelled streamflow was generally realistic for both in-sample and out-of-sample data (Figure 7)’. Be consistent with the terminology you used, calibration period in Figure 7, not the term in-sample data. Similarly, you used out-of-sample prediction in Figure 7. Is this the testing period? More consistent terminology throughout the paper would help the reader understand what you mean by training, calibration, testing, validating, in-sample and out-of-sample prediction.

Good to see that the next model developments could target different or denser basis functions capable of improving the estimation of the peak lag.

Technical Corrections:
Other minor suggestions:
The title could be simplified by changing "via" to "using" for more clarity.

The use of the phrase ‘real-world’ data is an odd turn of phrase as you are not comparing to a virtual world. You could consider changing this to simply measured or observed datasets for clarity. For example, you use the terminology observed streamflow in Figure 7.

More consistent terminology throughout the paper would help the reader understand what you mean by training, calibration, testing, validating, in-sample and out-of-sample prediction.

Overview comments:
Overall, the study is excellent and presents a novel data-driven approach to facilitate systematic comparisons of hydrological responses across sites where rainfall-runoff time series are available. The authors should be commended as the GAMCR approach produces results that are closely aligned with results from the ERRA approach. The authors demonstrate that the GAMCR is a robust tool to study runoff response behaviour in real-world catchments. However, it would be helpful to the reader if it were clearer about what the benefits of the GAMCR approach over the ERRA approach are for estimating the hydrological responses for diverse watersheds to justify the study earlier in the paper.
It is also encouraging that the authors acknowledge that this first application of GAMCR to synthetic and real-world data has helped them identify some current model limitations, and that the next model developments could target different or denser basis functions capable of improving the estimation of the peak lag. Minor revisions would strengthen the paper, but this use of a novel data-driven approach that uses GAM to estimate time-dependent catchment responses to capture the complex, non-linear relationship between rainfall and runoff is of interest to others in the field.
Citation: https://doi.org/10.5194/egusphere-2025-1591-RC1
- AC1:
  'Reply on RC1', Maria Grazia Zanoni, 19 Sep 2025
  Dear Reviewer,
  Thank you very much for your valuable comments and suggestions on our manuscript. We appreciate the time and effort you have dedicated to reviewing our work. We have carefully considered each point raised and have revised the manuscript accordingly. A detailed response to each comment is provided below.
  In Section 1, [...] it would be helpful to clarify the specific advantages of using GAMCR over ERRA. [...]
  
  We thank Reviewer 1 for this comment. An exhaustive discussion of the differences between the models is presented in the discussion, but we will clarify already at the end of the introduction that ``Differently from ERRA, GAMCR aims to estimate the hydrologic response to each individual precipitation events using combinations of spline basis functions, with coefficients determined through machine learning techniques. This approach, though requiring to fit Generalized Additive Models, allows for greater flexibility since additional information (e.g., temperature, dam operations, or site-specific characteristics) can be incorporated into the model."
  In Section 2.3 – Model training: [...] it is not clear what specific data period or proportion of data
  
  We thank the reviewer for the comment. For real data, we used 13 years of data (2005-2017) to train the models. We used data from the years 2018 and 2019 to test the model. We will clarify this detail about the training/test split for the real data in Section 3.2. Section 2.3, instead, describes the general training methodology, which is not specific to either synthetic or real data.
  In Section 2.3 – Model training: It would be helpful to provide the final hyperparameter λ1 and λ2 values used in the model training process. The default values of λ1=10^-3 and λ2=1 are mentioned in Section 3.3.
  
  We will specify in Section 2.3 that the default values λ1=10^-3and λ2=1 were used.
  In Section 2.3 – Model training: the model validation is only introduced later in Section 4.
  
  We will introduce a subsection ``Model validation strategy'' already within the methods section (Section 3), to clarify our strategy from the beginning. This will allow us to simplify Section 4 to focus entirely on the results. Thank you for this comment.
  In Section 2.4, you could provide the link to the GAMCR model software and online tutorial mentioned in the text would be helpful.
  
  This will be certainly done in the revised manuscript, thanks.
  In Section 3.1 - Synthetic data: You note that the simulation ‘closely mirrors actual measurements’ and provide a four-month example from 2010 to support this. However, to strengthen this claim, it would be helpful to include quantitative model performance statistics (e.g., R2, NSE, PBias, RMSE) for Case A. Ideally, this should cover the full 40-year period and also highlight in a figure the performance for a wet year (e.g., 2014) and a dry year (e.g., 2003) [...]
  
  We will include the RMSE statistic (which is 0.20 mm h^-1 ) in the supplement but prefer to omit it from the main manuscript, as it might mislead readers into thinking the synthetic data generator is designed to best reproduce observed data. Its purpose is instead to create hydrologically realistic series with fully known responses, not to best replicate the specific data at the Chiasso, Ponte di Polenta station. We will clarify this in the manuscript and in the supplement.
  In Section 3.1 - Synthetic data: Have you considered also generating a synthetic streamflow time series using CombiPrecip (2005 to 2019) as the precipitation input to train the model? This would allow for a more direct comparison and could potentially improve consistency in training and evaluating model performance.
  
  As for the previous response, we will better stress in the revised manuscript that the purpose of the data generator is just to create realistic synthetic data featuring nonlinear and nonstationary hydrologic responses. Using a different precipitation input would not result in increased consistency between training and evaluating model performance. The different input series would result in slightly different model parameters, but the synthetic data on which the GAMCR model is tested would still be ``perfect'' (in the sense that it is not subject to measurement errors) because it is produced by a model.
  In Section 3.2 – Real-world data: it would help if you clarified why selecting a medium-sized catchment was a criterion.
  
  We thank the reviewer for the insightful request and we will discuss in more detail why we chose the selected basins and how we can define them as midsize catchments. We selected basins between 34–185 km², which are considered small to medium-sized. In the Swiss context, they are well-suited for hydrological analysis, as their scale reduces the need for complex distributed models while still capturing key processes. The chosen sites also represent diverse hydrological regimes, elevations, and soil depths. This will be clarified and detailed in Section 3.2.
  In Section 3.2 - Real-world data: Could you: i) Tabulate key hydrological statistics for each watershed ii) Provide a Flow Duration Curve (FDC) for each watershed.
  
  We will add a Table 2 in section 3.2 with key hydrological statistics; we will also provide the FDCs for each watershed for the snow-free period in a new figure.
  In Section 3.2 – Real-world data: Since CombiPrecip also includes a quality index for assessing data reliability, have you considered excluding data before 2010 to enhance the robustness of your analysis?
  
  We were aware of the modifications to the Swiss radar network implemented between 2005 and 2016, which significantly improved the data quality of the CombiPrecip product. We included data starting from 2005, as initial experiments indicated that the model's performance was not substantially impacted by the lower data quality prior to 2015. This decision also reflects a more realistic scenario in which flux data is available over at least a decade.
  In Section 3.3 – Implementation details: Should this threshold be consistent with the previously stated Jth = 0.05 mm/h?
  
  It is normal that the thresholds differ: the 0.05 mm/h threshold was used for training stability and efficiency, while the 0.5 mm/h threshold was chosen for analysis, focusing on more hydrologically relevant events. We will clarify this point in the revision.
  In Section 4 – Results: Be consistent when referring to the three cases. in Figure 5, the labels are inconsistent
  
  We thank the reviewer for raising this point. We will change the order of the panels in Figure 6 to make sure figures a, b and c match the cases A, B and C considered.
  
  In Section 4 - Results: Could you elaborate on why GAMCR struggles with peak lag prediction? Additionally, what are the implications of this limitation for applying GAMCR in cross-site hydrological comparisons at a global scale.
  
  In section 5: For clarity, could you be more specific about what changes to the basis function structure you are proposing? Providing a brief explanation would help readers understand how this adjustment could improve peak timing estimation.
  
  We believe that predicting the peak lag is particularly challenging from a statistical perspective. This is because changes in the peak lag can result in only minor differences in the induced discharge, making the estimation problem very difficult. The main implication is that the peak lag estimated by GAMCR should not be used for cross-site hydrological comparisons, as we already stated in the discussion (lines 318--319, ``GAMCR should currently not be used to estimate the timing of the hydrologic response").
  In such an ill-conditioned scenario, the first lever to improve peak lag estimation is increasing the size of the training dataset. However, model architecture also plays a significant role. GAMCR, for instance, relies on a set of positive, unimodal basis functions that are denser at shorter lags. The choice and design of these basis functions directly affect the accuracy of peak lag estimation.
  It would therefore be valuable to investigate the optimal knot placement for B-splines in the context of peak lag estimation. Reducing the number of knots may limit the model's expressiveness, while having too many basis functions can cause identifiability issues—different combinations of basis functions might result in similar discharge responses, further complicating peak lag estimation.
  For this reason, we believe that instead of predefining the basis functions, it would be an interesting research direction to learn them directly from data. This would allow the model to adapt the number and shape of basis functions to better match the characteristics of the watershed under study. The challenge, however, lies in imposing appropriate constraints on the learned basis functions: a well-designed inductive bias is essential to mitigate the ill-posed nature of recovering the latent transfer function from streamflow time series. We leave this investigation for future work.
  These points will be included in Section 5 of the revised manuscript.
  In Section 4.2: For clarity and consistency, it would be helpful to use consistent terminology throughout the paper, either "calibration" or "training".
  
  In Section 5: Be consistent with the terminology you used, calibration period in Figure 7, not the term in-sample data.
  
  We thank the reviewer for pointing out this lack of consistency in the terminology. In the revised manuscript, we will stick to the terminology "training" and "test periods" to enhance clarity.
  Section 4.2: Are the out-of-sample predictions 1 year sufficient to test the GAMCR model performance?
  
  The test period spans two years: 2018 and 2019. With hourly data, we believe that this gives a reasonable dataset to evaluate the model performance. In section 3.2 of the revised manuscript, we will clarify the training/test split considered.
  Section 4.2: Referring to Figure 7, are there any issues with the quality of the observations at high flows at Chiasso for the Breggia (2349), i.e. the stage discharge relationship? Have you investigated the confidence in the stream flow records at this river gauge?
  
  We do not see any particular issues with the quality of the high-flow observations at Chiasso (Breggia, 2349). The discharge data used in this study were official data provided by the Federal Office for the Environment (FOEN), which applies standardized procedures for flow measurement and quality control at all official gauging stations in Switzerland. As such, we rely on these validated records, and no indications of systematic uncertainty or bias at high flows were reported for this station.
  In Section 4.2: The NSE for the out-of-sample prediction for the Chiasso dataset is 0.19, which indicates poor model performance. Can you clarify why the model shows such a poor NSE and a weak fit to the observed streamflow time series, despite the apparent agreement between the synthetic and observed training data? Additionally, do you think the predictions for Chiasso would improve if you used the CombiPrecip product as the rainfall input, rather than Lugano rainfall data used here?
  
  For all real-world data experiments, we rely on the CombiPrecip product as the rainfall input. The Lugano rainfall was used only to create the synthetic datasets.
  To clarify, in Section 4.2 we focus exclusively on real datasets; synthetic data is not considered. Figure 7 presents the streamflow estimates obtained with GAMCR for the six study sites. It is important to note that GAMCR was not specifically designed to achieve state-of-the-art performance in streamflow prediction. A model optimized for this task would need to be carefully designed and trained to handle rare events such as extreme rainfall or very high streamflow. Nevertheless, we included Figure 7 to demonstrate that GAMCR is capable of streamflow prediction, even though it is not tailored for it. The NSE coefficient is known to be sensitive to high-flow values, and a few large errors—typically occurring under high-streamflow conditions as illustrated in Figure 7, and possibly affected by poor rainfall estimates—can markedly reduce NSE values, even when the model performs well in most other cases.
  Technical Corrections
  
  We thank the reviewer for these helpful corrections, which we are happy to implement in the revised manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1591-AC1
RC2:
'Comment on egusphere-2025-1591', Anonymous Referee #2, 25 Aug 2025

I consider this study interesting and scientifically valid. However, it needs improvements before it can be published. Below are the corrections needed for the publication of this article, and I would like to be able to review this article again after the requested corrections have been made.
Major revision
-The main limitation of this study is that it considers the developed method to be of global application (in any watershed worldwide), but only calibrated and validated it in small and small/medium watersheds with climatic and physiographic characteristics existing in Switzerland. Therefore, I recommend two solutions to this problem: all mentions of the method's worldwide application should be replaced by mentions of its application restricted to the types and conditions of calibrated/validated watersheds. Or (better), the developed method should be calibrated and validated in basins with different climatic and physiographic characteristics worldwide. An example of a data source for large-sized tropical basins is the Ore-Hybam (but other sources should also be used).
Minor revision
-Line 7. Describe the abbreviation ‘GAMCR’ at its first mention in the abstract.
-Lines 17-18. Insert a source (citation) for the sentence: ‘The hydrologic response (or runoff response) is usually defined as the change in streamflow induced by a given input of precipitation.’
-Lines 99-101. Explain (rewrite) this sentence further. It suggests that it varies smoothly between basins with different climatic and physiographic characteristics.
-Line 102. The correct word is ‘third’, not ‘second’.
-Lines 163-164. Insert a source (citation) for the sentence: ‘In the case of the hydrologic response, the evaluation step is particularly important because the real-world impulse response functions cannot be measured directly...’.
-Line 198. I don't agree that basins measuring 34 to 195 km² are considered medium-sized. They are all small to small/medium in size. Please, correct this! Moreover, this is one of the many reasons to calibrate and validate the method in other basin types.
-Lines 203-206. Even when analyzing out of the snowy months, the basins' antecedent conditions are influenced by snow. This is another reason to calibrate and validate the method in other types of watersheds.
-Line 321. The basins are not climatically diverse as you mention. They are similar considering Switzerland's climate conditions in relation to the planet's climate diversity. Please correct this! Moreover, this is another reason to calibrate and validate the method in other parts (continents) of the world.

Citation: https://doi.org/10.5194/egusphere-2025-1591-RC2
- AC2:
  'Reply on RC2', Maria Grazia Zanoni, 19 Sep 2025
  I consider this study interesting and scientifically valid. However, it needs improvements before it can be published. Below are the corrections needed for the publication of this article, and I would like to be able to review this article again after the requested corrections have been made.
  
  We thank Reviewer 2 for their evaluation of our manuscript and for providing useful and constructive feedback. We agree with the comments and will incorporate the suggested revisions (see detailed responses below). Since the reviewer has already given clear guidance and the required changes are rather minor, we do not see the necessity of another round of review. The manuscript has already been under public discussion for five months, and we believe there is no reason to further delay its publication.
  Major revision
  
  The main limitation of this study is that it considers the developed method to be of global application (in any watershed worldwide), but only calibrated and validated it in small and small/medium watersheds with climatic and physiographic characteristics existing in Switzerland. Therefore, I recommend two solutions to this problem: all mentions of the method's worldwide application should be replaced by mentions of its application restricted to the types and conditions of calibrated/validated watersheds. Or (better), the developed method should be calibrated and validated in basins with different climatic and physiographic characteristics worldwide. An example of a data source for large-sized tropical basins is the Ore-Hybam (but other sources should also be used).
  
  We thank the reviewer for this constructive feedback. A recurring concern was the limited geographic and climatic diversity of the study sites, since all basins are located in Switzerland. We acknowledge that we may have been unclear about this point and would like to emphasize that it was not our intention to claim that GAMCR has already been validated across global climatic conditions. Our deliberate focus on Swiss basins ensured the use of high-quality, high temporal resolution, well-documented data from sites free of strong anthropogenic disturbances (e.g., dams, major abstractions), allowing us to conduct a robust first evaluation of the method. We have now computed the Flow Duration Curves (FDCs) of streamflow for all six basins (new figure that will be added to the manuscript) and shown the runoff coefficient values, obtained with ERRA, to comment on Figure 9. These analyses will complement the existing ones and reinforce that, even within Switzerland, the selected watersheds span a gradient of hydrological regimes and flow behaviors, ranging from rainfall-dominated systems with frequent low flows to nival- and nivo-pluvial-influenced catchments capable of sustaining higher discharges. This supports our claim that the method was tested across distinct runoff regimes and hydrologic responses, even though all sites belong to the same national context.
  We don't see reasons why the method should not work in different climatic regions, as long as a streamflow response to precipitation is recorded in the data, but we agree that this claim would need to be tested. Therefore, we will carefully revise the manuscript to restrict statements about worldwide applicability. Our results are now explicitly framed as valid for basins with climatic and physiographic characteristics similar to those studied here. Broader testing in other climatic regions is indeed an important avenue for future research, but it is beyond the scope of this initial study.
  Minor revision
  
  Line 7. Describe the abbreviation ‘GAMCR’ at its first mention in the abstract.
  
  We thank the reviewer for having noticed it. We will clarify the abbreviation GAMCR in the abstract.
  Lines 17-18. Insert a source (citation) for the sentence: ‘The hydrologic response (or runoff response) is usually defined as the change in streamflow induced by a given input of precipitation.’
  
  We will add the following sources supporting this statement:
  [1] Ponce, V. M. 1995. Hydrologic and environmental impact of small and medium-size basins. San Diego State University. Available from: https://pon.sdsu.edu/protected29/cive445_ponce_chapter05a_lecture.html
  [2] Kirchner, J. W., P. Benettin, and I. van Meerveld. 2023. Instructive surprises in the hydrological functioning of landscapes. Annu. Rev. Earth Planet. Sci. 51:277–299. doi:10.1146/annurev-earth-071822-100356.
  [3] Kirchner, J. W. (2022). Impulse Response Functions for Nonlinear, Nonstationary, and Heterogeneous Systems, Estimated by Deconvolution and Demixing of Noisy Time Series. Sensors, 22(9), 3291. https://doi.org/10.3390/s22093291
  Lines 99-101. Explain (rewrite) this sentence further. It suggests that it varies smoothly between basins with different climatic and physiographic characteristics.
  
  Thank you for identifying this unclear statement. GAMCR is trained and applied within a specific catchment. What we meant in that sentence is that the hydrologic response is expected to vary over time, and we assume that such variations are not abrupt/discontinuous. We will rephrase the sentence as: "The feature vector encodes the catchment’s current state, reflecting that its response tends to follow predictable patterns; for example, minor variations in precipitation three months prior are expected to produce moderate changes in response."
  Line 102. The correct word is ‘third’, not ‘second’.
  
  We thank the reviewer for noting this oversight.
  Lines 163-164. Insert a source (citation) for the sentence: ‘In the case of the hydrologic response, the evaluation step is particularly important because the real-world impulse response functions cannot be measured directly...’.
  
  We plan to update the manuscript with the following: "In the case of the hydrologic response, the evaluation step is particularly important because impulse response functions in real basins cannot be measured directly and the model is trained on streamflow data only [3,4,5]."
  [3] Kirchner, J. W. (2022). Impulse Response Functions for Nonlinear, Nonstationary, and Heterogeneous Systems, Estimated by Deconvolution and Demixing of Noisy Time Series. Sensors, 22(9), 3291. https://doi.org/10.3390/s22093291
  
  [4] Gupta, H.V., Wagener, T. and Liu, Y. (2008), Reconciling theory with observations: elements of a diagnostic approach to model evaluation. Hydrol. Process., 22: 3802-3813. https://doi.org/10.1002/hyp.6989
  [5] McDonnell, J. J., and K. Beven (2014), Debates—The future of hydrological sciences: A (common) path forward? A call to action aimed at understanding velocities, celerities, and residence time distributions of the headwater hydrograph, Water Resour. Res., 50, 5342–5350, doi:10.1002/2013WR015141.
  Line 198. I don't agree that basins measuring 34 to 195 km² are considered medium-sized. They are all small to small/medium in size. Please, correct this! Moreover, this is one of the many reasons to calibrate and validate the method in other basin types.
  
  We thank the reviewer for this important comment. We agree that, based solely on drainage area, the studied basins are better classified as small to small/medium. At the same time, we note that there is no universally accepted, size-based classification for catchments. In hydrology, many definitions consider process-based characteristics rather than area alone when describing catchment size. For instance, [1] defines a midsize catchment based on hydrological behavior, including features such as rainfall intensity varying within a storm, spatially uniform precipitation, runoff dominated by overland and channel flow, and negligible channel storage due to steep slopes. We would claim that the selected basins meet these hydrologic criteria. We will revised the manuscript to clearly distinguish the area-based classification (small to small/medium) from the process-based hydrologic definition, to avoid ambiguity while providing proper justification for our methodological choices.
  [1] Ponce, V. M. 1995. Hydrologic and environmental impact of small and medium-size basins. San Diego State University. \par Available from: https://pon.sdsu.edu/protected29/cive445_ponce_chapter05a_lecture.html
  Lines 203-206. Even when analyzing out of the snowy months, the basins' antecedent conditions are influenced by snow. This is another reason to calibrate and validate the method in other types of watersheds.
  
  We thank the reviewer for this observation, and we acknowledge that antecedent conditions in the studied basins can be influenced by residual snow, even outside the main snowy months. In this study, we carefully selected events and basins to minimize snow-related effects, ensuring a robust evaluation of the GAMCR method under the targeted conditions. While this choice limits the generality of the results, it was necessary to avoid confounding factors such as snowmelt that could bias model assessment. Extending calibration and validation to basins with different snow or climatic regimes is an important avenue for future work but lies beyond the scope of the present study.
  Line 321. The basins are not climatically diverse as you mention. They are similar considering Switzerland's climate conditions in relation to the planet's climate diversity. Please correct this! Moreover, this is another reason to calibrate and validate the method in other parts (continents) of the world.
  
  We acknowledge that, in a global context, the selected Swiss basins are not {\it climatically} diverse. However, within Switzerland, the basins were chosen to cover a range of hydrological regimes, including Jura-nivopluvial, transition nival, pluvial, southern nivo-pluvial, and southern pluvio-nival. This diversity allowed us to test the GAMCR method across different runoff behaviors and seasonal patterns resulting in a broad range of responses. Indeed, Figure 7 shows that the same rainfall intensities result in hydrologic responses that are ten times larger at Euthal compared to Salmsach.
  We believe that our site selection does not diminish the value of the results, as the methodology and framework developed with these basins are applicable to other regions, provided that careful data evaluation is conducted. Testing in other climates and continents is an important avenue for future work, but it lies beyond the scope of this initial study.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1591-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Maria Grazia Zanoni on behalf of the Authors (30 Sep 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (11 Oct 2025) by Charles Onyutha

AR by Maria Grazia Zanoni on behalf of the Authors (17 Oct 2025)

Journal article(s) based on this preprint

17 Nov 2025

Data-driven estimation of the hydrologic response using generalized additive models

Quentin Duchemin, Maria Grazia Zanoni, Marius G. Floriancic, Hansjörg Seybold, Guillaume Obozinski, James W. Kirchner, and Paolo Benettin

Geosci. Model Dev., 18, 8663–8678, https://doi.org/10.5194/gmd-18-8663-2025,https://doi.org/10.5194/gmd-18-8663-2025, 2025

Short summary

Quentin Duchemin, Maria Grazia Zanoni, Marius G. Floriancic, Hansjörg Seybold, Guillaume Obozinski, James W. Kirchner, and Paolo Benettin

Supplement

https://doi.org/10.5194/egusphere-2025-1591-supplement

Quentin Duchemin, Maria Grazia Zanoni, Marius G. Floriancic, Hansjörg Seybold, Guillaume Obozinski, James W. Kirchner, and Paolo Benettin

Viewed

Total article views: 2,327 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
2,179	122	26	2,327	47	39	49

HTML: 2,179
PDF: 122
XML: 26
Total: 2,327
Supplement: 47
BibTeX: 39
EndNote: 49

Views and downloads (calculated since 25 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	55	14	3	72
May 2025	91	38	6	135
Jun 2025	95	3	7	105
Jul 2025	67	11	3	81
Aug 2025	369	13	2	384
Sep 2025	1,362	24	4	1,390
Oct 2025	110	12	1	123
Nov 2025	30	7	0	37

Cumulative views and downloads (calculated since 25 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	55	14	3	72
May 2025	91	38	6	135
Jun 2025	95	3	7	105
Jul 2025	67	11	3	81
Aug 2025	369	13	2	384
Sep 2025	1,362	24	4	1,390
Oct 2025	110	12	1	123
Nov 2025	30	7	0	37

Viewed (geographical distribution)

Total article views: 2,227 (including HTML, PDF, and XML) Thereof 2,227 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 17 Nov 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (25146 KB)
Metadata XML

Short summary

We introduce GAMCR, a data-driven model that estimates how catchments respond to individual precipitation events. We validate GAMCR on synthetic data and demonstrate its ability to investigate the characteristic runoff responses from real-world hydrologic series. GAMCR provides new data-driven opportunities to understand and compare hydrological behavior across different catchments worldwide.


Total:	0
HTML:	0
PDF:	0
XML:	0