the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Climate and landscape jointly control Europe's hydrology
Abstract. The complex composition of hydrological systems, climates and landscapes makes it challenging to explain and predict hydrological streamflow response. Many previous large-sample studies, mostly focused on the United States, identified climate as the primary control, with landscape exerting only a minor role in shaping hydrological behaviour. Yet, a few other studies report contradicting results with landscape being a more dominant driver. In this study, we use an unprecedentedly large sample of more than 7000 catchments in Europe from the EStreams dataset to identify and map functionally similar catchments, together with their spatially variable climate and landscape controls. The wide spatial and temporal gradient of the study catchments was used to identify hydrological response types (HRTs) based on 40 hydrological streamflow signatures related to long-term averages and inter-annual variability of magnitude, timing, duration, frequency, and seasonality. Overall, 10 HRTs could be identified. Several HRTs are well defined and well distinguishable, largely due to catchments with strongly seasonal or more extreme behaviour. Other HRTs remain difficult to distinguish, as these catchments represent more transitional conditions with increasingly overlapping characteristics between HRTs. The underlying drivers of the HRTs were identified by using 84 climate- and landscape attributes to predict catchment membership to their respective HRT with a Random Forest classification model. Climate emerges as the dominant driver of hydrological behaviour at the continental scale. However, landscape was found, in 4 out of 10 HRTs, to be at least as strong or even stronger a control on the hydrological response. These results highlight that the complex, integrated nature of hydrological response remains challenging to disentangle, even with extensive datasets and advanced modelling approaches, and therefore, climate and landscape needs to be understood as joint drivers in a co-evolutionary perspective.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(25604 KB) - Metadata XML
-
Supplement
(16648 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-6372', Anonymous Referee #1, 12 Feb 2026
-
AC1: 'Reply on RC1', Julia M. Rudlang, 27 Mar 2026
Summary Comment: The manuscript seeks to broadly analyze hydrologic signatures across Europe. The analysis sought to address the question of whether we can (1) group hydrological signature data into hydrological response types, and (2) can we predict hydrological response types from climate and landscape attributes? The analysis is of interest, but the Methods are difficult to follow.
We would like to thank the referee for all the comments. We appreciate the time and effort taken to read through our manuscript in detail and to provide us with constructive, helpful and interesting feedback and thoughts on our research. We are taking all the comments into account for revising the manuscript.
We have separated the different comments, shown in italic, and our written replies below, shown in regular font and suggested revised text in bold.
General Comments:
Introduction – The Introduction largely poses climate and landscape as opposing and homogeneous contenders driving hydrological response. As climate acts on a landscape, for example, the rate that precipitation moves into the stream network will depend in part on topography, soil type and wetland/floodplain storage capacity, it would be helpful to acknowledge or frame these overarching hydrological drivers as more inter-related instead of opposed.
We completely agree that climate and landscape are inherently inter-related, rather than independent or even opposing factors. The intention was not to present them as competing drivers, but to structure the problem conceptually. To improve clarity, we will revise the Introduction to more explicitly emphasize their coupled influence on hydrological processes. Specifically, we will add the following sentence early in the Introduction (Line 23):"In practice, these controls are inherently intertwined, as climatic inputs are mediated through landscape properties such as topography, soils, geology, vegetation and storage elements before contributing to streamflow."
Introduction - Landscape controls can represent a large number of different variables and attributes, but few specific landscape attributes are mentioned, please elaborate on what specific landscape drivers have tended to emerge as important in past studies. This will help to justify the types of landscape variables included in the analysis.
We acknowledge that an overview of specific landscape attributes should be added to the introduction. We will clarify and elaborate more on the most common dominant landscape drivers emerging in past studies: lithology, topography, soil texture, land use and land cover (Fenicia and McDonnell 2022, Almagro et al.,2025, Kerins et al., 2025).
Data - What percent of the gages are nested within another gage used in the analysis? How does nested gages influence the independence of these watersheds?
The total percentage of nested gauges is 42%, but varies between 23-59% for the individual 10 hydrological response types groups (HRTs) (percentage given relative to the number of catchments within each HRT), The total percentage of nested gauges will be added to the Data section of the manuscript.
To avoid excessive nesting, we excluded in the initial analysis all catchments with areas above 25,000 km2. However, we acknowledge that the remaining nested catchments may still exert influence on the results. To further evaluate this, we ran an additional analysis, where we removed all the nested catchments. This resulted in a subset of 4,137 catchments. When running the k-means clustering based on the hydrological signatures, we also found 10 hydrological response type groups (HRTs), with very similar distributions across Europe (Fig. C1 Author Replies Supplement) and hydrological signature patterns within and between the groups (Fig. C2 in Author Replies Supplement). The Random Forest classification accuracy decreased only slightly by ~1-5% for each experiment (see Table C1 for comparison). The smallest drop is 1% in testing accuracy for the 1-CL experiment, where all 84 climate and landscape attributes were used. These results indicate that, while nesting introduces some dependence, its influence on the overall clustering and classification results can be considered negligible for our analysis. This limitation will be acknowledged in the revised manuscript.
Table C1. Random Forest training and testing accuracy (in %) and F1 comparison between the non-nested and original analysis.
Experiment
Training Accuracy (%)
Testing Accuracy (%)
F1 score
1-CL
69
59
0.57
2-C
67
58
0.56
3-VLC
59
49
0.46
4-VLS
53
43
0.41
5-SGT
60
48
0.45
6-A
39
30
0.28
1-CL (non-nested)
71
58
0.59
2-C (non-nested)
68
55
0.56
3-VLC (non-nested)
58
45
0.46
4-VLS (non-nested)
51
39
0.39
5-SGT (non-nested)
57
44
0.43
6-A (non-nested)
32
25
0.24
Data - As large dams can influence and bias discharge values, how were the presence of dams addressed within the selection of gages and watersheds?
This is indeed a highly relevant comment, and we agree that large dams and reservoirs influence the discharge values, which was not reported in the manuscript. In the analysis, the presence of dams and reservoirs were not used in the direct selection of the gauges but instead explicitly accounted for by including reservoir storage per catchment area AResStorage (mm) as landscape attribute. This was a deliberate choice, allowing us to quantify the influence of regulation within the modelling framework rather than imposing a prior filtering. In the Random Forest analysis, reservoir storage does not emerge as a dominant predictor (Figure 10), suggesting that regulation is not a primary control on hydrological response across the dataset. This interpretation is supported by the distribution of reservoir storage values: the median is 0 mm, with at least 75% of catchments exhibiting no or negligible storage, 11% exceeding 20 mm, and 8% exceeding 50 mm, therefore rarely exceeding 10% of the annual water balance. This indicates that the majority of catchments are unregulated or only weakly influenced by reservoirs, while more strongly regulated systems represent a relatively small subset. We will clarify this methodological choice and its implications in the revised manuscript.
Methods – Please add a study area section. A criticism raised in the Introduction is that many available large sample data sets only provide coarse spatial representations of hydrological, climatic and landscape contrasts. What is the spatial representation range across these gages? How does this compare, for example, to the range in US based studies?
We agree that explicitly quantifying the spatial representativeness of the dataset improves clarity, and we will add a dedicated study area description to the Data section.
The spatial representativeness of the dataset is reflected in its relatively high gauge density across Europe. While the distribution is not spatially uniform, as the largest gauge density is in central Europe and the lowest in southern and eastern Europe, the overall spatial density is roughly one gauge per 1000 km2, based on 7175 gauges covering approximately 7 million km2. For comparison, the CAMELS-US dataset (Addor et al., 2017), includes 671 gauges covering roughly 9.8 million km2, corresponding approximately to one gauge per 14,600 km2.
We want to clarify that the statement in the Introduction regarding the coarser spatial representation of some existing large-sample datasets was intended to highlight a general limitation in the availability of observations. The comparatively high gauge density of the dataset used in our analysis directly addresses this issue, enabling a more detailed and spatially resolved characterization of hydrological, climatic, and landscape variability, and thereby supporting more detailed identification of hydrological response patterns and their drivers.
Section 3.2 – What is the justification for performing k-means clustering on hydrologic signatures? Has this method been used previously to group signatures?
We acknowledge that justification for performing k-means clustering on the hydrological signature was not stated clearly in the manuscript. The k-means clustering method is a reasonably statistically straightforward and widely adapted method used in several hydrological classification studies (Sawizc et al., 2011, Kuentz et al., 2017, Brunner et al., 2020., Almagro et al., 2024). We will add a clarification to section 3.2.
Section 3.2 – Clarify the inputs and outputs for this step, so the suite of hydrologic signature values for each catchment were put into the clustering, what was the HRT output that was then used as the dependent variable in the RF? Was it signature values that represent each HRT? Or the classification of each catchment into an HRT? This is really important to clarify.
Based on the k-means clustering, each catchment is classified into one of the 10 hydrological response type groups (HRTs), labelled 1-10. The HRT label for each catchment is further used as the target variable in the Random Forest classification model. We will add this explanation to section 3.2.
Section 3.3. – Please justify why no variable selection process was used. While random forest techniques are generally insensitive to multicollinearity, the inclusion of highly correlated variables can deflate or bias variable importance values, complicating model interpretation and making it more challenging to identify the most predictive variables. Further, this also has potential implications for comparing the experiments, in Figure 7 the differences in results between models could be attributed to variable groups, or in part to the number of variables included in the model. In a regression analysis, typically an adjusted R2 would be calculated to account for differences in explanatory variable counts.
We agree with the reviewer that Random Forest models are robust against correlated variables, while at the same time, inclusion of correlated predictors can lead to instability and affect the interpretability of the feature importance. We did not apply an extensive variable selection procedure aimed at minimizing the number of predictors, as our goal was to retain a broad set of climate and landscape variables. However, we did apply a correlation-based pre-filtering step to reduce any major redundancy. Variables with strong correlations (r > 0.75) were removed. The remaining level of correlation was considered acceptable, with 50% of climate and landscape attributes exhibiting r < 0.12 (equivalent to explaining ~1% of the variance) and only 2.5% of the attributes being correlated with r > 0.6 (explaining 36% of the variance; see Figure S1 in the Supplementary Material). In some cases, correlated variables (e.g. topography and temperature) were retained because they represent distinct physical processes and provide complementary, non-overlapping information.
To explicitly test the robustness of feature importance and thus model interpretation despite the remaining correlations, we evaluated the consistency of feature importance ranking across experiments (Figure S12 in the Supplementary Material). The most influential variables show broadly consistent rankings across model configurations, suggesting that the main conclusions are not sensitive to multicollinearity effects. We acknowledge that correlated predictors may still influence the attribution of importance and the comparison between experiments with differing numbers of variables. This limitation will be added to the Discussion.
Line 225 – Why wasn’t a sub-section of independent watersheds withheld? Given the likelihood that at least some of the watersheds are actually nested, this raises concerns if the nesting biased the cross-validation results.
We acknowledge that the presence of nested catchments likely introduces dependencies that influence the cross-validation. In the initial analysis, no sub-sampling of independent watersheds was applied, as the aim was to retain the full spatial variability of the dataset. To limit excessive nesting, catchments larger than 25,000 km² were excluded.
As detailed in our response above, we explicitly evaluated the impact of nested catchments by repeating the analysis using only non-nested catchments (n = 4137). The results show that both the clustering into hydrological response types and the Random Forest classification performance remain largely unchanged, with only a small decrease in testing accuracy (1–5% across experiments).
These findings suggest that the influence of nested catchments on the model results is limited. We will clarify this in the revised manuscript and explicitly acknowledge the potential for residual dependence as a limitation.
Comment – Spatial autocorrelation can bias model results but does not appear to be considered in the analysis, how was spatial autocorrelation considered, tested for or accounted for in the analysis?
We highly appreciate this point, as this was indeed not considered in the initial analysis. This is a very novel aspect, which is also a limitation in other studies (e.g. Kuentz et al., 2017, Addor et al., 2018, Almagro et al., 2025). To address this, we conducted additional analyses to quantify and account for the effects of spatial autocorrelation. First, we calculated Moran’s I as metric to quantify the presence of spatial autocorrelation. For our analysis, values of I ~ 0.5 were found. This indicates a degree of spatial autocorrelation consistent with previous studies (e.g. Addor et al., 2018). The spatial autocorrelation is difficult to meaningfully deal with, because it is inherent in any type of clustering, as most of the signatures are affected by climate and landscape, which have very local and regional similarities. To further disentangle the effect of spatial autocorrelation we have run a 10-fold spatial block cross-validation in the Random Forest analysis. This is, instead of randomly sampling catchments form the total set, we have split the samples according to contiguous spatial blocks (e.g. training set: catchments in the south-west, testing: catchments in north-east of the domain). While training accuracies remained largely the same, testing accuracies decreased as compared to the original analysis (Table C2). These results suggest that spatial autocorrelation contributes to inflated performance under random cross-validation, and that spatial proximity has a considerable effect on model predictive skill. This, in turn, indicates that even the wide spectrum of climate and landscape attributes considered in the analysis, carries less information than what is suggested by a random analysis. Overall, this highlights the challenge of disentangling hydrological drivers across regions with strong spatial structure, consistent with the concept of the “uniqueness of place” (Beven, 2000). We will include these results and in detail discuss their implications in the revised manuscript.
Table C2. Overview of Random Forest model training and testing accuracies (in %) and F1 score for each experiment, considering the results of the initial 10-fold random cross-validation and the 10-fold spatial block cross-validation analysis.
Experiment
Training Accuracy (%)
Testing Accuracy (%)
F1 score
1-CL
69.0 ± 0.1
58.8 ± 0.9
0.59
2-C
67.0 ± 0.2
57.7 ± 1.2
0.56
3-VLC
59.4 ± 0.2
48.8 ± 1.6
0.46
4-VLS
53.4 ± 0.4
43.4 ± 1.3
0.41
5-SGT
60.2 ± 0.3
48.0 ± 1.6
0.45
6-A
38.7 ± 0.3
29.7 ± 2.1
0.28
1-CL (spatial 10 cv)
70.4 ± 1.0
28.2 ± 7.0
0.27
2-C (spatial 10 cv)
68.5 ± 1.0
27.7 ± 7.2
0.27
3-VLC (spatial 10 cv)
60.4 ± 1.0
26.2 ± 08.0
0.26
4-VLS (spatial 10 cv)
54.2 ± 1.0
26.9 ± 7.9
0.25
5-SGT (spatial 10 cv)
61.4 ± 0.8
25.6 ± 8.6
0.22
6-A (spatial 10 cv)
39.3 ± 0.5
21.4 ± 3.5
0.20
Discussion – Discussion of errors, limitations and sources of uncertainty is quite limited, please add a section to the Discussion to more thoroughly address potential sources of error and uncertainty in your Methods and provide potential future directions for this research.
We acknowledge that the discussion of limitations should be expanded in the manuscript. We will add a section discussing limitations of the analysis, errors and uncertainty in data, methods and results in the Discussion. This will include, among others, limitations of k-means clustering, the influence of spatial autocorrelation, the presence of nested catchments, and the potential effects of multicollinearity among predictors. In addition, we will include a discussion of potential future research directions, regarding improved handling of dependence structures, and further disentangling of climatic and landscape controls on hydrological response.
Technical Corrections
Introduction – Please define hydrological signatures and provide a few examples of how such signatures are useful, this helps broaden the analysis appeal.
We will clarify this in the Introduction.
Line 23 – Add a reference
We will add a reference.
Line 27 – change to “patterns”
We will change the word to “patterns”.
Line 47 – remove the word “however”
We will remove the word “however”.
Line 151 and 168 – What test was used to identify highly correlated variables? What threshold was used to eliminate these correlated variables? And how was the retained variable decided?
We applied a correlation-based pre-filtering step using the Pearson correlation coefficient to reduce redundancy among predictors. Pairwise correlations were computed across all variables, and a threshold of |r| > 0.75 was used to identify highly correlated variable pairs. For each pair above this threshold, one variable was removed based on a combination of interpretability and relevance to the study objectives, prioritizing variables with clearer physical meaning or more direct hydrological relevance. The remaining level of correlation was considered acceptable, see Figure S1 in the Supplementary Material. In some cases, higher correlated variables (e.g. topography and temperature) were retained because they represent distinct physical processes and provide complementary, non-overlapping information. We wanted to strike a balance between reducing highly correlated variables and including a broad set of predictors, consistent with the aim of exploring multivariate controls on hydrological response.
Data – Add the median watershed size
We will add the median watershed size to the Data section.
Table 1 – Either here or in section 3.1.1 add references for the hydrological signatures either individually or the previous publications from which this signature list was compiled.
We will clarify the sources of the hydrological signatures in section 3.1.1. The signatures where all complied from the EStreams dataset (do Nascimento et al., 2024). The non-standard signatures are now referenced in the Supplementary Material Table S1.
Table 1 – Hd(Ql) – correct the description
We will correct the description in Table 1.
Table 2 – In addition to providing sources in S2, please also cite the source of each variable in the Table.
We thank the reviewer for this suggestion. To maintain readability of Table 2, which already contains a large number of variables, we prefer not to include individual source citations for each variable directly within the table.
Instead, we will clarify in the table caption and accompanying text that all attributes are derived from the EStreams dataset (do Nascimento et al., 2024) and ensure that detailed source information remains fully documented in Table S2 of the Supplementary Material. We believe this approach maintains clarity in the main manuscript while still providing full transparency regarding data sources.
Table 1 and 2 – Do these tables include highly correlated variables?
Table 1 and 2 do not include any highly correlated variables. Throughout the manuscript only the variables that were retained for and used in the analysis are reported.
Table 2 – What are open areas? Consider using a different term here. Also, remove labeling of median LAI and NDVI as “Seasonality”
We completely agree that “open areas” is an ambiguous name. We will change it to “VF(LowV)” to represent the little or no vegetation areas. We will also remove the “seasonality” labelling for LAI and NDVI.
Table 2 – Why wasn’t floodplain data used? Is this highly correlated with the mean flat area fraction?
We agree that floodplain data could provide relevant information for characterizing hydrological behavior. However, it was not included in the analysis because it is expected to be strongly correlated with topographic metrics already considered, such as mean elevation and flat area fraction.
Lines 215-217 – One could also argue the opposite – that grouping watersheds into HRT classes from many different signatures, makes it more difficult to understand what each HRT represents, and therefore the relative importance of climate and landscape variables in predicting the more ambiguous HRT classes, is harder to interpret. I recommend toning down the criticism of predicting individual signatures here.
We agree with the reviewer that grouping catchments into hydrological response type (HRT) classes based on multiple signatures can make interpretation of individual HRTs more challenging. The intention of the lines in question was not to criticize previous studies, but rather to describe what has been previously done and, by this, more clearly define the current knowledge gap and explain how our analysis differs in the way we combine and interpret multiple signatures. We have reworded the manuscript to ensure that this distinction is clear and that no unintended criticism is implied.
Line 220 – What hyperparameters were tuned?
The hyperparameters that were tuned consist of: number of estimators (“n_estimators”), maximum tree depth (“max_depth”), minimum number of samples required to split an internal node (“min_samples_split”) and minimum number of samples required at each leaf node (“min_samples_leaf”). We will add this information to the Supplementary Material section S7.
Line 256 – What package was used to derive the Random Forest classifications? And what type of feature importance scores were used? (Gini or permutation?)
We used the Random Forest classification from the scikit-learn library, with the package sklearn.ensemble.RandomForestClassifier in Python. We used permutation feature importance score. We will add this information to the Methods.
Lines 264-265 – Revise sentence for clarity
We will revise the sentence.
Line 276 – Add mention of using PCA to the Methods
We completely agree that PCA should be described in the Methods, and we will implement it.
Line 278 – I don’t think “entails” is the correct word here, change to “suggests” or something similar.
We will change the word.
Line 449 – How much do you think this is because of the geography being limited to Europe?
This is an interesting point. The geographical focus on Europe likely influences the observed differences between hydrological response types (HRTs). Compared to a global analysis, the range of climatic and landscape variability is more limited, which may result in less pronounced differences between HRTs. We will include a note on this limitation in the revised Discussion.
Section 5.3 – I wonder if deriving HRTs for each of the original categories: magnitude frequency, duration, timing, seasonality would improve the clumping and facilitate an easier interpretation of what the HRTs represent.
We thank the reviewer for this suggestion and agree that deriving HRTs separately for the original categories (magnitude, frequency, duration, timing, seasonality) could provide additional insights and potentially facilitate interpretation. However, this is beyond the scope of the current study, as our aim was to characterize catchments based on the overall hydrological response across all signatures.
Comment - Given the number of signatures, variables and HRT’s referred to only as numbers – the Results in particular are challenging to follow! Consider revising the names of the HRT’s from a random numbering system to something more meaningful to facilitate easier interpretation of the results.
We completely agree with the reviewer that a number labelling is not the most intuitive interpretation of the hydrological behavior of each HRT. The number labels are ordered in ascending order according to the HRTs average slope of the flow duration curve. We will make sure to clarify this. In part, Table 4 provides the reader with more in detail, but also concise labelling. The concise labelling from Table 4 could be used more frequently throughout the manuscript in combination with the number labels to make the interpretation more intuitive.
-
AC1: 'Reply on RC1', Julia M. Rudlang, 27 Mar 2026
-
RC2: 'Comment on egusphere-2025-6372', Juraj Parajka, 15 Feb 2026
General comments
The study aims to classify the hydrological response of a large sample of European catchments (7175) using a wide range (40) of hydrological signatures, and to identify the roles of climate and landscape on the similarity/dissimilarity of catchment response. The results show that using a large sample of signatures and catchments does not, in itself, allow for the identification of more than 10 clusters, and many catchments still exhibit diverse hydrological responses that overlap across the clusters.
The study presents a very complex and comprehensive analysis that evaluates a very large sample of catchments and signatures. This is a significant advantage over previous studies, and it is worth publishing. Still, the complex approach and a large dataset also have limitations, particularly with respect to the clarity of presentation of the results. While the Data and methods sections generally well describe the details about the applied data and methods, the results section, in its current form, is very difficult to read, mainly because of the use of numerous abbreviations. In my opinion, the Discussion section is a key part, which can provide, for the readers, the discussion of the main findings, their link to previous knowledge/results and some synthesis about the limitations and uncertainties associated with the selection of the catchments and datasets used for estimation of the climate and landscape characteristics. In my opinion, this part can still be improved, and I have the following questions and requests, which can be discussed and presented in more detail:
- Would it be possible to provide (in the Supplement) the list of used catchments and their assignment to the clusters?
- The study analyses a very large number of catchments. Are all of them needed to evaluate the research questions? I wonder what the impact of (a) mixing smaller and larger catchments, (b) using nested catchments, and (c) using catchments with human impact in the analysis is? Is it not expected that, in larger catchments, the impacts of landscape and climate are mixed? To what extent is the finding about the small role of spatial proximity influenced by using nested catchments? Some of the signatures (such as flashiness) reflect the climate or landscape controls. Still, it can also reflect human impacts (such as reservoir operations), resulting in a mixed (noisy) clustering of catchments.
- Many regions are impacted by climate change, i.e., increasing air temperatures and changes in precipitation, which are associated with changes in hydrological response. What is the impact of such changes on the main findings?
Specific comments
- It might be worth mentioning that the Estreams meteorological characteristics are derived from the EOBS dataset. Because in some regions EOBS tend to underestimate precipitation, I’m curious about some details on how the long-term water balance is tested in the selected catchments (e.g., those that include glaciers or have a significant portion of snowfall).
- Q5 versus Q95. Usually it is used in an opposite way, such as the Q95 describes the low flows.
- The results of the Elbow method for finding the optimal number of clusters do not indicate that 10 clusters is the most optimal variant. I expect that using various methods to select the optimal number of clusters can bring different optimal cluster number. What is the idea for presenting two methods with different results?
- HRT 6 and 8 includes the largest number of catchments which are situated across different climate regions (e.g. as defined by Koppen classification). Please discuss the results for these two clusters in more detail. Does it mean that if they cross different climates that in these catchments the landscape is more important? What is the impact of nested and large catchments here?
Citation: https://doi.org/10.5194/egusphere-2025-6372-RC2 -
AC2: 'Reply on RC2', Julia M. Rudlang, 27 Mar 2026
General comments
The study aims to classify the hydrological response of a large sample of European catchments (7175) using a wide range (40) of hydrological signatures, and to identify the roles of climate and landscape on the similarity/dissimilarity of catchment response. The results show that using a large sample of signatures and catchments does not, in itself, allow for the identification of more than 10 clusters, and many catchments still exhibit diverse hydrological responses that overlap across the clusters.
The study presents a very complex and comprehensive analysis that evaluates a very large sample of catchments and signatures. This is a significant advantage over previous studies, and it is worth publishing. Still, the complex approach and a large dataset also have limitations, particularly with respect to the clarity of presentation of the results. While the Data and methods sections generally well describe the details about the applied data and methods, the results section, in its current form, is very difficult to read, mainly because of the use of numerous abbreviations. In my opinion, the Discussion section is a key part, which can provide, for the readers, the discussion of the main findings, their link to previous knowledge/results and some synthesis about the limitations and uncertainties associated with the selection of the catchments and datasets used for estimation of the climate and landscape characteristics. In my opinion, this part can still be improved, and I have the following questions and requests, which can be discussed and presented in more detail:
We thank the referee for their thoughtful and detailed comments. We appreciate the time and effort invested in reviewing the manuscript and for raising constructive and insightful points, which gave us a valuable reflection on our analysis. All comments have been carefully considered and will be accounted for in the revised version of the manuscript.
We have separated the different comments, shown in italic, and our written replies below, shown in regular font and suggested revised text in bold.
Would it be possible to provide (in the Supplement) the list of used catchments and their assignment to the clusters?
We acknowledge this idea. Given the large number of catchments, more than 7000, we suggest that we upload this information to the same repository as the code (https://zenodo.org/records/17987885), as a .csv file instead of adding a list to the Supplementary Material.
The study analyses a very large number of catchments. Are all of them needed to evaluate the research questions? I wonder what the impact of (a) mixing smaller and larger catchments, (b) using nested catchments, and (c) using catchments with human impact in the analysis is? Is it not expected that, in larger catchments, the impacts of landscape and climate are mixed? To what extent is the finding about the small role of spatial proximity influenced by using nested catchments? Some of the signatures (such as flashiness) reflect the climate or landscape controls. Still, it can also reflect human impacts (such as reservoir operations), resulting in a mixed (noisy) clustering of catchments.
Thank you for these very helpful questions and comments, providing valid discussion points. We address points (a)-(c) below.
(a) Mixing smaller and larger catchments / (b) Nested catchments:
To assess the influence of nested catchments, we repeated the analysis after removing all nested catchments, leaving 4,137 catchments (48% of the dataset; median catchment size 117 km2 as compared to 226 km2 for the complete set). k-means clustering on this subset again produced 10 hydrological response type groups (HRTs) with similar spatial distributions across Europe (Fig. C1 in the Author Replies Supplement ) and comparable hydrological signature patterns (Fig. C2 in the Author Replies Supplement ). Random Forest classification testing accuracy slightly decreased by 1–5% across experiments (Table C1), with the smallest drop (1%) in the 1-CL experiment using all 84 climate and landscape attributes. These results indicate that the influence of nested catchments on both clustering and classification outcomes is minor.
Table C1. Random Forest training and testing accuracy (in %) and F1 comparison between the non-nested and original analysis.
Experiment
Training Accuracy (%)
Testing Accuracy (%)
F1 score
1-CL
69
59
0.57
2-C
67
58
0.56
3-VLC
59
49
0.46
4-VLS
53
43
0.41
5-SGT
60
48
0.45
6-A
39
30
0.28
1-CL (non-nested)
71
58
0.59
2-C (non-nested)
68
55
0.56
3-VLC (non-nested)
58
45
0.46
4-VLS (non-nested)
51
39
0.39
5-SGT (non-nested)
57
44
0.43
6-A (non-nested)
32
25
0.24
(c) Catchments with human impact:
We deliberately included catchments affected by human activity to assess their potential role as hydrological drivers. Anthropogenic variables such as reservoir storage, irrigated area, artificial surface fraction, and agricultural land fraction were included in the analysis as landscape attributes. In experiment 6-A, which used only these anthropogenic attributes, both training and testing accuracy were low, and in the full 1-CL experiment, these features ranked very low in feature importance (Fig. 7a, Fig. 10a). This indicates that human impacts are not the dominant drivers in the dataset, although they are explicitly captured through these attributes.
Many regions are impacted by climate change, i.e., increasing air temperatures and changes in precipitation, which are associated with changes in hydrological response. What is the impact of such changes on the main findings?
We acknowledge the point raised, and we agree with the reviewer that climate change has impact on the hydrological behavior. However, the present analysis did not explicitly investigate temporal changes, as all climate and landscape attributes were characterized using long-term means, medians, or interannual variability. We will include a note in the Discussion under “Recommendations for future studies” highlighting that future work could explore the impacts of climate change by, for example, splitting time series into different periods and comparing hydrological metrics across them.
Specific comments
It might be worth mentioning that the Estreams meteorological characteristics are derived from the EOBS dataset. Because in some regions EOBS tend to underestimate precipitation, I’m curious about some details on how the long-term water balance is tested in the selected catchments (e.g., those that include glaciers or have a significant portion of snowfall).
We strongly agree that this should be mentioned. We will clarify in the revised manuscript that the EStreams meteorological data are derived from the E-OBS dataset in the discussion of limitations, as well as that E-OBS can underestimate precipitation in some regions. We will reference the EStreams data paper (do Nascimento et al., 2024) and a recent study highlighting this issue (Clerc-Schwarzenbach and do Nascimento, 2026).
Regarding the long-term water balance, in the pre-analysis data scan, we checked long-term water balances and discarded catchments with the most extreme water deficits/surpluses following a visual check in the Budyko space (Figure C3 in the Author Replies Supplement ). We will emphasize this more clearly in the Data section. Figure C3 in the Author Replies Supplement shows the catchments retained for the analysis. A small number of catchments still plot above the theoretical limits, indicating minor violations of the energy constraint (Ea > EP), likely due to uncertainties in precipitation, discharge and potential evapotranspiration estimates. As these represent only a small fraction of the dataset (~3%), they were retained to avoid introducing selection bias.
Catchments affected by snowfall are implicitly accounted for under the assumption of long-term water balance closure, as storage changes associated with seasonal snow are expected to average out over multi-year periods. Catchments with glacier influence are not explicitly accounted for. However, glacier contributions are limited to a small number of catchments in specific locations and are therefore considered negligible in the context of this large-sample analysis. In cases where glacier influence are strong, these could already have been filtered out by the criteria of long-term P > Q.
Q5 versus Q95. Usually it is used in an opposite way, such as the Q95 describes the low flows.
We agree that the most conventional definition is Q5 as high flows and Q95 as low flows. Although we acknowledge this point, we have defined Q5 as low flows and Q95 high flows, to remain consistent with the EStreams (do Nascimento et al., 2024) and CAMELS-US datasets (Addor et al., 2017). We will emphasize this in the Methods section.
The results of the Elbow method for finding the optimal number of clusters do not indicate that 10 clusters is the most optimal variant. I expect that using various methods to select the optimal number of clusters can bring different optimal cluster number. What is the idea for presenting two methods with different results?
We present both the Elbow method and the Silhouette Score to illustrate that selecting the optimal number of clusters is not unambiguous, as the two metrics do not necessarily agree. We used both metrics to help narrow down the range (where Elbow and Silhouette had small difference in values) to 6-10 cluster and further plotted the groups on a map for visual inspection. We emphasized the reporting in the manuscript on 10 clusters to include more detailed differentiation of hydrological response types. However, the entire model experiment was equally executed for 6, 7, 8 and 9 clusters (e.g. Supplementary Material Figures S3, S4). As the results suggest that the overall spatial pattern is fairly similar to 10 clusters, the results of these experiments that go beyond the above mentioned Figures S3 and S4, were not further shown.
HRT 6 and 8 includes the largest number of catchments which are situated across different climate regions (e.g. as defined by Koppen classification). Please discuss the results for these two clusters in more detail. Does it mean that if they cross different climates that in these catchments the landscape is more important? What is the impact of nested and large catchments here?
We thank the reviewer for this interesting comment. HRTs 6 and 8 indeed include a large number of catchments distributed across multiple climate regions, indicating that similar hydrological response patterns can emerge under differing climatic conditions. This does not necessarily imply that landscape characteristics are more important than climate but rather suggests that similar hydrological behavior can arise from different combinations of climatic and landscape controls. This is consistent with the concept of equifinality and uniqueness of place (Beven 2000), where different processes or conditions can lead to comparable hydrological responses. Our Random Forest results support this interpretation, as both climate and landscape variables contribute to predictive skill across experiments.
Regarding the influence of nested and large catchments, our additional analysis excluding nested catchments (see response above) shows that the clustering structure and model performance remain largely unchanged. This suggests that the broad spatial extent of HRTs 6 and 8 is not primarily driven by nesting effects or the inclusion of larger catchments.
Overall, the wide spatial distribution of these HRTs highlights the complexity of disentangling hydrological drivers across regions and reflects the importance of considering interacting climatic and landscape controls. We will expand the discussion of these clusters accordingly in the revised manuscript.
Data sets
EStreams: An Integrated Dataset and Catalogue of Streamflow, Hydro-Climatic Variables and Landscape Descriptors for Europe (1.2) Thiago V. M. do Nascimento et al. https://doi.org/10.5281/zenodo.14778580
Model code and software
Code used in: Climate and landscape jointly control Europe's hydrology Julia M. Rudlang https://doi.org/10.5281/zenodo.17987885
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 402 | 168 | 33 | 603 | 63 | 87 | 84 |
- HTML: 402
- PDF: 168
- XML: 33
- Total: 603
- Supplement: 63
- BibTeX: 87
- EndNote: 84
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Summary Comment: The manuscript seeks to broadly analyze hydrologic signatures across Europe. The analysis sought to address the question of whether we can (1) group hydrological signature data into hydrological response types, and (2) can we predict hydrological response types from climate and landscape attributes? The analysis is of interest, but the Methods are difficult to follow.
General Comments:
Introduction – The Introduction largely poses climate and landscape as opposing and homogeneous contenders driving hydrological response. As climate acts on a landscape, for example, the rate that precipitation moves into the stream network will depend in part on topography, soil type and wetland/floodplain storage capacity, it would be helpful to acknowledge or frame these overarching hydrological drivers as more inter-related instead of opposed.
Introduction - Landscape controls can represent a large number of different variables and attributes, but few specific landscape attributes are mentioned, please elaborate on what specific landscape drivers have tended to emerge as important in past studies. This will help to justify the types of landscape variables included in the analysis.
Data - What percent of the gages are nested within another gage used in the analysis? How does nested gages influence the independence of these watersheds?
Data - As large dams can influence and bias discharge values, how were the presence of dams addressed within the selection of gages and watersheds?
Methods – Please add a study area section. A criticism raised in the Introduction is that many available large sample data sets only provide coarse spatial representations of hydrological, climatic and landscape contrasts. What is the spatial representation range across these gages? How does this compare, for example, to the range in US based studies?
Section 3.2 – What is the justification for performing k-means clustering on hydrologic signatures? Has this method been used previously to group signatures?
Section 3.2 – Clarify the inputs and outputs for this step, so the suite of hydrologic signature values for each catchment were put into the clustering, what was the HRT output that was then used as the dependent variable in the RF? Was it signature values that represent each HRT? Or the classification of each catchment into an HRT? This is really important to clarify.
Section 3.3. – Please justify why no variable selection process was used. While random forest techniques are generally insensitive to multicollinearity, the inclusion of highly correlated variables can deflate or bias variable importance values, complicating model interpretation and making it more challenging to identify the most predictive variables. Further, this also has potential implications for comparing the experiments, in Figure 7 the differences in results between models could be attributed to variable groups, or in part to the number of variables included in the model. In a regression analysis, typically an adjusted R2 would be calculated to account for differences in explanatory variable counts.
Line 225 – Why wasn’t a sub-section of independent watersheds withheld? Given the likelihood that at least some of the watersheds are actually nested, this raises concerns if the nesting biased the cross-validation results.
Comment – Spatial autocorrelation can bias model results but does not appear to be considered in the analysis, how was spatial autocorrelation considered, tested for or accounted for in the analysis?
Discussion – Discussion of errors, limitations and sources of uncertainty is quite limited, please add a section to the Discussion to more thoroughly address potential sources of error and uncertainty in your Methods and provide potential future directions for this research.
Technical Corrections
Introduction – Please define hydrological signatures and provide a few examples of how such signatures are useful, this helps broaden the analysis appeal.
Line 23 – Add a reference
Line 27 – change to “patterns”
Line 47 – remove the word “however”
Line 151 and 168 – What test was used to identify highly correlated variables? What threshold was used to eliminate these correlated variables? And how was the retained variable decided?
Data – Add the median watershed size
Table 1 – Either here or in section 3.1.1 add references for the hydrological signatures either individually or the previous publications from which this signature list was compiled.
Table 1 – Hd(Ql) – correct the description
Table 2 – In addition to providing sources in S2, please also cite the source of each variable in the Table.
Table 1 and 2 – Do these tables include highly correlated variables?
Table 2 – What are open areas? Consider using a different term here. Also, remove labeling of median LAI and NDVI as “Seasonality”
Table 2 – Why wasn’t floodplain data used? Is this highly correlated with the mean flat area fraction?
Lines 215-217 – One could also argue the opposite – that grouping watersheds into HRT classes from many different signatures, makes it more difficult to understand what each HRT represents, and therefore the relative importance of climate and landscape variables in predicting the more ambiguous HRT classes, is harder to interpret. I recommend toning down the criticism of predicting individual signatures here.
Line 220 – What hyperparameters were tuned?
Line 256 – What package was used to derive the Random Forest classifications? And what type of feature importance scores were used? (Gini or permutation?)
Lines 264-265 – Revise sentence for clarity
Line 276 – Add mention of using PCA to the Methods
Line 278 – I don’t think “entails” is the correct word here, change to “suggests” or something similar.
Line 449 – How much do you think this is because of the geography being limited to Europe?
Section 5.3 – I wonder if deriving HRTs for each of the original categories: magnitude frequency, duration, timing, seasonality would improve the clumping and facilitate an easier interpretation of what the HRTs represent.
Comment - Given the number of signatures, variables and HRT’s referred to only as numbers – the Results in particular are challenging to follow! Consider revising the names of the HRT’s from a random numbering system to something more meaningful to facilitate easier interpretation of the results.