the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Network Approach for Multiscale Catchment Classification using Traits
Abstract. The classification of river catchments into groups with similar characteristics addresses the challenges posed by each location's unique properties to better understand and predict hydrological behaviors. The recent increasing availability of remote sensing and other large-scale geospatial datasets have enabled the use of advanced data-driven approaches to classify catchments using traits such as topography, geology, climate, land cover, land use, and human influence. Unsupervised clustering algorithms based on the Euclidean distance are commonly used for trait-based classification, but are subject to degradation when applied to high dimensional data. In this study we present a new network-based method for multi-scale catchment classification, which can be applied to large datasets and used to determine the traits associated with different catchment groups. In this framework two networks are analyzed in parallel; the first where the nodes are traits, and the second where the nodes are catchments. In both cases edges represent pairwise similarity and a network cluster detection algorithm is used for the classification. The traits network is used to investigate redundancy in the trait data and to condense this information into a small number of interpretable categories. The catchments network is used to classify the catchments into clusters, and to identify representative catchments for the different groups using the degree centrality metric. We apply this method to classify 9067 river catchments across the contiguous United States at both regional and continental scales using 274 non-categorical traits. At the continental scale we identify 25 interpretable categories of traits (e.g., developed areas, temperature, croplands) and 34 catchment clusters of size greater than 50. We find that catchments with similar trait categories are geographically coherent, with different spatial patterns emerging among the clusters that are dominated by natural and anthropogenic traits. We also find that the catchment clusters exhibit distinct hydrological behavior based on an analysis of streamflow indices. This network approach provides several advantages over traditional means of classification including the use of alternate similarity metrics that are more suitable for high dimensional data and providing interpretability with trait categories that reduce redundancies in the trait information. The paired catchment-trait networks enables analysis of hydrological behavior using the dominant trait categories for each catchment cluster. The approach can be used at multiple spatial scales, since the network topologies adjust automatically to reflect the trait patterns at the scale of investigation. Finally, the representative catchments identified as hub nodes in the network can be used to guide transferable observational and modeling strategies. The method is broadly applicable beyond hydrology for classification of other complex systems that utilize different types of trait datasets.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(5753 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(5753 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1675', Anonymous Referee #1, 05 Sep 2023
Major comment
The authors introduce a novel method to cluster catchments that is based on traits. The dataset is impressive and the network-based classification is, to my understanding, a relevant and innovative approach in this case. Methods and results are well presented.
My main concern with such unsupervised classification is how we can use it for practical hydrological studies. From the introduction and discussion, it appears one aim of clustering is application to ungauged basins. In this sense, the results of the paper are discouraging, because the clustering technique does not succeed in relating ‘traits’ clusters to hydrological behaviors, except for some specific hydrological traits. This part is essential, in my opinion, for switching from a mere clustering exercise to something which could actually be useful in hydrological practice. I do not know how the method can be tuned to improve the overlap between the geographical and hydrological clusters, but my wish is that the authors tackle this issue in the paper. I realize that this implies a significant change in the paper. In the case the authors stick to unsupervised clustering, I guess that the paper might be of interest, but in my opinion, the authors should:- introduce in more details the practical implications of such clustering, and
- compare the obtained classification with a benchmark clustering approach.
Minor comments
l.5: please clarify the term “subject to degradation”
l.43, l.48 and in many other places: problems with in-line referencing.
Section 2.3: I understand that traits values are standardized, but are their distributions normal? I guess no and I wonder how this may affect PCA and low dimensional vectors extracted from PCA.
l.473-475: Please clarify the added values of the network-based approach compared to other clustering techniques. Many of them address already the problem of dimensionality by working on Eigen-vectors.
Figure 13: what is the unit of MA41?
Citation: https://doi.org/10.5194/egusphere-2023-1675-RC1 - AC1: 'Reply on RC1', Fabio Ciulla, 26 Oct 2023
-
RC2: 'Comment on egusphere-2023-1675', Anonymous Referee #2, 07 Sep 2023
egusphere-2023-1675 “A Network Approach for Multiscale Catchment Classification using Traits” Fabio Ciulla and Charuleka Varadharajan
This article describes the application of a post-PCA clustering algorithm for classification, in this case for catchments. There is no strong argument that the technique is much better than other methods in this particular application, but the breadth, quality and density of the GAGES-II dataset make it an attractive test bed.
The authors do not apply any effort in showing the improvement their technique makes over others. For example, the justification for their network-based approach is a single paragraph and three numbers. In a more structured analysis, the differences between PCA only, and each of the three post-PCA clustering techniques, would be outlined and their differences tabulated with relevant measures (with an equivalent of Figure 3 for each). There would also be a baseline measure, the PCA or one clustering technique with a minimum number of clusters, and some limited exploration of the number of clusters (or the two free parameters mentioned).
It is not remarkable (line 579) that a classification method using indices and data from a database (of over 300 measures on over 9000 catchments) specifically designed to described gauged catchments for evaluating streamflow would result in a classification that was related to streamflow measures. It will be no surprise to hydrologists that high rainfall, high elevation, forested catchments behave hydrologically differently to flatter, lower rainfall, cropland areas, or that higher rainfall catchments with lots of urban areas get more flooding. What the results might show however is the bidirectionality such that starting from the stream flow indices we get catchment clusters, and that starting from catchment traits we can get groups of catchments with distinct flow behaviour.
What would also have been of interest is the places where the flow indices and clusters do not match well. For example, if there are two areas that are low slope, low elevation cropland that have distinctly different baseflow regime, one may be influenced by groundwater discharge or a factor not yet captured, and this would be useful additional data to know or require to be collected.
The citing of references within the text is inconsistent and non-standard, while many of the listed references do not use capital letters where appropriate in journal names or proceedings.
Citation: https://doi.org/10.5194/egusphere-2023-1675-RC2 - AC2: 'Reply on RC2', Fabio Ciulla, 26 Oct 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1675', Anonymous Referee #1, 05 Sep 2023
Major comment
The authors introduce a novel method to cluster catchments that is based on traits. The dataset is impressive and the network-based classification is, to my understanding, a relevant and innovative approach in this case. Methods and results are well presented.
My main concern with such unsupervised classification is how we can use it for practical hydrological studies. From the introduction and discussion, it appears one aim of clustering is application to ungauged basins. In this sense, the results of the paper are discouraging, because the clustering technique does not succeed in relating ‘traits’ clusters to hydrological behaviors, except for some specific hydrological traits. This part is essential, in my opinion, for switching from a mere clustering exercise to something which could actually be useful in hydrological practice. I do not know how the method can be tuned to improve the overlap between the geographical and hydrological clusters, but my wish is that the authors tackle this issue in the paper. I realize that this implies a significant change in the paper. In the case the authors stick to unsupervised clustering, I guess that the paper might be of interest, but in my opinion, the authors should:- introduce in more details the practical implications of such clustering, and
- compare the obtained classification with a benchmark clustering approach.
Minor comments
l.5: please clarify the term “subject to degradation”
l.43, l.48 and in many other places: problems with in-line referencing.
Section 2.3: I understand that traits values are standardized, but are their distributions normal? I guess no and I wonder how this may affect PCA and low dimensional vectors extracted from PCA.
l.473-475: Please clarify the added values of the network-based approach compared to other clustering techniques. Many of them address already the problem of dimensionality by working on Eigen-vectors.
Figure 13: what is the unit of MA41?
Citation: https://doi.org/10.5194/egusphere-2023-1675-RC1 - AC1: 'Reply on RC1', Fabio Ciulla, 26 Oct 2023
-
RC2: 'Comment on egusphere-2023-1675', Anonymous Referee #2, 07 Sep 2023
egusphere-2023-1675 “A Network Approach for Multiscale Catchment Classification using Traits” Fabio Ciulla and Charuleka Varadharajan
This article describes the application of a post-PCA clustering algorithm for classification, in this case for catchments. There is no strong argument that the technique is much better than other methods in this particular application, but the breadth, quality and density of the GAGES-II dataset make it an attractive test bed.
The authors do not apply any effort in showing the improvement their technique makes over others. For example, the justification for their network-based approach is a single paragraph and three numbers. In a more structured analysis, the differences between PCA only, and each of the three post-PCA clustering techniques, would be outlined and their differences tabulated with relevant measures (with an equivalent of Figure 3 for each). There would also be a baseline measure, the PCA or one clustering technique with a minimum number of clusters, and some limited exploration of the number of clusters (or the two free parameters mentioned).
It is not remarkable (line 579) that a classification method using indices and data from a database (of over 300 measures on over 9000 catchments) specifically designed to described gauged catchments for evaluating streamflow would result in a classification that was related to streamflow measures. It will be no surprise to hydrologists that high rainfall, high elevation, forested catchments behave hydrologically differently to flatter, lower rainfall, cropland areas, or that higher rainfall catchments with lots of urban areas get more flooding. What the results might show however is the bidirectionality such that starting from the stream flow indices we get catchment clusters, and that starting from catchment traits we can get groups of catchments with distinct flow behaviour.
What would also have been of interest is the places where the flow indices and clusters do not match well. For example, if there are two areas that are low slope, low elevation cropland that have distinctly different baseflow regime, one may be influenced by groundwater discharge or a factor not yet captured, and this would be useful additional data to know or require to be collected.
The citing of references within the text is inconsistent and non-standard, while many of the listed references do not use capital letters where appropriate in journal names or proceedings.
Citation: https://doi.org/10.5194/egusphere-2023-1675-RC2 - AC2: 'Reply on RC2', Fabio Ciulla, 26 Oct 2023
Peer review completion
Journal article(s) based on this preprint
Data sets
Classification of River Catchments in the Contiguous United States: Processed Dataset, Similarity Patterns, and Resulting Classes Fabio Ciulla, Charuleka Varadharajan https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1987555
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
316 | 176 | 27 | 519 | 17 | 16 |
- HTML: 316
- PDF: 176
- XML: 27
- Total: 519
- BibTeX: 17
- EndNote: 16
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Charuleka Varadharajan
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(5753 KB) - Metadata XML