the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical Note: Analysis of concentration-discharge hysteresis loops using Self-Organizing Maps
Abstract. Analyzing concentration-discharge (C-Q) hysteresis loops is essential for understanding both dissolved and particulate constituent sources and transport mechanisms in watershed hydrology. However, traditional hysteresis analysis methods, including loop classification schemes and hysteresis indices, fail to capture the full variability and gradual transitions between loop patterns. To address these limitations, we introduce an alternative approach for characterizing hysteresis patterns in watersheds using the Self-Organizing Map (SOM) algorithm, which better represents loop variability without relying on rigid categories. This technical report outlines the application – and the advantages – of SOM-based hysteresis loop characterization and presents a general workflow for its implementation to characterize C-Q hysteresis for any watershed constituent. We demonstrate the efficacy of the SOM algorithm through a proof-of-concept with sediment transport hysteresis loops. The SOM algorithm was able to classify hysteresis loops with a high degree of accuracy, correctly mapping the amplitude, direction, and concavity of hysteresis loops in the training dataset. We also used the SOM algorithm to develop a General Turbidity-Discharge (T-Q) SOM – which may be used as a standardized benchmark for characterizing primary loop types in sediment hysteresis analysis. We demonstrate the use of the General T-Q SOM in describing loop frequency distributions and exploring associations with hydrologic variables to infer hydrologic controls of loop types for three watersheds. We found that the General T-Q SOM captures key differences in loop shape (and thus sediment transport processes) overlooked by hysteresis indices while preserving the continuum of loop variability lost in classification schemes. Additionally, SOM-based correlation analysis effectively detected associations between loop types and hydrologic variables, enhancing understanding of their hydrologic significance. Combined with high-resolution water quality data, this method offers a powerful tool for advancing the identification of constituent sources and transport mechanisms at the watershed scale. To support broader adoption of the methodology described in this paper, we have developed a Python package, equipped with detailed documentation to facilitate SOM implementation and application in future C-Q analysis.
- Preprint
(2625 KB) - Metadata XML
-
Supplement
(766 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2146', Brandi Gaertner, 02 Jul 2025
- AC1: 'Reply on RC1', Tyler Mahoney, 25 Jul 2025
-
RC2: 'Comment on egusphere-2025-2146', Marie Cottrell, 01 Sep 2025
Comment on paper egusphere-2025-2146
General comment:
I have no expertise in watershed hydrology, nor on hysteresis loops. I read the paper from the perspective of a Kohonen algorithm practitioner.
My overall assessment is that it is a very good paper from this point of view. As far as I understand, the application problem is very well posed and explained. The reasons for using Kohonen's algorithm are perfectly introduced and justified. The algorithm itself is very well defined, as well as its practical implementation. The methodology and the different steps are clear and reproducible. Finally, the results are, according to the authors, quite satisfactory.
Specific comments:
The authors present hysteresis loops and the different methods used to classify them (which makes it possible to characterize hydrological events). They explain why traditional clustering methods are not satisfactory, in particular because we may not see what the important factors are.
The authors then discuss methods based on the calculation of certain indices, but show that these indices do not always make it possible to distinguish between very different events
To improve the analysis of loops, the authors propose to use the Kohonen algorithm (SOM) for its well-known clustering and visualization of results on two-dimensional maps.
The presentation of Kohonen's algorithm is very well done, very educational. The authors carefully explain the role of each hyperparameter. They offer to choose a distance adapted to the nature of their data. They introduce two criteria for choosing an "optimal" map, which must make a compromise between topographic error and quantization error, both of which must be minimized, although they vary in opposite directions.
The implementation of the algorithm is carefully detailed. The authors explain the different steps, pre-processing, filtering by a moving median, standardization of the lengths of the data vectors (representing loops), normalization (to keep only the shape).
The Kohonen map is trained on a balanced and labeled database (although the label is not included in the input data) where all the observed loop shapes are represented.
The authors present the choices (map size, parameters) they have chosen.
However, they remain inaccurate with regard to the decay function of the alpha parameter and the size. It would be desirable to propose and explain decay functions in accordance with the theory.
Then, once the map has been calibrated from the labeled data, the authors use it to classify the hysteresis loops from several observation stations by projecting them onto the map. They carefully study the clusters obtained and determine the significant explanatory variables through statistical analyses. They compare the conclusions obtained with the conclusions derived from the study of the indices presented in the introduction.
To conclude and allow the use of their methodology, the authors provide a program in Python.
Technical corrections: There is nothing to report, the manuscript is very neat and really very well written.
Citation: https://doi.org/10.5194/egusphere-2025-2146-RC2 -
AC2: 'Reply on RC2', Tyler Mahoney, 05 Sep 2025
We sincerely appreciate this generous feedback from the reviewer. It's encouraging to receive such positive remarks.In response to your comment on the decay function, we will implement the following revisions to clarify our selection:1. Lines 139 to 141 in section 2.2 will be modified as follows. Note that in this section we provide a general description of the SOM training process:During the training process, both the learning rate and radius of influence of the BMU (i.e., the spread of the smoothing kernel) are adjusted using a decay function, ensuring their values decrease over time (t). Common choices include hyperbolic, exponential, and linear decay functions (Kohonen, 2013), and distinct decay profiles can be defined independently for the learning rate and the BMU’s radius of influence.2. Lines 257 to 261 in section 3.1.2, will be modified as follows:During the map size selection step, multiple Self-Organizing Maps (SOMs) were trained using a grid search approach. The varied hyperparameters included the number of nodes (ranging from 5×5 to 13×13), initial neighborhood spread (0.5 to 13), and initial learning rate (0.05 to 0.9). Each SOM was trained over five epochs. A Gaussian neighborhood function and an exponential decay function were applied in all cases (see details in the SI). Final values for neighborhood spread and learning rate were set at 0.3 and 0.01, respectively, following the general recommendation to avoid zero values, which would halt the learning process (Kohonen, 2013; Samarasinghe, 2016). In total, approximately 900 maps were trained. Maps exhibiting high topographic error—indicative of topological distortion—were excluded. Finally, the elbow method was used to determine the optimal map size, defined as the point at which increasing the number of nodes no longer produced substantial reductions in quantization error (Nainggolan et al., 2019).3. Finally, we will include a workflow diagram in the Supplementary Information (see Figure 2 in our response to Reviewer 1), which outlines the specific equations used during the SOM training.Citation: https://doi.org/
10.5194/egusphere-2025-2146-AC2
-
AC2: 'Reply on RC2', Tyler Mahoney, 05 Sep 2025
-
RC3: 'Comment on egusphere-2025-2146', Anonymous Referee #3, 30 Sep 2025
The study by Ramirez et al "Analysis of concentration-discharge hysteresis loops using Self-Organizing Maps" describes a novel method to analyse hysteresis patterns in high-frequency concentration-discharge data.
The paper is generally very well written and I particularly like the innovative analyses and combination of methods to characterize and assess hysteresis loops and to assign event characteristics and catchment properties using an unsupervised machine learning algorithm. Also the figures are very nicely presented.
The presented method can provide a major step forward in the field of hysteresis analysis. However, despite having worked with hysteresis analysis in the hydrological and water quality context extensively, I have major difficulties understanding the content. I acknowledge that this is due to the fact that I have no background on SOM and Kohonen's algorithm, but given that this study is going to be published in HESS, I think a large part of the audience is likely to have a background in watershed hydrology. Therefore, I stress that the authors should significantly improve on the explanation of the methods and results. I hope the authors find my comments below useful in improving the manuscript.
General and major comments:The general workflow is hard to grasp and it is unclear what your python package can accomplish and what the user needs to do get all this to work - particularly because this is meant to be a technical note. Therefore, I think the paper would benefit from a figure/flowchart explaining the required steps and purpose of the steps from downloading/acquiring a dataset, via the curation of the dataset, training, refinement, application, to the resulting map. This would be like a summary map linking and expanding on figures 3, 4 and 5. It might make sense to move the current chapter 2.5 to the beginning of section 2 and expand with the points I highlighted.
For someone not familiar with SOM, I feel section 2 is written very abstract and is of very limited usefulness. First, I think it is extremely important that you stick to one definition and don't use a different word to describe the same SOM property. Perhaps this is the case, but I am doubtful, e.g. is a prototype and a sample the same (this is unclear in l.108)? Second, it would help if you could link the SOM properties (such as 'prototype', 'BMU', 'samples', 'n-length sequence', 'number of nodes', 'random samples', 'distance function' (distance between what?), 'topological preservation', 'quantization accuracy', 'topographic error', 'radius of influence') to actual properties of the C-Q data analysis. I assume that the SOM algorithm properties you mention must be associated to some kind of 'metrics' that are derived from the C-Q data (such as duration, time steps, magnitude, difference, event and hysteresis properties). In my opinion it would improve the understanding of the methods tremendously if you could highlight such links wherever possible. For instance, would the number of nodes be similar to the 'sensitivity' of defining individual C-Q-events or to the number of different hysteresis classes, or...?
Specific comments:
l. 20: "while preserving the continuum of loop variability lost in classification schemes" this is unclear to me and could be explained in a little more detail here.l. 138: would it be possible to link/explain C-Q data characteristics with the (some) terms of equation 1 (basically similar to my second major comment above)?
l. 146-148 ff: between what exactly are these quantization errors calculated? It means you have to define a 'true' classification manually and map the SOM against this 'subjective' classification that compromizes your aim of "using SOM to discriminate and characterize loop types commonly seen in sediment transport literature"?
l. 175-176: similarity between two samples - is one sample Q and the other sample C?
l. 191ff: the difference between the first (training on a curated dataset?) and second (application to 'any dataset'?) phase of the SOM algorithm is not clear to me. Does it mean you need to split time series at one location/gauge into training and application? Or can you train at one location and then apply it to another location? Please explain the requirements and limitations in a bit more detail.
l. l.194-201: 'curating a dataset with all known loop types' - this sounds to me like a major limitation of the method. Does it mean that you first need to analyse C-Q time series for 'old-style' loop types? Then, if C-Q relationships are very homogeneous in a catchment, it might be impossible to have a time series with different loop types - is the method then not applicable? For instance, different watersheds can cause quite different loop types for the same constituents - this limits the transferability of the method?
l. 210: This figure is very (!) useful and I would strongly suggest to: (1) expand it to explain additional properties of the SOM algorithm that you introduced earlier and link it to the actual data properties, (2) refer to the figure earlier in the methods.
l. 233: compiling the 'curated dataset' through manual delineation looks like a major effort. First, you need to identify relevant events in the time series, then you need to derive loops and classify them... These steps usually require many subjective decisions (when is an event an event, which class types to use, which class is the resulting hysteresis loop in). How did you do this?
l.257: I thought the number of nodes would have to be already defined in the previous step by arranging the manually delineation hysteresis loops into the grid you show in Figure S2? Is Figure S2 an output of your SOM, is it required as an input/during training of SOM? Or is it purely for information purpose?
l. 258-259: some additional methods and variables are mentioned here, such as "elbow method" or "number of epochs" - you didn't mention it in 2.2 Training process section.
l. 262: to me, this sounds like you calibrate the SOM map to the 'manual' delineation of loop types you conducted during dataset curation. This does not sound like 'unsupervised learning' (see also l. 282) and the 'subjective' classification you critizised in the introduction, is driving the properties of the SOM?
l.283-l.287 here you finally mention that the whole manual delineation/classification is only needed to accomplish the curation of the dataset. Why is this needed? Couldn't I simply take a sufficiently long / high number of time series which implies that all possible loop types exist therein? - ok later you mention that you don't want an uneven distribution of loop classes - but isn't forcing a similar distribution introducing a bias? What would happen if you don't use this dataset curation (perhaps this can be elaborated on in the discussion).
l. 295 suggest to add data sources of the catchment characteristics and how these were derived/extracted.
l. 300-307: The different methods for the three watersheds are confusing. Why additional variables only for 03289000 and not for the others? Why Zuecco-indices for the two others and not for 03289000? Suggest to explain how 'old-water to event-water' was calculated.
l. 329-333: why not selecting 0.02/0.7 which would seem to have a lower euclidean distance error than the one you chose.
l. 340ff: Given my difficulties with the methods, I am also confused by chapters 4.1.2 and 4.1.3. Chapter 4.1.2 is based on the curated dataset (=manual classification as far as I understand your earlier explanation) - but it also shows a 'trained SOM' (l.341). Then in 4.1.3, (l.370) you mention that "manual classification was not seen by the model as part of the training process" - This is confusing.
Minor comments:
l. 74-79 the information given in the caption partly duplicates information given in the main text. Suggest to streamline this.
l. 228: suggest to write "figure eight" to avoid confusion with Figure 8.
l. 313: something is missing here "a two dimensional..." array? matrix?
l.400: reference and explain a, b, c and d in the caption.
l.432: suggest not to name this technical note a "report"
l.475: the three arrows are hard to distinguish. It might make sense plotting them in three different colors and since they seem to overlap with a transparency value
Citation: https://doi.org/10.5194/egusphere-2025-2146-RC3
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
2,668 | 112 | 17 | 2,797 | 41 | 31 | 70 |
- HTML: 2,668
- PDF: 112
- XML: 17
- Total: 2,797
- Supplement: 41
- BibTeX: 31
- EndNote: 70
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Overall, this manuscript is well written and supported with literature. You demonstrate strong scientific rigor, present informative figures, and provide a supportive narrative.
Line 26: You mention that event-scale concentration has been employed for decades, but the oldest citation is only 5 years old (Malutta et al., 2020). Can you add some older/original literature in this first sentence to support your claim? Perhaps Williams (1989), Hamshaw et al. (2018), Bettel et al. (2025) since you mention in Line 225 that hysteresis loops were first recognized in these articles.
Section 2.2 – 2.4: I think these sections would be easier visualized with a workflow diagram that links to Figure 3. You could include a visualization for the process of training the model and finding the BMU. In the same workflow diagram, you can include a visualization for the process of using the DTW. Then, those can have an arrow pointing to the “SOM training” in Figure 3. Additionally, if possible, including the topological preservation and quantization accuracy into the diagram would help create a complete “picture” of the process. Although these three sections are written well, it is hard to visualize the order of the process. Additionally, Figure 3 in its current state is too general to provide a specific picture of the training process.
Line 228: I recommend simply adding a parenthesis such as (as seen on the left rows in Figure 4) after the sentence to immediately direct your viewers eyes to the single-line, clockwise, and counterclockwise.
Figure 4 caption: I would also recommend adding additional information on the single-line, Figure 8, clockwise, and counterclockwise topologies. You have a lot of detail in Figure 1-3, and I would suggest continuing that format.
Line 256: It seems like Line 256 should be appended to line 255.
Line 259: Would the highly distorted maps be defined by topological error (e.g. referring to section 2.3) and not the “topographic” error that is listed? If so, topographic is used throughout the paper (Line 169 and Line 325, Line 327, etc.) and would need corrected.
Line 262: A citation should be provided for the Pareto-optimal analysis.
Figure 8 caption: Please provide an explanation of what is shown in (a), (b), (c), (d), specifically.