the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Elucidating the interplay between catchment and in-stream processes using high frequency multivariate and multisite data
Abstract. Stream water quality data provide essential insights into catchment biogeochemical processes. Continuous high-resolution measurements offer significant potential to differentiate between catchment-scale inputs and in-stream biogeochemical processing. Nevertheless, extracting clear, meaningful information from complex, multivariate time series spanning multiple variables and locations poses significant challenges. Using Principal Component Analysis (PCA) on high-resolution multivariate water quality data, this study aims to (1) separate catchment-scale contributions from in-stream biogeochemical processes, and (2) evaluate the dominant environmental drivers of spatial and temporal variability. The data were collected at five monitoring stations located in the Bode River, Germany. At each station, six variables were measured at 15-minute intervals over a period of seven years (2013–2020). The first principal component (PC1) accounted for 46 % of the variance, capturing the typical seasonal impacts of stormflow dynamics. The second principal component (PC2) revealed the influence of saline groundwater upwelling, particularly during lowflow periods at specific sites. Diurnal fluctuations in pH, driven primarily by algal photosynthetic activity, were identified by the third component (PC3). Additional components highlighted localized processes: PC4, PC5, and PC6 were linked to turbidity variability during discharge peaks, while PC7 reflected anthropogenic influences, notably treated acid mine drainage entering the river. Lastly, PC8 described distinct nitrate dynamics observed at downstream monitoring sites. The application of PCA to high-resolution multivariate data proved to be very helpful in disentangling various catchment and within-stream effects on stream water quality. These findings emphasize the importance of advanced analytical techniques in unravelling complex hydrobiogeochemical dynamics.
- Preprint
(2033 KB) - Metadata XML
-
Supplement
(793 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2026-1203', Anonymous Referee #1, 19 May 2026
-
RC2: 'Comment on egusphere-2026-1203', Anonymous Referee #2, 19 May 2026
Summary
Elucidating the interplay between catchment and in-stream processes using high frequency multivariate and multisite data investigates and discriminates catchment-scale from in-stream processes that collectively affect water quality. The authors use principal component analysis (PCA) to separate signals by location and to better understand overlapping temporal signals. The authors found PCA to be a useful technique for extracting meaningful, process-based information from temporally dense, high-frequency sensor data.
Significance
The significance of this work is the use of a common multi-variate statistical technique (PCA) to pull data from high-frequency data to better understand spatial and temporal patterns. Specifically, the paper provides (1) a clear model on how to take advantage of often underutilized high-frequency sensor data, (2) interprets overlapping temporal signals, and (3) explores spatial patterns resulting from different land uses and environmental templates.
General Suggestions
Overall, this paper was clearly written and easy to read, although the organization of the paper is not easily digestible and in some cases makes parts feel repetitive or unjustified. I provide more general suggestions below, followed by more minor editorial items.
- There are inconsistencies in the hyphenation of words.
- Principal component analysis is one of the most common multi-variate statistical analyses. In some places too much emphasis is placed on PCA being an advanced analysis. For instance, I do not think that differentiating between PCA and EOF on L142-145 is unnecessary because of how common PCA is.
- A major strength of this work is the interpretation of spatial patterns (see Significance). The authors mostly attribute this strength to the use of PCA, although, this strength is more related to the distributed site/experimental design. The authors could strengthen this linkage to show how multi-variate analyses like PCA complement and enhance the benefits of high-quality study design.
- Several key aspects of the analyses are described in too little detail. For instance, the structure of the observation matrix is not completely clear. It would be beneficial to clearly state the matrix dimensionality and if all sites and time steps are analyzed together.
- The results section includes interpretations of the results that should appear in the discussion. This is confusing since the authors attribute specific PCs to processes prior to those connections being justified, which first appear in the discussion. This occurs in every paragraph of results and is particularly noticeable in Figure 3, where the PCs are attributed to specific processes well before any justification is provided. Other examples (not exhaustive) of this are in Section 4.2 (L199-206) and L249-255; this information precedes adequate explanation. Similarly, L288-234 is hard to interpret because concepts are presented prior to any evidence or justification are provided.
- There are too many figures in this paper. I found Figures 4, 5, 7, 10, 11, 12, 14, 16 to be very repetitive and difficult to digest. I highly recommend that the authors pursue a more consolidated presentation of the information to streamline the results section and facilitate more sensible ordering of results followed by interpretations in the discussion section. On another note, the position of the legend within the above figures is not legible and needs to be moved.
- In the results section the authors describe each component in order. This becomes quite repetitive, and it is unclear how important each of the PCs are. One clear solution would be to introduce or reference the variance explained as each PC is discussed. It is hard to jump back and forth to Table 2 to assess how important each PC is.
- A significant amount of work has been done in this region outside of this manuscript. It is frequently referred to in surficial or anecdotal terms. In some places being more specific and providing details would be beneficial. For example, on L237-239, are there any groundwater quality data that can be drawn upon to bolster this assertion?
- The authors mention that small gaps are filled by interpolation. The severity of missing data is blurred because the authors present an average of missing data across multiple sites and parameters. In Figure 15, for example, relatively long periods appear interpolated (see pink line) and in Figure 17, similarly sized gaps in nitrate (see orange) are not interpolated. It is unclear what criteria for filling were used. It is also unclear if the observation matrix for PCA only included time steps that were complete across all sites and parameters.
- The interpretation in section 5.1 about changes in discharge moving downstream is much more detailed, and contradicts, an earlier statement that attributes the observations solely to groundwater contributions concentrated in the headwaters alone.
- The suggestions made on L508-513 are overly vague.
- Figure 1 does not provide a broad enough context of the study. For instance, it is unclear where the Harz Mountains are relative to the watershed and how much of the watershed is intersected.
Specific Suggestions
L15: Consider replacing “over a period of seven years” with “seven-year period”.
L25: It is unclear if “Continuous monitoring” refers to uninterrupted data or a time series of data, which might be better described as high-frequency discrete time-interval data, rather than continuous.
L30: In-text citation is missing a closing bracket.
L36: I suggest replacing “This complexity can hinder watershed modelling, given the necessity to describe water and mass fluxes, as well as the transformation processes occurring in various compartments, such as soil, vadose, and saturated zones, including surface and groundwater interactions” with “This complexity can hinder watershed modelling, given the necessity to describe water and mass fluxes, as well as the transformation processes occurring within and between various compartments, such as soil, vadose, and saturated zones.”
L42: “spatial signatures” has not been used or described before. This is jargon, consider defining or using a more direct description.
L43: Consider replacing “has emerged as” with “is”.
L50: It is unclear if “in heterogenous datasets” refers to temporal heterogeneity or the combining of datasets into one.
L52-53: I was unable to interpret this sentence and think it should be revised.
L55: It is unclear what is meant by “watershed-scale contributions”. It is unclear how contributions could be discriminated from processes. Could “watershed-scale processes” be used instead while retaining the author’s intent?
Table 1: The column title “Dominant Land Use” is inaccurate; the authors provide additional details including secondary and tertiary land uses.
L94: Consider replacing “accounts” with “accounting”.
L116: “starting 2018” – missing word?
L118: “developed only after the end of the monitoring periods” is ambiguous.
L130-134: Is this observation exacerbated by land use and land cover? An earlier description described a land use gradient that may reasonably contribute.
L134: Missing space at start of sentence.
L134-135: This statement implies that source is more important that in-stream transformations without justification.
L196: “For pH, there is a gradual increase….” Are the authors referring to the correlation coefficients or pH values?
L224: Consider replacing “with the pH variable” with “with pH”.
L223-225: The authors claim that this statement is interesting, but no rationale is provided. I suggest more directly presenting this result and then exploring its interpretation in the discussion section.
L236: “groundwater contributes a great portion of the river discharge” is vague without contextualization or data.
L266-267: This statement requires a reference.
L275-278: I found it hard to reconcile the “highly significant” 4-6 components, considering they account for a relatively small amount of the variance described in Table 2.
L279: This statement about “strong positive loadings across all sites” seams to contrast with an earlier description about sediment trapping in the regulated part of the network.
L356-357: A reference or specific data should support this statement.
L362-363: The evidence for a pre-event water signal is ambiguous. I do not doubt that it is present, but it is not clearly explained with the data.
L367-370: Is this system Karst? If not, is this an appropriate example?
L384-388: References should support these statements.
Citation: https://doi.org/10.5194/egusphere-2026-1203-RC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 165 | 48 | 11 | 224 | 27 | 11 | 17 |
- HTML: 165
- PDF: 48
- XML: 11
- Total: 224
- Supplement: 27
- BibTeX: 11
- EndNote: 17
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General comments
The manuscript by Gutiérrez‑García et al. presents an interesting application of Principal Component Analysis (PCA) to a high-frequency multi-variable dataset collected on the Bode River catchment in Germany. Results from the PCA analysis were evaluated together with additional independent data, which helped to identify the major hydrological and biogeochemical processes occurring in the catchment or within the stream. Such approach appears very valuable for extracting key trends from a complex multi-variate dataset, and could surely contribute to future research.
Overall, the manuscript is generally well written and the interpretations proposed by the authors are generally consistent with the observed spatial and temporal patterns. However, the authors were not sufficiently clear about the study’s limitation, particularly concerning the methodological and interpretation weaknesses.
My main concern involves the interpretation of PCA components, which are sometimes based on weak correlations, which are themselves based on principal components (PC) that account for only a small portion of the total variance. Indeed, the authors chose to keep 8 principal components to reach a cumulative variance of 84%. However, only the first four PCs were associated with a variance greater than 5%. Such low-variance PCs are often considered weakly informative and potentially sensitive to noise. Most of the studies cited in the manuscript only used the fourth first PCs for interpreting their datasets. Although some independent validation is provided, the authors should present more clearly the robustness and limitations associated with the interpretation of low correlations and less informative PCs.
Several suggestions are presented below, that the authors should feel free to take into account or not.
Detailed comments
L14: it might be helpful to specify in the abstract which variables were used in this study.
L25: the authors mention broader implications for environmental monitoring and risk reduction that are not sufficiently developed or taken into account later in the paper. This approach could be applied in a variety of context, which it would be useful to mention here or in the discussion part.
L28: high frequency datasets are common in an operational context. That means that your approach could be applied to a large number of catchments in the world, using datasets that are often undervalued. A few words on these implications might add value to your paper.
L48: I cannot see which of these references use PCA to differentiate catchment scale processes from in-stream dynamics. Maybe you should separate references according to the main goals of each study. Furthermore, if, as you noted, previous studies has already applied PCA in this purpose, wat is the added scientific value of your study ? More generally, the introduction would benefit from providing a clearer explanation of the scientific challenges associated with PCA and high-frequency datasets.
Table 1: I would have appreciated to see the proportion of each land use for each sub catchment summarized in this table.
Figure 1: This figure should be improved to include:
L60: It is not clearly stated that the Mageburger Börde corresponds to the lowland plain area, if it’s the case.
L105: first mention of the Central German Lowland area, is this area different from the previous one cited ?
L129: I would have appreciated to know more about the hydrological regime of the Bode and Selk rivers (e.g. mean annual discharge) before knowing about the discharge observed in each station. It may also be helpful to add the mean discharge when writing about the main differences between two stations.
L140: Could you justify the use of Spearman correlation in your study, I guess for the non-normally distribution of the dataset, or the presence of outliers ? Maybe you could also precise that the very large dataset you are using automatically reduce the risk of false correlations
L141: how did you define the low, medium and high regime ? Any thresholds to provide ?
L142-L146: I’m not sure if this part is relevant. Why mention the term “EOF” if it is not used in your context. Besides, you already introduced PCA in the introduction. If you have more information to provide on its general application, you should have mentioned it earlier.
L151: I might be useful to precise that the sum of all eigenvalues give how much the total variance is captured by PCA.
add that the sum of all eigenvalues give how much the total variance is captured by PCA. Furthermore, could you please provide some clarification regarding the sentence on line 165: “first four pirincipal comoennt exhibited eigenvalues exceeding 1”. What does this mean ?
L153: Do you have some references ?
L166: See comment 151. So, you have 84% of total variance explained by the eight principal components, so you reach your objective of 80-90%, but you didn’t mention for how many PCs it may be acceptable to have a minimum 80%. PCs that explain between 2 to 5% or generally considered poorly informative and the interpretation that you made from these might be overinterpreted, particularly if you chose to associate clear hydrological processes to each PC. I would have appreciated some words about this risk of overinterpretation in the Methods and in the Discussion sections.
L172: your site description was already done in section 2. You should have mentioned these elements earlier.
L193: it might be useful to define a threshold for which a correlation coefficient is considered weakly or strongly informative for interpreting PCA. For example, you mentioned (L219) “strong” relationship with discharge and turbidity, for absolute correlation coefficient < 0.5. I understand that with such a large volume of data, correlation coefficients are often significant even if they are low, but you should be careful about the term used.
L243: maybe you should mention again in each subsection how much variance is explained by each PC. It may help to appreciate the robustness of each interpretation.
L244: you might precise which stations are correlated with pH.
L261: I’m not sure about the purpose of this sentence ? What does it imply ?
L262: “this effect”. You should precise which effect you are referring to.
L279: again you mentioned “strong” correlation across all sites, but correlation coefficients are greater than >0.5 for SH and MEI only (and maybe GGL?).
L300-331: Components 7 and 8 are poorly informative and showed weak correlation with the variables, but you didn’t mention the potential weakness of your interpretation in the results or discussion.
L336: the discussion section might benefit from being organized by processes and not by PC. Indeed, you highlighted some major hydrological and biogeochemical processes which could constitute a sub-section on their own. These processes could therefore give their name instead of mentioning the PC. For example:
- Section 5.1 : you named this section “hydrological processes” but mentioned mainly about specific discharge contributions. You should precise this title;
- Section 5.2 : PC1 is based on discharge peaks but you mentioned manly nitrate exportation. Maybe a unique sub-section referring to nitrates might be clearer, even if I understand that it is not the same processes involved.
Furthermore, I’m not sure to see a clear separation between the results and discussion. Some discussions parts often provide another description of the results on PCs (e.g.: L440, L420, L425, L383).
L343: but didn’t you mentioned that HAU was not in the mountainous region ?
L372: it might be helpful to provide the correlation coefficients. Otherwise, the figures S1, S2 and S3 do not appear to be fully exploited.
L386-388: do you have a reference ? This sentence appears more as results/interpretation than a discussion if you don’t cite any references.
L481: this paragraph should be included in another subsection where you could discuss the validation methods used and actually propose other, more robust validation methods. Furthermore, as mentioned earlier, this subsection and this study would benefit from being supplemented with some broader implications : what could these key environmental factors extracted from this dataset contribute in other contexts, particularly in operational contexts where such datasets are common ?
L498: this reference should have been in the Discussion section.
Technical comments
L53: add a comma after “Then”.
L117: I don’t understand this sentence, any mistyping ?
L134: missing space between point and “Correspondingly”.
L155: add a coma after “eigenvectors”.
L157: add a coma after “components”.
L186: lowercase after “:”
L249: wrong Figure.
L465: add coma after “STF”.