the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Improved endmember mixing analysis (EMMA): Application to a snowmelt-dominated stream in northern Utah
Abstract. An endmember mixing analysis (EMMA) is a sophisticated hydrograph separation technique used to determine the primary water sources in a watershed and estimate their respective input over time. In a traditional EMMA approach, a principal component analysis (PCA) is used to identify endmember composition, and the retained principal component (PC) scores are used to calculate the fractional contributions of each endmember. This approach is based on the idea that the reduced dimensionality of the endmember data in just a handful of PCs contains the most useful information. While this approach does simplify the mixing calculation, it limits potential model complexity. We show that calculating endmember contributions using the original water chemistry data (tracer space) results in a more simplified and uniform approach than performing the calculation in PC-defined subspace. Additionally, we demonstrate an iterative approach to selecting the tracers and endmembers to create a more complex (and more representative) model. We applied EMMA to the upper Provo River watershed (262 km2), a snowmelt-dominated catchment in northern Utah, to test some potential improvements in the method. Five endmembers (quartzite groundwater, carbonate groundwater, mineral soil water, organic soil water, and snow) were identified for the watershed and differentiated using seven tracers (δ18O, δ2H, HCO3-, Si, Mg2+, K+, and Ca2+). We applied this approach in a well-defined workflow implemented in EMMALAB, a software application designed to perform EMMA on one or more stream locations in a catchment. The analysis showed that snow was the dominant endmember during spring runoff, contributing 38 % of flow on average, while quartzite groundwater contributed 60 % during baseflow. The iterative analysis for selecting endmembers and tracers is easily implemented through EMMALAB, allowing for a uniform and simplified approach to apply the complex mathematics behind EMMA for more accurate hydrograph separation calculations.
- Preprint
(1588 KB) - Metadata XML
-
Supplement
(1297 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2053', Fengjing Liu, 10 Jul 2025
-
CC1: 'Reply on RC1', Alyssa N Thompson, 15 Jul 2025
We thank Prof. Liu for his kind attention to our manuscript. We found several of his comments useful, but we believe his main points are incorrect due to several important misunderstandings. In the following, we quote his review and intersperse responses to his main points. We did not respond to the minor comments as many of them were addressed in our response to the main comments.
Reviewer Comment #1:
As stated in the manuscript, the primary objectives of this study were (1) to critically evaluate previous EMMA approaches and (2) to introduce a new software package (EMMALAB) to facilitate a rapid, iterative modelling process that yields more reliable results and can accommodate maximum complexity. In my opinion, after reading the manuscript carefully, I do not think both objectives were justified or achieved. Primarily due to misunderstanding of EMMA and its mathematical procedure, the research design is fatally flawed and the comparison in the results between EMMA and EMMALAB was not made on an ample-to-ample basis. Recent efforts and progresses made by many researchers were mostly overlooked. I will elaborate my point of view below.
For a mixing model in general, several assumptions must be met (e.g., Hooper and Shoemaker, 1986): (1) solutes must be conservative; (2) the number of end-members is known; (3) solute concentrations are distinct over end-members for at least one solute; (4) solute concentrations in end-members are constant over time or temporal variations are known; and (5) solute concentrations in end-members are constant over space or treated as different end-members. The evaluation of the above assumptions #1 and #2 used to rely on a catchment hydrologic analysis and often remained very challenging. Not until 2003 when Hooper developed diagnostic tools of mixing models (DTMM) did the reinforcement of the assumptions #1 and #2 become statistically testable (primarily through analysis of distributions of U-space projected residuals against measured solute concentrations).
Author Response #1:
Hooper did not actually suggest any methods for determining the conservative behavior of tracers, other than evaluating how well a given model predicts stream tracer concentrations. He offered some insights into how one might identify non-conservative behavior in previously chosen tracers but did not offer recommendations for how to obtain this initial set. Bivariate plots are used in Hooper (2003) as an example of a method that can be used for detecting conservative behavior in an extreme case where only two endmembers mix, since this is the only situation in which linear mixing trajectories would be guaranteed for conservative tracers. In our manuscript, we suggested using bivariate plots of potential tracers as a starting point for the analyst (see P16 L420-428), because strongly linear mixing behavior between multiple tracers would be extremely unlikely if they were behaving in a non-conservative manner, and such behavior might be reasonably approximated in cases where two endmembers are relatively dominant. We then suggested iteratively adding other tracers (especially species that typically behave conservatively) and evaluating the effect of their inclusion on the solute predictions.
One of the main points of our paper is that there is no guaranteed method for identifying conservative tracers, and so tracer selection is best done as part of an iterative modeling process. Such an approach is greatly facilitated by a flexible, yet user-friendly and fast software package such as EMMALAB. In fact, in one paper (Liu et al., 2017, Water Resources Research, 44, W12433) the reviewer and his coauthors seem to advocate simply retaining a number of tracers that is one less than the number of endmembers, based on their residual plots from the initial model runs. However, this appears to us to imply that they must have 1) assumed an initial set of likely conservative tracers, 2) used a PCA based on that initial tracer set to estimate a likely number of endmembers, and 3) selected the tracers to retain based on their residuals. Although we do not advocate retaining a number of tracers one less than the number of endmembers (see below), this process is quite similar to ours and is certainly iterative to some degree. Therefore, a software package that facilitates rapid iteration would certainly benefit even the process Prof. Liu advocates.
Reviewer Comment #2:
DTMM relies solely on streamflow chemistry, without any information from end-members, to determine conservative tracers and the number of end-members. DTMM can also be used to evaluate the eligibility of end-members (through calculations of end-member distances between S- and U-Space). Since 2008, combining DTMM and principal component-based EMMA developed by Christophersen and Hooper in 1992 (EMMA-1992) has dominated hydrograph separations in catchment hydrology (many references can be easily found). EMMALAB that is being reviewed did not provide a statistically testable procedure to determine conservative tracers and the number of end-members. Instead, bivariate plots were used to determine conservative tracers and then (confusedly) added and deleted solutes from the list through trial and error. The determination of the number of end-members remained completely subjective. As a result, the solute concentrations were forced to fit in the mixing space. In this case, if any non-conservative solutes are included in the analysis, which violates the assumption #1 above, “beautiful” numerical results can still be obtained and tracer concentrations can be “well” simulated with the EMMALAB procedure, but will not be guaranteed to be hydrologically meaningful (I have more on this point later).
Author Response #2
Prof. Liu is mistaken here. The DTMM does provide a way to evaluate potential endmembers by calculating their distance from the hyperplane formed by the retained PCs, but this is emphatically not a “statistical test”. Rather, it is a way to rank potential endmembers in terms of likelihood that they are major contributors to stream chemistry, given previously chosen sets of tracers and retained PCs—which we admit can be useful. However, this begs the question of how many endmembers should be chosen. Prof. Liu advocates choosing one more endmember than the number of retained PCs, but this is actually a way to determine the **minimum** number of endmembers needed (see our discussion on P19 L503-510). And in any case, the number of PCs to retain **always** involves some level of subjectivity. We reviewed multiple common methods for choosing the number of endmembers, which indicated either 3 or 4. We chose 3, with our reasoning explained on P18 L493-497. As suggested in Hooper (2003), a residual analysis of the model can be calculated and viewed for structure to determine the number of PCs to retain. This method improves upon previous methods, such as “the rule of one”, to provide a more formal approach to selecting PCs, but does not remove all subjectivity from selecting which PCs to retain. This residual analysis is actually integrated into EMMALAB under the “PCA” tab.
As we discussed on P12 L289-302 and P15 L396-399, the number of PCs retained is not as critical for our procedure, because we advocate performing the mixing calculation in tracer space, rather than U-space. In other words, we retain ALL the available information to constrain the model, rather than discarding some arbitrary amount. Instead, the PCA is used solely as a tool to visualize which potential endmembers can reasonably circumscribe (at least most of) the stream data.
Prof. Liu then unfairly characterizes our modeling procedure as a simple curve-fitting exercise. First, he seems to be advocating a procedure in which the matrix equation for the mixing calculation is always critically determined (same number of linear equations as adjustable parameters) and so can be solved directly. However, choosing the maximum number of conservative tracers via trial and error, as we advocate, can turn it into an overdetermined problem that can be solved via optimization. Overdetermined equations are (by definition) more constrained, so clearly this part of our procedure militates against excessive model complexity. (EXAMPLE: Fitting a straight line in 2D space to two data points is a critically determined problem, i.e., it can be represented in terms of two linear equations and two unknowns. Fitting a straight line to 100 data points is an overdetermined problem, i.e., it can be represented in terms of 100 linear equations and two unknowns. It is generally acknowledged that fitting a line to 100 data points is more informative than fitting a line to two data points.) Second, while it is true that including more endmembers can unnecessarily complicate a model, we chose our endmembers based on what was needed to circumscribe our stream data in 3D U-space. As explained above, our choice of the number of PCs to retain necessarily involved some level of subjectivity, but it was hardly arbitrary.
One of our main points that seems to have been completely lost in this discussion is that greater model complexity (i.e., a greater number of endmembers) may be justified for a number of reasons, including modeling the hydrology of larger watersheds or exploring the contributions of non-dominant water sources. And if so, more tracers are mathematically required to include more endmembers.
Reviewer Comment #3
In comparison of the results between EMMA-1992 and EMMALAB, authors adopted an incorrect perception of EMMA-1992 and did not follow the established procedure of DTMM. Authors stated that only two or three PCs were allowed to be used in EMMA-1992. This assertion is not incorrect but not totally true. If the number of end-members is known (through DTMM or any independent tools with a statistically testable procedure), PCs with that number less one should technically be retained to derive end-member contributions, which has been demonstrated in many studies. When six tracers and five end-members were determined by EMMALAB in the study being reviewed, 2 PCs, 3 PCs, and 4 PCs were used to solve 5 end-member contributions (Figure 5 of the manuscript). This comparison is not based on ample-to-ample. If there are indeed five end-members, then 4 PCs should be used and its results should be compared with those using EMMALAB. As a matter of fact, the results using 4 PCs were almost identical with those of EMMALAB (the bottom panel of Figure 5), proving that EMMALAB did nothing significantly different from EMMA-1992. Note that I do not mean six tracers and five end-members determined by EMMALAB were correct, but just take authors’ own results to point out what went wrong with their analysis. Whether or not there should be 3, 4, or 5 end-members, there must be a statistically testable procedure to determine that.
Author Response #3
1) Yes, it is a common practice to use two or three PCs, as outlined in Christophersen and Hooper (1992). We never suggested that *only* two or three PCs are allowed, simply that it is the common practice (see P12 L292-293).
2) Again, one plus the number of tracers or retained PCs is the **minimum** number of endmembers that must be chosen. In other words, a cloud of data on a 2D plot can be circumscribed by a triangle drawn between three points, but it can also be circumscribed by four points. It is a mathematical fact, as evidenced by the following quotation from Hooper (2003): “Thus, if the rank of the data set is two (i.e., a plane), a minimum of three end-members is required for a mixing model.”
3) Prof. Liu seems to confuse the application of statistics by suggesting that DTMM has “statistically testable” procedures and that our study lacks statistical rigor. We reiterate the fact that there is nothing in the DTMM that can be properly called a “statistical test”.
4) We agree that “the results using 4 PCs were almost identical with those of EMMALAB.” We specifically pointed this out to illustrate the fact that the more PCs you retain, the more closely the U-space mixing solutions converge on the tracer-space mixing solutions. In fact, 4 PCs retained 96.4% of the data variance, so what else would one expect in that case? We also used this to demonstrate that artificially restricting the analysis to two or three dimensions (as is very commonly done) can drastically reduce accuracy. However, this has nothing to do with “proving that EMMALAB did nothing significantly different from EMMA-1992.” It is a simple illustration of the fact that there is no reason to perform the mixing calculation in dimensionally reduced space, which necessarily involves constraining the calculation with less information.
5) This brings up another point about how Prof. Liu seems to be viewing EMMALAB. It is not, essentially, an alternate method of performing EMMA. Rather, it is a computer program that makes performing EMMA much faster and easier. The only new approach included in the program itself is the ability to perform the mixing calculations in tracer space, but it is also capable of performing the calculations in U-space, just as has always been done.
Reviewer Comment #4
Mixing is a linear combination of solute concentrations, while chemical equilibrium involves higher order of polynomial relations with ions having charges greater than 1. Principal component analysis (PCA) may not be a perfect but is best available tool to test whether or not those solute concentrations are linearly associated when a lower number of PCs is examined. It is true that the number of PCs is usually not recommended to go too high relative to the number of solutes (analytes) included in the analysis. Otherwise, the results of chemical equilibrium can be approximated by simultaneous linear equations of PCs. This statement may be hard to follow, but an extreme example may be helpful. Given six tracers resulting from impacts of mixing and chemical equilibrium (just a scenario not authors’ case), one can extract six PCs with linear expressions, all of which together explain 100% of variance included in the original concentrations (this is how PCA is routinely conducted). Sometimes if not most times, one does not have to use all six PCs to have eigenvalues close to 100% if some analytes have a higher correlation coefficient (e.g., Ca2+ and HCO3- and 18O and 2H in this study).
Author Response #4
Prof. Liu seems to misunderstand how principal component analysis was applied in our study. While it is true that frequently all of the PCs do not need to be included to explain nearly 100% of the variance, this does not negate our point that it should essentially always be better to do the mixing calculations in the full tracer space, rather than dimensionally reduced space (see above). More information more strictly constrains the model. The percentage of data variance explained should always be considered when choosing the number of principal components retained, but there is no standard value for what that percentage should be, regardless of how many endmembers are used.
Reviewer Comment #5
In this case, if one uses a higher number of end-members (say really close to six), it is guaranteed to have a “sound” solution and “nice” projection for all analytes, no matter if one uses PCs or original concentrations. That is why the results between EMMA-1992 and EMMALAB in this study were so close as shown in the bottom panel of Figure 5. This PC-based limitation may have caused authors to state that only two or three PCs were allowed to be used in EMMA-1992. However, authors’ approach based on EMMALAB suffers from the same limitation if the conservative tracers and the number of end-members are not independently determined with a statistical test.
Author Response #5
Again, what statistical test is proposed? Or justified by the reviewer? In the paper, we provide standard mathematical metrics that are used to justify the choice of the number of PCs (see Sections 2.7 and 3.3).
Reviewer Comment #6
With five end-members, one can make up their values of six analytes and easily force them into a mixing space as long as they make sure all stream samples are located inside a convex hull (note that this is different from Xu and Harman 2022, who derived such a convex hull from the streamflow chemical data). One can imagine that the number of such a convex hull is not limited. However, the results may not be hydrologically meaningful. Note that I am not supposed to reveal my perception conveyed in this paragraph as it is one of the major points I will talk about in an EMMA-review paper I am working on. Without this detailed explanation, however, I feel it is hard to convince authors and other readers. Some of my arguments above may be hard to follow as many details have to be omitted. I hope the specific comments below help authors grasp what I meant exactly. It looks like authors got a very good data set. Please continue your work by reading more papers and finding a fillable gap associated with EMMA. I believe your efforts will eventually pay off.
Author Response #6
This comment is irrelevant. We chose potential endmembers based on the known geological and hydrological features in the study area, and on the U-space projections. We are aware of methods to estimate endmember compositions based on convex hulls around the stream data, but these will only provide realistic estimates in cases where the stream data closely approaches 100% contribution from all the endmembers at various points in time. If we had chosen our endmembers via such a method, Prof. Liu’s comment about using too many analytes might have been appropriate. However, our endmember samples were collected in the field, so we did not have to use methods for estimating endmember compositions.
Author Summary Statement
1) Prof. Liu’s contention that the Hooper’s DTMM includes methods to “statistically test” the validity of tracer and endmember selections is incorrect.
2) Optimizing overdetermined mixing models is generally preferable to solving critically determined mixing models, because overdetermined models are more strictly constrained.
3) Choosing the maximum number of conservative tracers turns the mixing calculation into an overdetermined problem. Therefore, it is nearly always preferable to perform the mixing calculations in tracer space, using the maximum number of conservative tracers available.
Citation: https://doi.org/10.5194/egusphere-2025-2053-CC1
-
CC1: 'Reply on RC1', Alyssa N Thompson, 15 Jul 2025
-
RC2: 'Comment on egusphere-2025-2053', Anonymous Referee #2, 24 Sep 2025
The objectives of the paper Improved endmember mixing analysis: Applications to a snowmelt-dominated stream in northern Utah is to critically evaluate past EMMA approaches and to introduce the new software package EMMALAB designed to allow easy implementation of endmember mixing analysis (EMMA). The authors suggest that the EMMALAB package is an improved and more flexible tool than what has been employed predominantly in the past. To make this justification, the authors utilized a robust dataset collected in the upper Provo River watershed to illustrate the use of EMMALAB, and support implementation of their model. Despite the robust dataset, I felt that the authors did not implement an appropriate workflow for an adequate comparison between EMMA and EMMALAB, and therefore did not properly evaluate the use of EMMALAB or illustrate it’s use case.
The authors justify the use of EMMALAB using a wide array of both endmembers and tracers suggesting improved model selection support. However, basic knowledge of the study site suggests that certain tracers should have been initially omitted prior to preforming an EMMA (in EMMALAB or otherwise). For example, the authors included Na+ and Cl- as tracers even though knowledge of the study site suggests otherwise. As the authors mention the Provo River watershed contains a highway which in the winter is known to be treated with road salt. Although not an assumption required by mixing models, constraints of physical realism should also be considered prior to selection. This would have resulted in removal of both Na+ and Cl- as tracers or better characterization of the road salt endmember in the initial sampling design.
I also found the explanation of the integration of EMMALAB with other MATLAB packages to be confusing. It was unclear which functionalities were EMMALAB or MATLAB derived. On several occasions the authors make reference to an app, with no explanation of what this is. I assumed that the app referred to EMMALAB, however this was not clear in the text. A workflow diagram highlighting the steps and tools would be very beneficial to support understanding of the process required to implement EMMLAB.
Was the end of season snow core that was collected and then melted used for isotopic analysis? It is well documented that the isotopic signature of the snowpack will change over time with snowpack fractionation which will impact the signature of runoff feeding streamflow. I think that this needs to be justified or you need to account for this process.
Specific Comments:
Line 20: is this catchment purely dominated by snow? Why was precipitation as rain not included as an endmember? It looks like you have several (small) rainfall events based on the shape of the hydrograph.
Line 31: please specify the added complexity that can be handled.
Line 44: can you please specify what characteristics define complex watersheds.
Line 74: how did you determine which solids or isotopes behave conservatively as tracers?
Line 81: should the diverted water be included in the mixing analysis? Could you please include more details about the water diversion?
Line 80-84: repetitive to material on 65, suggest aggregating.
Figure 1: suggest including the flow monitoring stations on the map.
Figure 1: could you please add the location of highway to the study map?
Line 110: based on the discharge data it looks as though this catchment is subject to inter-season freeze/thaw or rain on snow events. How would that impact evaluation of the snowpack endmember? Should more frequent sampling of the snowpack be done to improve characterization of this endmember?
Line 179: More details are needed on the mixing calculations for quick and thorough analysis of potential tracers.
Section 2.6: this section would really benefit from a workflow diagram. It would also be useful to convey how EMMALAB is integrated with other “standard” MATLAB packages.
Line 295: the comment about noise is interesting. It would have been nice to see an example of this.
Line 324: this is the first time the authors mention the app, with no description of what this is referring to.
Section 2.9: it would be beneficial to provide refence material to support this approach to error analysis.
Line 404: why provide functionality for something that is not recommended by the authors?
Line 426: why use bivariate plots if not recommended?
Line 429: recommendations are being made prior to displaying the results.
Line 444: you do not provide sufficient rational in the text for 0.8.
Line 533 to 534: you do not actually compare two different methodologies so this statement can not be made. You also can not make a statement like this without some type of statistical evaluation.
Section 3.4.1: the authors provided limited support for why this approach is the most accurate. The authors do not implement a research approach that allows comparison of “traditional” EMMA and the use of EMMALAB. This comparison is not sufficient or appropriate.
Line 582: you can use a dual isotope plot and calculate deuterium-excess to evaluate the effects of evaporation.
Line 589 to 590: I would argue that we should first explain things in the context of physical terms. Not the other way around.
Line 618: did you provide any evidence of this in your introduction?Citation: https://doi.org/10.5194/egusphere-2025-2053-RC2
Model code and software
EMMALAB v. 1.1: Software for improved endmember mixing analysis A. N. Thompson et al. https://www.hydroshare.org/resource/90ad78faec9f41c180d9057b9e815785/
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
769 | 96 | 18 | 883 | 26 | 9 | 24 |
- HTML: 769
- PDF: 96
- XML: 18
- Total: 883
- Supplement: 26
- BibTeX: 9
- EndNote: 24
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
As stated in the manuscript, the primary objectives of this study were (1) to critically evaluate previous EMMA approaches and (2) to introduce a new software package (EMMALAB) to facilitate a rapid, iterative modelling process that yields more reliable results and can accommodate maximum complexity. In my opinion, after reading the manuscript carefully, I do not think both objectives were justified or achieved. Primarily due to misunderstanding of EMMA and its mathematical procedure, the research design is fatally flawed and the comparison in the results between EMMA and EMMALAB was not made on an ample-to-ample basis. Recent efforts and progresses made by many researchers were mostly overlooked. I will elaborate my point of view below.
For a mixing model in general, several assumptions must be met (e.g., Hooper and Shoemaker, 1986): (1) solutes must be conservative; (2) the number of end-members is known; (3) solute concentrations are distinct over end-members for at least one solute; (4) solute concentrations in end-members are constant over time or temporal variations are known; and (5) solute concentrations in end-members are constant over space or treated as different end-members. The evaluation of the above assumptions #1 and #2 used to rely on a catchment hydrologic analysis and often remained very challenging. Not until 2003 when Hooper developed diagnostic tools of mixing models (DTMM) did the reinforcement of the assumptions #1 and #2 become statistically testable (primarily through analysis of distributions of U-space projected residuals against measured solute concentrations). DTMM relies solely on streamflow chemistry, without any information from end-members, to determine conservative tracers and the number of end-members. DTMM can also be used to evaluate the eligibility of end-members (through calculations of end-member distances between S- and U-Space). Since 2008, combining DTMM and principal component-based EMMA developed by Christophersen and Hooper in 1992 (EMMA-1992) has dominated hydrograph separations in catchment hydrology (many references can be easily found). EMMALAB that is being reviewed did not provide a statistically testable procedure to determine conservative tracers and the number of end-members. Instead, bivariate plots were used to determine conservative tracers and then (confusedly) added and deleted solutes from the list through trial and error. The determination of the number of end-members remained completely subjective. As a result, the solute concentrations were forced to fit in the mixing space. In this case, if any non-conservative solutes are included in the analysis, which violates the assumption #1 above, “beautiful” numerical results can still be obtained and tracer concentrations can be “well” simulated with the EMMALAB procedure, but will not be guaranteed to be hydrologically meaningful (I have more on this point later).
In comparison of the results between EMMA-1992 and EMMALAB, authors adopted an incorrect perception of EMMA-1992 and did not follow the established procedure of DTMM. Authors stated that only two or three PCs were allowed to be used in EMMA-1992. This assertion is not incorrect but not totally true. If the number of end-members is known (through DTMM or any independent tools with a statistically testable procedure), PCs with that number less one should technically be retained to derive end-member contributions, which has been demonstrated in many studies. When six tracers and five end-members were determined by EMMALAB in the study being reviewed, 2 PCs, 3 PCs, and 4 PCs were used to solve 5 end-member contributions (Figure 5 of the manuscript). This comparison is not based on ample-to-ample. If there are indeed five end-members, then 4 PCs should be used and its results should be compared with those using EMMALAB. As a matter of fact, the results using 4 PCs were almost identical with those of EMMALAB (the bottom panel of Figure 5), proving that EMMALAB did nothing significantly different from EMMA-1992. Note that I do not mean six tracers and five end-members determined by EMMALAB were correct, but just take authors’ own results to point out what went wrong with their analysis. Whether or not there should be 3, 4, or 5 end-members, there must be a statistically testable procedure to determine that.
Mixing is a linear combination of solute concentrations, while chemical equilibrium involves higher order of polynomial relations with ions having charges greater than 1. Principal component analysis (PCA) may not be a perfect but is best available tool to test whether or not those solute concentrations are linearly associated when a lower number of PCs is examined. It is true that the number of PCs is usually not recommended to go too high relative to the number of solutes (analytes) included in the analysis. Otherwise, the results of chemical equilibrium can be approximated by simultaneous linear equations of PCs. This statement may be hard to follow, but an extreme example may be helpful. Given six tracers resulting from impacts of mixing and chemical equilibrium (just a scenario not authors’ case), one can extract six PCs with linear expressions, all of which together explain 100% of variance included in the original concentrations (this is how PCA is routinely conducted). Sometimes if not most times, one does not have to use all six PCs to have eigenvalues close to 100% if some analytes have a higher correlation coefficient (e.g., Ca2+ and HCO3- and 18O and 2H in this study). In this case, if one uses a higher number of end-members (say really close to six), it is guaranteed to have a “sound” solution and “nice” projection for all analytes, no matter if one uses PCs or original concentrations. That is why the results between EMMA-1992 and EMMALAB in this study were so close as shown in the bottom panel of Figure 5. This PC-based limitation may have caused authors to state that only two or three PCs were allowed to be used in EMMA-1992. However, authors’ approach based on EMMALAB suffers from the same limitation if the conservative tracers and the number of end-members are not independently determined with a statistical test. With five end-members, one can make up their values of six analytes and easily force them into a mixing space as long as they make sure all stream samples are located inside a convex hull (note that this is different from Xu and Harman 2022, who derived such a convex hull from the streamflow chemical data). One can imagine that the number of such a convex hull is not limited. However, the results may not be hydrologically meaningful. Note that I am not supposed to reveal my perception conveyed in this paragraph as it is one of the major points I will talk about in an EMMA-review paper I am working on. Without this detailed explanation, however, I feel it is hard to convince authors and other readers.
Some of my arguments above may be hard to follow as many details have to be omitted. I hope the specific comments below help authors grasp what I meant exactly. It looks like authors got a very good data set. Please continue your work by reading more papers and finding a fillable gap associated with EMMA. I believe your efforts will eventually pay off.
Specific Comments (P refers to page and L refers to Line of the original manuscript):
P1/L15: Specify what complexity this study unravels.
P1/L15-17: I do not see this statement has been proven in this study.
P1/L17-18: How was this iterative approach statistically evaluated and justified? Given six tracers and five end-members with a few pairs of tracers strongly correlated (e.g., 18O and 2H and Ca2+ and HCO3-), you can always get a mathematical solution and project well their concentrations.
P1/L29: Some studies started using EMMA to understand and quantify sources of groundwater recharges (e.g., Hofmeister et al., 2022).
P2/L31: Specify complexity.
P2/L34: Better use specific conductance (SC) instead (SC = EC standardized at 25oC).
P2/L35: Many references, some of which are very important, are missing here.
P2/L36: Missing the most important characteristics of EMMA. EMMA, in conjunction with DTMM, was used to determine the rank of data (number of end-members), identify conservative tracers, screen end-members for eligibility to be used in hydrograph separations, and validate the results. The flawed design of this study was primarily caused by misunderstanding of EMMA, particularly its most recent developments.
P2/L36: “cluster of data” is not an accurate phrase. “rank of data” is more accurate. Please cite exact language from the citations for the key terms.
P2/L43: “more types of data” is vague. Correct concept is that EMMA is not limited by the number of tracers to be used compared to mixing models developed before 1992 (e.g., two tracers for three end-members).
P2/L44: Try to avoid using such phrase as “our impression”. Cite references or summarize from other studies with references.
P2/L45: A procedure has been proposed (e.g., Liu et al., 2008). Performance of PCA is straightforward.
P2/L46: You have to elaborate more and justify this statement using examples.
P2/L49-50: Again, this is not true. Authors need to find all important references.
P2/L51-53: Incorrect or inaccurate statement! That is not what at least one of the cited groups meant.
P2/L55-56: This statement is not necessarily incorrect but misleading without giving adequate context (see my relevant comment in the summarized section above).
P2/L58-60: This statement challenges the backbone of EMMA, which is okay. The fundamental idea of EMMA is to use correlation matrix and a lower mixing dimension to identify the rank of data and to get rid of effects caused by chemical reaction, noises, and errors (e.g., analytical errors). We are mostly looking for major end-members and it appears that it is impossible, at least for now, to find all end-members. A catchment, no matter how small it is, may have myriad number of end-members.
P2/L61: You are not really evaluating EMMA, as you did not even follow the established procedure, which combines DTMM and EMMA-1992.
P2/L62: The second part of your goal was to try to enforce the original values of analytes into a mixing space, without a formal statistical procedure to identify/test conservative tracers and the number of end-members.
P3/L65: Some studies already went to catchments of 100s-000s km2 (e.g., Merced River). Again, please make a thorough search of references in EMMA.
P3/L71: Add some more recent developments and applications.
P3/L70-78: Missing an important step: validation of the EMMA model. If the recent references are considered, end-member distances are also ignored, which turns out to be very important in validating end-members.
P4/L90: Nothing wrong here but just a precaution: Any elevational effect of stable isotopic values? A relief of 540 m could significantly affect isotopic values in stream water and groundwater (e.g., 0.22 per mil per 100 m for 18O in Sierra Nevada; see Liu et al., 2024).
P4/L91-94: This approach would have to assume that the yield is constant over different elevational bands, which is usually not the case for snowmelt-dominated mountain system (see Rice et al., 2012; Liu et al., 2024). Practically, there is not an easy solution for the problem, but at least the error sources should be admitted and relevant references should be cited.
P5/L108: Briefly mention WY2019 and 2020 were not sampled and why (e.g., pandemic?). I know this was indicated in the figure, but such a statement is still needed in the text.
P6/L125-126: Not a valid statement but practically okay.
P6/L133: Just start with “Samples were analyzed …” and no need to mention collection as the collection was described in the above section.
P6/L135-136: All analytes were tested or used in EMMALAB? It was never mentioned throughout the manuscript. If trace elements were never used in the manuscript, do not even mention them.
P6/L138: Was “MS” a typo? OES instead?
P7/L151-159: Hooper's (2003) residual analysis is still the best as it enables a statistical test (e.g., p value). In recent developments (Liu et al., 2020; Porter et al., 2022), the variance explained by each tracer was further explored, which may need authors' attentions.
P7/L162-163: Based on what? Subjective!
P7/L163-165: Which ones are common conservative tracers? How to determine that? Solutes behave conservatively in one catchment do not mean they behave the same way in another catchment even in adjacent or nested catchments (e.g., Liu et al., 2013; 2017).
P7/L168-169: Residual analysis of Hooper (2003) can enable users to pick up tracers in a more objective way (e.g., based on p-value or percent of variance explained).
P7/L174: Specify why “standardized”? Cite Hooper (2003) here if the reason is the same as his.
P8/L181: Not an accurate or complete statement. Standardization not only treats each variable in the original data set with the same weight but also guarantees that PCA will be conducted with a correlation matrix rather than a covariance matrix.
P8/L189-190: It does not have to be this way. In fact, there may be advantages to use individual samples from each of potential end-members (see recent developments from Liu et al., 2020; Porter et al., 2022; Tshewang et al., 2024).
P9/L202-210: Different from Hooper (2003), the approach suggested here was to select end-members solely based on 3 or 4-dimentional mixing diagrams. If they appear to fit in the mixing diagrams, then they are eligible. This is to force raw concentrations of analytes into a mixing space. Instead, Hooper (2003) suggested to calculate end-member distances between S-space and U-space to make a statistical test and thus objectively determine the eligibility of potential end-members. Authors' approach in this study lacks any statistical tests.
P9/L215: Specify why three PCs?
P9/L218: Rz is a matrix with standardized values of the original data, not original data per se. The product of Rz and Cr has to be de-standardized.
P11/L260: This section, particularly the first half (before but including equation 9) could be significantly simplified if references were properly cited.
P11/L263-264: Who has proven this? You can argue about it, make it a hypothesis, and then test it.
P11/L266: I do not see who has used EMMA for sediment samples. Add references if you do. Also, it has been used in groundwater system (e.g., Hofmeister et al., 2022).
P12/L293-295: This statement is not fair and quite subjective instead. There were some publications researchers were trying to solve the issue, or at least, made some efforts to resolve the issue. I do not think the approach proposed in the current study is objective (see my comments on determination of end-members). In addition, we were most times looking for major end-members and discarding minor ones. I do not think we can ever account for all end-members using EMMA or any other tools.
P13/L326: Even for 5 end-members, only 3 PCs were used in your study?
P13/L342-344: This statement is ambiguous and how to justify the decision here?
P14/L364: This entire section was not properly constructed. Show your results before you convince readers and make recommendations.
P14/L365: This statement is not fair and not true. Diagnostic tools and mixing models were developed to determine conservative tracers and the number of end-members through a statistical test.
P14/L378-379: This effort is much appreciated. Before it becomes useful, however, fundamental concept and design must be corrected.
P15/L388: Inappropriate citation for the specific statement.
P15/L392: This is too casual and subjective.
P15/L395: The essence of Hooper (2003) was not to use bivariate plots for testing conservative tracers.
P15/L396: How to determine the number of end-members in EMMALAB?
P15/L399-400: This procedure is subjective and against Christophersen and Hooper (1992) who suggested that polygons (hulls) must be convex not concave.
P15/L403-404: That part may turn out to be the most useful if DTMM is strictly followed.
P15/L409-414: This is too subjective in evaluating the eligibility of end-members. Why not use end-member distances?
P16/L416: Problems of this section are in several folds: (1) rely on bivariate plots that authors themselves seem to believe are fundamentally flawed; (2) authors’ approach has no basis of statistical tests for evaluating solutes, too subjective; (3) ambiguity in using solute concentrations simulation to evaluate whether or not such a solute should be included; a poor simulation may be caused by poor characterization of end-members, not necessarily nonconservative behavior of this solute. Note that DTMM does not rely on any information from end-membersin determining the mixing space, which is certainly superior.
P16/L417-418: This statement sounds odd. It gives the impression that the number of end-members is a subjective decision.
P16/L419-420: Not true. Diagnostic tools of mixing models was developed to help determine conservative tracers, the number of end-members, and even the eligibility of end-members (via end-member distances).
P16/L421-422: This statement is not incorrect, but authors misinterpreted the essence of the diagnostic tools of mixing models by Hooper (2003).
P16/L422-424: Again, this statement reflects misunderstanding of the diagnostic tools of mixing models. Also, I do not think the statement is an accurate description of what the cited authors intended to say. Double check before citing them.
P16/L424-426: This issue can be handled by the diagnostic tools of mixing models. In fact, using bivariate plots to test conservative behavior of solutes could cause this issue instead.
P16/L426: Then, why did the authors of this study use bivariate plots? This statement is self-contradictory.
P16/L429: You have not shown readers any of your results yet, but how come you make recommendations? The entire section (Results and Discussion) appears to be constructed following this inappropriate philosophy.
P16/L433-434: Too subjective and lack of a statistical test.
P16/L434-435: What are the different processes? Specify!
P16/L435-437: Conservative tracers vary very much from catchment to catchment. Conclusion from one catchment may not be applicable to the other catchments.
P16/L438-439: How? Whatsoever, this kind of description should appear in method section, not here.
P17/L444: Why 0.8?
P17/L445: Which processes and how their values were affected (specify in the text and do not make caption too long)?
P17/L457: Valid only if the intercept is set to be zero. Also, a poor comparison may indicate a poor characterization of one or more end-members, but not necessarily poor performance of solutes (tracers).
P18/L482: Software is nothing if the approach is not statistically proven.
P18/L483: In the procedure of determining which solutes to include, have you not yet determined which end-members to use?
P18/L491-492: Subjective as you admitted so later (P18/L497)!
P19/L501: Subjective and lack of a statistical test.
P19/L504-507: If the number of end-members is known a priori, why should we stick to a particular number of PCs? We can use as many PCs as we want and have, given that we have an adequate number of solutes that are not highly correlated.
P20/L515: Concave polygons are not recommended by Christophersen and Hooper (1992). Before you use concave polygons, you have to demonstrate why it is statistically valid.
P20/L518: Were 85.7% from five or three PCs?
P21/L533-534: Not a valid argument! Your comparison is not on ample-to-ample basis. It is very simple. If you know there are five end-members a priori, you should use four PCs. Per your demonstration, if four PCs are used, their results were almost identical. How could this prove the PCA-based approach is problematic?
P22/Figure 2 (the upper panel): If two PCs were used, how would you determine the contributions from five end-members? The same question is applicable to the central panel.
P22/Figure 2 (the bottom panel): The difference decreases significantly with an increase in the number of PCs, which is not a surprise. Also, with such a small difference, is it worth the efforts?
P23/L541: When I see this phrase (“most accurate”), I am expecting to see how you proved it in the following sentences.
P25/L582: The evaporation effect can be examined using deuterium-excess.
P25/L583: Any evidence? You cannot just guess.
P27/L603-610: This approach totally relies on the actual samples collected. Generux's approach has a statistical basis.
P28/L618: This statement is out of the point.