Using large-scale tracer-aided models to constrain ecohydrological partitioning in complex, heavily managed lowland catchments

Zheng, Hanwu; Tetzlaff, Doerthe; Birkel, Christian; Wu, Songjun; Sauter, Tobias; Soulsby, Chris

doi:10.5194/egusphere-2025-2166

Preprints

https://doi.org/10.5194/egusphere-2025-2166

Preprints

05 Jun 2025

| 05 Jun 2025

Using large-scale tracer-aided models to constrain ecohydrological partitioning in complex, heavily managed lowland catchments

Hanwu Zheng, Doerthe Tetzlaff, Christian Birkel, Songjun Wu, Tobias Sauter, and Chris Soulsby

Abstract. Tracer-aided modelling (TAM) enhances ecohydrological process understanding, as stable water isotopes (ẟ¹⁸O and ẟ²H) can help constrain equifinality and provide complementary information beyond streamflow. Despite being primarily applied in rural (<100 km²) catchments with minimal disturbance, TAM may assess epistemic uncertainties from unrecorded human activities affecting streamflow, improving model reliability. This study investigated four sub-catchments (Berste, Wudritz, Vetschauer, and Dobra) in the heavily-managed Middle Spree River basin (ca. 2800 km²), in NE Germany, a strategically vital water resource supplying drinking water to Berlin, Germany’s capital, and sustaining agricultural and industrial demands. Detailed evaluation of ecohydrological water partitioning in this evapotranspiration (ET)-dominated region is complicated by heterogeneous land use, extensive hydraulic infrastructure and overall intensive management. We used the spatially distributed tracer-aided model STARR to simulate the effects of natural water storage-flux dynamics and management interventions on streamflow over a 6-year period. Seasonal isotope data used for calibration additionally to streamflow effectively captured subsurface runoff, with isotope fractionation intensity strongly linked to ET apportionment. This multi-criteria calibration helped reduce equifinality in complex systems with human-induced epistemic challenges. Epistemic errors were manifested as strong trade-offs between the information content of the different calibration constraints (i.e., streamflow and isotopes). Although compromised solutions occasionally failed to meet acceptable performance thresholds for both calibrated variables, such conflicts highlight potentially important mismatches in process representation. Our modelling framework shows the potential for informative insights from wider use of (even sparse) isotope data sets in tracer-aided modelling of complex, heavily managed catchments.

Received: 08 May 2025 – Discussion started: 05 Jun 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2101 KB)

Supplement (570 KB)

Download & links

Hanwu Zheng, Doerthe Tetzlaff, Christian Birkel, Songjun Wu, Tobias Sauter, and Chris Soulsby

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-2166', Anonymous Referee #1, 24 Jul 2025
General comments:
The study falls within the scope of HESS and is well written, with clear structure and fluent language. The quality of the figures is mixed and the methods used were insufficiently robust to provide any confidence in the generalizability of the results or conclusions. The study is broadly similar to several previous publications on multi-objective optimization using isotope tracers, and the new contribution, beyond replication of previous findings in a new location, is not yet clear. With revisions, this could be an excellent publication for HESS.
Specific comments:
I see three areas in need of substantial revision: study differentiation, calibration methodology and presentation of results.
The study looks quite similar to previous studies in other areas, some of which have not yet been referenced in the introduction or discussion; multi-objective optimizations using flow and isotopes have been coming out for many years, e.g.: (He et al., 2019; Holmes et al., 2023; Nan & Tian, 2024; Tafvizi et al., 2024; Tunaley et al., 2017). The novelty is currently unclear, and the authors should revise to highlight the specific aspects that are new (this will likely involve only minor changes to the text). Is it the study site (agricultural with substantial groundwater pumping) or the spatial discretization of the model? Or something else, perhaps relating to the analysis of the results?
A more fundamental issue with the present version is the methodology applied. Given the central importance of calibration to the study, the methods applied are not as robust and defensible as they ought to be for a publication. In particular:
The model was calibrated to optimize NSE. This metric has lost support as a calibration objective because as a squared error metric, it overemphasises peak flow timing, and leads to erroneously damped simulation variability (Gupta et al., 2009). Unsurprisingly, the presented model results had erroneously damped variability (low flows too high, high flows too low). Further, for sparse datasets (like the isotope series here) it is highly sensitive to individual points, as noted in the text. Why was this metric used in spite of its well-known deficiencies?

There was no validation or clear evaluation of the model. Shen et al. (2022) was referenced to justify this omission, but this does not excuse the absence of some other method than split-sample validation to test the calibrated models. There is currently no clear evidence that the final models are at all reliable and not just overfit to the calibration data. This might be corrected by using satellite or other data to justify the ‘trustworthiness’ of the models but it should be an explicit evaluation.

It seems only a single calibration trial was performed for each objective type. The final calibrated models will vary depending on the initial population for the genetic algorithm, and on the random seed used in mutating new solutions. It is therefore important to run several independent calibration trials for each objective, as a single trial may be an outlier or fail to generate solutions near the ‘true’ Pareto front (i.e., solutions that are actually as good as the model can do). Without multiple independent calibration trials, it remains possible and plausible that the poor quality solutions for Berste were simply a fluke.

The presentation of the results would benefit greatly from revision in a few areas. In no particular order:
The presented time-series results have only the extreme end points of the pareto front, not the ‘compromise’ solutions, basically throwing out the ‘multi-objectiveness’ in favor of one simulation or the other. Why show only outliers?

Figure 4 is mislabeled as showing the Pareto fronts, but it actual has both dominated and non-dominated solutions from the calibration. Either the figure or label needs to change.

Labeling can be challenging to decipher. For example, subfigure 7 c2 is apparently ‘BSI in schemes 2-5 for wet year of 2023’ while figure 8 c2 is ‘Vetschauer compromised solution in scheme 2-5’ (I don’t know which compromise solution, just that it is one). Some figures are quite reader-friendly (Figure 5 and 6 for example can be followed without taxing decoding). However, I was quite unable to read the alphabet soup of Table 4 even after writing out a ‘key’ on scrap paper to track the 4 item deep ‘respectively’ label linking processes to letters (I think at least one comma is missing from the list).

Returning to the mysterious compromised solution, the actual solution is not defined, only that it comes from the ‘middle part of the Pareto front’. Is it the optimal solution when equal weight is given to the flow and isotope KGE or was it just sort of eyeballed?

A final, minor, point: it was frustrating to be told about finicky model details like roughness coefficient values without knowing any of the model basics, which were relegated to the supplement. Certainly, detailed model descriptions are out of scope but it would be lovely to at least have a couple sentences so the reader knows how many soil layers there are or if there is lateral groundwater flow between cells without hunting down a separate document.
Technical corrections:
The precipitation isotope input is referenced as coming from Bowen et al. (2003) which covers annual averages, but the inputs seem to be the monthly average estimates. The monthly estimation method comes from the subsequent 2005 paper (Bowen G. J., Wassenaar L. I. and Hobson K. A. (2005) Global application of stable hydrogen and oxygen isotopes to wildlife forensics. Oecologia 143, 337-348, doi:10.1007/s00442-004-1813-y.).
References:
Gupta, H. V., Kling, H., Yilmaz, K. K., & Martinez, G. F. (2009). Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of Hydrology, 377(1–2), 80–91. https://doi.org/10.1016/j.jhydrol.2009.08.003
He, Z., Unger-Shayesteh, K., Vorogushyn, S., Weise, S. M., Kalashnikova, O., Gafurov, A., Duethmann, D., Barandun, M., & Merz, B. (2019). Constraining hydrological model parameters using water isotopic compositions in a glacierized basin, Central Asia. Journal of Hydrology, 571, 332–348. https://doi.org/10.1016/j.jhydrol.2019.01.048
Holmes, T. L., Stadnyk, T. A., Asadzadeh, M., & Gibson, J. J. (2023). Guidance on large scale hydrologic model calibration with isotope tracers. Journal of Hydrology, 621. https://doi.org/10.1016/j.jhydrol.2023.129604
Nan, Y., & Tian, F. (2024). Isotope data-constrained hydrological model improves soil moisture simulation and runoff source apportionment. Journal of Hydrology, 633. https://doi.org/10.1016/j.jhydrol.2024.131006
Tafvizi, A., James, A. L., Holmes, T., Stadnyk, T., Yao, H., & Ramcharan, C. (2024). Evaluating the significance of wetland representation in isotope-enabled distributed hydrologic modeling in mesoscale Precambrian shield watersheds. Journal of Hydrology, 637, 131377. https://doi.org/10.1016/j.jhydrol.2024.131377
Tunaley, C., Tetzlaff, D., Birkel, C., & Soulsby, C. (2017). Using high-resolution isotope data and alternative calibration strategies for a tracer-aided runoff model in a nested catchment. Hydrological Processes, 31(22), 3962–3978. https://doi.org/10.1002/hyp.11313
Citation: https://doi.org/10.5194/egusphere-2025-2166-RC1
- AC1: 'Reply on RC1', Hanwu Zheng, 26 Aug 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-2166/egusphere-2025-2166-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-2166-AC1
RC2:
'Comment on egusphere-2025-2166', Anonymous Referee #2, 15 Sep 2025
This manuscript applies a large-scale tracer-aided modeling (TAM) approach to disentangle ecohydrological processes in the heavily managed Middle Spree catchment (MSC), Germany, an evapotranspiration-dominated region facing strong anthropogenic pressures. By integrating stable water isotopes (δ¹⁸O and δ²H) with streamflow into the distributed STARR model and calibrating with a multi-objective NSGA-II algorithm, the study evaluates runoff generation, groundwater contributions, and evapotranspiration (ET) partitioning across four sub-catchments (Berste, Wudritz, Vetschauer, Dobra). The key contribution lies in showing how streamflow–isotope trade-offs emerge as diagnostic signals of epistemic errors from unrecorded human impacts, such as irrigation or mining legacies. While isotope inclusion sometimes reduced discharge simulation performance, it significantly improved process representation such as subsurface mixing. Overall, the study demonstrates that even sparse seasonal isotope datasets can provide critical constraints in TAM for complex, human-altered hydrological systems, offering new insights into ecohydrological partitioning and informing future water management under anthropogenic and climatic pressures. From a reader’s perspective not deeply familiar with isotope tracer methods, I have several comments and suggestions for clarification.

Points for the Authors to Consider
1.Clarifying the added value of isotopes
The added value of incorporating isotopes over other hydrological variables remains somewhat unclear. For instance, while the introduction emphasizes human influences, isotope integration did not appear to improve the model’s ability to capture these anthropogenic effects, which raises questions about the practical contribution of isotopes in this context. How would the results compare if ET data were used in a multi-objective calibration of the STARR model? Could the process descriptions be refined to more clearly illustrate the unique role isotopes play relative to other potential data sources?
2.Improving figure clarity and linkage to discussion

Figures 5–8 combine multiple dimensions (temporal, spatial, and calibration metrics), making them information-rich but sometimes challenging to interpret. The figure captions and related explanations in the text could more directly highlight the core message of each figure. Including a short statement of motivation or the specific hypothesis addressed by each figure would help guide readers and improve accessibility. Moreover, because the figures are complex and the key messages are not always clearly highlighted, the subsequent discussion section becomes less convincing. Readers may find it difficult to fully trust the discussion, as the results and the interpretations are not always tightly aligned. Strengthening the clarity of figures and explicitly linking their core findings to the corresponding discussion points would improve the manuscript’s overall persuasiveness.

Specific Comments
Lines 127 and 140: Please clarify the meaning of SE and m.a.s.l.

Lines 240–243: Rainfall inputs are provided at daily resolution, whereas precipitation isotope inputs are monthly. How does this temporal inconsistency affect the results, and is this assumption reasonable?

Lines 249–251: Although a citation is provided, the manuscript would benefit from more detail on the isotope observations. Were these instantaneous grab samples, or integrated/accumulated values?

Table 3 (Scheme 1): Please clarify whether the calibration was performed jointly across all basins, or if each basin was calibrated independently.

Figure 3: Why are only δ²H time series presented, while δ¹⁸O observations and simulations are not shown? It would also help readers unfamiliar with isotope applications if key concepts such as LMWL and VSMOW were briefly explained.

Figure 4: KGE is used for isotopes and NSE for streamflow. Why not use the same performance metric for both, to improve comparability?

Table 4: The description of Table 4 appears in the first paragraph of the Results, though the table is first referenced in Section 3.2.2. Consider relocating the description for consistency.
Citation: https://doi.org/10.5194/egusphere-2025-2166-RC2
- AC2: 'Reply on RC2', Hanwu Zheng, 22 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-2166/egusphere-2025-2166-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-2166-AC2
RC3:
'Comment on egusphere-2025-2166', Anonymous Referee #3, 30 Sep 2025

This study addresses an important field in ecohydrological analyses, namely the explicit modelling of tracers to better understand (eco)hydrological systems. In the case of this study, the focus lies on the modelling of water stable isotopic signatures in rural catchments where the (eco)hydrological dynamics are heavily affected by human activities. Not only are the dynamics in the studied area affected by human activities today, but the areas were subject to heavy mining in before the 1990s, and the subsurface hydrology is thus highly altered. The study thus presents the very important advance in (and analysis of) the explicit modelling of water stable isotopes in complex watersheds impacted by human activities, moving away from the focus on process representation in mostly natural and remote systems. The main outcome or the central draw of the study lies in fact not in the perfect model representation of all human-induced changes to the system, but instead in the identification of the unknown and from a model-perspective structurally unrepresented processes and dynamics. These abesence of these prior-to-modelling unknown processes and dynamics in the model structure are described to become evident in the mismatch between the water stable isotopic simulations and the actual isotopic data.

On the upside, the study reads very well, with some exceptions the sites and data are nicely presented, the figures are clean and the results, discussion and conclusions are written in clear manner. I have some minor recommendations for the improvement of the text, particularly the abstract which could improve in clarity about the achieved results. I also find that some sentences in the introduction of the study and the study sites are a bit unclear. I do miss some broader discussion and literature outside of the grey box-type rainfall-runoff modelling domain, especially when it comes to understanding the worth of tracers for physically-based models and to using fully integrated or fully explicit physically based models to identify structural deficits in models by comparing them against tracers. Outside of the grey-box type rainfall-runoff modelling domain, many studies have looked at the worth of tracers for tracer aided modelling. Be this by postprocessing tracer data to become comparable to standard model outputs, or by semi-explicitly or fully-explicitly simulating tracer processes in physically based models. The insights gained from these exercises have helped in understanding the information content of different types of tracers, improving model predictions and in identifying model structural problems. I suggest adding some more references to the many other studies that our there, and I provide a reference to a review that has summarized the findings from many studies up to the year 2019.

On the downside, in terms of methodology, I do have critical concerns regarding the model-data interaction and the validity of the conclusions:
For the forcing of the isotope component of the model, a global model was used to define the monthly constant input signal. Subsequently, the model was calibrated against two types of data, namely seasonal stable water isotope measurements per catchment ((hence 4 per year, for 3 years = only 12 datapoints per subcatchment) and daily streamflow observations from discharge stations of the subcatchments. During calibration, 35 different model parameters were inversely identified.
This entire onset and procedure raises several questions that are critical for the interpretation of the results. First of all, neither the discharge gauging stations nor the locations of the stable water isotope measurements are indicated in figure 1. I assume that the measurements were taken at the outlet of the subcatchments, but this is just a guess. Please indicate the locations of the measurements.
Subsequently, it is unclear what the 4 stable water isotope datapoints per year represent. Are these simple grab samples? Were they taken after rainfall events or do they represent pure baseflow? Or are these cumulative samples taken over the course of a season? It is not enough to say that the sampling procedures can be read elsewhere, because there are huge implications for the model calibration (and interpretation) from what the samples represent.
Beyond the fact that it is mostly unclear what these tracer datapoints actually represent, forcing a model with some global model-derived isotope product instead of locally sampled or robustly characterized rainfall input signals introduces a major bias into the model which even by calibration may not be resolved, and which could cause some or evan all of the biases that the authors associate to the absence of some human-/land-use-/infrastructure-related model structural deficits. I personally seriously doubt that it is possible to differentiate between the origin of the biases with such a "minimalistic" dataset relative to the large complexity of the modelled systems, especially if one is using a lumped parameter or grey-box modelling approach. Of course, it can be shown that even a little bit of tracer data can improve model calibration, but that is not new and has been looked at in countless studies and synthesized in extensive detail in multiple review papers on the matter. Moreover, this relatively minimalistic tracer dataset with respect to the complexity of the studied system and the model structure was used to calibrate an entirety of 35 model parameters. Yes, daily streamflow data was alos considerd, but as was already introduced in the introduction by the authors themselves, these data are extremely ambiguous with respect to identifying correct parameter values in such catchment scale surface-subsurface hydrological models, even if of the lumped parameter type. There is simply no way that this dataset contains sufficient information to constrain so many model parameters - a fact that was also introduced by the authors in the introduction via references to the "right answers for the wrong reasons". Yes, the calibration aimed at pareto front identification, but even if the objective function and calibration approach is tailored to this situation, the lack of information in both the observation data as well as the forcing functions cannot be overcome. I may have missed something important in the study, but how I understand it at the moment, unfortunately, I am not convinced that the present approach can overcome this data scarcity problem to a degree that the insights gained from the study with respect to model structural deficits are unbiased enough to enable the detection of missing information on human infrastructure and alterations to the system. Or even allow a rating of the representativeness of ET partitioning, soil and baseflow processes. Many unresolved problems could simply, and do most likely, stem from inappropriate stable isotope forcing functions, too little tracer data for calibration, and too many parameters featuring into the calibration objective function. In other words, if your forcing/input function is sufficiently wrong, you will never be able to match both stable isotope records in streamflow as well as streamflow volumes against the same combined dataset. And if there is so little data used to calibrate so many parameters, then if one would be able to match both types of observations simultaneously (isotopes and discharge), there is zero guarantee that 35 parameters that were calibrated do not overcompensate for structural model problems. Ok, this latter version of the same problem did not manifest, but the first version of this problem did, and I don't see any convincing arguments that would tell me that the problem of the mismatch lies in structural model deficits from unknown human alterations and not from a problem in the isotope forcing function.

Ultimately, unless the authors present some additional hard data that support the claims on model validity, and unless the possible biases from model forcings, limited information content of the scarce tracer data, and the use of a grey-box model, are discussed and can convincingly be dismissed, I unfortunately can't support the manuscript for publication in HESS.

Specific comments
abstract: The abstract should provide the reader with information about the type of analysis that was done, but also for what this type of analysis can be used specifically. The first part is ticked off by the existing abstract, but the second part not so well, as the author's don't provide any clear examples of what kind of epistemic errors may found with their approach. This is because the section on the epistemic errors in the abstract reads very general, and it is difficult to infer what exactly the author's mean by "epistemic errors manifested as strong trade-offs between the information content..." The next sentences remain similarly unclear as to which kind of epistemic error, or which specific source for it, could be a likely cause of the "trade offs in information content"". It is alluded to that the model can help to identify the sources of these errors, ("potential for informative insights"), even when one only has sparse isotopic data to complement streamflow. But the exact use of the approach remains unclear. Here I would strongly suggest to provide one or two examples of which kind of sources for epistemic erros can be identified, and have been identified in this study.
l49: "non stationary climate inputs": what is meant by this? the "climate" usually is a longer term phenomenon, i.e. one assessed over a 30-year period conventionally. I think here something else is meant than a varying climate, namely the inter-annual variation, and therefore not a climate signal?
l66-67: A large number of studies has looked at the benefit of tracers for model calibration, some have even quantified the information content. An extensive review on this has been published in 2019, but article is not in the list here.
Schilling, O. S., Cook, P. G., & Brunner, P. (2019). Beyond classical observations in hydrogeology: The advantages of including exchange flux, temperature, tracer concentration, residence time and soil moisture observations in groundwater model calibration. Rev. Geophys., 57(1), 146-182. https://doi.org/10.1029/2018RG000619

l167f: this sentence is unclear to me. "...the decline of pumped sump water volumes has been faster than the replenishment of the groundwater deficit". What do you mean exactly by "sump water", and do you want to say the reduction groundwater abstraction was faster than the groundwater recharge, i.e. the recovery of the water table didn't happen as quickly as stopping in abstracting groundwater? It seems to be quite a complicated way to say something that isn't so complicated. Could you reformulate to make it clearer?
l300: "and isotope." seems unfinished
Discussion: The discussion is written as if the authors know which model performs best for soil water storage and flow as well as groundwater recharge, storage and flow. However, no comparison between actual data and these simulated components are made, and the entire discussion is based on high level observations and assumptions about the catchment's functioning and the assumption that the calibration approach and information contained in tracers would allow these insights to be gained. But as critically mentioned above, unless I see hard data on the validity of the isotope input function and the soil and groundwater components, I am convinced that the available data is not sufficient to derive the conclusions that are discussed in the discussion section. In the entire discussion, the lack of information on the true stable isotope input signals as well as the possible minimal information content of the stable isotope measurements from the 4 seasonal streamflow samples remains unmentioned. Instead, it is repeatedly claimed that the information content of stable isotopes is very high, and these assumptions are supposedly supported by information on soil water storage overestimation, correct ET partitioning and underestimation of baseflow etc. However, as stated previously, no hard data on all these processes are used to compare to the model outputs, and therefore all these claims remain relatively unsupported.

Citation: https://doi.org/10.5194/egusphere-2025-2166-RC3
- AC3: 'Reply on RC3', Hanwu Zheng, 10 Oct 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-2166/egusphere-2025-2166-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-2166-AC3

Hanwu Zheng, Doerthe Tetzlaff, Christian Birkel, Songjun Wu, Tobias Sauter, and Chris Soulsby

Supplement

https://doi.org/10.5194/egusphere-2025-2166-supplement

Hanwu Zheng, Doerthe Tetzlaff, Christian Birkel, Songjun Wu, Tobias Sauter, and Chris Soulsby

Viewed

Total article views: 914 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
789	98	27	914	52	17	33

HTML: 789
PDF: 98
XML: 27
Total: 914
Supplement: 52
BibTeX: 17
EndNote: 33

Views and downloads (calculated since 05 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	103	20	7	130
Jul 2025	61	10	2	73
Aug 2025	118	10	3	131
Sep 2025	404	18	6	428
Oct 2025	67	20	6	93
Nov 2025	34	18	3	55
Dec 2025	2	2	0	4

Cumulative views and downloads (calculated since 05 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	103	20	7	130
Jul 2025	61	10	2	73
Aug 2025	118	10	3	131
Sep 2025	404	18	6	428
Oct 2025	67	20	6	93
Nov 2025	34	18	3	55
Dec 2025	2	2	0	4

Viewed (geographical distribution)

Total article views: 912 (including HTML, PDF, and XML) Thereof 912 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 06 Dec 2025

Short summary

Ecohydrological processes in heavily managed catchments are often incorrectly represented in models. We applied a tracer-aided model STARR in an ET-dominated region (the Middle Spree, NE Germany) with major management impacts. Water isotopes were useful in identifying runoff contributions and partitioning ET even at sparse resolution. Trade-offs between discharge- and isotope-based calibrations could be partially mitigated by integrating more process-based conceptualizations into the model.


Total:	0
HTML:	0
PDF:	0
XML:	0