the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Controls on Steppe River Metabolism Vary by Scale and Network Location
Abstract. We explore geographic scaling of metabolism estimates derived from dissolved oxygen measurements with data from 75 sites in three corresponding ecoregions across the temperate steppes of Mongolia and the United States. We used a nested analysis with descending spatial scales (country, ecoregion, river basin, upper or lower watershed, and wide or constrained valleys) to assess spatial heterogeneity. We then linked estimates of metabolism with reach-to-watershed-scale metrics representing geomorphology, topography, climate, and anthropogenic activity to provide possible explanations for spatial scaling dependencies. We evaluated gross primary production (GPP) and ecosystem respiration (ER) at in-situ water temperature and after standardizing them to 20 °C (GPP20 and ER20). There was no significant effect of scaling on ER, and river basin explained only modest variation in GPP. In contrast, GPP20 varied significantly with ecoregion, river basin, basin position (upper vs. lower), and valley morphology (constrained vs. wide). ER20 had no significant spatial predictors. Best regression models for GPP included positive relationships with water velocity and median basin slope and for ER included mean basin air temperature, percentage of urban land use in the basin, and GPP. Best subset regression models for GPP20 included depth, water velocity, and basin slope and for ER20 included depth and mean basin air temperature. The proportion of the watershed in urban or cropland was explanatory of ER, but not GPP. We conclude that fundamental components of ecosystem metabolism respond to different watershed scales and to distinct environmental controls. Thus, macrosystem-scale studies require multi-scale assessment to predict and capture variation in aquatic metabolism. This suggests that universal models of river metabolism are unlikely to perform as well as models built to match the specific scale of inquiry or management.
- Preprint
(965 KB) - Metadata XML
-
Supplement
(429 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2026-555', Anonymous Referee #1, 31 Mar 2026
-
AC1: 'Reply on RC1', Walter Dodds, 11 May 2026
Reviewer 1
This is a timely and interesting paper examining the controls on stream metabolism across a large range of spatial scales for dry, temperate rivers in the USA and Mongolia. The Mongolian data - and the comparisons with the US data - are most important given the near lack of such information on streams from this region. I enjoyed reading it.
Thank you!
The paper is well-written and the methodology and key findings are clearly explained. The figures are appropriate for illustrating the related points in the text, and presuming they are reproduced in the original colors, visually acceptable.
Thanks again.
I do have several comments and suggestions they authors may wish to consider when revising the text. These are generally in order of appearance in the paper rather than in order of importance.
Appreciate the careful review! Our replies are in italics.
In the Introduction, it may be worth adding that this methodology may be used at these - or similar - sites to assess how climate change effects may temper the regional controls on metabolism, especially noting the later comment on the role of GPP as an effective indicator of stream health.
We will append the last sentence of the first paragraph of the introduction to implement this suggestion “Understanding scaling of processes that control river metabolism, and how those factors vary with location, will assist our knowledge of river ecosystem functions across broad climatic gradients (Dodds et al., 2018) and potentially assist in understanding region-specific effects of global climate change.
I am not familiar at all with rivers in Mongolia. Are they clear water or turbid? I'm presuming the USA sites are clear water - is that correct? In particular, is water clarity sufficient for PAR to reach the benthos in the thalweg, or just in littoral zones? Although stream metabolism doesn't distinguish benthic and pelagic metabolism, the importance of benthic production can be inferred from knowing whether water depth is less or greater than euphotic depth.
Depth was not a significant predictor of either GPP or GPP20. We now discuss effects on metabolic balance more fully based on this point. We did not have turbidity or light penetration measurements, but did collect substantial algal-dominated biofilms at most of them.
Arsenault, et al., 2022. Intercontinental analysis of temperate steppe stream food webs reveals consistent autochthonous support of fishes. Ecology Letters 25:2624–2636.
Line 163: I would like to see some commentary on the limitations and possible data artifacts that may affect subsequent modelling and meta-analysis arising from the very short (1 - 7 day) deployment at each site. I totally understand the logistical constraints. How is a very overcast day, or consecutive days, considered for example. This will result in GPP suppressed well below 'normal' behavior at this site. Is this overcome to some extent simply by the number of sites examined?
Luckily, many of these sites were relatively sunny (ranging from steppe to desert biomes) and we were generally at baseflow (we simply could not go out and work when it rained as the roads could become unpassable, most of the roads in Mongolia required 4WD even when dry). However, we now will mention these potential limitations in the discussion. We will add the following at the end of section 4.0 “We also acknowledge that our relatively short deployments at each site made it such that our measurements could be unrepresentative of longer-term average conditions at each site. For example, cloudy days, high discharge or turbidity from upstream precipitation, or variations in temperature could alter rates from day to day. However, the inability to safely access sites in rainy weather and the dry habitats we were working in meant that we generally had sunny sites and they were near baseflow. Hopefully our high number of sites and criteria for usable model fits overcame these limitations.”
Line 175: What were these metrics? (i.e. what criteria?)
Now we detail the output parameter: “the metrics of fit associated with the model (posterior predictive fit, (Grace et al., 2015) )” We also expanded our discussion of the output parameters and specific criteria we used. See response to reviewer 2.
Line 184: What is "unreasonably high"? Vague
We replaced this with “Some of our sites had unusual values for GPP and GPP/ER. When we removed sites where 1) we could not reliably co-locate with geomorphic and watershed data 2), GPP > 12 g O2 m-2 day-1 ,3 sites and 3) K was 12 > day-1, 3 sites. “
Line 217: Nutrients are discussed later but I would have liked to see some data on concentrations of bio-available N and P as well as DOC in the rivers, as factors potentially directly affecting GPP and ER. Is there much of a difference across the various spatial scales? Even spot measurements during the deployments would provide some insight. These can be related to, but not inferred from, land use for example.
Yes, this would have been nice, but for at least half the sites (Mongolia) we had no good way to collect, preserve, and transport low-nutrient samples. The facilities were not available for low nutrient analyses in Mongolia and our field campaigns involved a month of camping in remote areas each of three summers with uncertain refrigeration or even supplies of ice. In addition, bio-available N and P concentrations are not indicative of supply rates. (Dodds, W. K. 2003. Misuse of inorganic N and soluble reactive P concentrations to indicate nutrient status of surface waters. Journal of the North American Benthological Society 22:171–181.)
Line 252: How many sites are there with urban influence? How is this defined? Is it the same definition in both countries? Urbanization is not listed amongst the factors in lines 246-249. Is that just an oversight?
We forgot to include percentage urban in the watershed in the methods, section 2.4. It is now listed in the last sentence of that section.
Line 267: Was the significant water depth effect related to turbidity and euphotic depth, as noted above?
No, it was a negative effect on ER and a positive effect on GPP/ER. We now point this out in the abstract and the main text.
Line 278: 'strong' meaning here?
Appeared in most of the top 10 models, we just rearranged the sentence to state that and left the word strong out.
Line 329: Suggest citing the finding of Hall (2013) that a substantial fraction (estimate of 44% on average) of newly created organic matter is respired before it is taken up by higher trophic levels. Freshwater Science, 32(2): 507-516 (2013)
We added this sentence “A cross site analysis factored out GPP-associated respiration and found an average of 44% of the remaining respiration is driven by heterotrophs across 20 streams (Hall and Beaulieu, 2013)”
Line 350: Are the constrained valley effects (partly) through topographic shading?
Probably not, Figure 5 indicates constrained valleys did not influence the metabolic parameters according to the ANOVAs.
Line 378: Turbidity again. "...for metabolism in drier temperate areas, results from one country can likely be applied to another". I'm not convinced at all about this. My thinking - and I'm very happy to be convinced otherwise - is that parts of Central Asia, southwest USA, Australia, southern Africa etc where light penetration into a turbid water column is a major constraint on GPP may disrupt this universality of behavior.
But depth was not a factor for GPP, so it is probably not this. Also, “country” was not important in the hierarchical ANOVAS, which argues against this point as well, but Ecoregion was.
Citation: https://doi.org/10.5194/egusphere-2026-555-AC1
-
AC1: 'Reply on RC1', Walter Dodds, 11 May 2026
-
RC2: 'Comment on egusphere-2026-555', Anonymous Referee #2, 16 Apr 2026
This study explored the spatial scales at which stream metabolism and its hypothesized drivers vary, ranging from sub-watershed scales up to countries. Through a combination of nested ANOVAs and regressions of metabolism against possible predictors, the authors found that GPP varied at the river basin scale while GPP/ER varied at multiple scales and ER had no spatial predictors, highlighting air temperature, land use, and geomorphic characteristics as possible explanations for these spatial patterns. This study fills important knowledge gaps in our understanding of metabolism, both in terms of spatial scaling and geographic coverage (and the amount of field and metabolism modeling work that went into this is no small feat!). My feelings are generally quite positive about this paper. There are a couple discrepancies in how results are being reported in different sections of the paper, and I’d like to see a bit more detail on some of the methods (all explained in specific comments below). Once those things are sufficiently addressed, I look forward to seeing these findings make it out into the world!
Specific comments:
Title: “Vary by scale and network location” comes across as redundant, as network location is one of the scales in question.
Abstract:
Line 28: “GPP varied significantly with ecoregion, river basin…” – In the results, GPP and GPP20 only vary by basin. Is this sentence is meant to describe the GPP/ER results? This also made me realize GPP/ER is never mentioned in the abstract even though a fair bit of the results and discussion centers on this ratio.
Introduction:
Line 57 (in the context of Figure 1): “given the wide variety of conditions that may occur across biomes” – Could this be worded more precisely to better introduce the predictor variables you end up including in your models? The first clear mention of climate and land use as potential predictors of metabolism is in the description of what you did in this study in line 80- they weren’t explicitly mentioned anywhere beforehand.
Lines 70-72: “A major challenge in comparing studies… Schechner et al. 2021” – This idea seems like it belongs elsewhere, maybe in the methods and/or the second intro paragraph on why scaling has been challenging. As much as I love any opportunity to highlight that study, putting it here breaks up the flow of ideas. The premise of using consistent methodology could still be kept here, just stated more simply.
Lines 82-84: “While many metabolic measurements are available for the US, metabolic rates for Mongolia’s rivers have not, to our knowledge, been quantified prior to our work.” – First, while I think this is an important contribution to emphasize, it disrupts the flow of ideas a bit here. Maybe it could be moved earlier in the paragraph? Second, the wording is slightly misleading since your group has published metabolism data from some of the Mongolian sites at least twice already (Schechner et al. 2021, Tromboni et al. 2022). Maybe rephrase as something like “rates have not been extensively quantified,” while citing previous works?
Methods:
Line 100: “A detailed description of the FPZ delineation methodology… provided previously” – Could you summarize the general idea of the methodology in a sentence here, then say that the specifics can be found elsewhere? That would make it easier for the reader to conceptualize what information is encompassed in each FPZ.
Figure 2: Do you have site photos (or photos from the general regions) you could include alongside the map to give a sense of what these different ecoregions look like? Perhaps you could include representative photos of the three US regions along one side of the map and the three Mongolian ecoregions along the other side. That would help give a sense of how comparable we might expect ecoregions to be across countries.
Line 138: “Long-term (1977-2000) climate data” – Do we have reason to believe that air temperature from 26+ years ago would be predictive of current stream metabolism, given the reality of climate change? While I acknowledge that this is a fairly recent data product, I would still like to see more justification for the inclusion of historic rather than present air temperature as a predictor, as well as more transparency in places like Figure 1 about the fact that this is historic temperature.
Line 163: “Deployments ranged from 24 hours to 1 week” – Were all the deployments at a similar time of year, and were they all within a single year? It would be helpful to know that to assess whether any of the spatial patterns could be biased by time of sampling.
Line 176: “We re-ran sites where we were unable to get a good model fit… with the Riley and Dodds 2012 model.” – A few questions here. 1) Can you add a brief explanation of why the Riley and Dodds model is better able to estimate metabolism for sites with high k? 2) If there is an alternate model that you think performs better than BASE for the more challenging sites, why not just run all the sites with that model? Presumably if the model works well for challenging sites, then it would also work for less problematic sites. 3) How do model outputs from BASE and the Riley and Dodds model compare at the less problematic sites? It would be helpful to see a comparison of metabolism estimates from the two models to know if a comparison of data from different models is valid. 4) How many sites were run with the Riley and Dodds model versus BASE? Similar to my last question about ruling out sampling bias, I’m curious if the sites that required an alternate model were concentrated in a particular spatial grouping (e.g., if they were all upper watershed sites).
Line 182: “This left us with 75 sites out of 89 where we were able to model metabolism successfully” – Could you include a table in the supplement that has more of the specifics of the QAQC metrics you used and how many sites passed each metric? While I don’t think it needs to be in the main text, there should still be some record of what thresholds you’re accepting for things like ER vs k correlation, r-hat (assuming this is what you mean by “metrics of fit associated with the model”), modeled vs observed correlation, etc. so we can track consistency of QAQC practices across studies. And on a positive note, kudos on getting that many sites to behave!
Line 205: “separated by upstream and downstream sites followed by a finer FPZ scale” – The wording here and in the next sentence makes it sound like there is an additional level of organization in between upstream/downstream and wide/constrained.
Results:
Section 3.1: BASE estimates metabolism on a per volume basis, but GPP and ER are reported in areal units throughout your results. Either correct the units throughout the results, or explain in the metabolism methods how you converted to units of g O2/m2/d.
Line 225 and Figure 4: “…GPP varied marginally significantly, only by river basin (Figure 4)” – It wasn’t immediately intuitive that Figure 4 was related to this sentence, as there is no mention of river basin in Figure 4. On second glance it makes sense that it’s showing no significant difference across the broader spatial scales. However, if all the figure is showing (in the context of the ANOVA results) is that GPP20 didn’t vary at the scale of country or ecoregion, it makes me wonder why there isn’t a similar plot of ER20. Could Figure 4 instead be a two-panel figure that’s a side by side of GPP20 and ER20?
Line 263: “We explored the top explanatory variables… using ANOVAs” – Explanation of this analysis should be included at the end of section 2.4 rather than introducing it here. Also, statistics are only reported for the percent urban and percent cropland models in the supplement- I’d expect to see results for the other variables reported as well.
Discussion:
Line 295: “We analyzed … scaling factors and continuous predictors separately because…” – This should be explained in the methods rather than the discussion.
Line 356: “We explained a modest amount of the variation in metabolism” – I think this understates the fact that the adjusted r2 values in the linear models were all on the order of 0.03-0.1. Those r2 values suggest to me that there is a lot of variation in metabolism still not being captured, so I think some disclaimer along the lines of “there is still work to do to be able to more reliably predict metabolism” should be included.
Line 377: “ecoregions significantly influenced GPP” – in the results, GPP only significantly varied at the river basin scale.
Line 379: “Climate drives ecoregion characteristics (e.g., riparian canopy) – This comes across as a little contradictory with section 4.3, where it was suggested that local riparian cover might not be the most important control of GPP.
Line 384: “grazing in Mongolia… effects of cropland” – More of a curiosity question than a request to edit: do you have a sense of whether/how the designation of land as “cropland” in Mongolia accounts for nomadic herding practices? I’m wondering if it might be more challenging to pin down the effects of land use in Mongolia when one of the land use pressures involves roaming herds of livestock as opposed to, say, farms tied to specific locations (although perhaps the herding locations are more consistent than I’m aware of).
Conclusions:
Line 409-410: “respiration was related to long-term climate conditions… ER may be more sensitive to global warming” – Consider revisiting this conclusion depending on your response to my previous comment on use of historic temperature data.
Supplement:
Tables S4, S6, S14, S15: These tables don’t have the same note as the other ANOVAs about “we ran 4 tests… p<0.0125 for significance” as the GPP and ER tables. Were the models constructed differently, or was the note just forgotten?
Technical corrections:
Introduction:
Figure 1: Change the largest scale from “continent” to “country” to match how it’s discussed throughout the rest of the text.
Methods:
Line 130: Clearer if worded as “…hydrologic and geomorphic measurements in reaches based on…”?
Line 152: Clearer if worded as 10% and 5% of reach length rather than 0.1 and 0.05 of reach length?
Line 168: “animal disturbance” – Thank you for the beautiful mental image of goats munching on PAR sensors. (Purely speculation on my part, but it seems like something they would do!)
Results:
Line 228: “found in figure S1” – The data distributions are currently figure S2 in the supplement (but should be S1 if I’m reading correctly).
Line 231: Include O2 in units for rates.
Figures 3 and 4: Check units on axes- assuming these should be g O2 instead of mg O2?
Line 273: “…also matched at the upper-lower and percentage cropland scales” – should this be the ecoregion scale?
Discussion:
Line 283: Should be Figure S2 instead of S1? Check order here and in the supplement.
Line 287: “five of 76 river sites” – Should this be 75?
Supplement:
Table S7: “marked correlations are significant” – I’m not seeing any markings? Double-check that this table is displaying how you intended it to.
Citation: https://doi.org/10.5194/egusphere-2026-555-RC2 -
AC2: 'Reply on RC2', Walter Dodds, 11 May 2026
Reviewer 2:
This study explored the spatial scales at which stream metabolism and its hypothesized drivers vary, ranging from sub-watershed scales up to countries. Through a combination of nested ANOVAs and regressions of metabolism against possible predictors, the authors found that GPP varied at the river basin scale while GPP/ER varied at multiple scales and ER had no spatial predictors, highlighting air temperature, land use, and geomorphic characteristics as possible explanations for these spatial patterns. This study fills important knowledge gaps in our understanding of metabolism, both in terms of spatial scaling and geographic coverage (and the amount of field and metabolism modeling work that went into this is no small feat!). My feelings are generally quite positive about this paper. There are a couple discrepancies in how results are being reported in different sections of the paper, and I’d like to see a bit more detail on some of the methods (all explained in specific comments below). Once those things are sufficiently addressed, I look forward to seeing these findings make it out into the world!
We deeply appreciate your helpful review and kind words about the work.
Specific comments:
Title: “Vary by scale and network location” comes across as redundant, as network location is one of the scales in question.
We deleted network location in the title.
Abstract:
Line 28: “GPP varied significantly with ecoregion, river basin…” – In the results, GPP and GPP20 only vary by basin. Is this sentence is meant to describe the GPP/ER results? This also made me realize GPP/ER is never mentioned in the abstract even though a fair bit of the results and discussion centers on this ratio.
We revised the abstract to discuss GPP/ER and clearly separate GPP20 and ER20 results.
Introduction:
Line 57 (in the context of Figure 1): “given the wide variety of conditions that may occur across biomes” – Could this be worded more precisely to better introduce the predictor variables you end up including in your models? The first clear mention of climate and land use as potential predictors of metabolism is in the description of what you did in this study in line 80- they weren’t explicitly mentioned anywhere beforehand.
Now we state, in the last paragraph of this section “However, we also understand that other factors could control metabolic rates in addition to ecoregion. Thus, we investigated our large metabolism dataset for rivers in these three ecoregions in both countries in conjunction with a broad range of variables including climate (e.g., long-term watershed air temperature), geomorphology (e.g., width, depth, slope, valley shape) local environment (e.g. discharge, water velocity, water temperature), and land use (e.g., cropland, urban, forest).” to introduce the specific main variables and more clearly link the text to figure 1.
Lines 70-72: “A major challenge in comparing studies… Schechner et al. 2021” – This idea seems like it belongs elsewhere, maybe in the methods and/or the second intro paragraph on why scaling has been challenging. As much as I love any opportunity to highlight that study, putting it here breaks up the flow of ideas. The premise of using consistent methodology could still be kept here, just stated more simply.
We simply deleted the sentence, and slightly modified the following sentence to not refer to it.
Lines 82-84: “While many metabolic measurements are available for the US, metabolic rates for Mongolia’s rivers have not, to our knowledge, been quantified prior to our work.” – First, while I think this is an important contribution to emphasize, it disrupts the flow of ideas a bit here. Maybe it could be moved earlier in the paragraph? Second, the wording is slightly misleading since your group has published metabolism data from some of the Mongolian sites at least twice already (Schechner et al. 2021, Tromboni et al. 2022). Maybe rephrase as something like “rates have not been extensively quantified,” while citing previous works?
Good point! Done.
Methods:
Line 100: “A detailed description of the FPZ delineation methodology… provided previously” – Could you summarize the general idea of the methodology in a sentence here, then say that the specifics can be found elsewhere? That would make it easier for the reader to conceptualize what information is encompassed in each FPZ.
We now list the variables and spatial scales that were used in the prior research to characterize FPZs in the following sentence.
Figure 2: Do you have site photos (or photos from the general regions) you could include alongside the map to give a sense of what these different ecoregions look like? Perhaps you could include representative photos of the three US regions along one side of the map and the three Mongolian ecoregions along the other side. That would help give a sense of how comparable we might expect ecoregions to be across countries.
We are not quite sure how to implement this. The lowland sites ranged from desert, through grassland, to forest. The upland sites were in mountains (terminal basin or mountain steppe) or just upper regions of hilly terrain (Grassland Steppe). It would take more pictures than practical.
Line 138: “Long-term (1977-2000) climate data” – Do we have reason to believe that air temperature from 26+ years ago would be predictive of current stream metabolism, given the reality of climate change? While I acknowledge that this is a fairly recent data product, I would still like to see more justification for the inclusion of historic rather than present air temperature as a predictor, as well as more transparency in places like Figure 1 about the fact that this is historic temperature.
We meant this as a proxy for climate, and river temperature as the immediate value that is indicative of recent conditions the biota was exposed to. Once we used river temperature to correct metabolic rates to a common temperature, then this was used to represent climate and relate to current metabolic rates. This in part was intended to account for the fact that Mongolia air temperatures tend to be lower, along with watershed elevational differences (sub-basin differences). Some of the basins went from mountain streams to desert rivers. We modified figure 1 to make it clear that we used average air temperature, and this came out as an important variable in ER20.
Line 163: “Deployments ranged from 24 hours to 1 week” – Were all the deployments at a similar time of year, and were they all within a single year? It would be helpful to know that to assess whether any of the spatial patterns could be biased by time of sampling.
We now follow this statement with “All deployments were made during summer under baseflow conditions.”
Line 176: “We re-ran sites where we were unable to get a good model fit… with the Riley and Dodds 2012 model.” – A few questions here. 1) Can you add a brief explanation of why the Riley and Dodds model is better able to estimate metabolism for sites with high k? 2) If there is an alternate model that you think performs better than BASE for the more challenging sites, why not just run all the sites with that model? Presumably if the model works well for challenging sites, then it would also work for less problematic sites. 3) How do model outputs from BASE and the Riley and Dodds model compare at the less problematic sites? It would be helpful to see a comparison of metabolism estimates from the two models to know if a comparison of data from different models is valid. 4) How many sites were run with the Riley and Dodds model versus BASE? Similar to my last question about ruling out sampling bias, I’m curious if the sites that required an alternate model were concentrated in a particular spatial grouping (e.g., if they were all upper watershed sites).
With respect to the Riley and Dodds model (points 1, 2 and 3) we now state “This model is more difficult to automate and does not provide statistics on goodness of fit, but it does allow quicker visual evaluation of how the model responds to different priors than does the Grace et. al. (2015) model. In cases where both models are run successfully, they give very similar results.” With respect to the final dataset, while the high aeration sites tended to be upper, and they were the hardest to fit, we also had more originally, and those tended to be the ones that got thrown out. In the end we had a fairly even distribution of numbers of sites. Overall, we had 45 upper and 30 lower, Grassland Steppe and 13 upper and 15 lower, Mountain Steppe had 16 upper and 8 lower, and Terminal Basin had 16 upper and 7 lower.
Line 182: “This left us with 75 sites out of 89 where we were able to model metabolism successfully” – Could you include a table in the supplement that has more of the specifics of the QAQC metrics you used and how many sites passed each metric? While I don’t think it needs to be in the main text, there should still be some record of what thresholds you’re accepting for things like ER vs k correlation, r-hat (assuming this is what you mean by “metrics of fit associated with the model”), modeled vs observed correlation, etc. so we can track consistency of QAQC practices across studies. And on a positive note, kudos on getting that many sites to behave!
We elected to place the metrics in the methods, and how many sites we had to throw out. We now explain more fully our criteria for model acceptance. We discarded sites where we were unable to model the data with good fit as evaluated by posterior predictive check, modeled versus estimated data correlation, chain convergence, deviance and information criteria from the model, as well as a visual evaluation of model fit. Regarding specific threshold values, the chain convergence was a binary measure reported by the Grace model, the visual evaluation of model fit was somewhat qualitative but reasonably replicable. See https://github.com/dgiling/BASEmetab/blob/master/vignettes/BASEmetab.Rmd. This document explains that Convergence Check - accepted (this covered all R-hats), 0.4 < PPP < 0.6
Line 205: “separated by upstream and downstream sites followed by a finer FPZ scale” – The wording here and in the next sentence makes it sound like there is an additional level of organization in between upstream/downstream and wide/constrained.
Thanks, we did not intend that, wording clarified.
Results:
Section 3.1: BASE estimates metabolism on a per volume basis, but GPP and ER are reported in areal units throughout your results. Either correct the units throughout the results, or explain in the metabolism methods how you converted to units of g O2/m2/d.
In the methods we now state “We converted volumetric rates from the BASE models to g O2 m-2 day-1 multiplying by the mean stream depth for the reach and appropriate unit conversions.” We also corrected the units in the figures.
Line 225 and Figure 4: “…GPP varied marginally significantly, only by river basin (Figure 4)” – It wasn’t immediately intuitive that Figure 4 was related to this sentence, as there is no mention of river basin in Figure 4. On second glance it makes sense that it’s showing no significant difference across the broader spatial scales. However, if all the figure is showing (in the context of the ANOVA results) is that GPP20 didn’t vary at the scale of country or ecoregion, it makes me wonder why there isn’t a similar plot of ER20. Could Figure 4 instead be a two-panel figure that’s a side by side of GPP20 and ER20?
You are correct, it should have just referred to the table in the supplement. ‘We now plot both GPP and ER in Figure 4.
Line 263: “We explored the top explanatory variables… using ANOVAs” – Explanation of this analysis should be included at the end of section 2.4 rather than introducing it here. Also, statistics are only reported for the percent urban and percent cropland models in the supplement- I’d expect to see results for the other variables reported as well.
Now we explain these ANOVAs in the last two sentences of section 2.4. We also include the hierarchical ANOVAs for all the significant variables.
Discussion:
Line 295: “We analyzed … scaling factors and continuous predictors separately because…” – This should be explained in the methods rather than the discussion.
We now explain it in the methods but left the reminder here as well.
Line 356: “We explained a modest amount of the variation in metabolism” – I think this understates the fact that the adjusted r2 values in the linear models were all on the order of 0.03-0.1. Those r2 values suggest to me that there is a lot of variation in metabolism still not being captured, so I think some disclaimer along the lines of “there is still work to do to be able to more reliably predict metabolism” should be included.
We inserted this wording in the next sentence now: Given these low values for r2, it is clear there is still work to do to be able to more reliably predict metabolism as a function of scale.
Line 377: “ecoregions significantly influenced GPP” – in the results, GPP only significantly varied at the river basin scale.
This should have been GPP20, corrected.
Line 379: “Climate drives ecoregion characteristics (e.g., riparian canopy) – This comes across as a little contradictory with section 4.3, where it was suggested that local riparian cover might not be the most important control of GPP.
Good point, we deleted the parenthetical comment.
Line 384: “grazing in Mongolia… effects of cropland” – More of a curiosity question than a request to edit: do you have a sense of whether/how the designation of land as “cropland” in Mongolia accounts for nomadic herding practices? I’m wondering if it might be more challenging to pin down the effects of land use in Mongolia when one of the land use pressures involves roaming herds of livestock as opposed to, say, farms tied to specific locations (although perhaps the herding locations are more consistent than I’m aware of).
Yes, most grazing land is not owned by individuals and most of what can be grazed is. Traditionally there were no constraints on grazing and the range was open. In 2024, after our research, legislation was passed to require community grazing controls to limit overgrazing. One of the big drivers of range degradation is the high price of cashmere, which is moving herders away from traditional methods and encourages over grazing by Cashmere goats.
Conclusions:
Line 409-410: “respiration was related to long-term climate conditions… ER may be more sensitive to global warming” – Consider revisiting this conclusion depending on your response to my previous comment on use of historic temperature data.
We now try to make it clearer that immediate conditions were river temperature, and climate was indicated by long term averages. Note that river temperature integrates daily fluctuations in air temperature and is more proximate to the biota in the rivers.
Supplement:
Tables S4, S6, S14, S15: These tables don’t have the same note as the other ANOVAs about “we ran 4 tests… p<0.0125 for significance” as the GPP and ER tables. Were the models constructed differently, or was the note just forgotten?
Good catch this was left in from a previous draft. After considering the arbitrary nature of p < 0.05, we meant to leave the Bonferroni correction note out below the tables, but some of them slipped through. However, most of the p values that indicated a strong relationship were at or well lower than the p = 0.01 value.
Technical corrections:
Introduction:
Figure 1: Change the largest scale from “continent” to “country” to match how it’s discussed throughout the rest of the text.
Thanks!
Methods:
Line 130: Clearer if worded as “…hydrologic and geomorphic measurements in reaches based on…”?
We changed it to “hydrologic and geomorphic measurements in reaches defined by a minimum fifteen-minute travel time”
Line 152: Clearer if worded as 10% and 5% of reach length rather than 0.1 and 0.05 of reach length?
Yes, this is clear, thanks.
Line 168: “animal disturbance” – Thank you for the beautiful mental image of goats munching on PAR sensors. (Purely speculation on my part, but it seems like something they would do!)
Goats, sheep, or yak…maybe even camels, most likely rodents.
Results:
Line 228: “found in figure S1” – The data distributions are currently figure S2 in the supplement (but should be S1 if I’m reading correctly).
We re-organized the figures in the supplement based on this comment. We also realized the units needed correction in these figures as well.
Line 231: Include O2 in units for rates.
Added.
Figures 3 and 4: Check units on axes- assuming these should be g O2 instead of mg O2?
Yes, not sure how that slipped through. All figures fixed.
Line 273: “…also matched at the upper-lower and percentage cropland scales” – should this be the ecoregion scale?
Changed to “but water velocity also matched at the upper-lower hierarchy level and the percentage cropland at ecoregion scales as well”.
Discussion:
Line 283: Should be Figure S2 instead of S1? Check order here and in the supplement.
We were asked to combine the supplements (combine 2) after the initial submission. So, two different versions of the manuscript and the supplements got us a bit confused. We are re-checking all these references once we have finalized the revision.
Line 287: “five of 76 river sites” – Should this be 75?
Fixed
Supplement:
Table S7: “marked correlations are significant” – I’m not seeing any markings? Double-check that this table is displaying how you intended it to.
We meant to take this statement out. It is gone now
Citation: https://doi.org/10.5194/egusphere-2026-555-AC2
-
AC2: 'Reply on RC2', Walter Dodds, 11 May 2026
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 736 | 271 | 83 | 1,090 | 149 | 65 | 146 |
- HTML: 736
- PDF: 271
- XML: 83
- Total: 1,090
- Supplement: 149
- BibTeX: 65
- EndNote: 146
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This is a timely and interesting paper examining the controls on stream metabolism across a large range of spatial scales for dry, temperate rivers in the USA and Mongolia. The Mongolian data - and the comparisons with the US data - are most important given the near lack of such information on streams from this region. I enjoyed reading it.
The paper is well-written and the methodology and key findings are clearly explained. The figures are appropriate for illustrating the related points in the text, and presuming they are reproduced in the original colors, visually acceptable.
I do have several comments and suggestions they authors may wish to consider when revising the text. These are generally in order of appearance in the paper rather than in order of importance.
In the Introduction, it may be worth adding that this methodology may be used at these - or similar - sites to assess how climate change effects may temper the regional controls on metabolism, especially noting the later comment on the role of GPP as an effective indicator of stream health.
I am not familiar at all with rivers in Mongolia. Are they clear water or turbid? I'm presuming the USA sites are clear water - is that correct? In particular, is water clarity sufficient for PAR to reach the benthos in the thalweg, or just in littoral zones? Although stream metabolism doesn't distinguish benthic and pelagic metabolism, the importance of benthic production can be inferred from knowing whether water depth is less or greater than euphotic depth.
Line 163: I would like to see some commentary on the limitations and possible data artifacts that may affect subsequent modelling and meta-analysis arising from the very short (1 - 7 day) deployment at each site. I totally understand the logistical constraints. How is a very overcast day, or consecutive days, considered for example. This will result in GPP suppressed well below 'normal' behavior at this site. Is this overcome to some extent simply by the number of sites examined?
Line 175: What were these metrics? (i.e. what criteria?)
Line 184: What is "unreasonably high"? Vague
Line 217: Nutrients are discussed later but I would have liked to see some data on concentrations of bio-available N and P as well as DOC in the rivers, as factors potentially directly affecting GPP and ER. Is there much of a difference across the various spatial scales? Even spot measurements during the deployments would provide some insight. These can be related to, but not inferred from, land use for example.
Line 252: How many sites are there with urban influence? How is this defined? Is it the same definition in both countries? Urbanization is not listed amongst the factors in lines 246-249. Is that just an oversight?
Line 267: Was the significant water depth effect related to turbidity and euphotic depth, as noted above?
Line 278: 'strong' meaning here?
Line 329: Suggest citing the finding of Hall (2013) that a substantial fraction (estimate of 44% on average) of newly created organic matter is respired before it is taken up by higher trophic levels. Freshwater Science, 32(2): 507-516 (2013)
Line 350: Are the constrained valley effects (partly) through topographic shading?
Line 378: Turbidity again. "...for metabolism in drier temperate areas, results from one country can likely be applied to another". I'm not convinced at all about this. My thinking - and I'm very happy to be convinced otherwise - is that parts of Central Asia, southwest USA, Australia, southern Africa etc where light penetration into a turbid water column is a major constraint on GPP may disrupt this universality of behavior.