the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Identifying Dominant Parameters Across Space and Time at Multiple Scales in a Distributed Model Using a Two-Step Deep Learning-Assisted Time-Varying Spatial Sensitivity Analysis
Abstract. Distributed models require parameter sensitivity analyses that capture both spatial heterogeneity and temporal variability, yet most existing approaches collapse one of these dimensions. We present a two-step, deep learning-assisted, time-varying spatial sensitivity analysis (SSA) that identifies dominant parameters across space and time. Using SWAT for runoff simulation of the Jinghe River Basin, we first apply the Morris method with a spatially lumped strategy to screen influential parameters and then perform SSA using a deep learning-assisted Sobol' method for quantitative evaluation. A key innovation lies in the systematic sensitivity evaluation with parameters represented and analysed at both subbasin and hydrologic response unit (HRU) scales, enabling explicit treatment of distributed parameters at their native spatial resolutions. To reduce computational burden, two multilayer perceptron surrogates were trained for 195 subbasin and 2,559 HRU parameters, respectively, allowing efficient time-varying SSA of NSE-based Sobol' indices over 3- and 24-month rolling windows during 1971–1986. Results reveal structured, scale-dependent controls: spatially, sensitivity hotspots are coherent between scales but become more localized at the HRU level, reflecting heterogeneity in land use, soils, and topography; temporally, sensitivities fluctuate with runoff in the 3-month window, while event-scale variations are smoothed in the 24-month window, yielding more persistent patterns governed by storage and routing processes. The proposed framework provides a computationally efficient and unified approach for identifying scale-dependent sensitivity hotspots and hot moments, thereby supporting targeted calibration and enhancing the interpretability and predictive robustness of distributed models under nonstationary conditions.
- Preprint
(2561 KB) - Metadata XML
-
Supplement
(562 KB) - BibTeX
- EndNote
Status: open (until 28 Jan 2026)
- RC1: 'Comment on egusphere-2025-5694', Anonymous Referee #1, 19 Dec 2025 reply
-
RC2: 'Comment on egusphere-2025-5694', Anonymous Referee #2, 06 Jan 2026
reply
The paper tackles an important problem, but several method choices and interpretations need improvement, and some conclusions appear over-claimed. I would recommend major revision.
Major Comments
1) The authors state that no model calibration was performed to maintain "diagnostic integrity". While this isolates parameter sensitivity from calibration bias, it risks performing SA on a model that does not represent the physical reality of the Jinghe River Basin. Will sensitivity patterns change significantly once the model is constrained into a realistic posterior parameter space? You might want to report baseline SWAT performance vs observations and discuss how poor/mediocre fit would distort SA.
2) The author chose fully connected MLPs over sequence models like LSTMs arguing that the temporal structure is "encoded in the output vector". However, hydrologic processes are inherently autoregressive. Would not a 180-neuron output layer treat each month runoff as an independent regression target and potentially ignoring the temporal dependencies and mass balance continuity inherent in the SWAT model?
3) You find hotspot subbasins near the gauge (e.g., 33–34) dominate sensitivities. This is unsurprising when the response is a single-gauge performance metric and proximity/connectivity to the gauge will mechanically increase leverage as briefly acknowledged by the authors. However, the authors interpret patterns as reflecting spatial heterogeneity (land use/soil/topography). Would hotspot locations remain under alternative responses and/or multi-site constraints? You might want to discuss this including the effect of additional gauges, internal variables (ET, soil water, baseflow index), or alternative spatially distributed responses.
4) For the HRU-scale Sobol analysis the authors used k = 2559 and N = 32768. The authors noted that sensitivity magnitudes (ST) are smaller at this scale due to "variance dilution" Is the identified sensitivity a physical signal or a mathematical artifact of the Sobol' method when the input space is massively expanded? With thousands of parameters, many ST estimates can be noisy/biased; negative indices and non-closure can occur and should be diagnosed. You might want to include convergence checks including reporting fraction of negative indices and demonstrating stability of rankings/hotspots. Otherwise, soften your claims and frame findings as exploratory.
5) The use of NSE as the sole sensitivity target can be problematic, especially for rolling windows. NSE is highly sensitive to variance and peaks, can behave poorly for low flows, and is nonlinearly bounded; windowed NSE can become unstable for low-variance windows. You mention this as a limitation, but the paper’s main results rely on it. You might want to include at least one complementary metric and a compare whether dominant parameters/hot moments persist across metrics. Otherwise, the "hot moments" identified (e.g., CN2 peaking during wet months) may reflect the flaws or the mathematical structure of NSE rather than the shift in the physical process.
5) The manuscript states the model was “simulated at a monthly time step … consistent with available meteorological data,” yet earlier you describe daily meteorological records and daily runoff data availability. Clarify if SWAT time step was daily or monthly or whether outputs were aggregated to monthly for NSE/SSA
6) You highlighted the use of lagged Spearman's rank correlations between sensitivity and runoff is a highlight. However, the 1-month lag in 3-month windows needs clearer physical attribution. Is this a signature of soil moisture memory or a delay in the surrogate response?
7) It seems that CN2 is described as adjusted by a multiplicative factor in screening, and then later treated as independently distributed at subbasin and HRU scales. The replacement vs factor approach is not consistently explained. The read might want to know how “distributed parameters” are represented and thus what the Sobol indices refer to. You might want to describe how key parameters are perturbed at each scale (subbasin vs HRU), including how SWAT input files are edited and whether spatial structure is preserved or broken for the SA purpose. This would help reproducibility if other researchers are interested in this approach.
Minor Comments
1) The largest errors occur during high-flow months, attributed to the "limited representation of extreme events in the training dataset". For a model intended to support "flood warnings," this is a significant deficiency.
2) There is a duplicated/incorrect subsection header (“2.5.2 Spatial Parameterization…” and 2.5.1 Spatial Parameterization for Distributed Parameters)
3) Numerous typos/grammar issues (e.g., “trransform”, “unfiorm”, “predication error”, etc.)
4) Resampling land use to 3000 m and DEM to 150 m is a major preprocessing decision; provide quantitative justification that hydrologic response and HRU composition are not materially altered and implications
5) The “hierarchical calibration strategy” is discussed, but the study does not actually demonstrate calibration improvement; tone this down or add an illustrative experiment.
Recommendation
The workflow is promising, but the paper needs stronger validation, clearer reproducibility, and more cautious interpretation.
Citation: https://doi.org/10.5194/egusphere-2025-5694-RC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 158 | 71 | 22 | 251 | 30 | 32 | 28 |
- HTML: 158
- PDF: 71
- XML: 22
- Total: 251
- Supplement: 30
- BibTeX: 32
- EndNote: 28
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Space-time varying sensitivity analysis is an ongoing topic for research, given its potential value for deeper analysis of distributed models on the one side, and it’s very high computational demand on the other side. The study presented is thus relevant for the community.
My main points for revisions are the Discussion section – which does not compare the current results with previous findings, and a lack of robustness analysis and testing of the influence of choices and assumptions. These can both be rectified. Please see may detailed comments below.
Main comments:
[1] Any sequential strategy, in which different methods are used in series, depends on early steps not overly constraining the outcome of later steps. For example, in this case, ensuring that parameters are not eliminated that might later become relevant when assessed in a distributed manner. How can it be ensured that this problem does not occur in this sequential application proposed here?
[2] In this case study, have the authors tested whether the distributed sensitivity analysis is changing when different parameters are kept after the first stage? E.g. by trying to pursue step 2 with all parameters for smaller test cases?
[3] In how far have the authors tested the “performance” of the MLP in terms of consistency of sensitive parameters?
[4] Have the authors estimated confidence limits on the resulting sensitivity indices to ensure that their analyses have converged? It is difficult to assess the robustness of the results without such convergence tests. (e.g. Sarrazin et al. 2016 https://doi.org/10.1016/j.envsoft.2016.02.005)
[5] Why do you see such strong differences in results for sub-basins and HRUs? If it is potentially due to the different number of parameters, then can this not be tested and confirmed?
[6] Please check again for spelling issues. E.g. in the caption of Fig. 5 “show lagged Spearman’s rank correlations (r) between and runoff”, there is a word missing after between.
[7] The Discussion section is a good start, but it is currently not fulfilling its actual role. It is meant to discuss the results of this specific study in the context of previous studies. However, section 4.1 just reviews the results, while section 4.2 makes some references to potential future explorations. So, the current 4.1 should be part of the results section. In section 4, the authors need to discuss how their findings different (or not) from previous findings regarding the sensitivity of the SWAT parameters. Did they find new influences of processes compared to other studies? Did the different approach yield different results? Etc. Also, what did the authors find in their methodology compared to previous space-time varying analyses? The authors made some different choices and assumptions, how did this influence the results and findings?