SPatial Efficiency And Kmoments (SPEAK): Evaluating Spatial Consistency in (Semi)Distributed Rainfall–Runoff Models
Abstract. We introduce the Spatial Efficiency and Kmoments (SPEAK) metric, a novel objective function for the spatial calibration of hydrological models. SPEAK is built on Kmoment-based statistics, including a Kmoment-based: i) correlation, ii) coefficient-of-variation ratio, and iii) probability density function. This novel formulation is explicitly designed to overcome key limitations of existing spatial performance metrics, such as sensitivity to binning strategies, grid resolution, and sample heterogeneity. By relying on distributional properties rather than grid-to-grid correspondence, SPEAK provides a statistically robust framework for evaluating spatial patterns in gridded hydrological variables. The proposed metric is implemented in both semi-distributed and fully distributed configurations of the TUW hydrological model and tested across 99 near-natural Chilean catchments that encompass strong climatic and physiographic gradients. Actual evapotranspiration (ETa) from GLEAM v4.2a is used as an independent spatial benchmark, allowing the assessment of model performance beyond streamflow reproduction. Calibration using SPEAK is compared with a conventional streamflow-only calibration based on the Kling-Gupta Efficiency (KGE) and an ETa-only calibration based on the Spatial Efficiency metric (SPAEF). Model performance is evaluated using the normalised root-mean-square error (NRMSE), the spatial Pearson correlation coefficient, the Fraction Skill Score (FSS), and sensitivity to catchment attributes. Results demonstrate that while streamflow-only calibration leads to satisfactory runoff simulations (KGE ≥ 0.25 for all catchments and cases analysed; whereas the mean and median KGE are 0.80 and 0.85, respectively), it fails to reproduce the spatial patterns of ETa. When ETa is used as a calibration target, SPEAK consistently outperforms SPAEF, exhibiting lower NRMSE (number of catchments with lower NRMSE: 85 and 92 in fully and semi-distributed configuration, respectively), reduced internal component dispersion, and improved representation of spatial patterns across seasons and hydroclimatic zones. Importantly, SPEAK shows limited dependence on catchment characteristics. These findings highlight SPEAK as a methodologically robust spatial performance metric, with clear potential for improving the calibration and diagnosis of distributed hydrological models and other gridded environmental variables.
The paper is well-structured and proposes a novel approach to improve SPAEF metric.
I have the following comments:
-Did authors apply sensitivity analysis before model calibration? I couldnt find the details. Apparently all TUW parameters are included in the spatial calibration.
A robust calibration framework startes with a sensitivty analysis to reduce the parameter search space i.e. positively affecting the model runs and convergence to globally optimum metrics.
-Line 250: "Fully distributed model configuration: Model inputs are provided at the grid-cell (0.05° x 0.05°) level across
each catchment with uniform-in-space parameters’ values (i.e., not depending on the spatial dimension)."
I think the it would be excellent if the authors could apply pedotransfer functions to full distributed version of the model.
Some models like mHM have this setting to gether with multi-parameter regionalization approach that help to create robust patterns instead of weak patterns due to uniform parameter values.
-Line 490: "Furthermore, the TUW model employed spatially uniform parameter
values, potentially limiting its ability to represent local heterogeneity".
Good that the authors mention this limitation in the text.
-Line 65: "In recent years, the Spatial Efficiency (SPAEF; Koch et al., 2018)"
In that paper the authors state that: "Following the multiple-component idea of KGE we present a novel spatial
performance metric denoted SPAtial EFficiency (SPAEF), which was originally proposed by Demirel et al. (2018a, b)."
https://gmd.copernicus.org/articles/11/1873/2018/
This sentence can be helpful to find the origin of the metric.
-Sample size (number of grid cells) seems small in Fig4-5 and 6 as compared to a basin with 100x100 grids for example.
The reader can be cruious why the authors selected small catchments? Or why didnt they give the gridded maps of continental Chile (99 near-natural catchments)?