Technical note: Machine learning metamodelling for global sensitivity analysis
Abstract. Global sensitivity analysis (GSA) plays a central role in hydrologic modelling by supporting model understanding, diagnosis, and decision-making through the identification of influential and non-influential parameters and their interactions. Variance-based methods provide a rigorous framework for GSA but are often computationally expensive, as their estimation requires a large number of model evaluations. Metamodelling has therefore been widely adopted as a strategy to alleviate this issue, with recent advances in machine learning (ML) offering new opportunities to construct accurate and flexible surrogates for complex models. This technical note examines the practical relationship between Sobol’ total-effect indices (Ti) and feature importance measures derived from ML metamodels within a hydrologic modelling context. Building on theoretical results that link Ti to permutation variable importance (PVIi) under independence assumptions, we provide systematic numerical evidence using three conceptual hydrologic models of varying complexity (HBV, HyMod, and VIC) applied to three headwater catchments in northern Germany, together with three ML metamodels: a random forest (RF), a neural network (NN), and a linear model (LM). The three metamodels were trained on Monte Carlo samples and used to estimate sensitivities through PVIi and SHapley Additive exPlanations (SHAPi). The results demonstrate that RF and NN metamodels reliably reproduce both the ranking and relative magnitude of Ti using PVIi across all hydrologic models, providing clear empirical support for the theoretical connection between the two measures. In contrast, the performance of LM-based estimates depends strongly on the degree of linearity in the underlying model response. Mean absolute SHAPi values exhibit a consistent monotonic relationship with Ti and preserve parameter rankings, while sample-specific SHAPi values enable a distributed evaluation of sensitivities across both the parameter space and the target variable space. Overall, this study highlights ML metamodelling as a computationally efficient and conceptually sound framework for GSA in hydrologic modelling and beyond.