Preprints
https://doi.org/10.5194/egusphere-2026-1787
https://doi.org/10.5194/egusphere-2026-1787
14 Apr 2026
 | 14 Apr 2026
Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

Technical note: Machine learning metamodelling for global sensitivity analysis

Patricio Yeste, Lieke A. Melsen, João Paulo L. F. Brêda, Nicolás Tacoronte, Andrea Saltelli, Giulia Vannucci, Roberta Siciliano, and Axel Bronstert

Abstract. Global sensitivity analysis (GSA) plays a central role in hydrologic modelling by supporting model understanding, diagnosis, and decision-making through the identification of influential and non-influential parameters and their interactions. Variance-based methods provide a rigorous framework for GSA but are often computationally expensive, as their estimation requires a large number of model evaluations. Metamodelling has therefore been widely adopted as a strategy to alleviate this issue, with recent advances in machine learning (ML) offering new opportunities to construct accurate and flexible surrogates for complex models. This technical note examines the practical relationship between Sobol’ total-effect indices (Ti) and feature importance measures derived from ML metamodels within a hydrologic modelling context. Building on theoretical results that link Ti to permutation variable importance (PVIi) under independence assumptions, we provide systematic numerical evidence using three conceptual hydrologic models of varying complexity (HBV, HyMod, and VIC) applied to three headwater catchments in northern Germany, together with three ML metamodels: a random forest (RF), a neural network (NN), and a linear model (LM). The three metamodels were trained on Monte Carlo samples and used to estimate sensitivities through PVIi and SHapley Additive exPlanations (SHAPi). The results demonstrate that RF and NN metamodels reliably reproduce both the ranking and relative magnitude of Ti using PVIi across all hydrologic models, providing clear empirical support for the theoretical connection between the two measures. In contrast, the performance of LM-based estimates depends strongly on the degree of linearity in the underlying model response. Mean absolute SHAPi values exhibit a consistent monotonic relationship with Ti and preserve parameter rankings, while sample-specific SHAPi values enable a distributed evaluation of sensitivities across both the parameter space and the target variable space. Overall, this study highlights ML metamodelling as a computationally efficient and conceptually sound framework for GSA in hydrologic modelling and beyond.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Patricio Yeste, Lieke A. Melsen, João Paulo L. F. Brêda, Nicolás Tacoronte, Andrea Saltelli, Giulia Vannucci, Roberta Siciliano, and Axel Bronstert

Status: open (until 26 May 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Patricio Yeste, Lieke A. Melsen, João Paulo L. F. Brêda, Nicolás Tacoronte, Andrea Saltelli, Giulia Vannucci, Roberta Siciliano, and Axel Bronstert
Patricio Yeste, Lieke A. Melsen, João Paulo L. F. Brêda, Nicolás Tacoronte, Andrea Saltelli, Giulia Vannucci, Roberta Siciliano, and Axel Bronstert
Metrics will be available soon.
Latest update: 14 Apr 2026
Download
Short summary
Understanding which factors most influence streamflow is key for accurate hydrologic modelling. This study shows that machine learning methods, like random forests and neural networks, can efficiently identify the most important model inputs, producing results similar to traditional sensitivity analysis but with far less computation. This approach helps scientists explore complex models faster and more reliably, improving insights into how catchments respond to changing conditions.
Share