Exploring uncertainties across the modelling chain in machine-learning-based streamflow forecasting
Abstract. Operational streamflow forecasts underpin flood preparedness and reservoir operations, yet their utility is often constrained by poorly characterized and attributed predictive uncertainty. In machine-learning-based forecasting, uncertainty is frequently omitted or reported as a single aggregate output, leaving it unclear which parts of the end-to-end forecasting chain drive overconfidence and forecast degradation, particularly with increasing lead time. In this work, we develop an end-to-end uncertainty decomposition framework for operational streamflow forecasting that attributes predictive uncertainty across meteorological forcing choice, feature design, model architecture, hyperparameter optimization, and training variability, evaluated across multi-day horizons. The decomposition reveals a systematic, horizon-dependent shift in dominant uncertainty sources, with forcing-related contributions increasing with lead time while model-structure and feature choices remain influential at shorter horizons. During high-flow events, predictive intervals remain essential because pipeline heterogeneity can bias the central estimate even when ensemble dispersion widens appropriately. Tuning contributes little to the uncertainty budget but strongly affects compute–skill trade-offs, with Bayesian optimization delivering the most favorable cost–benefit performance under the tested constraints. Together, these results provide actionable guidance for operational freshwater management, showing where investment yields the largest reliability gains: model design at short lead times and forcing quality at longer lead times. This guidance can reduce the risk of costly or unsafe decisions in flood preparedness, reservoir operation, and other critical decision-making contexts in water management.