Testing data assimilation strategies to enhance short-range AI-based discharge forecasts
Abstract. Effective discharge forecasts are essential in operational hydrology. The accuracy of such forecasts, particularly in short lead times, is generally increased through the integration of recent measured discharges using data assimilation (DA) procedures. Recent studies have demonstrated the effectiveness of deep learning (DL) approaches for rainfall-runoff (RR) modeling, particularly Long Short-Term Memory (LSTM) networks, outperforming traditional approaches. However, most of these studies do not include DA procedures, which may limit their operational forecast performance. This study suggests and evaluates three DA strategies that incorporate discharge from either past observed discharges or forecast discharges of a pre-trained benchmark model (BM). The proposed strategies, based on a Multilayer Perceptron (MLP) orchestrator, include: (1) the integration of recent observed discharges, (2) the integration of both recent discharge observations and pre-trained BM forecasts, and (3) the post-processing of BM forecast errors. Experiments are implemented using the CAMELS-US dataset using two established benchmark models: the trained LSTM model from Kratzert et al. (2019) and the conceptual Sacramento Soil Moisture Accounting (SAC-SMA) model from Newman et al. (2017), covering both machine learning and conceptual RR simulation approaches. Lead times of 1, 3, and 7 days, covering short- and mid-term horizons, are considered. The approaches are evaluated in two forecast frameworks: (1) perfect meteorological forecasts over the forecasting lead time and (2) highly uncertain ensemble meteorological forecasts. The two frameworks yield contrasting outcomes. When evaluated under the perfect forecast framework, the application of DA leads to substantial improvements in forecast performance, although the magnitude of these gains depends on the initial performance of the benchmark (BM) models and the forecasting lead time. Improvements are consistently significant for the SAC-SMA cases, while for the LSTM cases, gains are observed mainly for basins where the LSTM initially underperforms. However, the ensemble forecast evaluation yields unexpected results: the performance ranking of the tested models changes markedly compared to the perfect forecast framework. The LSTM model, in particular, appears penalized by the unreliability – specifically, the under-dispersion – of its forecast ensembles, meaning that its predictions are insufficiently responsive to meteorological forcing over the forecast lead time. This finding underscores the importance of ensuring reliable ensemble dispersion for the efficient operational deployment of AI-based hydrological forecasts.