Preprints
https://doi.org/10.5194/egusphere-2026-1965
https://doi.org/10.5194/egusphere-2026-1965
20 Apr 2026
 | 20 Apr 2026
Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

Better data or better architecture? Improving deep-learning-based prediction in ungauged basins

Benedikt Heudorfer, Hoshin Gupta, Alexander Dolich, and Ralf Loritz

Abstract. Large-sample hydrology has recently been driven by two key developments. First, the introduction of hydrological benchmark datasets such as CAMELS-US and CARAVAN, and second, the emergence of deep‑learning modelling frameworks, particularly LSTM‑based regional models, which have demonstrated performance on par with, and in some cases exceeding, that of process-based models for streamflow prediction in gauged and ungauged settings. Building on these developments, we investigate whether (i) further enhanced LSTM architectures, (ii) new sets of static features, or (iii) a combination of both enable us to significantly improve Predictions in Ungauged Basins (PUB). In this study, we evaluate a state-of-the-art regional LSTM model (base LSTM) against embedded (EMB-LSTM) and cross‑attention enhanced (CA-LSTM) variants, in combination with a suite of newly applied static features, namely MODIS surface reflectance bands, ALPHAEARTH embeddings, DEM-, meteorology- and catchment coordinate-derived auxiliary aggregates, and conventional CAMELS attributes. We tested these model-and-data combinations in pseudo‑ungauged 5‑fold cross‑validation across the 531 CAMELS‑US catchments. Model performance was quantified by the Nash‑Sutcliffe Efficiency (NSE), while latent‑space complexity was assessed via the Shannon effective rank (erank). Results show that the quality of static features is more important than architectural improvements. ALPHAEARTH embeddings attained the highest median NSE, but only in combination with auxiliary static feature data (ALPHAEARTHplus). Architectural refinements yielded only modest improvements. Thereby the relatively simple EMB-LSTM, which allowed the LSTM layer to better ingest ALPHAEAERTHplus static features, outperformed the other architectures. With this combination, we achieved a median performance of NSE = 0.726, significantly improving the state-of-the-art PUB performance (NSE = 0.69) for the CAMELS-US dataset. Auxiliary analysis indicates that further improvement is possible when adding MODIS bands as additional dynamic features to the model. In conclusion, our study indicates that, broadly speaking, (a) better data is more important than better architecture, (b) better architecture is necessary only to accommodate better data, (c) the single layer LSTM remains the most suitable core model as of now, and (d) the Shannon effective rank complexity of the latent space is a useful diagnostic for linking improved PUB performance to improved quality of latent hydrological representation inside the model. Overall, this highlights the need for improved measurement‑derived descriptor datasets, especially for soil and geology.

Competing interests: At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Benedikt Heudorfer, Hoshin Gupta, Alexander Dolich, and Ralf Loritz

Status: open (until 01 Jun 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Benedikt Heudorfer, Hoshin Gupta, Alexander Dolich, and Ralf Loritz
Benedikt Heudorfer, Hoshin Gupta, Alexander Dolich, and Ralf Loritz
Metrics will be available soon.
Latest update: 20 Apr 2026
Download
Short summary
For most rivers, water level is not measured, making flood prediction difficult. But it's still possible with certain models. We want to improve these models and test if better models or better data help when predicting floods with these models in the United States. Results show that better data (measured by satellites) improves predictions more than better model designs. Actually, simple models often worked best. And we show that we need better measurement of landscape information.
Share