Fusing Satellite Embeddings to Improve Streamflow Reconstruction Across River Networks
Abstract. Reconstructing streamflow across river networks is increasingly challenging in the context of heavily modified land surface conditions. Here we present a Data Integration model with Satellite Embeddings (DISE), a reach-scale residual-learning framework that integrates Google Satellite Embeddings (SE; compact learned vector representations of satellite imagery) from the AlphaEarth Foundation Model with a recently developed discharge simulation (GRADES-hydroDL) by learning corrections toward gauge observations. We evaluate DISE at 41 gauging stations in the Yangtze River Basin using leave-one-station-out cross-validation, with embeddings aggregated over each reach’s contributing subcatchment. Simulations incorporating SE consistently outperform the GRADES-hydroDL baseline, with mean aggregation emerging as the most balanced strategy. Improvements are most pronounced for magnitude and bias: compared to GRADES-hydroDL, median KGE increases from 0.485 to 0.594 and median NSE from 0.301 to 0.533, while correlation gains remain modest, suggesting SE primarily help the model capture streamflow volume and variability rather than timing. Control experiments further show that SE enhance spatial generalization beyond both meteorological forcings and traditional hydro-environmental reach attributes (RiverATLAS): compared to the base configuration without spatial context, adding SE alone increases median KGE from 0.473 to 0.594; when SE are further added on top of RiverATLAS, median KGE increases from 0.497 to 0.567. Once SE are included, adding RiverATLAS can even slightly reduce performance. Embedding-driven gains weaken where streamflow is governed by processes not directly visible from surface imagery, particularly complex reservoir operations. Nevertheless, SE can still provide useful information when forcing-based corrections are limited. These results demonstrate that SE provide analysis-ready, information-rich representations of land surface heterogeneity that measurably strengthen streamflow reconstruction across river networks. DISE offers a scalable pathway to inject high-resolution Earth observation context into river-network modeling, improving predictions in basins where conventional forcings and hydro-environmental descriptors are often insufficient.