Preprints
https://doi.org/10.5194/egusphere-2025-483
https://doi.org/10.5194/egusphere-2025-483
10 Mar 2025
 | 10 Mar 2025

Ensembling Differentiable Process-based and Data-driven Models with Diverse Meteorological Forcing Datasets to Advance Streamflow Simulation

Peijun Li, Yalan Song, Ming Pan, Kathryn Lawson, and Chaopeng Shen

Abstract. Streamflow simulations via different hydrological models have different features and can provide valuable information after being ensembled. While few studies have focused on ensembling simulations via models with significant structural differences and evaluating them under both temporal and spatial tests. Here we systematically evaluated and utilized the simulations from two highly different models with great performances: a purely data-driven long short-term memory (LSTM) network and a physics-informed machine learning (“differentiable”) HBV (Hydrologiska Byråns Vattenavdelning) model (δHBV). To effectively display the features of the two models, multiple forcing datasets are employed and utilized in two ways. The results show that the simulations of LSTM and δHBV have distinct features and complement each other well, leading to better Nash-Sutcliffe model efficiency coefficients (NSE) and improved high-flow and low-flow metrics across all spatiotemporal tests, compared to within-class ensembles. Ensembling models trained on a single forcing outperformed a single model using fused forcings, challenging the paradigm of feeding all available data into a single data-driven model. Most notably, δHBV significantly enhanced spatial interpolation when incorporated into LSTM, and even more prominent benefits for spatial extrapolation where the LSTM-only ensembles degraded significantly, attesting to the value of the structural constraints in δHBV. These advances set new benchmark records on the well-known CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) hydrological dataset, reaching median NSE values of ~0.83 for the temporal test (densely trained scenario), ~0.79 for the ungauged basin test (PUB, Prediction in Ungauged Basins), and ~0.70 for the ungauged region test (PUR, Prediction in Ungauged Regions). This study advances our understanding of how various model types, each with distinct mechanisms, can be effectively leveraged alongside multi-source datasets across diverse scenarios.

Competing interests: Chaopeng Shen and Kathryn Lawson have financial interests in HydroSapient, Inc., a company that could potentially benefit from the results of this research. This interest has been reviewed by the Pennsylvania State University in accordance with its individual conflict of interest policy for the purpose of maintaining the objectivity and the integrity of research. The other authors have no competing interests to declare.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Peijun Li, Yalan Song, Ming Pan, Kathryn Lawson, and Chaopeng Shen

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-483', Anonymous Referee #1, 05 May 2025
    • AC1: 'Reply on RC1', Peijun Li, 09 Jun 2025
  • RC2: 'Comment on egusphere-2025-483', Aggrey Muhebwa, 20 Jun 2025
    • AC2: 'Reply on RC2', Peijun Li, 18 Jul 2025
Peijun Li, Yalan Song, Ming Pan, Kathryn Lawson, and Chaopeng Shen
Peijun Li, Yalan Song, Ming Pan, Kathryn Lawson, and Chaopeng Shen

Viewed

Total article views: 718 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
590 110 18 718 17 31
  • HTML: 590
  • PDF: 110
  • XML: 18
  • Total: 718
  • BibTeX: 17
  • EndNote: 31
Views and downloads (calculated since 10 Mar 2025)
Cumulative views and downloads (calculated since 10 Mar 2025)

Viewed (geographical distribution)

Total article views: 762 (including HTML, PDF, and XML) Thereof 762 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 13 Sep 2025
Download
Short summary
This study explores how combining different model types improves streamflow predictions, especially in data-sparse scenarios. By integrating two highly accurate models with distinct mechanisms and leveraging multiple meteorological datasets, we highlight their unique strengths and set new accuracy benchmarks across spatiotemporal conditions. Our findings enhance the understanding of how diverse models and multi-source data can be effectively used to improve hydrological predictions.
Share