Ensembling Differentiable Process-based and Data-driven Models with Diverse Meteorological Forcing Datasets to Advance Streamflow Simulation

Li, Peijun; Song, Yalan; Pan, Ming; Lawson, Kathryn; Shen, Chaopeng

doi:10.5194/egusphere-2025-483

Preprints

https://doi.org/10.5194/egusphere-2025-483

Preprints

10 Mar 2025

| 10 Mar 2025

Ensembling Differentiable Process-based and Data-driven Models with Diverse Meteorological Forcing Datasets to Advance Streamflow Simulation

Peijun Li, Yalan Song, Ming Pan, Kathryn Lawson, and Chaopeng Shen

Abstract. Streamflow simulations via different hydrological models have different features and can provide valuable information after being ensembled. While few studies have focused on ensembling simulations via models with significant structural differences and evaluating them under both temporal and spatial tests. Here we systematically evaluated and utilized the simulations from two highly different models with great performances: a purely data-driven long short-term memory (LSTM) network and a physics-informed machine learning (“differentiable”) HBV (Hydrologiska Byråns Vattenavdelning) model (δHBV). To effectively display the features of the two models, multiple forcing datasets are employed and utilized in two ways. The results show that the simulations of LSTM and δHBV have distinct features and complement each other well, leading to better Nash-Sutcliffe model efficiency coefficients (NSE) and improved high-flow and low-flow metrics across all spatiotemporal tests, compared to within-class ensembles. Ensembling models trained on a single forcing outperformed a single model using fused forcings, challenging the paradigm of feeding all available data into a single data-driven model. Most notably, δHBV significantly enhanced spatial interpolation when incorporated into LSTM, and even more prominent benefits for spatial extrapolation where the LSTM-only ensembles degraded significantly, attesting to the value of the structural constraints in δHBV. These advances set new benchmark records on the well-known CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) hydrological dataset, reaching median NSE values of ~0.83 for the temporal test (densely trained scenario), ~0.79 for the ungauged basin test (PUB, Prediction in Ungauged Basins), and ~0.70 for the ungauged region test (PUR, Prediction in Ungauged Regions). This study advances our understanding of how various model types, each with distinct mechanisms, can be effectively leveraged alongside multi-source datasets across diverse scenarios.

Received: 03 Feb 2025 – Discussion started: 10 Mar 2025

Competing interests: Chaopeng Shen and Kathryn Lawson have financial interests in HydroSapient, Inc., a company that could potentially benefit from the results of this research. This interest has been reviewed by the Pennsylvania State University in accordance with its individual conflict of interest policy for the purpose of maintaining the objectivity and the integrity of research. The other authors have no competing interests to declare.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5660 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (5660 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

01 Dec 2025

Ensembling differentiable process-based and data-driven models with diverse meteorological forcing datasets to advance streamflow simulation

Peijun Li, Yalan Song, Ming Pan, Kathryn Lawson, and Chaopeng Shen

Hydrol. Earth Syst. Sci., 29, 6829–6861, https://doi.org/10.5194/hess-29-6829-2025,https://doi.org/10.5194/hess-29-6829-2025, 2025

Short summary

Peijun Li, Yalan Song, Ming Pan, Kathryn Lawson, and Chaopeng Shen

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-483', Anonymous Referee #1, 05 May 2025

Please find my comments in the attachment.

Citation: https://doi.org/10.5194/egusphere-2025-483-RC1
- AC1: 'Reply on RC1', Peijun Li, 09 Jun 2025
  
  Thank you for your constructive comments. Please see the attached file for our responses
  
  Citation: https://doi.org/10.5194/egusphere-2025-483-AC1
RC2:
'Comment on egusphere-2025-483', Aggrey Muhebwa, 20 Jun 2025
The manuscript presents an innovative ensemble strategy that combines a differentiable process-based model (δHBV) with a data-driven Long Short-Term Memory (LSTM) model, further diversified through the application of multiple meteorological forcing datasets. The approach is evaluated across a wide range of generalization scenarios (temporal extrapolation, PUB, and PUR) using the CAMELS dataset. Although the paper is well-written and the main ideas are clearly communicated, it would benefit from additional details in the methods and a deeper discussion of model complementarity and limitations.
Strength
The proposed ensemble framework is conceptually strong and offers a well-justified combination of complementary data and algorithmic modeling paradigms.

The study is well evaluated across well-defined training protocols and temporal-spatial splits, which improves confidence in its generalizability.

The use of multiple data sources for meteorological forcings addresses input uncertainty better than traditional single-source modeling.

The results, specifically the finding that δHBV improves spatial generalization, have clear implications for prediction in ungauged regions.

Weakness
Interpretability: While the δHBV model’s performance is shown to be beneficial in spatial generalization, the underlying reasons for this complementarity (e.g., structural constraints, parameter smoothness) are not deeply explored. A discussion of how each model contributes to ensemble diversity would strengthen the scientific value of the work.

Robustness and Sensitivity Analysis: The paper lacks an explicit assessment of how ensemble performance responds to errors or biases in the forcing datasets or uncertainty in model parameters. Including even a limited robustness analysis would improve confidence in the ensemble’s reliability. Additionally, the authors should consider running one or two experiments to understand whether changing the size of the lookback window (i.e., the number of historical timesteps) for the LSTMs impacted the overall performance of the ensemble.

Scalability and Practical Deployment: The manuscript does not address the computational or operational feasibility of deploying this ensemble framework in practice, especially over large domains or in real-time forecasting contexts. A short discussion (1-2 sentences) on this topic would add practical relevance.
Citation: https://doi.org/10.5194/egusphere-2025-483-RC2
- AC2: 'Reply on RC2', Peijun Li, 18 Jul 2025
  
  Thank you for your constructive comments. Please see the attached file for our responses
  
  Citation: https://doi.org/10.5194/egusphere-2025-483-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-483', Anonymous Referee #1, 05 May 2025

Please find my comments in the attachment.

Citation: https://doi.org/10.5194/egusphere-2025-483-RC1
- AC1: 'Reply on RC1', Peijun Li, 09 Jun 2025
  
  Thank you for your constructive comments. Please see the attached file for our responses
  
  Citation: https://doi.org/10.5194/egusphere-2025-483-AC1
RC2:
'Comment on egusphere-2025-483', Aggrey Muhebwa, 20 Jun 2025
The manuscript presents an innovative ensemble strategy that combines a differentiable process-based model (δHBV) with a data-driven Long Short-Term Memory (LSTM) model, further diversified through the application of multiple meteorological forcing datasets. The approach is evaluated across a wide range of generalization scenarios (temporal extrapolation, PUB, and PUR) using the CAMELS dataset. Although the paper is well-written and the main ideas are clearly communicated, it would benefit from additional details in the methods and a deeper discussion of model complementarity and limitations.
Strength
The proposed ensemble framework is conceptually strong and offers a well-justified combination of complementary data and algorithmic modeling paradigms.

The study is well evaluated across well-defined training protocols and temporal-spatial splits, which improves confidence in its generalizability.

The use of multiple data sources for meteorological forcings addresses input uncertainty better than traditional single-source modeling.

The results, specifically the finding that δHBV improves spatial generalization, have clear implications for prediction in ungauged regions.

Weakness
Interpretability: While the δHBV model’s performance is shown to be beneficial in spatial generalization, the underlying reasons for this complementarity (e.g., structural constraints, parameter smoothness) are not deeply explored. A discussion of how each model contributes to ensemble diversity would strengthen the scientific value of the work.

Robustness and Sensitivity Analysis: The paper lacks an explicit assessment of how ensemble performance responds to errors or biases in the forcing datasets or uncertainty in model parameters. Including even a limited robustness analysis would improve confidence in the ensemble’s reliability. Additionally, the authors should consider running one or two experiments to understand whether changing the size of the lookback window (i.e., the number of historical timesteps) for the LSTMs impacted the overall performance of the ensemble.

Scalability and Practical Deployment: The manuscript does not address the computational or operational feasibility of deploying this ensemble framework in practice, especially over large domains or in real-time forecasting contexts. A short discussion (1-2 sentences) on this topic would add practical relevance.
Citation: https://doi.org/10.5194/egusphere-2025-483-RC2
- AC2: 'Reply on RC2', Peijun Li, 18 Jul 2025
  
  Thank you for your constructive comments. Please see the attached file for our responses
  
  Citation: https://doi.org/10.5194/egusphere-2025-483-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (13 Aug 2025) by Thomas Kjeldsen

AR by Peijun Li on behalf of the Authors (19 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (31 Aug 2025) by Thomas Kjeldsen

RR by Anonymous Referee #1 (03 Sep 2025)

ED: Publish subject to minor revisions (review by editor) (09 Oct 2025) by Thomas Kjeldsen

AR by Peijun Li on behalf of the Authors (16 Oct 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (24 Oct 2025) by Thomas Kjeldsen

AR by Peijun Li on behalf of the Authors (28 Oct 2025) Manuscript

Journal article(s) based on this preprint

01 Dec 2025

Ensembling differentiable process-based and data-driven models with diverse meteorological forcing datasets to advance streamflow simulation

Peijun Li, Yalan Song, Ming Pan, Kathryn Lawson, and Chaopeng Shen

Hydrol. Earth Syst. Sci., 29, 6829–6861, https://doi.org/10.5194/hess-29-6829-2025,https://doi.org/10.5194/hess-29-6829-2025, 2025

Short summary

Peijun Li, Yalan Song, Ming Pan, Kathryn Lawson, and Chaopeng Shen

Viewed

Total article views: 1,177 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
996	150	31	1,177	32	46

HTML: 996
PDF: 150
XML: 31
Total: 1,177
BibTeX: 32
EndNote: 46

Views and downloads (calculated since 10 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	102	19	2	123
Apr 2025	63	16	3	82
May 2025	59	15	1	75
Jun 2025	68	17	7	92
Jul 2025	66	14	4	84
Aug 2025	123	21	1	145
Sep 2025	434	19	8	461
Oct 2025	44	15	2	61
Nov 2025	37	14	3	54

Cumulative views and downloads (calculated since 10 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	102	19	2	123
Apr 2025	63	16	3	82
May 2025	59	15	1	75
Jun 2025	68	17	7	92
Jul 2025	66	14	4	84
Aug 2025	123	21	1	145
Sep 2025	434	19	8	461
Oct 2025	44	15	2	61
Nov 2025	37	14	3	54

Viewed (geographical distribution)

Total article views: 1,205 (including HTML, PDF, and XML) Thereof 1,205 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 01 Dec 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (5660 KB)
Metadata XML

Short summary

This study explores how combining different model types improves streamflow predictions, especially in data-sparse scenarios. By integrating two highly accurate models with distinct mechanisms and leveraging multiple meteorological datasets, we highlight their unique strengths and set new accuracy benchmarks across spatiotemporal conditions. Our findings enhance the understanding of how diverse models and multi-source data can be effectively used to improve hydrological predictions.


Total:	0
HTML:	0
PDF:	0
XML:	0

Ensembling Differentiable Process-based and Data-driven Models with Diverse Meteorological Forcing Datasets to Advance Streamflow Simulation

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)