the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Ensembling Differentiable Process-based and Data-driven Models with Diverse Meteorological Forcing Datasets to Advance Streamflow Simulation
Abstract. Streamflow simulations via different hydrological models have different features and can provide valuable information after being ensembled. While few studies have focused on ensembling simulations via models with significant structural differences and evaluating them under both temporal and spatial tests. Here we systematically evaluated and utilized the simulations from two highly different models with great performances: a purely data-driven long short-term memory (LSTM) network and a physics-informed machine learning (“differentiable”) HBV (Hydrologiska Byråns Vattenavdelning) model (δHBV). To effectively display the features of the two models, multiple forcing datasets are employed and utilized in two ways. The results show that the simulations of LSTM and δHBV have distinct features and complement each other well, leading to better Nash-Sutcliffe model efficiency coefficients (NSE) and improved high-flow and low-flow metrics across all spatiotemporal tests, compared to within-class ensembles. Ensembling models trained on a single forcing outperformed a single model using fused forcings, challenging the paradigm of feeding all available data into a single data-driven model. Most notably, δHBV significantly enhanced spatial interpolation when incorporated into LSTM, and even more prominent benefits for spatial extrapolation where the LSTM-only ensembles degraded significantly, attesting to the value of the structural constraints in δHBV. These advances set new benchmark records on the well-known CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) hydrological dataset, reaching median NSE values of ~0.83 for the temporal test (densely trained scenario), ~0.79 for the ungauged basin test (PUB, Prediction in Ungauged Basins), and ~0.70 for the ungauged region test (PUR, Prediction in Ungauged Regions). This study advances our understanding of how various model types, each with distinct mechanisms, can be effectively leveraged alongside multi-source datasets across diverse scenarios.
Competing interests: Chaopeng Shen and Kathryn Lawson have financial interests in HydroSapient, Inc., a company that could potentially benefit from the results of this research. This interest has been reviewed by the Pennsylvania State University in accordance with its individual conflict of interest policy for the purpose of maintaining the objectivity and the integrity of research. The other authors have no competing interests to declare.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(5660 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2025-483', Anonymous Referee #1, 05 May 2025
- AC1: 'Reply on RC1', Peijun Li, 09 Jun 2025
-
RC2: 'Comment on egusphere-2025-483', Aggrey Muhebwa, 20 Jun 2025
The manuscript presents an innovative ensemble strategy that combines a differentiable process-based model (δHBV) with a data-driven Long Short-Term Memory (LSTM) model, further diversified through the application of multiple meteorological forcing datasets. The approach is evaluated across a wide range of generalization scenarios (temporal extrapolation, PUB, and PUR) using the CAMELS dataset. Although the paper is well-written and the main ideas are clearly communicated, it would benefit from additional details in the methods and a deeper discussion of model complementarity and limitations.
Strength
- The proposed ensemble framework is conceptually strong and offers a well-justified combination of complementary data and algorithmic modeling paradigms.
- The study is well evaluated across well-defined training protocols and temporal-spatial splits, which improves confidence in its generalizability.
- The use of multiple data sources for meteorological forcings addresses input uncertainty better than traditional single-source modeling.
- The results, specifically the finding that δHBV improves spatial generalization, have clear implications for prediction in ungauged regions.
Weakness
- Interpretability: While the δHBV model’s performance is shown to be beneficial in spatial generalization, the underlying reasons for this complementarity (e.g., structural constraints, parameter smoothness) are not deeply explored. A discussion of how each model contributes to ensemble diversity would strengthen the scientific value of the work.
- Robustness and Sensitivity Analysis: The paper lacks an explicit assessment of how ensemble performance responds to errors or biases in the forcing datasets or uncertainty in model parameters. Including even a limited robustness analysis would improve confidence in the ensemble’s reliability. Additionally, the authors should consider running one or two experiments to understand whether changing the size of the lookback window (i.e., the number of historical timesteps) for the LSTMs impacted the overall performance of the ensemble.
- Scalability and Practical Deployment: The manuscript does not address the computational or operational feasibility of deploying this ensemble framework in practice, especially over large domains or in real-time forecasting contexts. A short discussion (1-2 sentences) on this topic would add practical relevance.
Citation: https://doi.org/10.5194/egusphere-2025-483-RC2 - AC2: 'Reply on RC2', Peijun Li, 18 Jul 2025
Status: closed
-
RC1: 'Comment on egusphere-2025-483', Anonymous Referee #1, 05 May 2025
- AC1: 'Reply on RC1', Peijun Li, 09 Jun 2025
-
RC2: 'Comment on egusphere-2025-483', Aggrey Muhebwa, 20 Jun 2025
The manuscript presents an innovative ensemble strategy that combines a differentiable process-based model (δHBV) with a data-driven Long Short-Term Memory (LSTM) model, further diversified through the application of multiple meteorological forcing datasets. The approach is evaluated across a wide range of generalization scenarios (temporal extrapolation, PUB, and PUR) using the CAMELS dataset. Although the paper is well-written and the main ideas are clearly communicated, it would benefit from additional details in the methods and a deeper discussion of model complementarity and limitations.
Strength
- The proposed ensemble framework is conceptually strong and offers a well-justified combination of complementary data and algorithmic modeling paradigms.
- The study is well evaluated across well-defined training protocols and temporal-spatial splits, which improves confidence in its generalizability.
- The use of multiple data sources for meteorological forcings addresses input uncertainty better than traditional single-source modeling.
- The results, specifically the finding that δHBV improves spatial generalization, have clear implications for prediction in ungauged regions.
Weakness
- Interpretability: While the δHBV model’s performance is shown to be beneficial in spatial generalization, the underlying reasons for this complementarity (e.g., structural constraints, parameter smoothness) are not deeply explored. A discussion of how each model contributes to ensemble diversity would strengthen the scientific value of the work.
- Robustness and Sensitivity Analysis: The paper lacks an explicit assessment of how ensemble performance responds to errors or biases in the forcing datasets or uncertainty in model parameters. Including even a limited robustness analysis would improve confidence in the ensemble’s reliability. Additionally, the authors should consider running one or two experiments to understand whether changing the size of the lookback window (i.e., the number of historical timesteps) for the LSTMs impacted the overall performance of the ensemble.
- Scalability and Practical Deployment: The manuscript does not address the computational or operational feasibility of deploying this ensemble framework in practice, especially over large domains or in real-time forecasting contexts. A short discussion (1-2 sentences) on this topic would add practical relevance.
Citation: https://doi.org/10.5194/egusphere-2025-483-RC2 - AC2: 'Reply on RC2', Peijun Li, 18 Jul 2025
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 974 | 140 | 28 | 1,142 | 30 | 44 |
- HTML: 974
- PDF: 140
- XML: 28
- Total: 1,142
- BibTeX: 30
- EndNote: 44
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Please find my comments in the attachment.