Hydrological Auditing of LISFLOOD v4.1.1: Impacts of Model Setup on Water Balance Components in the Po River Basin

Moschini, Francesca; Ficchì, Andrea; Pistocchi, Alberto

doi:10.5194/egusphere-2026-423

Preprints

https://doi.org/10.5194/egusphere-2026-423

Preprints

12 Feb 2026

| 12 Feb 2026

Hydrological Auditing of LISFLOOD v4.1.1: Impacts of Model Setup on Water Balance Components in the Po River Basin

Francesca Moschini, Andrea Ficchì, and Alberto Pistocchi

Abstract. In recent years, large-scale hydrological models have been increasingly used at regional and global scales to support decision making. Their realism in simulating water balance components is crucial for building trust across different use cases. Hydrological models may reproduce streamflow well but misrepresent other fluxes, due to internal fluxes compensations and equifinality. Therefore, alternative setups can benefit specific applications by improving the representation of relevant water balance components. "Hydrological auditing" of models, i.e. a thorough critical review of their realism beyond the calibration targets (usually streamflow), provides useful insights for both practical applications and process understanding. We present one such exercise in a representative European case study using a physically-based hydrological model (LISFLOOD), widely used for flood forecasting and water resources management. We evaluate LISFLOOD v4.1.1's performance in simulating streamflow, evapotranspiration, and overall water balance in the Po River Basin, a complex and highly managed basin in Northern Italy. Six alternative model setups are tested, including different soil layers depths and preferential flow representations. Results show that the model setup currently used in the European Flood Awareness System (EFAS) v.5 performs best in terms of streamflow simulation, particularly at the daily time step, but tends to underestimate evapotranspiration. In turn, this may lead to an overestimation of groundwater recharge and a poor water balance representation. The use of the Budyko framework as a diagnostic tool reveals that model setups without preferential flow better match the expected long-term water balance, but reduce daily streamflow performance. The study highlights the importance of evaluating model performance and auditing alternative parametrizations to ensure accurate simulations of water balance components, crucial for water resources management. We propose criteria to improve the calibration of the LISFLOOD model in a flexible and target-driven way, to better support water resources management in complex river basins.

Received: 26 Jan 2026 – Discussion started: 12 Feb 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Francesca Moschini, Andrea Ficchì, and Alberto Pistocchi

Status: final response (author comments only)

RC1:
'Comment on egusphere-2026-423', Anneli Guthke, 31 Mar 2026
Summary:
This study assesses a ubiquitous but often overlooked problem: hydrological models are built for a specific purpose (e.g., flood forecasting), and at some point “misused” for other tasks (e.g., drought prediction, water resources management), often without specific re-training and re-evaluation. Since any model is just a coarse abstraction of reality and suffers from model structural errors, compensation for model error happens within the allowed parameter ranges, and unphysical behavior can emerge across compartments, processes and variables. So if trained for streamflow only, a hydrological model might perform poorly on other components of the water balance, and this is the target of the presented analysis in this manuscript. The authors investigate different model setups of a specific distributed model, LISFLOOD, on the Po River Basin, with respect to streamflow prediction performance, but also through diagnostic evaluation of other fluxes.
Overall evaluation:
The authors reveal interesting contradictions between performance, parameter estimation and water balance closure when training different versions of LISFLOOD. The manuscript is very well structured and a pleasure to read. While the conclusions of the study are supported by the findings, unfortunately, the manuscript left me somewhat “uninspired” – I had hoped for more insights. Yet, the findings are worthwhile reporting and the analysis itself is nicely done, so my recommendation is to still consider this manuscript for publication, albeit not the most forward-directed paper.
Specific comments:
Abstract: “Their realism in simulating water balance components is crucial for building trust across different use cases.” Thank you for this statement, this is so true but often overlooked. Glad to see this issue addressed explicitly in this study.

l. 4/5: “Therefore, alternative setups can benefit specific applications by improving the representation of relevant water balance components.” I know what you mean, but I feel this sentence is too dense. Please invest one or two sentences more to explain alternative setups and how they could eventually lead to improved representation of water balance.

l. 17: “… in a flexible and target-driven way,…” This might almost be a philosophical question, but doesn’t that approach contradict your motivation to “build trust across different use cases”? I’ve long been debating with myself and others whether models should be built and tested goal-oriented (this was the word I used in the past, see Guthke (2017)) or open-purpose. Intuitively, a model should do the right things for the right reasons and combine internal realism with best performance, but empirically, we observe that this is not the case, and hence the (also justified) advice to optimize and evaluate the model on those aspects it will be applied for. In that sense, is there any problem with a flood forecasting system that predicts stream discharge really well, but struggles with evapotranspiration? Who cares (to play devil’s advocate here)? I’m curious to read more about the authors’ perspective on this dilemma in the manuscript. – Ok, from reading the introduction I understand that LISFLOOD is a specific case where a model is used beyond its intended purpose; this piece of information would be helpful to integrate into the abstract. To tell the story that (at latest) when a model is asked to perform other tasks (especially: water resources management) than it was trained on, it needs to be reevaluated and maybe retrained with a focus on a realistic water balance.

l. 74: It would be great to round off the introduction (and the manuscript) with a claim to develop a diagnostic methodology that applies to distributed models in general, if possible; in the current form, the analysis relies heavily on the architecture of LISFLOOD. While such a demo case is certainly interesting, the impact of the study could be increased if the authors invested some effort into more general considerations.

l. 379: “All the experiments including the ByPass do not show any significant difference in the frequency distribution of the calibrated parameters as identified by the Kolmogorov-Smirnov test…” I find this somewhat surprising, because, yes, the distribution seems more stable than without ByPass, but even with, I (visually) identify distinct pattern shifts. For example, the benchmark model seems to favor other values than the alternative models with ByPass concerning GWPercValue, LZThreshold, bInfilt, SnowMeltCoef.

Discussion & conclusions: while the conclusions are supported by the findings of the study and the proposed ways forward (constraining certain parameters to shield against “abuse” during parameter calibration) are interesting, the study left me a bit “uninspired” – I would have hoped for a more comprehensive comparative analysis including more diverse model representations, and more insights into how to unify these apparently contradicting model uses.

(You might have noticed that usually I provide many more comments – this is a sign that the manuscript is very nicely written, structured in a clear manner and the analysis is well documented. Congrats!)

Technical comments:
“Auditing” – this term is only used in the title and abstract. Make consistent use of it also in the body of the manuscript, or consider replacing these terms.

l. 116: Please define GPD.

Eqs. 3 and 4: I guess the left-hand side of Eq. 3 should be named “Infiltr” to be consistent with Eq. 4.

l. 312: Remove one instance of “does not simulate”

References:
Guthke, A. (2017). Defensible Model Complexity: A Call for Data-Based and Goal-Oriented Model Choice. Ground Water, 55(5).
Citation: https://doi.org/10.5194/egusphere-2026-423-RC1
- AC1: 'Reply on RC1', Francesca Moschini, 21 May 2026
  
  We thank the Editor and the Dr. Anneli Guthke for their positive assessment of our work and for the constructive comments, which will help us improve the manuscript. In the attached document we provide a point-by-point response to the reviewer comments. Reviewer comments are reported in black, while our responses are shown in blue.
  
  We hope that the revised manuscript and responses satisfactorily address all comments and that the manuscript can now be considered for publication in GMD.
  
  Yours sincerely,
  
  Francesca Moschini, on behalf of all coauthors
  
  Citation: https://doi.org/10.5194/egusphere-2026-423-AC1
RC2:
'Comment on egusphere-2026-423', Anonymous Referee #2, 19 Apr 2026

The authors evaluated LISFLOOD's performance in simulating streamflow, evapotranspiration, and overall water balance in the Po River Basin using six model setups. With the use of the Budyko framework, the authors demonstrated that the current model setup in EFAS performs best in streamflow simulation, but tends to underestimate ET and have a relatively poor water balance representation compared to other setups. The findings in this paper are crucial for future LISFLOOD configuration for different purposes and introduce an interesting and effective Budyko-based diagnostic framework. I recommend a minor revision with the following comments.

Major comments:

1 In the authors' different model setups, could you clarify why the maximum soil depth is set to be 3m? Can any data support this?

2 The authors only used the KGE of streamflow as the objective function. I would suggest the authors use ET or Budyko as an additional constraint to see whether it can help make the prediction accurate on both streamflow, ET, and water closure.

3 The authors evaluated the Budyko distance. The deviation from the Budyko equation is attributed to the model configuration. Is there any missed representation of the model in terms of anthropogenic activities, urbanization, or snow processes that can cause this deviation?

4 The authors compared the model-simulated Budyko relationship with the theoretical Budyko curve. I was wondering if, since there are PET and AET datasets available, the author could compare against the observed Budyko relationship?
Minor comments:

1 Does GLOFAS have the same model setup as EFAS? If not, which setup in the six evaluated models is the one used by GLOFAS?

2 Both left and right-hand sides of Eq. 2 have AWI; at least one of them is a typo.

3 The authors have a couple of typos on "Xinanjiang" as "Xinjang" or other words. Please correct these.

4 L420: "the energy limit (green line)": The energy limit is the black line. Please correct this.

5 313 has a typo:" does not simulate does not simulate."

Citation: https://doi.org/10.5194/egusphere-2026-423-RC2
- AC2: 'Reply on RC2', Francesca Moschini, 21 May 2026
  
  We thank the Editor and the reviewer for their assessment of our work and for the constructive comments, which will help us improve the manuscript. In the attached document we provide a point-by-point response to the reviewer comments. Reviewer comments are reported in black, while our responses are shown in blue.
  
  We hope that the revised manuscript and responses satisfactorily address all comments and that the manuscript can now be considered for publication in GMD.
  
  Yours sincerely,
  
  Francesca Moschini, on behalf of all coauthors
  
  Citation: https://doi.org/10.5194/egusphere-2026-423-AC2

Francesca Moschini, Andrea Ficchì, and Alberto Pistocchi

Viewed

Total article views: 1,670 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,002	604	64	1,670	166	269

HTML: 1,002
PDF: 604
XML: 64
Total: 1,670
BibTeX: 166
EndNote: 269

Views and downloads (calculated since 12 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	418	188	39	645
Mar 2026	392	254	15	661
Apr 2026	111	110	5	226
May 2026	81	52	5	138
Jun 2026	0

Cumulative views and downloads (calculated since 12 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	418	188	39	645
Mar 2026	392	254	15	661
Apr 2026	111	110	5	226
May 2026	81	52	5	138
Jun 2026	0

Viewed (geographical distribution)

Total article views: 1,732 (including HTML, PDF, and XML) Thereof 1,732 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 02 Jun 2026

Short summary

We evaluated how different configurations of a large-scale river basin model affect simulations of streamflow, evaporation, soil moisture, and groundwater in the Po River Basin in Italy. We tested alternative soil depths and the inclusion or removal of subsurface flow pathways, and compared results with observations and with an established long-term water balance relationship. Setups that best matched river flow often underestimated evaporation and overestimated deep groundwater recharge.


Total:	0
HTML:	0
PDF:	0
XML:	0