Simple Box-Cox probabilistic models for hourly streamflow predictions

Prieto, Cristina; Kavetski, Dmitri; Fenicia, Fabrizio; Kirchner, James; McInerney, David; Thyer, Mark; Álvarez, César

doi:10.5194/egusphere-2026-483

Preprints

https://doi.org/10.5194/egusphere-2026-483

Preprints

04 Feb 2026

| 04 Feb 2026

Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

Simple Box-Cox probabilistic models for hourly streamflow predictions

Cristina Prieto, Dmitri Kavetski, Fabrizio Fenicia, James Kirchner, David McInerney, Mark Thyer, and César Álvarez

Abstract. The increasing availability of hourly scale hydrological data offers valuable benefits for advancing our scientific understanding of catchment processes and improving operational forecasting capabilities. This work contributes to streamflow predictions at the hourly scale by investigating practical methods for uncertainty quantification using probabilistic predictions. We examine common approaches for representing the heteroscedasticity of streamflow errors using the Box-Cox (BC) transformation and common approaches for representing the persistence of streamflow errors using auto-regressive (AR) models. Case studies based on 7 catchments from Spain, Switzerland and USA that cover humid to semi-arid conditions are reported. The results favor Box-Cox transformations with power parameter values of 0–0.5. Notably the log transformation achieves the best statistical reliability of predictions, while its precision and volumetric bias are not statistically significantly worse than for the BC02 and BC05 transformations respectively. The results also tend to favor the AR2 and AR3 models over the AR1 model in representing persistence of errors, with the addition of moving average terms providing little additional benefit. The study findings are broadly consistent with earlier work with daily data, and provide practical guidance for hourly scale studies in predictive uncertainty quantification that is accessible to a wide range of hydrologists. We also report progress towards "seamless" aggregation from hourly to longer scales, which is a capability that is desirable in many practical operational contexts.

Received: 27 Jan 2026 – Discussion started: 04 Feb 2026

Competing interests: At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1695 KB)

Supplement (106 KB)

Download & links

Cristina Prieto, Dmitri Kavetski, Fabrizio Fenicia, James Kirchner, David McInerney, Mark Thyer, and César Álvarez

Status: open (until 11 Apr 2026)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2026-483', Anonymous Referee #1, 09 Mar 2026 reply

The paper “Simple Box-Cox probabilistic models for hourly streamflow Predictions” by Prieto et al. aims at improving hourly streamflow prediction by identifying practical methods for quantifying predictive uncertainty, with a focus on how to represent heteroscedasticity and temporal persistence in streamflow errors.
I find the paper interesting and relevant for the readers of the journal, and I recommend its publication after minor revisions. A general comment is that the authors appear to assume familiarity with several concepts and methodological choices that would benefit from being briefly clarified within the manuscript.
For example, I suggest specifying whether the classification of the catchments (e.g., arid, semi-arid, etc.) is based on indicators such as the P/PET and Q/R ratios, in order to improve clarity for the reader.
Additionally, I recommend removing references to equation numbering from other papers, such as in the sentence: “The volumetric bias metric is also taken from Equation 34 of McInerney et al. (2017).” Instead, the equation should be explicitly reported in the manuscript, and the reference to the original numbering should be removed. This will make the paper more self-contained and easier to follow.
I kindly suggest reorganizing the manuscript so that all methodological aspects are presented within a single section, rather than being distributed across different parts of the paper. In particular, the content related to methodological procedures starting from Section 3.2 should be moved from the Case Study section to the Methodology section to improve clarity and readability.
Additionally, I recommend improving the visual quality and formatting of figures and tables. Figures should use color-blind-friendly palettes to ensure accessibility and clarity for all readers.
Line 278: the reference is missing
line 347: "Table 1 lists the best metric residual error models and the best-median residual" I think you meant "Table 2"
lines 382-383: " The hydrographs in Fig. 2 show that the SLS transformation has tighter predictive limits for high flows (i.e. lower uncertainty) and wider predictive limits for low flows (i.e. higher uncertainty)." this is not clear from figure 2 and the low flows appear to be better predicted that high flow. Can you please elaborate on that?
fig. 4 and following: please note that dots may overlap, so please kindly use different markers, so that the reader can see both or explain where the superposition is.
Line 441: "if it is also is reliable" please remove the second "is"
Line 446 and following: "For volumetric bias, our Figure 1 is slightly different from Figure 2 in McInerney et al. [2017]. McInerney et al. [2017] show that BC0 to BC05 are better, whereas in our study none of the transformations are statistically significantly worse than the others. SLS has the best median performance and in McInerney et al. [2017], SLS has the worst median performance between Log, BC02, BC05 and SLS". Can you please elaborate on the reason behind it?

Table 3, the caption reports that the values are estimated across catchments, can you please explain how catchments are represented in the tables? I wonder if it would be possible to organize the tables in a better way as they are multiple, however just one caption is provided and it is not fully explanatory

Reply

Citation: https://doi.org/10.5194/egusphere-2026-483-RC1
RC2: 'Comment on egusphere-2026-483', Anonymous Referee #2, 12 Mar 2026 reply

The manuscript explores practical approaches for uncertainty quantification within a probabilistic framework for streamflow prediction. It focuses on two widely used strategies, that are (i) representing the heteroscedasticity of streamflow errors through the Box–Cox (BC) transformation, and (ii) capturing the temporal persistence of errors using autoregressive (AR and ARMA) models. The study examines seven catchments located in Spain, Switzerland, and the United States, covering a gradient from humid to semi‑arid climates.
For the analysis, the authors employ a simple conceptual hydrological model that balances descriptive accuracy with model parsimony, and they implement it at an hourly time step. As the manuscript highlights, hourly predictions—and particularly their uncertainty quantification—have been relatively understudied, largely due to the scarcity of high‑quality hourly data and the challenges involved in characterizing the associated prediction errors.
Against this background, the study investigates how different sources of uncertainty influence model predictions, considering various error model configurations that combine different BC transformations with AR structures of different orders. The authors also evaluate how these uncertainties propagate across temporal scales, from hourly to daily and monthly predictions, in a “seamless” prediction framework. Finally, by analyzing catchments from diverse climatic settings, the manuscript aims to identify results that can be generalized across contrasting hydrological regimes.
The topic is highly relevant for the scientific community. The innovative contribution of the manuscript is clear. The paper is generally well structured and clearly written, presenting its results and conclusions in a coherent and accessible manner. While the findings are not groundbreaking, they are—as the authors themselves point out—highly relevant for the practical application of probabilistic prediction frameworks. The manuscript deserves to be published after minor revisions.
There is, indeed, one aspect that would benefit from a more detailed explanation: the seamless modeling framework. I encourage the authors to expand on how the transition from hourly to daily and monthly time scales is handled, as this additional information would greatly aid in interpreting the results. Some additional effort could be devoted to making the paper fully readable on its own, by providing a few more details rather than directing readers to previous publications. This is, however, a very minor point, and I leave it to the authors to decide whether such additions are necessary.
I also have a few minor corrections, listed below.
Line 255. How the value of A was established?
Line 347. Table 2, not 1.
Figure 1. Capital letters in the legends.
Figure 3. The Caption end with “Persistence model: PACF analysis of innovations”, which - I suppose - is the subtitle of the following paragraph (4.3).
Figures 4-6. I suggest to remove comments from the captions that should only describe the figures.
Table 3. Are reliability and precision comparable across different data sets (time series at the hourly, daily and monthly scales) so that we can compare numerical values

Reply

Citation: https://doi.org/10.5194/egusphere-2026-483-RC2

Cristina Prieto, Dmitri Kavetski, Fabrizio Fenicia, James Kirchner, David McInerney, Mark Thyer, and César Álvarez

Supplement

https://doi.org/10.5194/egusphere-2026-483-supplement

Cristina Prieto, Dmitri Kavetski, Fabrizio Fenicia, James Kirchner, David McInerney, Mark Thyer, and César Álvarez

Viewed

Total article views: 322 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
206	102	14	322	42	57	49

HTML: 206
PDF: 102
XML: 14
Total: 322
Supplement: 42
BibTeX: 57
EndNote: 49

Views and downloads (calculated since 04 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	134	63	9	206
Mar 2026	72	39	5	116

Cumulative views and downloads (calculated since 04 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	134	63	9	206
Mar 2026	72	39	5	116

Viewed (geographical distribution)

Total article views: 340 (including HTML, PDF, and XML) Thereof 340 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 18 Mar 2026

Short summary

Hourly streamflow data are increasingly available and can improve streamflow predictions. We tested simple ways to describe uncertainty by transforming flow values and by accounting for how errors persist from hour to hour, using seven catchments in Spain, Switzerland and the United States. Simple transformations and short-term error memory improve the reliability of probabilistic predictions and help combine hourly results into longer time scales for practical operational contexts.


Total:	0
HTML:	0
PDF:	0
XML:	0