the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
UK Hydrological Outlook using Historic Weather Analogues
Abstract. Skilful seasonal hydrological forecasts are beneficial for water resources planning and disaster risk reduction. The UK Hydrological Outlook (UKHO) provides river flow and groundwater level forecasts at the national scale. Alongside the standard Ensemble Streamflow Prediction (ESP) method, a new Historic Weather Analogues (HWA) method has recently been implemented. The HWA method samples within high resolution historical observations for analogue months that matches the atmospheric circulation patterns forecasted by a dynamical weather forecasting model. In this study, we conduct a hindcast experiment using the GR6J hydrological model to assess where and when the HWA method is skilful across a set of 314 UK catchments for different seasons. We benchmark the skill against the standard ESP and climatology forecasts to understand to what extent the HWA method represents an improvement to existing forecasting methods. Results show the HWA method yields skilful winter river flow forecasts across the UK compared to the standard ESP method where skilful forecasts were only possible in southeast England. Winter river flow forecasts using the HWA method were also more skilful in discriminating high and low flows across all regions. Catchments with the greatest improvement tended to be upland, fast responding catchments with limited catchment storage and where river flow variability is strongly tied with climate variability. Skilful winter river flow predictability was possible due to relatively high forecast skill of atmospheric circulation patterns (e.g. winter NAO) and the ability of the HWA method to derive high resolution meteorological inputs suitable for hydrological modelling. However, skill was not uniform across different seasons. Improvement in river flow forecast skill for other seasons was modest, such as moderate improvements in northern England and northeast Scotland during spring and little change in autumn. Skilful summer flow predictability remains possible only for southeast England and skill scores were mostly reduced compared to the ESP method elsewhere. This study demonstrates that the HWA method can leverage both climate information from dynamical weather forecasting models and the influence of initial hydrological conditions. An incorporation of climate information improved winter river flow predictability nationally, with the advantage of exploring historically unseen weather sequences. The strong influence of initial hydrological conditions contributed to retaining year-round forecast skill of river flows in southeast England. Overall, this study provides justification for when and where the HWA method is more skilful than existing forecasting approaches and confirms the standard ESP method as a “tough to beat” forecasting system that future improvements should be tested against.
- Preprint
(3714 KB) - Metadata XML
-
Supplement
(679 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2369', Anonymous Referee #1, 21 Jul 2025
Comments for UK Hydrological Outlook Using Historic Weather Analogue.
Thank you for this well-written and timely manuscript, which describes a new experiment in the UK national hydrological forecasting system. The method incorporates historical observations from analogue months by matching the large-scale circulation patterns, and use them as forcings to generate hydrological forecasts. The study shows improvements in seasonal forecasts skills and event categorization, particularly in winter (the rainy season). This work is both important and valuable for the hydrological forecast community.
Below are some comments to further discuss the idea with the authors and improve readability:
Line 68, Consider specifying “summer NAO (SNAO)" when first time refer to it.
Line 86, section 1.2, The section mentions four forecast categories, but the introduction states there are "three strands." Please clarify this. And the methods in the first category might be better to conclude as "descriptive forecasts" to distinguish them from ensemble-based approaches that come later.
Line 144, Are analogue months considered independently (i.e., monthly NAO indices)? Have you tested using moving-window averages for NAO to account for variability in selecting analogues?
Line 220, Could you clarify why 17 ensemble members were chosen here? A flow chart illustrating the selection process would be helpful.
Line 235, For analogue season selection, have you plotted rainfall patterns for an example season to assess consistency among analogue months? It would be interesting to see such visualization (e.g., a map or time series).
Line 336, The text here continues analyzing results from Figure 3. But it reads like it is from Figure S1. Just specify it would help.
Line 341, What does “heterogeneity” refer to here, between the areas or between the methods? How are the numbers reflecting heterogeneity, could you explain a bit more.
Line 348, consider adding the catchment numbers together with the ratio, e.g. XX out of YY.
Line 388, Typo: "--0.38" should likely be "-0.38." Is this value statistically significant?
Line 420, Figure 7, This is an excellent visualization. I also noticed that for summer, both high flow events and low flow events had a drop in performance using HWA. Could this reflect challenges in low-flow forecasting? Since later in the discussion the authors mentioned summer is a future target, so maybe already mention it here while discussing the results for summer months.
Line 452, Is it better to show the correct ratio for each station instead of the full distribution? Or if distributions are preferred, please just specify the reasons.
Line 499, In some sections, the authors attribute skills in some areas like the south and east to initial hydrological conditions or river memory. Is this based on prior knowledge of basin characteristics?
Some other thoughts:
Given HWA’s success in winter, would you consider a dynamic framework switch between forecasting methods seasonally (e.g., HWA in winter, other methods in summer)?
And for summer, are there other alternative indices that might outperform NAO for selecting analogues?
Just curious, what is the ratio of autumn/winter rainfall?
Citation: https://doi.org/10.5194/egusphere-2025-2369-RC1 - AC1: 'Reply on RC1', Wilson Chan, 05 Sep 2025
-
RC2: 'Comment on egusphere-2025-2369', Anonymous Referee #2, 26 Jul 2025
Thank you for the opportunity to review this manuscript evaluating the use of Historic Weather Analogues (HWAs) for improved seasonal streamflow prediction across UK catchments. This work builds on over 25 years of research on incorporating climate information into seasonal forecasts (e.g., Hamlet and Lettenmaier, 1999), providing a systematic hindcast evaluation and nation-wide case study of the HWA method that is directly relevant to operational prediction (i.e., the UK Hydrologic Outlook). The authors demonstrate how forecasted sea level pressure anomalies from GloSea6 can be used to select HWAs as inputs to the GR6J hydrology model, leading to improved streamflow prediction over traditional ESP methods in regions more influenced by meteorology than initial hydrologic conditions. While the core methodological progress is incremental, the study provides a rigorous benchmarking of the HWA approach against both climatological and ESP baselines, with results showing meaningful wintertime skill improvements.
Major Comments
- The use of retrospective model simulations (here, termed “simulated observations”) in verification rather than actual streamflow observations is unconventional, and not immediately clear. At the very least, this needs to be better described in the methods section, but it should also be disclosed elsewhere. Additionally, I would request that you strongly consider renaming this variable to a more transparent term, e.g. “retrospective simulation”, retro-sim), so that it is clear that this is not an observational dataset. Please justify this choice (e.g., incomplete obs. dataset, upstream regulations, etc.) and include a discussion of its limitations in the Discussion section.
- I encourage the authors to consider greater use of the active voice throughout the Methods section. At times, it was unclear who was performing certain actions, which made it difficult to follow some of your methods. For example, when discussing the hindcasts from the GloSea6 prediction system, it was not always clear whether the subject was the UK Met Office or the authors themselves (e.g., “In addition, retrospective forecasts (‘hindcasts’) for each meteorological season over the 1993-2016 period are produced, initialised from a subset of dates (1st, 9th and 17th) each month.“ … and, “Hindcasts were made for each conventional season (DJF - winter, MAM - spring, JJA - summer and SON - autumn)”). Clearer attribution using active voice will make it easier to follow/understand the methodology.
- Consider including brief introductory paragraphs at the start of the Results and Discussion sections. Such introductions can outline the key questions addressed, clarify the structure of each section, and provide context for the analyses that follow. It also provides gentler transitions for the reader.
- Please revise the description of the ESP (and HWA) methods to consistently use standard forecasting terminology such as “meteorological traces,” “ensemble members,” “hindcast initialization,” and “lead time” For example, clarify that ESP ensemble members are generated by running the hydrological model with observed meteorological input sequences (traces) from different years in the historical record, conditioned on the current initial hydrologic state. It would also be helpful if the authors explicitly describe the GloSea6 climate hindcast initializations (1st, 9th, 17th) to the hydrological forecast initializations (e.g., does each hydrologic forecast correspond to a climate ensemble member initialized on those dates, or are hydrological forecasts always initialized at the start of the season?)
Minor Comments
Introduction
Line 39 - “potential risk during flood-prone seasons”
Lines 43-47 - This sentence feels incomplete. Please be more explicit about the implications of the dependencies of IHC and seasonal weather predictability on seasonal hydrologic forecasting. Explicitly stating these implications will help set the stage for the rest of the paper.
Line 53 - consider adding commas after each e.g. (e.g., Hulme and Barrow, 1997), here and elsewhere in the manuscript. Also, it may be the case that the e.g. is overused in your citations in Section 1.1
Line 60 - Instead of just the eastern US, this may be more broadly defined as from eastern North America to Scandinavia
Line 64 - consider condense the West et al. (2019) citation to the end of line 66 only, to remove redundancy.
Line 68 - Please define SNAO more explicitly
Line 74 - Broaden topic sentence to hydrologic response variability of all catchments across the UK, then hone in to talk about regional difference, e.g., between the SE and NW
Line 88 - Existing approaches for forecasting what precisely? Weather? Streamflow?
Line 89 - While the term “analogy” is used, the more common terminology in the literature is “analogue” or “analog” forecasts.
Line 95 - Consider citing the foundational LSTM paper on rainfall-runoff modeling (Kratzert et al 2019)
Line 102 - Please consider citing Day (1985), which documents the original US National Weather Service ESP methodology.
Line 104/105 - break this into two sentences, 1) describing the role of IHCs in providing skill for ESP forecasts (perhaps with more emphasis on this point), and 2) the dominant processes influencing IHCs across the UK
Line 129 - Rather than implying that detailed hydrologic modeling is not possible with current NWP output, emphasize the importance of downscaling methods when using NWP as hydrologic model forcing due to discrepancies in spatial resolution
Section 1.2: This section could benefit from a revised organization. I suggest the following structure:
#1: Simple statistical methods
#2: ESP-based methods
#3: Stylised scenario approaches
#4: NWP-forced hydrologic modeling
Additionally, consider the placement of the discussions of LSTMs in this scheme. LSTMs are not traditional statistical methods (as currently categorized, though they are data-driven) and are maybe better thought of as a model type that could be applied within any of the other forecasting approaches. Maybe it would be more appropriate to have a brief discussion of different types of hydrologic models (conceptual, process/physics-oriented, and data-driven including both simple statistical and deep learning methods) all of which can be applied with any of these forecasting methods
Line 131 - “climate information into ESP forecasts, often referred to as conditional ESP”. Or similar.
Line 132 - “sub-sampling meteorological traces” (good to define met trace first)
Line 134 - Please discuss the mechanism of improvements found in W&L (2006) and Beckers et al. (2016)
Line 138 - Small typo: “studies have shown”
Line 139 - Define horizon of “long lead times”
Line 155 - missing em dash after (in-prep)
Section 1.3: The transition from conditioned ESP approaches to the HWA method is logical but could be made clearer. Consider adding a sentence or two to explicitly link the evolution of methods, e.g.:
“While conditioned ESP methods rely on sub-sampling or weighting historical traces based on large-scale climate signals, the HWA approach further advances this concept by identifying specific historical weather patterns that closely match forecasted atmospheric circulation states. This enables forecasts to more directly leverage reliable dynamical model outputs and can provide higher spatial resolution than traditional ESP-based methods.”MethodsLine 168 - I would not capitalize Chalk and Limestone
Line 169/170 - Exactly how many catchments of your study catchments are part of the UK Benchmark Network?
Section 2.1 - Consider breaking this section into two paragraphs, and being more explicit about what was used as a model input vs. simply a descriptive variables/catchment attribute. The current paragraph reads like a dense list of data sources – more context would be helpful.
Line 200 - this should be two sentences.
Line 205 - Citation on the mKGE?
Line 207 - This i.e. seems out of place. Consider removing.
Line 207 - GR6J model results from whom/where? Be more explicit please.
Line 229 - This paragraph (and others in this section, perhaps) could benefit from a clearer problem statement as a topic sentence
Line 230 - What do you define as high spatial resolution? Or at least, what is the average catchment size? This might help us better understand discrepancies between seasonal climate model outputs and needed hydrology model inputs.
Line 238 - not just simulated monthly patterns, but predicted monthly patterns from the hindcasts. It is important to make this clear to help the reader with understanding the key method.
Line 245 - Similarly, this paragraph would benefit from first defining the problem statement, e.g., that the signal to noise of NAO in seasonal climate systems is too small during the winter, and then discussing how you address this challenge.
Section 2.4.1 - Is this a daily timestepped model? Line 274 suggests so, but please state.
Line 272 - It would be helpful to use more conventional terms to describe your ESP approach, as you did in Section 1.2. For example, you could write: “For each month in the hindcast period, three-month lead time seasonal ESP hindcasts were generated using the GR6J model forced with meteorological traces from the historical observation record.” This would make your methods more transparent and easier to follow for readers familiar with ESP-based forecasting.
Section 2.4.2 - Consider breaking into two paragraphs for improved readability
Section 2.5 - Consider breaking into two or more paragraphs for improved readability
Results
Line 229 - At the beginning of a section, it may be helpful to be specific about what types of forecasts (and at what lead times) you are talking about, e.g. “seasonal streamflow forecasts”, not weather forecasts (for example)
Line 330 - Have you defined this positive skill threshold of 0.05 yet?
Line 346 - NI? Not defined.
Line 350 - Use proper name and then define abbreviation
Figure 3 - Consider including key details to allow figures to stand alone more effectively, e.g., that this is across 314 UK study catchments. Same for hindcast period years. Also, add a CRPSS label to the colorbars, and a key for the arrow direction.
Figure 4 - Add colorbar labels. In the caption, consider revising to: “Blue colours indicate the HWA method is better than the ESP benchmark reference, red colours…” Additionally, please clarify whether the symbol direction (triangle up/down) represents the same information as the color (i.e., skill difference). If not, consider using the symbol to convey complementary information – such as the sign of the HWA skill score relative to climatology. For example, an upward triangle could indicate HWA is skillful compared to climatology, while a downward triangle could indicate it is not. This would allow readers to quickly assess not only where HWA outperforms ESP, but also where it is meaningfully skillful in an absolute sense.
Figure 5 - Add a descriptive label on colorbar
Figure 6 - Please consider updating the y-axis label to “DJF Mean Daily Flow” or a similar more descriptive term. Additionally, consider adding a shaded region indicating the 25th–75th percentile range around the ESP and HWA ensemble means. This would provide a sense of ensemble dispersion and improve the interpretability of the forecast spread. Please also update “Obs Sim” to “Retro Sim”.
Figure 7 - Is the colorbar incorrectly labeled (e.g. AUC instead of ROC)?
Line 425 - Develop this logic just a bit further – why are you highlighting winter flow predictability over other seasons?
Figure 9 - The probability bar plots are hard to see and even harder to compare against the observed outlook categories. That said, I do think the case studies are really valuable, so I think it’s worth considering how to improve this figure. What about colored pie charts? Or aggregation of results to regions? I would also define the NAO phase in the “Winter 1994/95” and “Winter 2009/10” subtitles.
Figure 10 - Please clarify the scientific value of including jet speed as an intermediate variable in this figure. Does examining jet speed, in addition to NAO index, provide insight into the added value of HWA versus ESP? Additionally, is there a statistically significant difference in HWA-ESP skill between different NAO phases? Might you be able to explain differences in skill between the two methods using total catchment storage (e.g., Harrigan et al. (2018), line 503)?
Discussion
Line 502 - Please clarify that these conclusions about the role of IHCs are specific to the UK context, as the cited studies (Svensson, 2016; Svensson et al., 2015) are focused on UK catchments. Otherwise, broaden citations.
Line 510 - What was the study domain of the Baker et al. (2018) study?
Section 4.2 - The current text details the comparative skill of HWA and ESP methods, but could be strengthened by more explicitly discussing the implications of the role of IHCs in ESP forecasts. Please elaborate on what your findings suggest about when and where IHCs are most critical for skill, how this influences forecast design and operational use, and what this means for improving seasonal prediction in regions dominated by IHC versus meteorological predictability.
Conclusion
Line 629 - Suggested addition: “... in South East England, where initial hydrologic conditions related to groundwater storage provide seasonal predictability”.
Citation: https://doi.org/10.5194/egusphere-2025-2369-RC2 - AC2: 'Reply on RC2', Wilson Chan, 05 Sep 2025
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
489 | 57 | 13 | 559 | 32 | 12 | 33 |
- HTML: 489
- PDF: 57
- XML: 13
- Total: 559
- Supplement: 32
- BibTeX: 12
- EndNote: 33
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1