the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Warnings based on risk matrices: a coherent framework with consistent evaluation
Abstract. Risk matrices are widely used across a range of fields and have found increasing utility in warning decision practices globally. However, their application in this context presents challenges, which range from potentially perverse warning outcomes to a lack of objective verification (i.e., evaluation) methods. This paper introduces a coherent framework for generating multi-level warnings from risk matrices to address these challenges. The proposed framework is general, is based on probabilistic forecasts of hazard severity or impact and is compatible with the Common Alerting Protocol (CAP). Moreover, it includes a family of consistent scoring functions for objectively evaluating the predictive performance of risk matrix assessments and the warnings they produce. These scoring functions enable the ranking of forecasters or warning systems and the tracking of system improvements by rewarding accurate probabilistic forecasts and compliance with warning service directives. A synthetic experiment demonstrates the efficacy of these scoring functions, while the framework is illustrated through warnings for heavy rainfall based on operational ensemble prediction system forecasts for Tropical Cyclone Jasper (Queensland, Australia, 2023). This work establishes a robust foundation for enhancing the reliability and verifiability of risk-based warning systems.
- Preprint
(803 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2025-323', Samar Momin, 12 Apr 2025
General Comments:
This paper introduces a mathematically rigorous framework for issuing and evaluating multi-level warnings derived from risk matrices. It addresses critical weaknesses in current risk matrix-based warning systems, such as inconsistency, lack of objectivity, and absence of formal verification mechanisms. The framework is probabilistic, hazard-agnostic, and compatible with the Common Alerting Protocol (CAP), making it widely applicable in disaster risk management.
The manuscript is technically strong, well-written, and well-structured. It clearly explains the conceptual foundation and mathematical formulation, with practical examples and synthetic experiments demonstrating real-world and theoretical robustness, and provides an open-source Python-based code.
Strengths:
1. Innovation and Relevance:
The paper presents a coherent warning framework that resolves known inconsistencies in traditional risk matrices. The risk matrix score and warning score are introduced as consistent, theoretically grounded methods for evaluation.
2. Operational Usability:
The framework is flexible and compatible with real-time systems (e.g., CAP-based alerting), and can be applied across hazards and domains.
3. Synthetic Experiment and Case Study:
The use of six distinct synthetic forecasters in a probabilistic setup illustrates the scoring method’s discriminative power. The Tropical Cyclone Jasper case study shows practical feasibility in a high-impact, real-world scenario.
4. Clarity and Depth:
The manuscript does an excellent job explaining the logic behind severity-certainty structuring, lead-time sensitivity, and score weighting using realistic examples.
5. Open-Source Tooling:
Providing a Python implementation in the scores package adds major value and supports reproducibility.
Specific Comments:
1. Terminology and Framing:
While the mathematical rigor is a strength, early sections could benefit from briefly reinforcing why these inconsistencies in risk matrices matter for public safety and policy credibility. Consider simplifying the initial explanation of “forecast directive” and “warning directive” for non-technical readers.
2. Comparison with Existing Systems:
The distinction from the UK Met Office (UKMO) and other operational frameworks is clear, but it might help to include a side-by-side visual comparison in an appendix or supplementary material (if possible).
3. Evaluation Weights:
The method for deriving weights from stakeholder input (e.g., community consultation on false alarm vs. miss costs) is strong. However, a brief reflection on the subjectivity and variability in such consultations would add depth.
4. Scalability to Multi-Hazard Systems:
Although the framework is hazard-agnostic, a discussion on how it could scale or adapt to multi-hazard interactions (e.g., flood + wind) would strengthen its applicability. That being said, it would be helpful to shed light on this framework toward earthquake hazards as they are growing in frequency (if possible).
5. Lead Time Scaling:
The use of distinct matrices for LONG-, MID-, and SHORT-range phases is excellent. It would be helpful to mention how this could be dynamically updated as new ensemble data arrives.
Citation: https://doi.org/10.5194/egusphere-2025-323-RC1 -
AC1: 'Reply on RC1', Robert Taggart, 06 May 2025
Thank your for taking the time to review the manuscript and give critical feedback.
Below, we reproduce your comments/suggestion in bold font, followed by our response in non-bold font. Italicized text indicates proposed additional material that will be inserted into the revised manuscript.
While the mathematical rigor is a strength, early sections could benefit from briefly reinforcing why these inconsistencies in risk matrices matter for public safety and policy credibility. Consider simplifying the initial explanation of “forecast directive” and “warning directive” for non-technical readers.
Thanks for this suggestion. We will add a couple of extra sentences in the manuscript to help orientate the reader with the "forecast directive" terminology. When the term "forecast directive" is forecast introduced (L25), we will give the following simple example:
For example, a forecast directive for a warning service for damaging wind gusts might be "Issue a warning if and only if the probability of a wind gust exceeding 90 km/h is at least 10%".
We will also elaborate why directives are important by inserting an additional sentence after the existing sentence starting at L43:
When warning decision process lacks adequate definition, two forecasters with identical probabilistic assessments of the hazard could issue two different warning levels. This may lead to warning messages that fluctuate unnecessarily, compromising both public safety and service credibility.
The distinction from the UK Met Office (UKMO) and other operational frameworks is clear, but it might help to include a side-by-side visual comparison in an appendix or supplementary material (if possible).
We think implementing this suggestion will be helpful for the reader. We have prepared a side-by-side visual comparison which fits naturally in Section 2.2 as a new figure. The text of Section 2.2 will also be updated to reference this visual comparison.
The method for deriving weights from stakeholder input (e.g., community consultation on false alarm vs. miss costs) is strong. However, a brief reflection on the subjectivity and variability in such consultations would add depth.
We believe that such discussion on this is beyond the scope of the current work. However, we will include a brief sentence at the end of Section 3.1 to note that the development of our framework motivates further research:
Although the process for determining weights in this fictitious flood example was presented straightforwardly, this framework motivates further research into developing best practices for eliciting thresholds and weights through stakeholder consultation.
Although the framework is hazard-agnostic, a discussion on how it could scale or adapt to multi-hazard interactions (e.g., flood + wind) would strengthen its applicability. That being said, it would be helpful to shed light on this framework toward earthquake hazards as they are growing in frequency (if possible).
We will include the following comment near the end of Section 2.1 on the applicability of the framework to a generic index, which may account for multi-hazard interactions:
More generally, the framework could be applied to an index, which itself represents complex multi-hazard interactions. An example of such an index is the Fire Behaviour Index (FBI) used in the Australian Fire Danger Ratings System (AFDRS), which combines weather and fuel state information to determine the severity of fire behaviour.
Although the framework is applicable to earthquake hazards, we believe it is not appropriate to discuss this in detail, as earthquakes lie outside the authors' area of expertise.
The use of distinct matrices for LONG-, MID-, and SHORT-range phases is excellent. It would be helpful to mention how this could be dynamically updated as new ensemble data arrives.
How the arrival of new ensemble data impacts the warning issue process will depend on the way each warning service is designed. Going into such details is beyond the scope of this manuscript but could be explored using concrete warning service examples in a follow-up paper. Nonetheless, we note here that there are at least two factors at play. One is where the lead time phases are a function of the onset to severe phenomena, and new ensemble data shifts the time of onset sufficiently to change the phase. The other is where new ensemble data leads to a re-evaluation of the likelihood and/or severity of the phenomena, which may prompt an update of the warning based on pre-defined amendment criteria for the warning service.
Citation: https://doi.org/10.5194/egusphere-2025-323-AC1
-
AC1: 'Reply on RC1', Robert Taggart, 06 May 2025
-
RC2: 'Comment on egusphere-2025-323', Anonymous Referee #2, 18 Apr 2025
This paper proposes a probabilistic framework for multi-level warnings based on risk matrices and illustrates an example for Tropical Cyclone Jasper. The paper is well-organized, and well-written. Also, it publishes open-source codes and all the mathematical algorithms in the appendix, make the paper clear and concise.
Citation: https://doi.org/10.5194/egusphere-2025-323-RC2 -
AC2: 'Reply on RC2', Robert Taggart, 06 May 2025
Thank you for reading the manuscript and your positive review
Citation: https://doi.org/10.5194/egusphere-2025-323-AC2
-
AC2: 'Reply on RC2', Robert Taggart, 06 May 2025
-
RC3: 'Comment on egusphere-2025-323', Anonymous Referee #3, 20 Apr 2025
This is an excellent paper, which outlines an innovative method for presentation and evaluation of warning predictions. It is strongly based on theoretical concepts but also provides a methodology that is intuitive. I highly recommend publication in EGUsphere.
Citation: https://doi.org/10.5194/egusphere-2025-323-RC3 -
AC3: 'Reply on RC3', Robert Taggart, 06 May 2025
Thank you for reading the manuscript and your positive review
Citation: https://doi.org/10.5194/egusphere-2025-323-AC3
-
AC3: 'Reply on RC3', Robert Taggart, 06 May 2025
Status: closed
-
RC1: 'Comment on egusphere-2025-323', Samar Momin, 12 Apr 2025
General Comments:
This paper introduces a mathematically rigorous framework for issuing and evaluating multi-level warnings derived from risk matrices. It addresses critical weaknesses in current risk matrix-based warning systems, such as inconsistency, lack of objectivity, and absence of formal verification mechanisms. The framework is probabilistic, hazard-agnostic, and compatible with the Common Alerting Protocol (CAP), making it widely applicable in disaster risk management.
The manuscript is technically strong, well-written, and well-structured. It clearly explains the conceptual foundation and mathematical formulation, with practical examples and synthetic experiments demonstrating real-world and theoretical robustness, and provides an open-source Python-based code.
Strengths:
1. Innovation and Relevance:
The paper presents a coherent warning framework that resolves known inconsistencies in traditional risk matrices. The risk matrix score and warning score are introduced as consistent, theoretically grounded methods for evaluation.
2. Operational Usability:
The framework is flexible and compatible with real-time systems (e.g., CAP-based alerting), and can be applied across hazards and domains.
3. Synthetic Experiment and Case Study:
The use of six distinct synthetic forecasters in a probabilistic setup illustrates the scoring method’s discriminative power. The Tropical Cyclone Jasper case study shows practical feasibility in a high-impact, real-world scenario.
4. Clarity and Depth:
The manuscript does an excellent job explaining the logic behind severity-certainty structuring, lead-time sensitivity, and score weighting using realistic examples.
5. Open-Source Tooling:
Providing a Python implementation in the scores package adds major value and supports reproducibility.
Specific Comments:
1. Terminology and Framing:
While the mathematical rigor is a strength, early sections could benefit from briefly reinforcing why these inconsistencies in risk matrices matter for public safety and policy credibility. Consider simplifying the initial explanation of “forecast directive” and “warning directive” for non-technical readers.
2. Comparison with Existing Systems:
The distinction from the UK Met Office (UKMO) and other operational frameworks is clear, but it might help to include a side-by-side visual comparison in an appendix or supplementary material (if possible).
3. Evaluation Weights:
The method for deriving weights from stakeholder input (e.g., community consultation on false alarm vs. miss costs) is strong. However, a brief reflection on the subjectivity and variability in such consultations would add depth.
4. Scalability to Multi-Hazard Systems:
Although the framework is hazard-agnostic, a discussion on how it could scale or adapt to multi-hazard interactions (e.g., flood + wind) would strengthen its applicability. That being said, it would be helpful to shed light on this framework toward earthquake hazards as they are growing in frequency (if possible).
5. Lead Time Scaling:
The use of distinct matrices for LONG-, MID-, and SHORT-range phases is excellent. It would be helpful to mention how this could be dynamically updated as new ensemble data arrives.
Citation: https://doi.org/10.5194/egusphere-2025-323-RC1 -
AC1: 'Reply on RC1', Robert Taggart, 06 May 2025
Thank your for taking the time to review the manuscript and give critical feedback.
Below, we reproduce your comments/suggestion in bold font, followed by our response in non-bold font. Italicized text indicates proposed additional material that will be inserted into the revised manuscript.
While the mathematical rigor is a strength, early sections could benefit from briefly reinforcing why these inconsistencies in risk matrices matter for public safety and policy credibility. Consider simplifying the initial explanation of “forecast directive” and “warning directive” for non-technical readers.
Thanks for this suggestion. We will add a couple of extra sentences in the manuscript to help orientate the reader with the "forecast directive" terminology. When the term "forecast directive" is forecast introduced (L25), we will give the following simple example:
For example, a forecast directive for a warning service for damaging wind gusts might be "Issue a warning if and only if the probability of a wind gust exceeding 90 km/h is at least 10%".
We will also elaborate why directives are important by inserting an additional sentence after the existing sentence starting at L43:
When warning decision process lacks adequate definition, two forecasters with identical probabilistic assessments of the hazard could issue two different warning levels. This may lead to warning messages that fluctuate unnecessarily, compromising both public safety and service credibility.
The distinction from the UK Met Office (UKMO) and other operational frameworks is clear, but it might help to include a side-by-side visual comparison in an appendix or supplementary material (if possible).
We think implementing this suggestion will be helpful for the reader. We have prepared a side-by-side visual comparison which fits naturally in Section 2.2 as a new figure. The text of Section 2.2 will also be updated to reference this visual comparison.
The method for deriving weights from stakeholder input (e.g., community consultation on false alarm vs. miss costs) is strong. However, a brief reflection on the subjectivity and variability in such consultations would add depth.
We believe that such discussion on this is beyond the scope of the current work. However, we will include a brief sentence at the end of Section 3.1 to note that the development of our framework motivates further research:
Although the process for determining weights in this fictitious flood example was presented straightforwardly, this framework motivates further research into developing best practices for eliciting thresholds and weights through stakeholder consultation.
Although the framework is hazard-agnostic, a discussion on how it could scale or adapt to multi-hazard interactions (e.g., flood + wind) would strengthen its applicability. That being said, it would be helpful to shed light on this framework toward earthquake hazards as they are growing in frequency (if possible).
We will include the following comment near the end of Section 2.1 on the applicability of the framework to a generic index, which may account for multi-hazard interactions:
More generally, the framework could be applied to an index, which itself represents complex multi-hazard interactions. An example of such an index is the Fire Behaviour Index (FBI) used in the Australian Fire Danger Ratings System (AFDRS), which combines weather and fuel state information to determine the severity of fire behaviour.
Although the framework is applicable to earthquake hazards, we believe it is not appropriate to discuss this in detail, as earthquakes lie outside the authors' area of expertise.
The use of distinct matrices for LONG-, MID-, and SHORT-range phases is excellent. It would be helpful to mention how this could be dynamically updated as new ensemble data arrives.
How the arrival of new ensemble data impacts the warning issue process will depend on the way each warning service is designed. Going into such details is beyond the scope of this manuscript but could be explored using concrete warning service examples in a follow-up paper. Nonetheless, we note here that there are at least two factors at play. One is where the lead time phases are a function of the onset to severe phenomena, and new ensemble data shifts the time of onset sufficiently to change the phase. The other is where new ensemble data leads to a re-evaluation of the likelihood and/or severity of the phenomena, which may prompt an update of the warning based on pre-defined amendment criteria for the warning service.
Citation: https://doi.org/10.5194/egusphere-2025-323-AC1
-
AC1: 'Reply on RC1', Robert Taggart, 06 May 2025
-
RC2: 'Comment on egusphere-2025-323', Anonymous Referee #2, 18 Apr 2025
This paper proposes a probabilistic framework for multi-level warnings based on risk matrices and illustrates an example for Tropical Cyclone Jasper. The paper is well-organized, and well-written. Also, it publishes open-source codes and all the mathematical algorithms in the appendix, make the paper clear and concise.
Citation: https://doi.org/10.5194/egusphere-2025-323-RC2 -
AC2: 'Reply on RC2', Robert Taggart, 06 May 2025
Thank you for reading the manuscript and your positive review
Citation: https://doi.org/10.5194/egusphere-2025-323-AC2
-
AC2: 'Reply on RC2', Robert Taggart, 06 May 2025
-
RC3: 'Comment on egusphere-2025-323', Anonymous Referee #3, 20 Apr 2025
This is an excellent paper, which outlines an innovative method for presentation and evaluation of warning predictions. It is strongly based on theoretical concepts but also provides a methodology that is intuitive. I highly recommend publication in EGUsphere.
Citation: https://doi.org/10.5194/egusphere-2025-323-RC3 -
AC3: 'Reply on RC3', Robert Taggart, 06 May 2025
Thank you for reading the manuscript and your positive review
Citation: https://doi.org/10.5194/egusphere-2025-323-AC3
-
AC3: 'Reply on RC3', Robert Taggart, 06 May 2025
Data sets
Data and code for risk matrix score paper Robert J. Taggart http://doi.org/10.5281/zenodo.14668723
Model code and software
Data and code for risk matrix score paper Robert J. Taggart http://doi.org/10.5281/zenodo.14668723
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
185 | 64 | 15 | 264 | 12 | 21 |
- HTML: 185
- PDF: 64
- XML: 15
- Total: 264
- BibTeX: 12
- EndNote: 21
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1