the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Probabilistic and Machine Learning Methods for Uncertainty Quantification in Power Outage Prediction due to Extreme Events
Abstract. Strong hurricane winds damage power grids and cause cascading power failures. Statistical and machine learning models have been proposed to predict the extent of power disruptions due to hurricanes. Existing outage models use inputs including power system information, environmental, and demographic parameters. This paper reviews the existing power outage models, highlighting their strengths and limitations. Existing models were developed and validated with data on a few utility companies and regions, limiting the extent of their applicability across geographies and hurricane events. Instead, we train and validate these existing outage models using power outages for multiple regions and hurricanes, including Hurricanes Harvey (2017), Michael (2018), and Isaias (2020), in 1,833 cities along the U.S. coastline. The dataset includes outage data from 39 utility companies in Texas, 5 in Florida, 5 in New Jersey, and 11 in New York. We discuss the limited ability of state-of-the-art machine learning models to (1) make bounded outage predictions, (2) extrapolate predictions to high winds, and (3) account for physics-informed outage uncertainties at low and high winds. For example, we observe that existing models can predict outages as high as 25 times more than the number of customers and cannot capture well the outage variance for wind speeds over 70 m/s. Finally, we present a Beta regression outage modeling framework to address the shortcomings of existing power outage models.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(1480 KB)
-
Supplement
(832 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1480 KB) - Metadata XML
-
Supplement
(832 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2022-975', Anonymous Referee #1, 11 Nov 2022
Review for “Probabilistic and Machine Learning Methods for Uncertainty Quantification in Power Outage Prediction due to Extreme Events”
This is a comprehensive work that compares different machine learning methods. I find the methods and presentations are solid and the authors did nice job in summarizing the substantial works they have finished. However, I do find some critical information is missing. Namely, they need a comparison of performances for all ML models with the separate testing data because it will show whether their models have overfitting issues and how their performances for new data never encountered. The author also needs more illustrations of how their model input data is obtained and possible associated uncertainties. Based on those, I suggest a major revision for this version. Please see detailed suggestions below.Line 34, “including hurricane”, is it wind field or else? Please clarify.
Uncertainty in the poweroutage.us data.
Line 105 to 111, what are the possible uncertainties in interpolate all covariates into the city scale? Which interpolation method is used? Please be specific.
Line 113: number of outages, is it the same as customers without power?
Line 125: Uncertainty in wind speed estimates since the sizes of cities vary.
Line 147: Which rescaling technique is used for this one?
Line 175: Please fix the citation.
Table 2: how to interpret the difference between R2DEV and R2ψ?
Why is random forest used only for the fraction of customers without power? Is the number of power outages not fit the RF algorithm?
Figure 5: what are R2 and other error statistics in the holdout test? It will be helpful to report them in the same figure.
Do you have any prediction vs observation plot for the RF model like Figure 5?
Section 8.2, there is a heavy discussion on how winds control the power outage from the models. However, how precipitation is related to power outages is not shown, as it is the second most important variable in the RF model. You have demonstrated some nonlinear relationships between wind speed and power outage fraction. Therefore, it is worthwhile to show precipitation’s relationship to outrage fraction or show precipitation and wind jointly with outage prediction in a separate pdp plot. That may explain some nonlinear relationships in Figure 7b.
Section 9, the author mentioned beta regression may have better performance. But no comparison is made with the previous method. I suggest shortening the arguments after line 454 because there is no evidence in the paper supporting them.
Line 469 to 470, unlike linear models, RF does not have the assumption of non-collinearity.Citation: https://doi.org/10.5194/egusphere-2022-975-RC1 - AC1: 'Reply on RC1', Prateek Arora, 19 Jan 2023
-
RC2: 'Comment on egusphere-2022-975', Anonymous Referee #2, 15 Dec 2022
This paper investigated the limitations of existing power outage models, including bounded prediction, out-of-distribution prediction, and physics-aware uncertainties The authors found some of the existing state-of-the-art models may generate unrealistic predictions, and cannot generalize well to extreme events that are not sufficiently represented in the training datasets. The authors discuss some potential ways to address the shortcomings of these models. I have some major comments that authors need to address before publication:
1. The problems mentioned by the authors, including limited generalization ability, unbounded predictions, and unreasonable uncertainty variations, are common problem for general machine learning models. Many machine learning community researchers proposed different methods to address these problems. How unique and critical are they for power outage predictions?
2. Now there is a variety of more complex power outage prediction models [1], are there any specific reasons for the authors to choose to evaluate traditional machine learning models? These traditional models are known to be less representative.
3. It is unclear to me why beta regression should perform well in general cases. I think it also has its own problems such as strict distribution assumption and does not address the representativeness issues which eventually cause the poor generalization problem. Could you provide any justifications and performance comparison regarding why Beta regression should be used?
[1]Xie, Jian, Inalvis Alvarez-Fernandez, and Wei Sun. "A review of machine learning applications in power system resilience." In 2020 IEEE Power & Energy Society General Meeting (PESGM), pp. 1-5. IEEE, 2020.
Citation: https://doi.org/10.5194/egusphere-2022-975-RC2 - AC2: 'Reply on RC2', Prateek Arora, 19 Jan 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2022-975', Anonymous Referee #1, 11 Nov 2022
Review for “Probabilistic and Machine Learning Methods for Uncertainty Quantification in Power Outage Prediction due to Extreme Events”
This is a comprehensive work that compares different machine learning methods. I find the methods and presentations are solid and the authors did nice job in summarizing the substantial works they have finished. However, I do find some critical information is missing. Namely, they need a comparison of performances for all ML models with the separate testing data because it will show whether their models have overfitting issues and how their performances for new data never encountered. The author also needs more illustrations of how their model input data is obtained and possible associated uncertainties. Based on those, I suggest a major revision for this version. Please see detailed suggestions below.Line 34, “including hurricane”, is it wind field or else? Please clarify.
Uncertainty in the poweroutage.us data.
Line 105 to 111, what are the possible uncertainties in interpolate all covariates into the city scale? Which interpolation method is used? Please be specific.
Line 113: number of outages, is it the same as customers without power?
Line 125: Uncertainty in wind speed estimates since the sizes of cities vary.
Line 147: Which rescaling technique is used for this one?
Line 175: Please fix the citation.
Table 2: how to interpret the difference between R2DEV and R2ψ?
Why is random forest used only for the fraction of customers without power? Is the number of power outages not fit the RF algorithm?
Figure 5: what are R2 and other error statistics in the holdout test? It will be helpful to report them in the same figure.
Do you have any prediction vs observation plot for the RF model like Figure 5?
Section 8.2, there is a heavy discussion on how winds control the power outage from the models. However, how precipitation is related to power outages is not shown, as it is the second most important variable in the RF model. You have demonstrated some nonlinear relationships between wind speed and power outage fraction. Therefore, it is worthwhile to show precipitation’s relationship to outrage fraction or show precipitation and wind jointly with outage prediction in a separate pdp plot. That may explain some nonlinear relationships in Figure 7b.
Section 9, the author mentioned beta regression may have better performance. But no comparison is made with the previous method. I suggest shortening the arguments after line 454 because there is no evidence in the paper supporting them.
Line 469 to 470, unlike linear models, RF does not have the assumption of non-collinearity.Citation: https://doi.org/10.5194/egusphere-2022-975-RC1 - AC1: 'Reply on RC1', Prateek Arora, 19 Jan 2023
-
RC2: 'Comment on egusphere-2022-975', Anonymous Referee #2, 15 Dec 2022
This paper investigated the limitations of existing power outage models, including bounded prediction, out-of-distribution prediction, and physics-aware uncertainties The authors found some of the existing state-of-the-art models may generate unrealistic predictions, and cannot generalize well to extreme events that are not sufficiently represented in the training datasets. The authors discuss some potential ways to address the shortcomings of these models. I have some major comments that authors need to address before publication:
1. The problems mentioned by the authors, including limited generalization ability, unbounded predictions, and unreasonable uncertainty variations, are common problem for general machine learning models. Many machine learning community researchers proposed different methods to address these problems. How unique and critical are they for power outage predictions?
2. Now there is a variety of more complex power outage prediction models [1], are there any specific reasons for the authors to choose to evaluate traditional machine learning models? These traditional models are known to be less representative.
3. It is unclear to me why beta regression should perform well in general cases. I think it also has its own problems such as strict distribution assumption and does not address the representativeness issues which eventually cause the poor generalization problem. Could you provide any justifications and performance comparison regarding why Beta regression should be used?
[1]Xie, Jian, Inalvis Alvarez-Fernandez, and Wei Sun. "A review of machine learning applications in power system resilience." In 2020 IEEE Power & Energy Society General Meeting (PESGM), pp. 1-5. IEEE, 2020.
Citation: https://doi.org/10.5194/egusphere-2022-975-RC2 - AC2: 'Reply on RC2', Prateek Arora, 19 Jan 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
503 | 270 | 18 | 791 | 47 | 8 | 3 |
- HTML: 503
- PDF: 270
- XML: 18
- Total: 791
- Supplement: 47
- BibTeX: 8
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Luis Ceferino
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1480 KB) - Metadata XML
-
Supplement
(832 KB) - BibTeX
- EndNote
- Final revised paper