the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A robust error correction method for numerical weather prediction wind speed based on Bayesian optimization, Variational Mode Decomposition, Principal Component Analysis, and Random Forest: VMD-PCA-RF (version 1.0.0)
Abstract. Accurate wind speed prediction is crucial for the safe utilization of wind resources. However, current single-value deterministic numerical weather prediction methods employed by wind farms do not adequately meet the actual needs of power grid dispatching. In this study, we propose a new hybrid forecasting method for correcting 10-meter wind speed predictions made by the Weather Research and Forecasting (WRF) model. Our approach incorporates Variational Mode Decomposition (VMD), Principal Component Analysis (PCA), and five artificial intelligence algorithms: Deep Belief Network (DBN), Multilayer Perceptron (MLP), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), light Gradient Boosting Machine (lightGBM), and the Bayesian Optimization Algorithm (BOA). We first construct WRF-predicted wind speeds using the Global Prediction System (GFS) model output based on prediction results. We then perform two sets of experiments with different input factors and apply BOA optimization to debug the four artificial intelligence models, ultimately building the final models. Furthermore, we compare the forementioned five optimal artificial intelligence models suitable for five provinces in southern China in the wintertime: VMD-PCA-RF in December 2021 and VMD-PCA-lightGBM in January 2022. We find that the VMD-PCA-RF evaluation indexes exhibit relative stability over nearly a year: correlation coefficient (R) is above 0.6, accuracy rate (FA) is above 85 %, mean absolute error (MAE) is below 0.6 m/s, root mean square error (RMSE) is below 0.8 m/s, relative mean absolute error (rMAE) is below 60 %, and relative root mean square error (rRMSE) is below 75 %. Thus, for its promising performance and excellent year-round robustness, we recommend adopting the proposed VMD-PCA-RF method for improved wind speed prediction in models.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(13122 KB)
-
Supplement
(19065 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(13122 KB) - Metadata XML
-
Supplement
(19065 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-945', Anonymous Referee #1, 07 Jul 2023
In this paper, the authors present interesting methods for wind speed corrections from the NWP model with multi-step methods. Below are a few minor suggestions for revision:
1. The main issue that I see in this paper is the short period for training and testing of the model, and the authors claim from this that the model is robust. Similar studies for wind speed correction from NWP models usually use several years for training and at least one year for testing. As I understood, this paper is trained only on data from February 2022, and the main conclusions are based on testing in December 2021 and January 2022, with some additional verification of stability over 10 months.
2. order of figures in the text: Fig. 1, Fig. 2, Fig. 3, Fig. 6, Fig. 4, Fig. 5, ... Fig. 11, Fig. 14, Fig. 12.
3. Sometimes authors refer to figures in the text as "Fig. NN" in other cases as "Figure NN", and even once as "figure NN". According to Journal rules, I think it should always be "Fig. NN." Fig. 6 and 9 are unreadable.
4. On lines 56–57, the authors state that "Currently,..." and cite a publication from 1999, but there are more recent publications for the HIRLAM model or consortium.
5. The authors claim in line 520 that "In general, VMD-PCA-RF is the best wind speed correction model for winter and even throughout the entire year in the five southern provinces," while on Fig. 14 for 2022-01, VMD-PCA-lightGBM is better.
6. There should be more clarification about observational data. In line 132, the authors wrote "For the purposes of this paper, the 10-meter wind speed data is interpolated across 410 sites". Are those 410 sites the weather stations? Why did the authors use interpolation from this database instead of observations from stations?
Citation: https://doi.org/10.5194/egusphere-2023-945-RC1 -
AC1: 'Reply on RC1', shaohui zhou, 10 Jul 2023
Dear Reviewer,
We are grateful for your constructive comments and suggestions. We have carefully revised this manuscript and provided the following point-to-point responses. Please see the attached document.
We look forward to continuing this exchange and addressing any further questions or remarks prompted by the interactive discussion.With thanks and best wishes,
Shaohui Zhou (on behalf of the co-authors)
-
AC1: 'Reply on RC1', shaohui zhou, 10 Jul 2023
-
RC2: 'Comment on egusphere-2023-945', Anonymous Referee #2, 17 Jul 2023
Zhou et al. present a series of machine learning models including VMD-PCA-RF, a combination of Variational Mode Decomposition, Principal Component Analysis, and Random Forest, for correction of errors in WRF-predicted wind speeds. The manuscript presents various machine learning algorithms and uses two sets of experiments with different approaches to arrive at the model with better predictive capabilities. Accurate prediction of wind speed is important for the wind energy market for effective harvesting of wind energy, and this manuscript has potential in improving predictive capabilities for such uses. As a modeling paper it is fit for the scope of GMD, but the manuscript as presented has major shortcomings primarily in its presentation that require major revisions before further considation.
Major comments:
- Many of the figures in the text are unclear both in presentation and in purpose. Generally, the use of figures to illustrate points and their order in the text should be deliberate and help the flow of the reader to understand the text.For example, figure 1 shows the elevation map of five southern provinces in China where observational data is used. Figure 2 shows the WRF simulation domain which appears to be a direct figure output from the WRF Pre-Processor (WPS). What is the purpose of these figures? It could be merged into one figure where the observation sites and provinces are marked. The purpose of the elevation maps in the analysis only shows up very late in the text in Section 4.2 about the RF feature importance and is not immediately clear to the reader.
Why was Lechang, Guangdong chosen for Figure 5, and where is this site, was it especially chosen? What is the purpose of the figure to the reader?In terms of presentation, Figure 3 bottom half is very unclear. The right section of Step 2 is completely unreadable. Step 3 - what do the colored boxes mean? Does their width represent some information? Define the error metrics (FA, ...) before presenting the figure;
Figure 4 text is unreadable and the colors do not help discern the lines. Make the lines bolder. The backgrounds could just be white and grey to represent the training+validation & the test sets (label them with a legend).
Figure 6 text on the right side is unreadable. Are the specific correlation coefficient text useful to the reader? The colorbar could be sufficient to illustrate the importance. The colorbars of the left and right panels could be the same size. Also, define the feature abbreviations in text as it is impossible to understand the figure and the corresponding feature names in the text if they're not clearly defined. Label the experiments 1 and 2 in Figure 6.
Text in Figure 9, 14 is too small.- As voiced by Reviewer #1, the model was trained mostly based on winter data (DJF). Would the use of data from other seasons help the prediction?
- The observational dataset presented in Section 2.1 is unclear. Where does this observational dataset come from? Did the authors create this blended data set, and if so what is the source data and the relevant citations?
Specific comments:
- Figure 2 has a contour map but it is not labeled (I assume this is topography), only the unit (m) is specified.
- Line 16: "safe"? Elaborate on the purpose of wind speed prediction for use of wind speed resources.
- Line 26: Define "BOA" here.
- Line 26: Why "debug"? Is there a bug in the models? I suggest "analyze".
- Line 33 shows many metrics of the presented model compared to observations. How much better is this against WRF-predicted values before correction?
- Line 43-45 talks about the decline of wind markets. Could authors elaborate on the relationship of this to wind speed prediction? It could be more useful for the reader to understand how better wind speed prediction serves the wind energy markets.
- Line 59: Cite the original WRF whitepapers as well (Skamarock et al.) instead of just the wind speed prediction part.
- Line 116: Define DBN here.
- Line 140-141: WRF is not just developed by NCEP. The WRF website states it is a "collaborative partnership of the National Center for Atmospheric Research (NCAR), the National Oceanic and Atmospheric Administration (represented by the National Centers for Environmental Prediction (NCEP) and the Earth System Research Laboratory), the U.S. Air Force, the Naval Research Laboratory, the University of Oklahoma, and the Federal Aviation Administration (FAA)."
- Line 145-146: WRF can use other input fields other than GFS. I suggest just stating that your run of WRF uses GFS as initial and lateral boundary conditions.
- Overall, section 2.2.1 could be improved to be more relevant and shortened. The background of WRF is well stated in literature and the manuscript should focus on parts relevant to wind speed prediction. "Boilerplate" text about WRF (e.g., L166-167 about "WRFOUT") is not exactly relevant and could be shortened (authors already state previously in text that output frequency is 1-hour to line up with observational data).
- In Section 2.2.4, first summarize the major difference (and purpose) of experiments 1 & 2. It is hard for the reader to see the importance of the two experiments when it is mixed together with the analysis.
- Line 234: "Missing and outlier values are removed from the dataset" - isn't this WRF model outputs, why would there be missing values?
- Section 3.1: Better to describe the RMSE, R, error metric values of different model configurations in a table for clarity. A table only shows in Section 3.3 in the form of Table 3 & 4 and it is unclear of the relationship of these and experiments 1 & 2. The flow could be much improved here.
- A lot of the feature labels could be better explained instead of being just listed in the text (e.g., in conclusion Line 542). What does pca0, IMF0 represent physically?Citation: https://doi.org/10.5194/egusphere-2023-945-RC2 -
AC2: 'Reply on RC2', shaohui zhou, 25 Jul 2023
Dear Reviewer2,
We are grateful for your constructive comments and suggestions. We have carefully revised this manuscript and provided the following point-to-point responses. Please see the attached document.
We look forward to continuing this exchange and addressing any further questions or remarks prompted by the interactive discussion.With thanks and best wishes,
Shaohui Zhou (on behalf of the co-authors)
-
AC2: 'Reply on RC2', shaohui zhou, 25 Jul 2023
-
RC3: 'Comment on egusphere-2023-945', Anonymous Referee #3, 24 Jul 2023
General comments:
This paper is a description of several candidate post-processing approaches for producing point wind speed forecasts from numerical weather prediction simulations for sites in southern China. While the overall methods appear reasonable, and the conclusions appear valid, the paper needs some work to clarify the approach in some regards.
Specific comments:
I think more detail is needed on the gridded meteorological dataset that you describe creating in section 2.1 “Data”. In particular, what is the source of the meteorological in situ observations? Are these wind towers all at a consistent height? How do you combine the surface observations with satellite data? Can you show some proof that your dataset “exhibits superior quality compared to other products”, or at least provide some references that evaluate the dataset?
In the step 1 description (first part of Fig. 3, and text description in lines 222-242), it seems like a lot is being changed between Exp. 1 and Exp. 2. How can you control for this? It makes it somewhat hard to interpret the results.
I was confused about why no bias statistics were shown in the verification section. Showing only mean absolute error and root mean square error type verification is only part of the story; can you say anything about the mean biases of the different approaches explored in this study? I think that is an important part of the analysis that is not shown yet.
I think somewhere (maybe in Fig. 1 and/or Fig. 2) you need to label the provinces, as readers from outside of China may not know which is which.
Minor comments:
Page 1, line 24: GFS stands for Global Forecast System.Lines 29: indexes > indices.
Page 3, line 83: training the > training on the
Page 5, line 118: Can you specify that these provinces are in China?
Page 6, line 141: NCEP does not develop WRF, but rather NCAR (National Center for Atmospheric Research). While there are contributors to WRF from NCEP, there are also contributors from universities and many other organisations.
Page 6, lines 144-147: This section about GFS is confusing. Are you saying WRF uses GFS initial and lateral boundary conditions? It has the capability, but is not required to use GFS data. Also, NCAR did not have a role in developing GFS to my knowledge.
Page 7, line 166: surface process plan > land surface model
Page 9, line 206: Can you define and capitalize your acronym “pcs”.
Fig. 3: validing > validating.
Page 10, line 228: selected WRF field forecast data, including > selected WRF field forecast data to include only…
Page 10, line 235: 8+9+3 does not equal 12. Are you counting the 9 IMF components as one set of meteorological elements? Please clarify your wording here.
Page 13, line 276: Where does the “FA” acronym come from? I normally interpret that as false alarm, but it seems you have a different definition.
Line 278: index > indices.
Lines 303-308: Can you put these verification results in a Table? That would make it much easier to read, and to compare the different approaches. The same goes for further lists of results in other sections.
Line 309: Indexes > indices.
Lines 309-311: These sentences don’t make much sense. It would be better to say “in” instead of “is that”. For example FA in January 2022 is generally higher than in December 2021.
Lines 356-359: Can you point out which figure this text refers to? I see some panels in Figs. 7 and 8 have blue and red scatter plots. Which model(s) are you specifically referring to about the day vs. night issues?
Figs. 7 and 8: Please clarify in the caption that the scatter plots are by hour. Is there some pattern to the models and months that are being shown in each panel? If so, it is above my head. Also, what is the difference between Fig. 7 and Fig. 8?
Fig. 11: Can you clarify in the caption which panels show FA and which show RMSE? It is not clear.
Line 477: I think it would be clearer to say “elevation above sea level” rather than “height”. When I read “height” in this sort of study, it makes me think of anemometer height above ground level.
Line 494-495: This sentence is poorly worded and doesn’t make sense.
Lines 493-498: The use of the word “unstable” or “instability” in this section is confusing. I might say something more like “variability”.
Fig. 14: The text claims this figure shows the actual wind speed in each month, but I cannot find that.
Line 513: Indexes > indices
Citation: https://doi.org/10.5194/egusphere-2023-945-RC3 -
AC3: 'Reply on RC3', shaohui zhou, 26 Jul 2023
Dear Reviewer3,
We are grateful for your constructive comments and suggestions. We have carefully revised this manuscript and provided the following point-to-point responses. Please see the attached document.
We look forward to continuing this exchange and addressing any further questions or remarks prompted by the interactive discussion.With thanks and best wishes,
Shaohui Zhou (on behalf of the co-authors)
-
AC3: 'Reply on RC3', shaohui zhou, 26 Jul 2023
-
RC4: 'Comment on egusphere-2023-945', Anonymous Referee #4, 25 Jul 2023
General comments
The manuscript “A robust error correction method for numerical weather prediction wind speed based on Bayesian optimization, Variational Mode Decomposition, Principal Component Analysis, and Random Forest: VMD-PCA-RF (version 1.0.0)” by Zhou et al. introduces a hybrid method for correcting 10-meter wind speed predicted by WRF. The authors compare the performance of two sets of experiments with different predictors and report the best model for wind speed correction during December 2021 to January 2022. In general, this manuscript fits the scope of the Geoscientific Model Development. However, after reading the manuscript, I find it still has a few major flaws. Firstly, the descriptions for the observation data and methods are unclear and ambiguous, and some citations should be implemented in the main text. Secondly, the information in the main text, figures, and tables is repeated. For example, the authors just simply report many statistics for model validation and comparison in Section 3, which are also showed in the tables. I would suggest the authors to summarize the key points and analyze the potential reasons for the differences in the main text rather than listing the statistics, which can be better for readers’ understanding. Finally, the writing and figures should be improved. Some figures should be combined, e.g., Figure 1 and 2. The captions for some figures are very simple, e.g., Figure 5, Figure6, and Figure 10. The labels and legends might be enlarged for a better readability. This reviewer requests major revisions listed below.
Specific comments
P5, Section 2.1: The description of the observation data is unclear. I would suggest the authors to give more details on this dataset. What are the data sources for the ground and satellite data? How do the authors process the data? How do the authors interpolate the data across 410 sites? Please cite the data sources and related techniques.
P6, Line 155: Do the authors consider the spin-up time for WRF simulations?
P6, Line 162-166: Please add the citations for these WRF parameterizations and schemes.
P5 and P7: Figure 1 and Figure 2 both show the terrain heights in the study region. What’s the difference between the two figures? I would suggest the authors to combine the two figures.
P10, Line 234: What’s the criteria for the outliers?
P10, Line 235: There are only 7 meteorological elements in Figure 4. Please add the missing one in Figure 7.
P10, Line 245: Why do the authors only use the data in February as training and validation dataset? I think there may be some seasonal variability for meteorological fields in the three months. I’m wondering if these machine learning models can successfully capture the relationship between the predictors and target variables in other two months.
P16, Figure 6: How does the feature importance calculate? What’s the correlation coefficients represent? Please add more details in the main text or in the caption.
P21, Line 382: I think there is no significant differences in statistics for most models except for the DBN and VMD-PCA-DBN based on the Taylor chart in Figure 9. The Taylor chart and Table 3 provide the same information. I would suggest the authors to remove the Taylor chart in Figure 9.
P24, Figure 10: What do the shading areas and colored curves in Figure10c and 10d represent? Please clarify in the caption.
Technical corrections
P6, Line 140: Please add citations for the WRF v4.2 model.
P6, Line 140: It should be “National Centers for Environmental Prediction (NCEP)”.
P10, Line 224: Please rearrange the order of the figures and tables, which should be numbered in the order of their appearance in the main text.
P13, Line 276: Please spell out the acronyms “FA”.
P14: Please change the “m/s” to “m s-1".
Citation: https://doi.org/10.5194/egusphere-2023-945-RC4 -
AC4: 'Reply on RC4', shaohui zhou, 27 Jul 2023
Dear Reviewer4,
We are grateful for your constructive comments and suggestions. We have carefully revised this manuscript and provided the following point-to-point responses. Please see the attached document.
We look forward to continuing this exchange and addressing any further questions or remarks prompted by the interactive discussion.With thanks and best wishes,
Shaohui Zhou (on behalf of the co-authors)
-
AC4: 'Reply on RC4', shaohui zhou, 27 Jul 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-945', Anonymous Referee #1, 07 Jul 2023
In this paper, the authors present interesting methods for wind speed corrections from the NWP model with multi-step methods. Below are a few minor suggestions for revision:
1. The main issue that I see in this paper is the short period for training and testing of the model, and the authors claim from this that the model is robust. Similar studies for wind speed correction from NWP models usually use several years for training and at least one year for testing. As I understood, this paper is trained only on data from February 2022, and the main conclusions are based on testing in December 2021 and January 2022, with some additional verification of stability over 10 months.
2. order of figures in the text: Fig. 1, Fig. 2, Fig. 3, Fig. 6, Fig. 4, Fig. 5, ... Fig. 11, Fig. 14, Fig. 12.
3. Sometimes authors refer to figures in the text as "Fig. NN" in other cases as "Figure NN", and even once as "figure NN". According to Journal rules, I think it should always be "Fig. NN." Fig. 6 and 9 are unreadable.
4. On lines 56–57, the authors state that "Currently,..." and cite a publication from 1999, but there are more recent publications for the HIRLAM model or consortium.
5. The authors claim in line 520 that "In general, VMD-PCA-RF is the best wind speed correction model for winter and even throughout the entire year in the five southern provinces," while on Fig. 14 for 2022-01, VMD-PCA-lightGBM is better.
6. There should be more clarification about observational data. In line 132, the authors wrote "For the purposes of this paper, the 10-meter wind speed data is interpolated across 410 sites". Are those 410 sites the weather stations? Why did the authors use interpolation from this database instead of observations from stations?
Citation: https://doi.org/10.5194/egusphere-2023-945-RC1 -
AC1: 'Reply on RC1', shaohui zhou, 10 Jul 2023
Dear Reviewer,
We are grateful for your constructive comments and suggestions. We have carefully revised this manuscript and provided the following point-to-point responses. Please see the attached document.
We look forward to continuing this exchange and addressing any further questions or remarks prompted by the interactive discussion.With thanks and best wishes,
Shaohui Zhou (on behalf of the co-authors)
-
AC1: 'Reply on RC1', shaohui zhou, 10 Jul 2023
-
RC2: 'Comment on egusphere-2023-945', Anonymous Referee #2, 17 Jul 2023
Zhou et al. present a series of machine learning models including VMD-PCA-RF, a combination of Variational Mode Decomposition, Principal Component Analysis, and Random Forest, for correction of errors in WRF-predicted wind speeds. The manuscript presents various machine learning algorithms and uses two sets of experiments with different approaches to arrive at the model with better predictive capabilities. Accurate prediction of wind speed is important for the wind energy market for effective harvesting of wind energy, and this manuscript has potential in improving predictive capabilities for such uses. As a modeling paper it is fit for the scope of GMD, but the manuscript as presented has major shortcomings primarily in its presentation that require major revisions before further considation.
Major comments:
- Many of the figures in the text are unclear both in presentation and in purpose. Generally, the use of figures to illustrate points and their order in the text should be deliberate and help the flow of the reader to understand the text.For example, figure 1 shows the elevation map of five southern provinces in China where observational data is used. Figure 2 shows the WRF simulation domain which appears to be a direct figure output from the WRF Pre-Processor (WPS). What is the purpose of these figures? It could be merged into one figure where the observation sites and provinces are marked. The purpose of the elevation maps in the analysis only shows up very late in the text in Section 4.2 about the RF feature importance and is not immediately clear to the reader.
Why was Lechang, Guangdong chosen for Figure 5, and where is this site, was it especially chosen? What is the purpose of the figure to the reader?In terms of presentation, Figure 3 bottom half is very unclear. The right section of Step 2 is completely unreadable. Step 3 - what do the colored boxes mean? Does their width represent some information? Define the error metrics (FA, ...) before presenting the figure;
Figure 4 text is unreadable and the colors do not help discern the lines. Make the lines bolder. The backgrounds could just be white and grey to represent the training+validation & the test sets (label them with a legend).
Figure 6 text on the right side is unreadable. Are the specific correlation coefficient text useful to the reader? The colorbar could be sufficient to illustrate the importance. The colorbars of the left and right panels could be the same size. Also, define the feature abbreviations in text as it is impossible to understand the figure and the corresponding feature names in the text if they're not clearly defined. Label the experiments 1 and 2 in Figure 6.
Text in Figure 9, 14 is too small.- As voiced by Reviewer #1, the model was trained mostly based on winter data (DJF). Would the use of data from other seasons help the prediction?
- The observational dataset presented in Section 2.1 is unclear. Where does this observational dataset come from? Did the authors create this blended data set, and if so what is the source data and the relevant citations?
Specific comments:
- Figure 2 has a contour map but it is not labeled (I assume this is topography), only the unit (m) is specified.
- Line 16: "safe"? Elaborate on the purpose of wind speed prediction for use of wind speed resources.
- Line 26: Define "BOA" here.
- Line 26: Why "debug"? Is there a bug in the models? I suggest "analyze".
- Line 33 shows many metrics of the presented model compared to observations. How much better is this against WRF-predicted values before correction?
- Line 43-45 talks about the decline of wind markets. Could authors elaborate on the relationship of this to wind speed prediction? It could be more useful for the reader to understand how better wind speed prediction serves the wind energy markets.
- Line 59: Cite the original WRF whitepapers as well (Skamarock et al.) instead of just the wind speed prediction part.
- Line 116: Define DBN here.
- Line 140-141: WRF is not just developed by NCEP. The WRF website states it is a "collaborative partnership of the National Center for Atmospheric Research (NCAR), the National Oceanic and Atmospheric Administration (represented by the National Centers for Environmental Prediction (NCEP) and the Earth System Research Laboratory), the U.S. Air Force, the Naval Research Laboratory, the University of Oklahoma, and the Federal Aviation Administration (FAA)."
- Line 145-146: WRF can use other input fields other than GFS. I suggest just stating that your run of WRF uses GFS as initial and lateral boundary conditions.
- Overall, section 2.2.1 could be improved to be more relevant and shortened. The background of WRF is well stated in literature and the manuscript should focus on parts relevant to wind speed prediction. "Boilerplate" text about WRF (e.g., L166-167 about "WRFOUT") is not exactly relevant and could be shortened (authors already state previously in text that output frequency is 1-hour to line up with observational data).
- In Section 2.2.4, first summarize the major difference (and purpose) of experiments 1 & 2. It is hard for the reader to see the importance of the two experiments when it is mixed together with the analysis.
- Line 234: "Missing and outlier values are removed from the dataset" - isn't this WRF model outputs, why would there be missing values?
- Section 3.1: Better to describe the RMSE, R, error metric values of different model configurations in a table for clarity. A table only shows in Section 3.3 in the form of Table 3 & 4 and it is unclear of the relationship of these and experiments 1 & 2. The flow could be much improved here.
- A lot of the feature labels could be better explained instead of being just listed in the text (e.g., in conclusion Line 542). What does pca0, IMF0 represent physically?Citation: https://doi.org/10.5194/egusphere-2023-945-RC2 -
AC2: 'Reply on RC2', shaohui zhou, 25 Jul 2023
Dear Reviewer2,
We are grateful for your constructive comments and suggestions. We have carefully revised this manuscript and provided the following point-to-point responses. Please see the attached document.
We look forward to continuing this exchange and addressing any further questions or remarks prompted by the interactive discussion.With thanks and best wishes,
Shaohui Zhou (on behalf of the co-authors)
-
AC2: 'Reply on RC2', shaohui zhou, 25 Jul 2023
-
RC3: 'Comment on egusphere-2023-945', Anonymous Referee #3, 24 Jul 2023
General comments:
This paper is a description of several candidate post-processing approaches for producing point wind speed forecasts from numerical weather prediction simulations for sites in southern China. While the overall methods appear reasonable, and the conclusions appear valid, the paper needs some work to clarify the approach in some regards.
Specific comments:
I think more detail is needed on the gridded meteorological dataset that you describe creating in section 2.1 “Data”. In particular, what is the source of the meteorological in situ observations? Are these wind towers all at a consistent height? How do you combine the surface observations with satellite data? Can you show some proof that your dataset “exhibits superior quality compared to other products”, or at least provide some references that evaluate the dataset?
In the step 1 description (first part of Fig. 3, and text description in lines 222-242), it seems like a lot is being changed between Exp. 1 and Exp. 2. How can you control for this? It makes it somewhat hard to interpret the results.
I was confused about why no bias statistics were shown in the verification section. Showing only mean absolute error and root mean square error type verification is only part of the story; can you say anything about the mean biases of the different approaches explored in this study? I think that is an important part of the analysis that is not shown yet.
I think somewhere (maybe in Fig. 1 and/or Fig. 2) you need to label the provinces, as readers from outside of China may not know which is which.
Minor comments:
Page 1, line 24: GFS stands for Global Forecast System.Lines 29: indexes > indices.
Page 3, line 83: training the > training on the
Page 5, line 118: Can you specify that these provinces are in China?
Page 6, line 141: NCEP does not develop WRF, but rather NCAR (National Center for Atmospheric Research). While there are contributors to WRF from NCEP, there are also contributors from universities and many other organisations.
Page 6, lines 144-147: This section about GFS is confusing. Are you saying WRF uses GFS initial and lateral boundary conditions? It has the capability, but is not required to use GFS data. Also, NCAR did not have a role in developing GFS to my knowledge.
Page 7, line 166: surface process plan > land surface model
Page 9, line 206: Can you define and capitalize your acronym “pcs”.
Fig. 3: validing > validating.
Page 10, line 228: selected WRF field forecast data, including > selected WRF field forecast data to include only…
Page 10, line 235: 8+9+3 does not equal 12. Are you counting the 9 IMF components as one set of meteorological elements? Please clarify your wording here.
Page 13, line 276: Where does the “FA” acronym come from? I normally interpret that as false alarm, but it seems you have a different definition.
Line 278: index > indices.
Lines 303-308: Can you put these verification results in a Table? That would make it much easier to read, and to compare the different approaches. The same goes for further lists of results in other sections.
Line 309: Indexes > indices.
Lines 309-311: These sentences don’t make much sense. It would be better to say “in” instead of “is that”. For example FA in January 2022 is generally higher than in December 2021.
Lines 356-359: Can you point out which figure this text refers to? I see some panels in Figs. 7 and 8 have blue and red scatter plots. Which model(s) are you specifically referring to about the day vs. night issues?
Figs. 7 and 8: Please clarify in the caption that the scatter plots are by hour. Is there some pattern to the models and months that are being shown in each panel? If so, it is above my head. Also, what is the difference between Fig. 7 and Fig. 8?
Fig. 11: Can you clarify in the caption which panels show FA and which show RMSE? It is not clear.
Line 477: I think it would be clearer to say “elevation above sea level” rather than “height”. When I read “height” in this sort of study, it makes me think of anemometer height above ground level.
Line 494-495: This sentence is poorly worded and doesn’t make sense.
Lines 493-498: The use of the word “unstable” or “instability” in this section is confusing. I might say something more like “variability”.
Fig. 14: The text claims this figure shows the actual wind speed in each month, but I cannot find that.
Line 513: Indexes > indices
Citation: https://doi.org/10.5194/egusphere-2023-945-RC3 -
AC3: 'Reply on RC3', shaohui zhou, 26 Jul 2023
Dear Reviewer3,
We are grateful for your constructive comments and suggestions. We have carefully revised this manuscript and provided the following point-to-point responses. Please see the attached document.
We look forward to continuing this exchange and addressing any further questions or remarks prompted by the interactive discussion.With thanks and best wishes,
Shaohui Zhou (on behalf of the co-authors)
-
AC3: 'Reply on RC3', shaohui zhou, 26 Jul 2023
-
RC4: 'Comment on egusphere-2023-945', Anonymous Referee #4, 25 Jul 2023
General comments
The manuscript “A robust error correction method for numerical weather prediction wind speed based on Bayesian optimization, Variational Mode Decomposition, Principal Component Analysis, and Random Forest: VMD-PCA-RF (version 1.0.0)” by Zhou et al. introduces a hybrid method for correcting 10-meter wind speed predicted by WRF. The authors compare the performance of two sets of experiments with different predictors and report the best model for wind speed correction during December 2021 to January 2022. In general, this manuscript fits the scope of the Geoscientific Model Development. However, after reading the manuscript, I find it still has a few major flaws. Firstly, the descriptions for the observation data and methods are unclear and ambiguous, and some citations should be implemented in the main text. Secondly, the information in the main text, figures, and tables is repeated. For example, the authors just simply report many statistics for model validation and comparison in Section 3, which are also showed in the tables. I would suggest the authors to summarize the key points and analyze the potential reasons for the differences in the main text rather than listing the statistics, which can be better for readers’ understanding. Finally, the writing and figures should be improved. Some figures should be combined, e.g., Figure 1 and 2. The captions for some figures are very simple, e.g., Figure 5, Figure6, and Figure 10. The labels and legends might be enlarged for a better readability. This reviewer requests major revisions listed below.
Specific comments
P5, Section 2.1: The description of the observation data is unclear. I would suggest the authors to give more details on this dataset. What are the data sources for the ground and satellite data? How do the authors process the data? How do the authors interpolate the data across 410 sites? Please cite the data sources and related techniques.
P6, Line 155: Do the authors consider the spin-up time for WRF simulations?
P6, Line 162-166: Please add the citations for these WRF parameterizations and schemes.
P5 and P7: Figure 1 and Figure 2 both show the terrain heights in the study region. What’s the difference between the two figures? I would suggest the authors to combine the two figures.
P10, Line 234: What’s the criteria for the outliers?
P10, Line 235: There are only 7 meteorological elements in Figure 4. Please add the missing one in Figure 7.
P10, Line 245: Why do the authors only use the data in February as training and validation dataset? I think there may be some seasonal variability for meteorological fields in the three months. I’m wondering if these machine learning models can successfully capture the relationship between the predictors and target variables in other two months.
P16, Figure 6: How does the feature importance calculate? What’s the correlation coefficients represent? Please add more details in the main text or in the caption.
P21, Line 382: I think there is no significant differences in statistics for most models except for the DBN and VMD-PCA-DBN based on the Taylor chart in Figure 9. The Taylor chart and Table 3 provide the same information. I would suggest the authors to remove the Taylor chart in Figure 9.
P24, Figure 10: What do the shading areas and colored curves in Figure10c and 10d represent? Please clarify in the caption.
Technical corrections
P6, Line 140: Please add citations for the WRF v4.2 model.
P6, Line 140: It should be “National Centers for Environmental Prediction (NCEP)”.
P10, Line 224: Please rearrange the order of the figures and tables, which should be numbered in the order of their appearance in the main text.
P13, Line 276: Please spell out the acronyms “FA”.
P14: Please change the “m/s” to “m s-1".
Citation: https://doi.org/10.5194/egusphere-2023-945-RC4 -
AC4: 'Reply on RC4', shaohui zhou, 27 Jul 2023
Dear Reviewer4,
We are grateful for your constructive comments and suggestions. We have carefully revised this manuscript and provided the following point-to-point responses. Please see the attached document.
We look forward to continuing this exchange and addressing any further questions or remarks prompted by the interactive discussion.With thanks and best wishes,
Shaohui Zhou (on behalf of the co-authors)
-
AC4: 'Reply on RC4', shaohui zhou, 27 Jul 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
385 | 130 | 34 | 549 | 44 | 10 | 12 |
- HTML: 385
- PDF: 130
- XML: 34
- Total: 549
- Supplement: 44
- BibTeX: 10
- EndNote: 12
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
Shaohui Zhou
Zexia Duan
Xingya Xi
Yubin Li
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(13122 KB) - Metadata XML
-
Supplement
(19065 KB) - BibTeX
- EndNote
- Final revised paper