the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Improving multi-modal wind speed prediction of short and medium term with a bi-clustered machine learning method
Abstract. Accurate prediction of wind speed is of great importance for stable and reliable operation of wind farms. However, the single numerical model forecast cannot provide precise wind speed outputs due to the defect of its physical parameterization scheme, whose error will gradually grow with increasing prediction time. Therefore, we proposed a model named Bi-clustered Recursive Bayesian Forest (BCRBR) for wind speed prediction and correction. The approach incorporated Sea-land Breeze and weather stability effects, integrating an atmospheric circulation index as input features; wind farm data underwent modal classification via bi-clustering to mitigate wind speed magnitude interactions, followed by machine learning-based correction of wind speed. The method was proved to be effective for wind speed prediction correction. Compared to forecasts from the Weather Research and Forecasting model, wind speed error indicators were reduced by more than 60 %; and the forecast precision increased from 30.2 % to 78.4 %, of which the improvement is more than twice. Compared to other models, the proposed model presented favorable correction results in different types of wind field, indicating its greater versatility and stronger competitiveness than other models.
- Preprint
(2244 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 18 May 2026)
-
RC1: 'Comment on egusphere-2025-5370', Anonymous Referee #1, 02 May 2026
reply
-
AC1: 'Reply on RC1', Weixiao Lu, 13 May 2026
reply
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-5370/egusphere-2025-5370-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Weixiao Lu, 13 May 2026
reply
-
RC2: 'Comment on egusphere-2025-5370', Anonymous Referee #2, 15 May 2026
reply
This article presents a hybrid method to forecast wind speed based on a statistical correction through ML over WRF estimates. A methodology based on bi-clustering and ML was trained over WRF simulations and experimental data gathered at an off-shore wind farm for short- and medium-term wind speed prediction. The results achieved by the presented methodology highly improve the accuracy of WRF estimates, and supersede the classic ML methods such as Random Forest or Gradient Boosting. Although the results seem promising, this article should be reconsidered after a major revision. The article requires a clearer motivation of the methodology used, a deeper explanation of the measurement data, a clearer description of the training and validation data structure, and a more adequate comparison with the State-of-the-art methods. Below I provide some comments that I hope can help the authors improve their work.
MAJOR COMMENTS
- For the experimental data, more information should be provided. What is the instrumental set-up? Is only one measurement location used (90 m hub-height of 1 turbine)? Results could differ a lot depending on where these measurements are taken since turbine wakes have great effect on the wind stochastic processes. Also, the difference in resolutions between WRF measurements and actual experimental data can influence the predictions. Please comment on those.
- Motivation on the use of BCMMC, RFE, BO, and RF is missing. Why not other ones? Also, a quantitative comparison with the state of the art for the results achieved is missing.
- Same for lines 88-110 and 111-133: More concise motivation of the paper is required. Please specify more clearly what are the limitations of the existing methods. As you suggest, features such as SLB should be incorporated in the prediction. What is the magnitude of errors due to not incorporating SLB for instance?
- In Sect. 3.4. was hyper-parameter tuning carried out for the different models compared? If not, the analysis would not be valid. Please provide the hyper-parameter tuning results as well. Do they correspond to the accuracy observed in other studies in the state of the art that use the same method?
- The input features and labels to predict are hard to follow as they are explained in different sections of the manuscript. I suggest expanding Table 1 with the prediction labels, time and spatial resolutions, the data source, and the forecast time (i.e., 1-hour ahead forecasting).
- In Sect 3.4. the manuscript comments on the performance of classic methods such as XGB for time-series forecasting. However, the typical data structure used in time-series forecasting by XGB is following an auto-regressive process approach. This is, past measurements are used to predict future ones. I suggest trying this approach or deleting these lines.
- Regarding the robustness experiment, please give comments on how the trained model behaves in accordance to the SLB, since there is none in a mountain zone.
- Please modify the discussion and conclusions sections according to the comments given.
MINOR COMMENTS
- Lines 54-72: Please discuss quantitatively WRF errors. This is, provide statistical figures for “significant errors” as presented in line 67.
- Line 45: Please add a reference.
- Line 46: Complex terrain of the ocean may be misleading. Typically, complex terrain is associated to mountain/valley areas.
- Please indicate in the introduction the amount of data that will be used, the length of the campaign, etc. Also, what are the coordinates of the wind farms, their names, etc.
- Lines 223-226: Please formulate or expand on the wind speed evaluation index, comparison experiment, robustness experiment, etc.
- 3 would benefit from the off-shore wind farm coordinates.
- Sect 2.1. Data. Please indicate from which up to which day data is available, which measurement instruments are used, the source of ACI, etc.
- 2.2.3. Lines 244-245: Motivation on the usage of BCMMC model is insufficient. You could provide some reference to motivate the good performance of it under similar scenarios.
- Line 367: There is a typo: “The formula is as follows” two times.
Citation: https://doi.org/10.5194/egusphere-2025-5370-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 168 | 58 | 13 | 239 | 11 | 14 |
- HTML: 168
- PDF: 58
- XML: 13
- Total: 239
- BibTeX: 11
- EndNote: 14
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
1. There are many speed forecasting method available in literatures? How this method is specifically different from other works in terms of performance in prediction? Whether the authors did any comparison with existing works available in literatures? This shall be included in the results section.
2. How the historical data from April to July 2023 is alone sufficient to predict data for August 2023? Whether this work considers any uncertainties?
3. Is there any other data considered such as moisture, or air density for wind speed prediction?
4. In Figure 6, which characterizes the results of the BCMMC model, the colors used to distinguish the individual clusters may not be easily distinguishable. It is recommended to use an alternative color scheme for better recognition.
5. Why don't the authors try normalized data for training instead of actual data?
6. Although the paper summarizes the advantages and potential applications of the BCRBR model in the conclusion section, the discussion section lacks an in-depth exploration of the model's potential limitations and directions for future improvements. It is recommended that the authors further discuss the model's limitations in the discussion section, such as its dependence on specific wind field conditions and computational resource requirements, and propose possible directions for future research improvements.