Improving multi-modal wind speed prediction of short and medium term with a bi-clustered machine learning method

Zhang, Yan; Li, Lei; Xiong, Xiong; Yin, Xiang; Zhang, Xiaojun; Cui, Fuhai; Dang, Rui; Liu, Wei; Zhai, Liang; Wang, Pengzhao; Sun, Peng; Lu, Weixiao; Zhang, Wenjie

doi:10.5194/egusphere-2025-5370

Preprints

https://doi.org/10.5194/egusphere-2025-5370

Preprints

13 Apr 2026

| 13 Apr 2026

Improving multi-modal wind speed prediction of short and medium term with a bi-clustered machine learning method

Yan Zhang, Lei Li, Xiong Xiong, Xiang Yin, Xiaojun Zhang, Fuhai Cui, Rui Dang, Wei Liu, Liang Zhai, Pengzhao Wang, Peng Sun, Weixiao Lu, and Wenjie Zhang

Abstract. Accurate prediction of wind speed is of great importance for stable and reliable operation of wind farms. However, the single numerical model forecast cannot provide precise wind speed outputs due to the defect of its physical parameterization scheme, whose error will gradually grow with increasing prediction time. Therefore, we proposed a model named Bi-clustered Recursive Bayesian Forest (BCRBR) for wind speed prediction and correction. The approach incorporated Sea-land Breeze and weather stability effects, integrating an atmospheric circulation index as input features; wind farm data underwent modal classification via bi-clustering to mitigate wind speed magnitude interactions, followed by machine learning-based correction of wind speed. The method was proved to be effective for wind speed prediction correction. Compared to forecasts from the Weather Research and Forecasting model, wind speed error indicators were reduced by more than 60 %; and the forecast precision increased from 30.2 % to 78.4 %, of which the improvement is more than twice. Compared to other models, the proposed model presented favorable correction results in different types of wind field, indicating its greater versatility and stronger competitiveness than other models.

Received: 30 Oct 2025 – Discussion started: 13 Apr 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Yan Zhang, Lei Li, Xiong Xiong, Xiang Yin, Xiaojun Zhang, Fuhai Cui, Rui Dang, Wei Liu, Liang Zhai, Pengzhao Wang, Peng Sun, Weixiao Lu, and Wenjie Zhang

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-5370', Anonymous Referee #1, 02 May 2026

1. There are many speed forecasting method available in literatures? How this method is specifically different from other works in terms of performance in prediction? Whether the authors did any comparison with existing works available in literatures? This shall be included in the results section.
2. How the historical data from April to July 2023 is alone sufficient to predict data for August 2023? Whether this work considers any uncertainties?
3. Is there any other data considered such as moisture, or air density for wind speed prediction?
4. In Figure 6, which characterizes the results of the BCMMC model, the colors used to distinguish the individual clusters may not be easily distinguishable. It is recommended to use an alternative color scheme for better recognition.
5. Why don't the authors try normalized data for training instead of actual data?
6. Although the paper summarizes the advantages and potential applications of the BCRBR model in the conclusion section, the discussion section lacks an in-depth exploration of the model's potential limitations and directions for future improvements. It is recommended that the authors further discuss the model's limitations in the discussion section, such as its dependence on specific wind field conditions and computational resource requirements, and propose possible directions for future research improvements.

Citation: https://doi.org/10.5194/egusphere-2025-5370-RC1
- AC1: 'Reply on RC1', Weixiao Lu, 13 May 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-5370/egusphere-2025-5370-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-5370-AC1
RC2:
'Comment on egusphere-2025-5370', Anonymous Referee #2, 15 May 2026
This article presents a hybrid method to forecast wind speed based on a statistical correction through ML over WRF estimates. A methodology based on bi-clustering and ML was trained over WRF simulations and experimental data gathered at an off-shore wind farm for short- and medium-term wind speed prediction. The results achieved by the presented methodology highly improve the accuracy of WRF estimates, and supersede the classic ML methods such as Random Forest or Gradient Boosting. Although the results seem promising, this article should be reconsidered after a major revision. The article requires a clearer motivation of the methodology used, a deeper explanation of the measurement data, a clearer description of the training and validation data structure, and a more adequate comparison with the State-of-the-art methods. Below I provide some comments that I hope can help the authors improve their work.
MAJOR COMMENTS
For the experimental data, more information should be provided. What is the instrumental set-up? Is only one measurement location used (90 m hub-height of 1 turbine)? Results could differ a lot depending on where these measurements are taken since turbine wakes have great effect on the wind stochastic processes. Also, the difference in resolutions between WRF measurements and actual experimental data can influence the predictions. Please comment on those.

Motivation on the use of BCMMC, RFE, BO, and RF is missing. Why not other ones? Also, a quantitative comparison with the state of the art for the results achieved is missing.

Same for lines 88-110 and 111-133: More concise motivation of the paper is required. Please specify more clearly what are the limitations of the existing methods. As you suggest, features such as SLB should be incorporated in the prediction. What is the magnitude of errors due to not incorporating SLB for instance?

In Sect. 3.4. was hyper-parameter tuning carried out for the different models compared? If not, the analysis would not be valid. Please provide the hyper-parameter tuning results as well. Do they correspond to the accuracy observed in other studies in the state of the art that use the same method?

The input features and labels to predict are hard to follow as they are explained in different sections of the manuscript. I suggest expanding Table 1 with the prediction labels, time and spatial resolutions, the data source, and the forecast time (i.e., 1-hour ahead forecasting).

In Sect 3.4. the manuscript comments on the performance of classic methods such as XGB for time-series forecasting. However, the typical data structure used in time-series forecasting by XGB is following an auto-regressive process approach. This is, past measurements are used to predict future ones. I suggest trying this approach or deleting these lines.

Regarding the robustness experiment, please give comments on how the trained model behaves in accordance to the SLB, since there is none in a mountain zone.

Please modify the discussion and conclusions sections according to the comments given.

MINOR COMMENTS
Lines 54-72: Please discuss quantitatively WRF errors. This is, provide statistical figures for “significant errors” as presented in line 67.

Line 45: Please add a reference.

Line 46: Complex terrain of the ocean may be misleading. Typically, complex terrain is associated to mountain/valley areas.

Please indicate in the introduction the amount of data that will be used, the length of the campaign, etc. Also, what are the coordinates of the wind farms, their names, etc.

Lines 223-226: Please formulate or expand on the wind speed evaluation index, comparison experiment, robustness experiment, etc.

3 would benefit from the off-shore wind farm coordinates.

Sect 2.1. Data. Please indicate from which up to which day data is available, which measurement instruments are used, the source of ACI, etc.

2.2.3. Lines 244-245: Motivation on the usage of BCMMC model is insufficient. You could provide some reference to motivate the good performance of it under similar scenarios.

Line 367: There is a typo: “The formula is as follows” two times.
Citation: https://doi.org/10.5194/egusphere-2025-5370-RC2
- AC2: 'Reply on RC2', Weixiao Lu, 10 Jun 2026
  
  Dear Editor and Reviewer,
  Thank you very much for your thoughtful feedback on our manuscript (ID: egusphere-2025-5370). We have addressed all the comments and provided detailed, point-by-point responses in the attached PDF document. Please find our revisions and responses therein. We greatly appreciate your guidance and patience.
  
  Citation: https://doi.org/10.5194/egusphere-2025-5370-AC2

Yan Zhang, Lei Li, Xiong Xiong, Xiang Yin, Xiaojun Zhang, Fuhai Cui, Rui Dang, Wei Liu, Liang Zhai, Pengzhao Wang, Peng Sun, Weixiao Lu, and Wenjie Zhang

Viewed

Total article views: 379 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
241	112	26	379	18	19

HTML: 241
PDF: 112
XML: 26
Total: 379
BibTeX: 18
EndNote: 19

Views and downloads (calculated since 13 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	136	51	13	200
May 2026	81	50	7	138
Jun 2026	15	9	6	30
Jul 2026	9	2	0	11

Cumulative views and downloads (calculated since 13 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	136	51	13	200
May 2026	81	50	7	138
Jun 2026	15	9	6	30
Jul 2026	9	2	0	11

Viewed (geographical distribution)

Total article views: 370 (including HTML, PDF, and XML) Thereof 370 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 16 Jul 2026

Short summary

1. Developed a Bi-clustered Recursive Bayesian Forest model, improving wind speed forecast accuracy by 48.2 % and reducing error metrics by more than 60 %. 2. Incorporated Sea-land breeze, weather stability, and atmospheric circulation index as features, using bi-clustering modal classification to mitigate wind speed magnitude interactions. 3. Proposed a machine learning-based correction technique that outperforms traditional numerical models for more reliable wind speed forecasting.


Total:	0
HTML:	0
PDF:	0
XML:	0