Application of Multivariate Selective Bandwidth Kernel Density Estimation for Data Correction

Bui, Hai; Bakhoday-Paskyabi, Mostafa

doi:10.48550/arXiv.2306.16043

Preprints

https://doi.org/10.48550/arXiv.2306.16043

Preprints

28 Oct 2025

| 28 Oct 2025

Application of Multivariate Selective Bandwidth Kernel Density Estimation for Data Correction

Hai Bui and Mostafa Bakhoday-Paskyabi

Abstract. This paper presents an intuitive application of multivariate kernel density estimation (KDE) for data correction. The method utilizes the expected value of the conditional probability density function (PDF) and a credible interval to quantify correction uncertainty. A selective KDE factor is proposed to adjust both kernel size and shape, determined through least-squares cross-validation (LSCV) or mean conditional squared error (MCSE) criteria. The selective bandwidth method can be used in combination with the adaptive method to potentially improve accuracy. Two examples, involving a hypothetical dataset and a realistic dataset, demonstrate the efficacy of the method. The selective bandwidth methods consistently outperform non-selective methods, while the adaptive bandwidth methods improve results for the hypothetical dataset but not for the realistic dataset. The MCSE criterion minimizes root mean square error but may yield under-smoothed distributions, whereas the LSCV criterion strikes a balance between PDF fitness and low RMSE.

Received: 15 Sep 2025 – Discussion started: 28 Oct 2025

Hai Bui and Mostafa Bakhoday-Paskyabi

Status: final response (author comments only)

RC1: 'Comment on egusphere-2025-4312', Anonymous Referee #1, 19 Nov 2025

The manuscript provides a clear and useful application of multivariate KDE for data correction. The selective bandwidth strategy and the comparison of LSCV and MCSE criteria are well explained, and the examples effectively demonstrate the method’s advantages. Although the manuscript addresses wind-speed–related applications and may attract interest from some atmospheric researchers, I believe it is not an ideal fit for AMT. The overall style and the core contributions align more closely with work in mathematical or theoretical modeling rather than atmospheric measurement technique development.

Specific comments are as follows:

Figure 3 is difficult to interpret. Please provide additional explanation in the main text or immediately after the caption to help readers better understand the figure. The same issue also applies to Figure 2.

In Tables 2 and 3, the meaning of the boldface numbers is unclear. Please clarify this in the caption or text.

In Section 4.2, the wind-speed measurement experiment appears limited. I recommend adding more experimental evidence to more convincingly demonstrate the effectiveness of the proposed method.

The legend in Figure 1 is missing and should be included.

Citation: https://doi.org/10.5194/egusphere-2025-4312-RC1
RC2:
'Comment on egusphere-2025-4312', Anonymous Referee #2, 10 Dec 2025
Overall Assessment
The manuscript presents a methodological development in multivariate kernel density estimation (KDE), introducing a “selective bandwidth” approach to adjust kernel shape and size for data correction. While the method is technically sound and the study includes an application to wind speed data from a meteorological mast, I find that the core contribution is primarily methodological and theoretical. The work is more aligned with the fields of statistical modeling, kernel density estimation, or general data science, rather than with the specific focus of Atmospheric Measurement Techniques on novel measurement techniques, instrumentation, observational methods, or their direct validation. Therefore, I cannot recommend publication in this journal.

Major Concerns:
Misalignment with Journal Scope. AMT emphasizes the development, evaluation, and application of measurement techniques and instruments in the atmospheric and related sciences. The core of this paper is a general statistical method (multivariate KDE with bandwidth selection) that is applied to a meteorological dataset, but does not introduce, improve, or critically evaluate a measurement technique itself. The “data correction” presented is a purely statistical post-processing step. The paper does not address instrumental limitations, propose a new sensing principle, validate a new instrument, or enhance the fundamental process of atmospheric measurement.

Atmospheric Case Study is Illustrative, Not Central. The wind speed correction example serves as a demonstration of the method, but the methodological development is not driven by or uniquely tailored to the specific challenges of atmospheric measurement (e.g., complex error structures of instruments, spatial representativeness, temporal autocorrelation). The analysis could be replaced with data from any other field (e.g., hydrology, oceanography, engineering) without altering the methodological core. This indicates that the atmospheric context is incidental rather than integral to the study's advancement.

Lack of Measurement-Centric Innovation or Insight. The paper does not provide new insights into the behavior of the cup anemometer or LiDAR under shading conditions, nor does it propose a method to physically mitigate or characterize the mast shadow effect. The correction is entirely data-driven and statistical. For this journal, a more suitable study might involve using the KDE-corrected data to validate or improve a physical model of flow distortion, or to develop a new instrumental setup to avoid shading. The current contribution remains in the realm of data processing.

Suggestions for the Authors
The work has merit and could be suitable for publication in a journal focused on applied statistics, geoscientific data analysis, wind energy resource assessment, or environmental modelling. I encourage the authors to consider submitting to a journal with a broader emphasis on statistical methods for geophysical applications. In such venues, the methodological contribution would be more directly valued and reach a more appropriate audience.

Conclusion
While the developed KDE method is interesting and well-presented, its primary contribution is statistical and methodological rather than pertaining to atmospheric measurement technology. Therefore, it does not meet the specific scope and aims of Atmospheric Measurement Techniques.

Decision: Reject (out of scope)
Citation: https://doi.org/10.5194/egusphere-2025-4312-RC2

Hai Bui and Mostafa Bakhoday-Paskyabi

Viewed

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 204 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
202	0	2	204	0	0

HTML: 202
PDF: 0
XML: 2
Total: 204
BibTeX: 0
EndNote: 0

Views and downloads (calculated since 28 Oct 2025)

Month	HTML	PDF	XML
Oct 2025	48	0	48
Nov 2025	74	1	75
Dec 2025	26	1	27
Jan 2026	54	0	54

Cumulative views and downloads (calculated since 28 Oct 2025)

Month	HTML	PDF	XML
Oct 2025	48	0	48
Nov 2025	74	1	75
Dec 2025	26	1	27
Jan 2026	54	0	54

Viewed (geographical distribution)

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 195 (including HTML, PDF, and XML) Thereof 195 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 24 Jan 2026

Short summary

We developed a new way to improve the correction of data by using a probability-based method called kernel density estimation. Our approach adapts the size and shape of the core probability functions, leading to more accurate corrections while also allowing us to measure the uncertainty of the results. Tests on both simple and realistic datasets show that our method outperforms existing ones, offering a practical tool for more reliable data analysis.


Total:	0
HTML:	0
PDF:	0
XML:	0