Preprints
https://doi.org/https://doi.org/10.48550/arXiv.2306.16043
https://doi.org/https://doi.org/10.48550/arXiv.2306.16043
28 Oct 2025
 | 28 Oct 2025
Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

Application of Multivariate Selective Bandwidth Kernel Density Estimation for Data Correction

Hai Bui and Mostafa Bakhoday-Paskyabi

Abstract. This paper presents an intuitive application of multivariate kernel density estimation (KDE) for data correction. The method utilizes the expected value of the conditional probability density function (PDF) and a credible interval to quantify correction uncertainty. A selective KDE factor is proposed to adjust both kernel size and shape, determined through least-squares cross-validation (LSCV) or mean conditional squared error (MCSE) criteria. The selective bandwidth method can be used in combination with the adaptive method to potentially improve accuracy. Two examples, involving a hypothetical dataset and a realistic dataset, demonstrate the efficacy of the method. The selective bandwidth methods consistently outperform non-selective methods, while the adaptive bandwidth methods improve results for the hypothetical dataset but not for the realistic dataset. The MCSE criterion minimizes root mean square error but may yield under-smoothed distributions, whereas the LSCV criterion strikes a balance between PDF fitness and low RMSE.

Share
Hai Bui and Mostafa Bakhoday-Paskyabi

Status: open (until 03 Dec 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Hai Bui and Mostafa Bakhoday-Paskyabi
Hai Bui and Mostafa Bakhoday-Paskyabi
Metrics will be available soon.
Latest update: 28 Oct 2025
Download
Short summary
We developed a new way to improve the correction of data by using a probability-based method called kernel density estimation. Our approach adapts the size and shape of the core probability functions, leading to more accurate corrections while also allowing us to measure the uncertainty of the results. Tests on both simple and realistic datasets show that our method outperforms existing ones, offering a practical tool for more reliable data analysis.
Share