Quality Control of Historical Temperature Data for Pure Rotational Raman Lidar Using Density-Based Clustering
Abstract. This paper is the first to use two density-based clustering algorithms, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Ordering Points To Identify the Clustering Structure (OPTICS), to screen the historical detection data of pure rotational Raman (PRR) temperature measurement lidar. To address the issues of threshold radius in DBSCAN and output value processing in OPTICS, three automated processing methods suitable for PRR temperature lidar detection data characteristics are proposed. These methods are the k-distance Fast Change Region (k-FCR) Method based on the DBSCAN, the Reachability Distance (RD) Method based on the OPTICS, and the Predecessor Divergence (PD) Method based on the OPTICS. Using these three methods, quality control was conducted on the historical data detected by a PRR temperature lidar from March 2021 to May 2024, demonstrating the effectiveness of these methods in automated quality control of historical data and the complementary nature of their quality control effects. Under the reliable threshold set in this paper, compared with the traditional Signal-to-Noise Ratio (SNR) method, the RD method increased the True Positive Rate (TPR) by 23.7 %, the PD method increased the True Negative Rate (TNR) by 6.0 %, and the k-FCR method increased the TPR by 72.1 % at the cost of some TNR loss. The influence of the SNR of data points and the number of continuous observation profiles on the quality control results is also explored, providing further references for the selection and application of different quality control methods. The methods provided in this paper will allow relevant researchers to filter PRR lidar data of atmospheric temperature according to their own needs, and these methods can also be applied to the automated processing of future atmospheric temperature data from detection networks.