the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Quality Control of Historical Temperature Data for Pure Rotational Raman Lidar Using Density-Based Clustering
Abstract. This paper is the first to use two density-based clustering algorithms, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Ordering Points To Identify the Clustering Structure (OPTICS), to screen the historical detection data of pure rotational Raman (PRR) temperature measurement lidar. To address the issues of threshold radius in DBSCAN and output value processing in OPTICS, three automated processing methods suitable for PRR temperature lidar detection data characteristics are proposed. These methods are the k-distance Fast Change Region (k-FCR) Method based on the DBSCAN, the Reachability Distance (RD) Method based on the OPTICS, and the Predecessor Divergence (PD) Method based on the OPTICS. Using these three methods, quality control was conducted on the historical data detected by a PRR temperature lidar from March 2021 to May 2024, demonstrating the effectiveness of these methods in automated quality control of historical data and the complementary nature of their quality control effects. Under the reliable threshold set in this paper, compared with the traditional Signal-to-Noise Ratio (SNR) method, the RD method increased the True Positive Rate (TPR) by 23.7 %, the PD method increased the True Negative Rate (TNR) by 6.0 %, and the k-FCR method increased the TPR by 72.1 % at the cost of some TNR loss. The influence of the SNR of data points and the number of continuous observation profiles on the quality control results is also explored, providing further references for the selection and application of different quality control methods. The methods provided in this paper will allow relevant researchers to filter PRR lidar data of atmospheric temperature according to their own needs, and these methods can also be applied to the automated processing of future atmospheric temperature data from detection networks.
- Preprint
(4815 KB) - Metadata XML
-
Supplement
(2250 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2650', Anonymous Referee #1, 13 Jan 2025
The paper lacks in providing any physical foundation for the applicability of the considered algorithms and the physical consistency of the achieved results. The paper, in its present form, completely lacks to illustrate what are the specific scientific questions aimed to be addressed by the application of the proposed algorithms (why a quality-control check based on a black box algorithm should be preferable to traditional Cal-Val efforts based on the comparison with independent measurements), as well as a comprehensive physical motivation behind the application of the present algorithms to the consider the data set. I cannot believe that the only motivation behind the application of these approaches to long-term series of temperature profile measurements resides in the fact that “… Atmospheric temperature, similar to wind fields, also exhibits temporal and spatial continuity, making density-based clustering methods potential for screening PRR lidar temperature detection data”, which is the only motivation the authors put forward. Authors state that: “Density-based clustering classifies data based solely on its features without the aid of external data sources, and it is a form of unsupervised learning.” This is a very strong statement that, to my opinion, can be endorsed only if substantial physical evidence is provided, which I don't seem to see in the paper. The paper is primarily dedicated to the illustration and application of two density-based clustering methods for quality control of temperature lidar data with the only argument that this approach had been used in literature for wind lidar data. Most part of the paper is dedicated to the illustration of the algorithms. To validate the algorithms and assess quality control effects of different methods authors set a threshold for reliable data to deviations from ERA5 of less than or equal to 5 K and less than or equal to 10 K. Authors well identify that “… detailed and high-resolution temperature structure observations are urgently needed for studying atmospheric energy balance, dynamics, and chemistry …, The troposphere … requires precise temperature detection for studying the atmospheric transport of pollutants … and for short to medium term weather forecasting”. However, the set thresholds are by far inadequate to validate temperature measurements to achieve these scientific objectives. Ther paper should undergo substantial modifications along the lines specified above, with a substantial integration of the text to carefully illustrate the physical motivations behind the application of the present algorithms and a substantiation of the part dedicated to assumptions made in the validation of the results. I will be happy to reconsidered the paper after these fundamental integrations.
Citation: https://doi.org/10.5194/egusphere-2024-2650-RC1 -
RC2: 'Comment on egusphere-2024-2650', Anonymous Referee #2, 09 Feb 2025
This manuscript of Cao et al. discusses the use of algorithms for the quality control of temperature measurements with the rotational Raman lidar at the Beijing Institute of Technology, China. As reference data set, ECMWF Reanalysis v5 (ERA5) reanalysis profiles are used. The temperature profiles of the lidar are compared with ERA5 temperature profiles using density-based clustering algorithms.
The lidar at Beijing Institute of Technology allows for temperature measurements at night but not during daytime (in contrast to some other rotational Raman lidar systems). Therefore, the comparisons which are presented here are limited to nighttime cases only.
Unfortunately, the manuscript in not well written. The language is partly difficult to understand. Furthermore, essential information is missing. However, my main concern is that the lidar and ERA5 data are not independent. Therefore, the ERA5 simply cannot be used for quality control of the lidar data.
Rotational Raman temperature lidar needs to be calibrated for obtaining atmospheric temperature profiles from the atmospheric backscatter signals. For this, data of radiosondes launched near the lidar site are usually used. This essential information regarding the calibration procedure is missing in this manuscript. Assuming that the lidar calibration has been made the common way, the radiosonde profiles cannot be considered as independent and thus cannot be used as independent reference for the quality control of the lidar measurements. The same is true for the ERA5 data, which are also based on the same radiosonde data.
Furthermore, the horizontal grid resolution of ERA5 is only 31 km, which is difficult to compare with the local lidar profiles: the lidar observations are influenced by local effects – especially in the complex urban terrain of this lidar site in Beijing.
My recommendation is thus to reject the manuscript in its present form.
Minor points:
References to original publications of the rotational Raman lidar technique for atmospheric temperature measurements including the determination of the uncertainties of the lidar data are missing. The references given for the equations in section 2.2.3 and Appendix A are not the original ones.
References to other state-of-the-art rotational Raman lidar systems could be added.
Figure 9a: There is discontinuity near the center of the plot. I assume this is because the time series is not continuous. It would be important to mark different periods in the data set.
Citation: https://doi.org/10.5194/egusphere-2024-2650-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
118 | 35 | 26 | 179 | 34 | 9 | 8 |
- HTML: 118
- PDF: 35
- XML: 26
- Total: 179
- Supplement: 34
- BibTeX: 9
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1