the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Aerosol type classification with machine learning techniques applied to multiwavelength lidar data from EARLINET
Abstract. Aerosol typing is essential for understanding atmospheric composition and its impact on the climate. Lidar-based aerosol typing has been often addressed with manual classification using optical property ranges. However, few works addressed it using automated classification with machine learning (ML) mainly due to the lack of annotated datasets. In this study, a high-vertical-resolution dataset is generated and annotated for the University of Granada (UGR) station in Southeastern Spain, which belongs to the European Aerosol Research Lidar Network (EARLINET), identifying five major aerosol types: Continental Polluted, Dust, Mixed, Smoke and Unknown. Six ML models – Decision Tree, Random Forest, Gradient Boosting, XGBoost, LightGBM and Neural Network- were applied to classify aerosol types using multiwavelength lidar data from EARLINET, for two system configurations: with and without depolarization data. LightGBM achieved the best performance, with precision, recall, and F1-Score above 90 % (with depolarization) and close to 87 % (without depolarization). The performance for each aerosol type was evaluated and dust classification improved by ~30 % with depolarization, highlighting its critical role in distinguishing aerosol types. Validation against an independent dataset from a Saharan dust event confirmed robust classification under real and extreme conditions. Compared to NATALI, a neural network-based EARLINET algorithm, the approach presented in this work shows improved aerosol classification accuracy, which emphasize the benefits of using high-resolution multiwavelength lidar data from real measurements. This highlights the potential of ML-based methods for robust and accurate aerosol typing, establishing a benchmark for future studies using multiwavelength lidar at high-resolution data from EARLINET.
- 
        
                                        Notice on discussion status
                                        The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version. 
- 
                                    Preprint
                                    (1442 KB) 
- 
            
            
                                    The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version. 
- Preprint
                                        (1442 KB) 
- Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
- 
                     RC1:  'Comment on egusphere-2025-269', Anonymous Referee #1, 11 Mar 2025
                        
                                
                        
            
            
            
            
                        - AC1: 'Reply on RC1', Ana del Águila, 28 May 2025
 
- 
                     RC2:  'Comment on egusphere-2025-269', Anonymous Referee #2, 18 Apr 2025
            
            
            
            
                        This paper presents a very innovative and relevant study, showing the possibility of using ML techniques to predict the type of aerosols. The work is very well written and structured. However, the following points need to be better detailed. Line 135: Why was the median used to fill in the gaps? Did you try to use other techniques? Perhaps the use of machine learning techniques could generate a more robust filling. Table 1: Why were these groups of hyperparameters exclusively selected? Was any analysis of the importance of hyperparameters performed? This can severely affect the final performance of the models, especially for neural networks. Figure 4: I recommend increasing the font size of the axes. Figure 4: How did you deal with the problem of the imbalance of the dataset? Because "continental polluted" tends to have worse performance due to the smaller number of data. Line 285: I expected better results from the NN model with depolarization data, since you have more information about the particle analyzed. Isn't this difference associated with the data input format in the model? Was any preprocessing performed to normalize them? Figure 5: Considering the use by other users, I think it is important to comment on the computational cost of each model. Section 3.2.3: Was an analysis of multicollinearity between the features performed? This can affect the importance of each one in the model, as well as the performance of the final model. Line 323: Because of this statement, I expected that depolarization would present better results in the MLP Classifier. Line 363: I recommend reviewing the imbalanced dataset issue because if this is not corrected, the cases that are less present in the training tend to perform worse. Citation: https://doi.org/10.5194/egusphere-2025-269-RC2 - AC2: 'Reply on RC2', Ana del Águila, 28 May 2025
 
Interactive discussion
Status: closed
- 
                     RC1:  'Comment on egusphere-2025-269', Anonymous Referee #1, 11 Mar 2025
                        
                                
                        
            
            
            
            
                        
            
                        - AC1: 'Reply on RC1', Ana del Águila, 28 May 2025
 
- 
                     RC2:  'Comment on egusphere-2025-269', Anonymous Referee #2, 18 Apr 2025
            
            
            
            
                        This paper presents a very innovative and relevant study, showing the possibility of using ML techniques to predict the type of aerosols. The work is very well written and structured. However, the following points need to be better detailed. Line 135: Why was the median used to fill in the gaps? Did you try to use other techniques? Perhaps the use of machine learning techniques could generate a more robust filling. Table 1: Why were these groups of hyperparameters exclusively selected? Was any analysis of the importance of hyperparameters performed? This can severely affect the final performance of the models, especially for neural networks. Figure 4: I recommend increasing the font size of the axes. Figure 4: How did you deal with the problem of the imbalance of the dataset? Because "continental polluted" tends to have worse performance due to the smaller number of data. Line 285: I expected better results from the NN model with depolarization data, since you have more information about the particle analyzed. Isn't this difference associated with the data input format in the model? Was any preprocessing performed to normalize them? Figure 5: Considering the use by other users, I think it is important to comment on the computational cost of each model. Section 3.2.3: Was an analysis of multicollinearity between the features performed? This can affect the importance of each one in the model, as well as the performance of the final model. Line 323: Because of this statement, I expected that depolarization would present better results in the MLP Classifier. Line 363: I recommend reviewing the imbalanced dataset issue because if this is not corrected, the cases that are less present in the training tend to perform worse. Citation: https://doi.org/10.5194/egusphere-2025-269-RC2 - AC2: 'Reply on RC2', Ana del Águila, 28 May 2025
 
Peer review completion
 
                                 
                
                                 
                                 
                
                                 
                             
                          Journal article(s) based on this preprint
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 859 | 160 | 31 | 1,050 | 40 | 62 | 
- HTML: 859
- PDF: 160
- XML: 31
- Total: 1,050
- BibTeX: 40
- EndNote: 62
Viewed (geographical distribution)
| Country | # | Views | % | 
|---|
| Total: | 0 | 
| HTML: | 0 | 
| PDF: | 0 | 
| XML: | 0 | 
- 1
Pablo Ortiz-Amezcua
Siham Tabik
Juan Antonio Bravo-Aranda
Sol Fernández-Carvelo
Lucas Alados-Arboledas
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
                            (1442 KB) 
- Metadata XML
 
 
                         
                         
                         
                         
            
                             
                 
                 
                 
                 
                
See the attached file with the comments.