the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine learning-driven characterization and prescription of aerosol optical properties for atmospheric models
Abstract. Accurate modeling of aerosol optical properties is critical to simulate aerosol radiative effects. However, uncertainties regarding the simulation aerosol intensive optical properties are still significant. Therefore, the use of observations to constrain aerosol optical properties in models has been indicated as an option. Also, explicit computations of optical properties are still too costly for operational models, which make observational-based prescriptions a convenient solution. We developed a observational-based prescription of aerosol optical properties driven by machine-learning techniques that can be applied in models. The Iberian Peninsula (IP) was taken as the reference domain, and the aerosol products from the AERONET sites across the IP as the main dataset. First, clustering was applied to define the typical aerosol optical regimes affecting the IP atmosphere. Five typical regimes were identified. Two of them were dominated by coarse mode, which were associated with Saharan dust. One was found to be close to pure dust, while the other indicated a mixed scenario of dust and pollution. Two of the non-dust regimes, strongly and moderately absorbing, were found to be associated with smoke. The remaining non-dust regime, with not a clear association, occurs mostly in the eastern portion of the IP. Afterward, using aerosol-type columnar mass density from MERRA-2, a model was trained as predictor of the optical regimes using the Random Forest method. The model was tested under distinct aerosol scenarios. Predictions' accuracy ranged from 60 to 75 %, depending on the regime, while presenting an average accuracy of 70 %.
- Preprint
                                        (2482 KB) 
- Metadata XML
- 
                                    Supplement (182 KB) 
- BibTeX
- EndNote
Status: final response (author comments only)
- 
                     RC1:  'Comment on egusphere-2025-454', Anonymous Referee #1, 22 May 2025
            
                        
            
                            
                    
            
            
            
                        - 
                                        
                                     AC1:  'Reply on RC1', Nilton Rosario, 26 Aug 2025
                                        
                                                
                                        
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-454/egusphere-2025-454-AC1-supplement.pdf
 
- 
                                        
                                     AC1:  'Reply on RC1', Nilton Rosario, 26 Aug 2025
                                        
                                                
                                        
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        
- 
                     RC2:  'Comment on egusphere-2025-454', Anonymous Referee #2, 15 Jul 2025
            
                        
            
                            
                    
            
            
            
                        The study characterizes the typical aerosol intensive optical properties affecting the Iberian Peninsula (IP), comprising Spain and Portugal, using the atmospheric column inversion products from the AERONET sites. The authors employed K-means clustering to analyze historical aerosol intensive properties across all AERONET that operated for at least 2 years and has the higest quality dataset level 2.0 available. Five distinct aerosol optical regimes affecting the IP were identified based on the clustering technique, followed by the utilization of aerosol-type columnar mass density data (dust, organic carbon, black carbon, sea-salt, and sulphates) from MERRA-2 reanalysis to predict the aerosol optical regime using the Random Forest supervised learning methodology. The performance of the trained model was tested under various aerosol scenarios, and the predictions ranged from 60% to 75% with accuracy exceeding 90% when predicting solely dust or non-dust optical regimes. Overall, the study is very interesting and fits to the journal scope. The manuscript is well-written but require some improvement in clarity on certain aspects before re-consideration. Recent literature needs to be cited. 
 Comments:Line 37: Statement starting with 'Via'? 
 Line 70: compositions -> composition,
 Line 70: It should be 'microphysical properties'
 Line 70: computations -> computation
 Line 76: What parameters are being referred to in 'aerosol simulation'?
 Line 197: What do you mean by observation-contrained approaches? Are you referring to the threshold based aerosol type classification methods? Please clarify.
 Lines 211-215: What is the rationale for choosing these aerosol intensive properties? How is Lidar Ratio (LR) and Linear Depolarization Ratio (LDR) derived with AERONET sky radiance measurements? How reliable are the LR and LDR derived from AERONET?
 Line 235: Which climate models are being referred here?
 Table 1: Are these VMR-F, VMR-C, STD-F, STD-C, Reff-F, Reff-C provided by the AERONET inversion products or these are derived by the authors? Please clarify. Since these intensive properties are inversion products of AERONET, how did you account for their uncertainty impacting the the aerosol optical regimes identified through K-means clustering (Section 2.4)? There is no much discussion on the influence of the observational/inversion uncertainty of aerosol intensive properties on the identified clusters and interpretation of your results.
 Line 286: Use Sulphate or sulfate consistently throughout the manuscript.
 Lines 285-290: It was mentioned that the MERRA-2 Aerosol Diagnostic Product (ADP) for aerosol types is considered in this study. Dust, Black Carbon, Organic Carbon, Sea-Salt and Sulphate aerosol mass concentration at specific levels are integrated in the entire atmospheric column to obtain columnar aerosol optical properties such extinction, scattering and absorption optical depth. It is not clear on how the mass concentrations of individual species are converted to optical depths. Atleast proper citation of references to the method adopted might have been included. At which wavelength these are obtained? Did you validate extinction optical depth derived from MERRA-2 with the aerosol optical depth from AERONET? Similarly, how does the SSA from MERRA-2 validate with the corresponding SSA from AERONET?
 Line 309: There exist several methods and indices to decide on the appropriate number of clusters such as Elbow, Silhouette, Davies Bouldin, and Calinski-Harabasz indices. I have noticed that in the following study: https://doi.org/10.1016/j.atmosres.2022.106518, the authors have stated that the correct number of clusters derived from different approaches may not lead to a single solution. What is the rationale for adopting the Elbow method, except the fact that it is a widely used method for determining the optimal number of clusters?
 Lines 325-328: What do you mean by 'clusters average'?
 Lines 334-336: It was not mentioned anywhere how the times were synchronized between the AERONET inversion parameters and MERRA-2 data of aerosol species column mass density. Each of the AERONET inversion parameters and MERRA-2 aerosol species column mass densities might have different ranges of variability and units. How is this accounted for in the ML model while identifying the clusters? I mean to ask if the ML model does any scaling and normalization of different parameters. If not, won't the range of variability and units have any impact on the aerosol classification?
 Line 461: Large radius spread for C3 ... What does this infer?
 Lines 505-506: There is no mention about catgorization of seasons till now. How are months categorized into seasons?
 Table 3: What does the values in brackets correspond to? Standard deviation or error? This may be mentioned in the table caption.
 Line 555: How can you say that this would not introduce a substantial error in the radiative effect calculations? In terms of what metrics radiative effect is calculated? Radiative forcing or heating rates? Better to quantify this error. I suggest you to check this study: https://doi.org/10.1016/j.jqsrt.2024.109179, and see if this might provide some insights on the errors associated with direct radiative effects.
 Lines 573-583: It appears that these details are repeated again. Please check and avoid repetitions.
 Line 598: reanalyzes -> reanalyses
 Figure 10: Short forms (dst, oac, ssl, so4, so2, bcc) used as x-axis labels for features should be defined in the figure 10 caption. It is also not clear if this relative importance is obtained for the entire IP region or the grids consisting the AERONET sites. Can you bring out similar figures to ascertain the relative importance of aerosol intrinsic parameters (Table 1) for different clusters (or aerosol scenarios) identified in this study together with the predictor variables from MERRA-2.
 Line 625: All of a sudden MERRA-2 AOD field is taken as a reference. AERONET sites also provide the AOD and SSA values, which could have been checked during the period of various scenarios (Case#01, Case#02, Case#03, Case#04).
 Lines 673-674: 'lower computational cost' --> How is this quantified? Have you compared with any other methods of aerosol classification?
 Figure 11 Caption: MODIS Terra?
 Lines 697-699: Earlier it was mentioned AERONET AOD > 0.4 but for MERRA-2 AOD > 0.3. Why?Citation: https://doi.org/10.5194/egusphere-2025-454-RC2 - 
                                        
                                     AC2:  'Reply on RC2', Nilton Rosario, 26 Aug 2025
                                        
                                                
                                        
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-454/egusphere-2025-454-AC2-supplement.pdf
 
- 
                                        
                                     AC2:  'Reply on RC2', Nilton Rosario, 26 Aug 2025
                                        
                                                
                                        
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 1,810 | 81 | 25 | 1,916 | 35 | 31 | 53 | 
- HTML: 1,810
- PDF: 81
- XML: 25
- Total: 1,916
- Supplement: 35
- BibTeX: 31
- EndNote: 53
Viewed (geographical distribution)
| Country | # | Views | % | 
|---|
| Total: | 0 | 
| HTML: | 0 | 
| PDF: | 0 | 
| XML: | 0 | 
- 1
 
 
                         
                         
                         
                        



 
                 
                 
                 
                 
                
The authors use K-means clustering on AERONET data to identify different regimes of aerosol optical properties over the Iberian Peninsula, and subsequently train random forests to predict these regimes from aerosol column densities provided by MERRA-2. While the paper is interesting and fits the journal, I would still recommend major revisions, please refer to comments below:
Major comments:
Minor comments & technical corrections