the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Reconstructing the Full-Physics Model with Machine Learning for Aerosol Composition Retrieval
Abstract. Accurate aerosol composition retrievals support radiative forcing assessment, source attribution, air quality analysis, and improved modeling of aerosol–cloud–radiation interactions. Aerosol retrievals based solely on visible-wavelength aerosol optical depth (AOD) observations provide limited spectral sensitivity, which may be insufficient to reliably distinguish among aerosol types with similar optical properties. In this study, we present a new retrieval framework that combines multi-wavelength AOD observations from both the visible and infrared spectrum, enhancing aerosol type discrimination. A neural network forward model trained on simulations from the Model for Optical Properties of Aerosols and Clouds (MOPSMAP), which relates aerosol optical properties to spectral AOD, is embedded in an optimal estimation method (OEM) to retrieve aerosol composition. This machine learning-based forward model achieves computational efficiency without making compromises in accuracy. The neural network forward model achieves a mean R2 of 0.99 with root-mean-square error below 0.01. The retrieval resolves up to four independent aerosol components, with degrees of freedom for signal about 3.75. In the total retrieval uncertainty, the forward model contributes less than 10 %, confirming its robustness. We apply this hybrid method to ground-based observations, including data from the Aerosol Robotic Network (AERONET) and Fourier Transform Infrared spectrometer (FTIR) measurements. The retrieved aerosol compositions are consistent with physical expectations and validated through backward trajectory analysis. Furthermore, we successfully apply this method to satellite AOD observations and demonstrate its potential for global aerosol composition retrievals. The full development of a global dataset will be further addressed in future work.
Competing interests: Justus Notholt is a member of the editorial board of Atmospheric Measurement Techniques. The authors declare that they have no other competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
                                        (1864 KB) 
- Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-3289', Anonymous Referee #1, 10 Aug 2025
- 
                     RC2:  'Interesting idea but methodology is unclear and the manuscript is missing quantitative information needed to assess it', Anonymous Referee #2, 02 Oct 2025
            
                        
            
                            
                    
            
            
            
                        This paper presents a method to retrieve aerosol composition (as a mixture of 5 components from the GEOS aerosol modeling system) from multispectral aerosol optical depth (AOD). The forward model converting between aerosol composition and AOD is MOPSMAP, approximated for speed using a neural network. There are example applications to data from Ny-Ålesund and to some satellite retrievals. The authors state that a larger scale application of this idea to data is forthcoming. The manuscript is in scope for AMT. The idea presented is interesting. However, there are some gaps which make it hard to assess the work, and some errors. I recommend major revisions and would be willing to review the revision. I don’t feel that I can provide a full review until the methodology described in Section 3 and associated subsections is clearer, because many assumptions and quantitative parameters are not stated, which makes it difficult to interpret the results of the method. My comments in support of this recommendation are as follows: - The title is vague and doesn’t clearly reflect the content of the manuscript. I suggest something like “Retrieval of aerosol component fraction from spectral aerosol optical depth”. This describes more clearly what is actually done in the paper. The use of “machine learning” in the title honestly feels like an attempt to add this buzzword in, because the machine learning aspect is in my view not a novel aspect of the work (it’s an emulator for MOPSMAP, the work could be done without it, and it’s not a particularly complicated or conceptually new use of machine learning).
- Throughout, especially in the introduction, the term “infrared” is used to describe the FTIR measurements. It is only when we get to the data description that we learn this is about shortwave infrared (SWIR, solar spectral region, 1-3 microns). To me and I suspect to most readers, “infrared” without further specification implies thermal infrared. These are quite different spectral regions with different aerosol behavior. I suggest specifying SWIR throughout when talking about these data, otherwise it is somewhat misleading for the casual reader.
- Lines 59-68: there are a lot more examples than these. For example, the GRASP and RemoTAP algorithms use machine learning emulators to replace online radiative transfer calculations and have been widely applied to e.g. POLDER data (and more recently RemoTAP for PACE). There are other approaches (e.g. FastMAPOL, MAPP) which have been extensively published on in recent years, either for airborne polarimeters or more recently also for satellite observations from PACE. So there is already a fairly widespread use of these techniques in routine satellite aerosol data processing.
- Lines 92-95: I am not sure of the purpose of this sentence. Are the authors saying that, in this work, the AOD is used to determine SSA, etc? In that case I think the sentence is not necessary because the method is described later. Or are they saying that AERONET provides these? In that case references should be provided, and note that these derived properties aren’t determined by AERONET solely from spectral AOD (and not from all of those channels) but also from sky-scanning radiance measurements.
- Section 2.1 (and tying in to methodology later): to me, it looks like FTIR is providing an AOD at 2 microns but the other bands are the same as or match AERONET values. So the value of the FTIR seems a bit overstated in the manuscript, as it is emphasized repeatedly. None of the analysis shows the value of adding this band in particular. I would rather have seen the use of the UV bands in the network. For an eventual application to AERONET data globally, 2 micron data are not available, while the 340/380 nm band pair more often are. It reads a bit like the choice of wavelengths was motivated by the specific bands available for the case study at Ny-Ålesund, but it isn’t justified that this makes sense more generally. It feels a bit like the authors had access to these measurements and then tried to find something to do with them, as opposed to developing the approach and then finding appropriate case studies to look at. Perhaps the history is otherwise but it is not well-justified by the manuscript as submitted. Otherwise why pick Ny-Ålesund and why add the FTIR observation when it isn’t present at the hundreds of other AERONET sites in the network?
- Section 2.2, title: It is not accurate to say the satellite provides AOD “measurements”, they are retrievals.
- Section 2.2: to be clear, is it the case that the analysis takes the VIIRS Deep Blue monthly spectral AOD over ocean and use the Ångström power law to adjust the wavelengths to match the ones chosen for the MOPSMAP-based training? Is that right? I don’t know that it makes sense to do the decomposition based on monthly AOD data. To me that seems equivalent to the assumption that aerosol composition is constant over a month, which will not be valid in some locations (including the North Atlantic chosen for the case study later). It is not obvious that calculating composition on an instantaneous or daily basis and then averaging the results rather than calculating composition based on monthly-averaged AOD would give the same results. Most of the error in satellite AOD is not random noise but rather factors related to geometry, surface type, and optical property assumptions at the given location and time – which do not necessary decrease on temporal averaging, so I don’t think there is a justification on those grounds. So this choice should be better justified and its advantages and limitations discussed.
- Asymmetry factor is sometimes summarized as AF and sometimes as Af. This should be made consistent.
- Section 3: I think the whole methodology sections should be rewritten in a different order and with more information. Much of the text and Figure 1 are confusing, and some of the information needed to understand it and section 3.0 is not given until sections 3.1 and 3.2. I think the authors need to be clear what exactly is taken from MERRA2 components: is it just realistic mixing fractions? Or is it also the spectral complex refractive index and size distribution (and therefore also SSA, asymmetry factor)? Are the fractions in terms of mass, volume, area, AOD at some wavelength? What distributions were drawn from to sample these parameters, and how are they justified?
- Section 3: I guess one of my issues with the manuscript is I don’t really understand the point of adding machine learning to this, as opposed to using MOPSMAP directly, aside from speed. If it is just speed then I think the machine learning nature of this work is a bit overhyped. It might be that there is some detail I am missing about what is done, but the manuscript is not clear enough to say.
- Section 3.2: what is the justification for this network architecture? Some more references to the NN methods/code bases should be added as well. Can we see training and validation subset loss functions from the training process? Can the authors demonstrate that 10,000 simulations is enough for a comprehensive sampling and train:val:test subset, given this seems a fairly high dimensional problem, and does not lead to overfitting?
- Section 3.3: what values, specifically, are taken for x_a and S_a and how are they justified? These are important, particularly for the later discussion about averaging kernels and uncertainty estimates, both of which depend on the strengths of the prior constraints. It is mentioned that S_y is the measurement (input AOD) uncertainty; what numbers specifically are used here? Is it 0.01 or is there a more detailed description of AERONET and FTIR uncertainty? Is there any spectral correlation assumed?
- Line 235: are these 1500 cases the “Test” split of the original data, or another randomly-chosen 15%? This should be written more explicitly.
- Section 4.2 and Figure 4: are these results shown for the “Test” subset? I am not sure that R2 (which the text focuses on) is the relevant metric here; I’d think that the AOD reconstruction RMSE is. Also, as a practical matter, the AODs shown in Figure 4 are all very high. The lowest 500 nm AOD in figure 4(a) is about 0.7. This is a magnitude rarely seen except in an extreme aerosol event like fire or a dust storm. So an uncertainty analysis of the posterior composition based on this distribution will greatly overstate the actual practical utility of the algorithm, because the AODs are so high that measurement uncertainty is negligible. In practice the true AOD is often likely to be a factor of 3 or so lower, so the relative uncertainty about a factor of 3 higher. In short, the results of this theoretical uncertainty analysis based on simulations are likely to overstate the performance of the method. This will influence the discussion in these sections, including e.g. averaging kernels and relative contributions of different terms to overall posterior uncertainty.
- Section 4.4/Figure 5: again, are the “component fractions” defined in terms of mass, volume, area, number, AOD at some wavelength, something else?
- Line 310: I’m not sure we need this many significant figures for the average averaging kernel. It would be good also to show somehow the variability of the averaging kernel matrix between these simulations (e.g. standard deviations of each element). That will help to show whether the information content varies significantly across the ensemble of cases.
- Line 314: instead of just the average degrees of freedom, how about also showing the mode? What’s the interquartile range or standard deviation or similar? This ties in to the above point.
- Section 4.5: the discussion here and figures talk about MODIS, but the introduction to the paper says VIIRS data were used. Which is it, VIIRS or MODIS? My intuition says VIIRS because I don’t think the classification shown in Figure 6(b) is provided in MODIS, only VIIRS (though I could be wrong). Also see previous discussion about whether it makes sense to do this on monthly data as opposed to daily then averaging. Also, what assumed satellite uncertainty is taken for this example retrieval, and what is its assumed spectral correlation?
- Line 338: another option is taking SSA, asymmetry factor etc from the values used by the algorithm. This would keep consistency with what the retrieval assumed.
- Figure 7: This should be redrawn to use the same map projection and latitude/longitude boundaries for both panels. Having them different makes it difficult to compare the results.
- As a general methodological point: I could not fully judge this study because of the missing information described above. But conceptually, I find the idea that one can take spectral AOD and use this to get at weights of 5 components to seem unrealistic. Since aerosol extinction is spectrally smooth, the different AOD wavelengths are not orthogonal and really there are maybe 3 pieces of information in the AOD spectral (AOD magnitude and maybe two parameters related to spectral curvature as Ångström exponent is often represented as a log-log quadratic function). So to get 5 components weights out of this seems speculative and it must weigh heavily on the a priori constraints (which are not discussed in detail in the current version of the manuscript). This is borne out somewhat by the averaging kernel analysis which shows the prior is fairly important, especially for black carbon. For realistic aerosol loadings (as mentioned previously, about a factor of 3 lower than in the analysis presented), the uncertainty seems likely to be very high. I think we need a lot more detail on the underlying distributions all these cases were drawn from before we know how robust the results are, and there should be examples of averaging kernels drawn from more realistic aerosol loadings.
 Citation: https://doi.org/10.5194/egusphere-2025-3289-RC2 
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 607 | 93 | 19 | 719 | 14 | 18 | 
- HTML: 607
- PDF: 93
- XML: 19
- Total: 719
- BibTeX: 14
- EndNote: 18
Viewed (geographical distribution)
| Country | # | Views | % | 
|---|
| Total: | 0 | 
| HTML: | 0 | 
| PDF: | 0 | 
| XML: | 0 | 
- 1
 
 
                         
                         
                         
                        



 
                 
                 
                 
                 
                
This study presents a novel hybrid machine learning (ML) and physics-based framework for retrieving aerosol composition from multi-wavelength Aerosol Optical Depth (AOD) observations. The approach effectively bridges ML and physical modeling, offering a scalable solution with significant scientific and environmental applications. The research is well-conducted, and the results appear robust. I recommend acceptance after minor revisions addressing the following points:
1)The paper should clarify whether Single Scattering Albedo (SSA), asymmetry parameter (ASY), and relative humidity (RH) are necessary for the retrieval process. If these parameters are required, please briefly discuss how they can be obtained (e.g., from ancillary datasets, reanalysis products, or simultaneous measurements).
2) The manuscript claims that AOD in infrared (IR) wavelengths provides additional information on aerosol composition. Please elaborate on this point—for example, by explaining how IR absorption features are linked to specific aerosol types (e.g., dust, organic carbon) or how they complement visible/UV observations.
3) The text refers to MOSMAP as a "radiative transfer model," but it appears to be a bulk aerosol optical property calculator based on size distribution and refractive index inputs. Please correct this terminology. Additionally, the study relies solely on Mie scattering, neglecting non-spherical scattering methods (e.g., T-matrix for dust). Since dust aerosols are often nonspherical, this simplification may introduce errors. A brief discussion on this limitation and its potential impact should be included.
4) The Optimal Estimation Method (OEM) requires prior information and its associated covariance matrix. The manuscript should clarify: Whether prior estimates are sourced from MERRA-2 or other datasets. How the covariance matrix of the prior is defined (e.g., based on climatological variability, instrument uncertainty, or empirical assumptions).
These revisions would strengthen the manuscript’s clarity and methodological rigor.