the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Particulate Matter Concentrations Derived from Airborne High Spectral Resolution Lidar Measurements Using Machine Learning Regression
Abstract. We use measurements of near-surface aerosol backscatter, extinction, and depolarization acquired by four NASA Langley Research Center airborne High Spectral Resolution Lidars (HSRLs) in machine learning (ML) regression algorithms to derive concentrations of particulate matter (PM) with aerodynamic diameters less than 2.5 mm (PM2.5), 10 mm (PM10), and the PM2.5/PM10 ratio. The ML regression models are trained using airborne HSRL measurements acquired over major metropolitan regions in the United States and Asia that are coincident with hourly surface PM2.5 and PM10 measurements from the EPA air quality system and similar networks in other countries. We examine several regression methods and find that exponential Gaussian Process regression (GPR) algorithms consistently give the best performance in terms of the lowest root-mean-square (RMS) errors and the highest correlations. When evaluated using surface measurements withheld from the training sets, ML models that use the HSRL near-surface measurements of aerosol backscatter and aerosol intensive properties such as depolarization, backscatter color ratio, and lidar ratio typically give the best performance with RMS differences in PM2.5 retrievals around 5 mg m-3 and correlation coefficients above 0.8, respectively. Corresponding RMS differences and correlation coefficients for PM10 retrievals are 11 mg m-3 and 0.7 and corresponding RMS differences and correlation coefficients for PM2.5/PM10 are 0.17 and 0.75. This retrieval performance is achieved using airborne HSRL measurements alone and so does not depend on external knowledge of or assumptions regarding aerosol type, aerosol mass extinction efficiency, aerosol hygroscopic growth, the ratio of PM2.5 to PM10, particle density, or relative humidity. PM2.5 values in the training set range from about 5 to 80 mg m-3; PM10 values range from about 10 to 100 mg m-3. Accurate retrievals of PM outside these ranges would require commensurate training data. We present examples of PM retrievals in the United States as well as Asia when HSRL measurements were acquired when the aircraft flew systematic "raster-scan" patterns for several hours over major urban areas. We show that these PM2.5 retrievals are in good agreement with PM2.5 derived from coincident airborne in situ measurements near the surface as well as aloft. We describe also how the distribution of PM2.5 varies with aerosol type and altitude over these regions. We use the HSRL measurements of aerosol extinction and retrievals of surface PM2.5 along with HSRL retrievals of aerosol type to derive estimates of the fine mode aerosol mass extinction efficiency (MEEf) for major aerosol types identified by an updated HSRL aerosol classification method. MEEf ranges from about 2.6 ± 0.5 m2 g-1 for maritime aerosol to 5.0 ± 0.7 m2 g-1 for smoke. These estimates of MEEf are also in good agreement with values derived from airborne in situ measurements. We also discuss how this methodology may be applied to measurements from the Atmospheric Lidar (ATLID) on the EarthCARE satellite.
- Preprint
(7315 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-4812', Anonymous Referee #1, 20 Oct 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-4812/egusphere-2025-4812-RC1-supplement.pdfCitation: https://doi.org/
10.5194/egusphere-2025-4812-RC1 -
RC2: 'Comment on egusphere-2025-4812', Anonymous Referee #2, 07 Nov 2025
The study deals with the use of the Matlab regression learner software to explore various regression methods applied to several large observational airborne lidar (HSRL) data sets. These data were collected during large field campaigns over major urban areas in the USA, South Korea, Philippines, Taiwan, and Thailand. Goal was to investigate what combination of lidar information (backscatter, extinction, lidar ratio, depolarization ratio at single or multi wavelengths) allows a good estimation of PM2.5 and PM10 at heights close to the surface. In this machine learning (ML) studies, dense sets of network in situ observations of PM2.5 and PM10 were used. It was found that the Exponential Gaussian Process Algorithms consistently showed the best performance. 12 different lidar configurations (models 1-12) were defined. However, in the result section only the optimum model (model 11) was applied.
This is an excellent and well elaborated study done by experienced researchers and lidar experts!
I have only minor remarks. As a reviewer, my role is to be critical and to criticize if I find something that should be mentioned. All the positive aspects remain widely uncommented.
The Abstract may be too long. According to the AMT/ACP rules the abstract should not exceed 250 words.
Lines 93-111: What about all the ground-based lidars and lidar networks? Why are they not mentioned? All the multiwavelength Raman polarization lidars, EARLINET? Ground-based lidars are ideal to monitor the diurnal, weekly, and seasonal cycle of the aerosol pollution state in urban areas, and this, in contrast to airborne and satellite lidars, continuously! Airborne field campaigns are very useful, no doubt, but they are snapshots! Spaceborne lidar observations provide global coverage, however, also snapshot-like. To my opinion, in such a general introduction one should provide a more general overview on the available lidar techniques and networks, MPLNET, ADNET, EARLINET.
Lines 139-150: To continue with my general comment: I was surprised that the Raman lidar technique was not mentioned at all, although the first author Rich Ferrare grew up as an aerosol Raman lidar specialist. The use of the robust and very stable Raman lidar technique is, to my opinion, the optimum approach for long term monitoring of aerosol pollution, even at low heights of 100-200 m above ground (by using near range receiver units). Meanwhile, rotational Raman channels allow coverage of the lower part of the atmosphere even at daytime.
To avoid misunderstanding. The development of all the different airborne HSRL lidars at LARC, NASA is unique! The lidar team as a whole did a fantastic job during the last 10-15 years.
Back to the manuscript. Later on, I was also surprised that none of the defined models 1-12 covers the CALIOP lidar configuration. I think that should be improved. Or does it make no sense at all, when there is no lidar-ratio information? The CALIOP model would be model 7 without information on 532 nm lidar ratio and 532-1064 nm depolarization ratio. The comparison of model 7 (without the lidar ratio and 1064 nm depol ratio information) and model 12 would be the perfect opportunity to demonstrate the big step forward in spaceborne lidar development from CALIOP to ATLID (EarthCARE lidar)!
Line 216: Table 1! It is not easy to find out what the HSRL 2 (the main lidar in all these field campaign discussed in this paper, model 11) can measure. A better, clear overview of the different systems would be helpful.
Line 218: What do you mean with self-calibration. In the backscatter coefficient retrieval, you always need to assume a reference backscatter value at the reference height.
Line 290: So, the basic goal was to use 193 flights (conducted from 2010-2024) over major metropolitan regions to explore various machine learning regression models for deriving PM concentrations. The result section is, however, mainly contains HSRL-2 observations and applications of model 11.
How many flights were conducted with the HSRL-2?
When using these 193 flight over urban areas then you investigated the link between lidar observations and in situ observations for only ONE aerosol type, even if PM2.5/PM10 ranged from 0.1 to 0.9? Please comment on that!
To cover the entire globe (in the case of global observations with CALIOP or ATLID) would that mean we need global sets of in situ PM observations in the machine learning studies?
Line 316: Is there a good reference available so that the reader can learn more about the Exponential Gaussian Process Algorithm?
Line 329, Table 2: Model 11 has the most crosses and is obviously the best model in this study. Model 12 is the EarthCARE model! Why is there no CALIOP model? … model 7 (without lidar ratio and 1064 nm depol ratio information)? Please comment on that!
In the case of models 7-11: Either BSC or EXT, but always LR is used! Does that mean: When BSC and LR is included, automatically the information about EXT is available, andis not needed? Please explain why a model that uses BSC plus EXT plus LR makes no sense!
Line 351: Figure 4 shows models 1, 2, 3, and 11! I think one should show model 7 in this figure!
Line 366, Figure 5: I am surprised that the use of BSC gives better results than the use of EXT. The extinction coefficient (overall scattering effect) is closely linked to the cross section of the particles, and PM is also well correlated with the particle cross section and thus with EXT. Is the reason that BSC is the better parameter related to the fact that the study only concentrates on the urban-haze aerosol type (mostly fine-mode aerosol)?
Section 3: The result section shows interesting results and the full potential of airborne aerosol HSRL observations to quantify the pollution state close to the ground. I have no questions here!
Figure 8: The in-situ observations (EPA surface stations) are not easy to see. Maybe a bit larger symbols will help?
Figures 12-15 show convincing (excellent) results. But as a critical reviewer my question would be? Can we use the developed approach if we have totally independent data sets, e.g., lidar observations over Beijing, Shanghai, Wuhan, Pearl River Delta in China, or over polluted Cairo, Egypt, Dakar, Senegal, Nairobi, Kenia, or over Paris and London in Europe, or Tomsk in Siberia or Fairbanks in Alaska? Any comment on that would be fine! Do we need always complex data sets of lidar and in situ observations in ML efforts for each region of the world, before we can make trustworthy use of lidar observation?
Citation: https://doi.org/10.5194/egusphere-2025-4812-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 234 | 211 | 14 | 459 | 13 | 13 |
- HTML: 234
- PDF: 211
- XML: 14
- Total: 459
- BibTeX: 13
- EndNote: 13
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1