Particulate Matter Concentrations Derived from Airborne High Spectral Resolution Lidar Measurements Using Machine Learning Regression
Abstract. We use measurements of near-surface aerosol backscatter, extinction, and depolarization acquired by four NASA Langley Research Center airborne High Spectral Resolution Lidars (HSRLs) in machine learning (ML) regression algorithms to derive concentrations of particulate matter (PM) with aerodynamic diameters less than 2.5 mm (PM2.5), 10 mm (PM10), and the PM2.5/PM10 ratio. The ML regression models are trained using airborne HSRL measurements acquired over major metropolitan regions in the United States and Asia that are coincident with hourly surface PM2.5 and PM10 measurements from the EPA air quality system and similar networks in other countries. We examine several regression methods and find that exponential Gaussian Process regression (GPR) algorithms consistently give the best performance in terms of the lowest root-mean-square (RMS) errors and the highest correlations. When evaluated using surface measurements withheld from the training sets, ML models that use the HSRL near-surface measurements of aerosol backscatter and aerosol intensive properties such as depolarization, backscatter color ratio, and lidar ratio typically give the best performance with RMS differences in PM2.5 retrievals around 5 mg m-3 and correlation coefficients above 0.8, respectively. Corresponding RMS differences and correlation coefficients for PM10 retrievals are 11 mg m-3 and 0.7 and corresponding RMS differences and correlation coefficients for PM2.5/PM10 are 0.17 and 0.75. This retrieval performance is achieved using airborne HSRL measurements alone and so does not depend on external knowledge of or assumptions regarding aerosol type, aerosol mass extinction efficiency, aerosol hygroscopic growth, the ratio of PM2.5 to PM10, particle density, or relative humidity. PM2.5 values in the training set range from about 5 to 80 mg m-3; PM10 values range from about 10 to 100 mg m-3. Accurate retrievals of PM outside these ranges would require commensurate training data. We present examples of PM retrievals in the United States as well as Asia when HSRL measurements were acquired when the aircraft flew systematic "raster-scan" patterns for several hours over major urban areas. We show that these PM2.5 retrievals are in good agreement with PM2.5 derived from coincident airborne in situ measurements near the surface as well as aloft. We describe also how the distribution of PM2.5 varies with aerosol type and altitude over these regions. We use the HSRL measurements of aerosol extinction and retrievals of surface PM2.5 along with HSRL retrievals of aerosol type to derive estimates of the fine mode aerosol mass extinction efficiency (MEEf) for major aerosol types identified by an updated HSRL aerosol classification method. MEEf ranges from about 2.6 ± 0.5 m2 g-1 for maritime aerosol to 5.0 ± 0.7 m2 g-1 for smoke. These estimates of MEEf are also in good agreement with values derived from airborne in situ measurements. We also discuss how this methodology may be applied to measurements from the Atmospheric Lidar (ATLID) on the EarthCARE satellite.