Machine learning significantly improves the simulation of hourly-to-yearly scale cloud nuclei concentration and radiative forcing in polluted atmosphere
Abstract. The accurate prediction of cloud condensation nuclei (CCN) number concentration (NCCN) on a large spatiotemporal scale is challenging but critical to evaluate the aerosol cloud interaction effect. Combining multi-source dataset and the NCCN simulated by the Weather Research and Forecasting coupled with Chemistry (WRF-Chem) model, we have developed a Random Forest Regression method (RFRM) model which achieves well prediction of hourly-to-yearly scale NCCN at typical supersaturations in polluted North China Plain (NCP). We show that the prediction bias of NCCN compared to observations is reduced from -59 % with the WRF-Chem model to approximately -31 % with the RFRM model (the prediction precision is improved by 1.6 times accordingly) during the campaigns. The greatest improvement is seen in both very polluted and clean cases. The RFRM model captures well the spatial variation and better describes long-term trends of NCCN. More importantly, the prediction reveals a significant long-term decreasing trend of NCCN in NCP due to a rapid reduction in aerosol concentrations from 2014 to 2018, during which a series of strict emission reduction measures were implemented by the Chinese government. This reflects the climate benefit of pollution control. Our study further illustrates that the RFRM model reduces the uncertainty in simulating cloud radiative forcing from an overestimation of 1.89 ± 0.78 W m-2 to 0.81 ± 0.63 W m-2, illustrating the high sensitivity of climate forcing to changes in NCCN. This work offers a new modeling framework that guides the way to simulate CCN in other regions around the world and has the potential to effectively filling the observation gap of CCN concentrations.