the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine Learning-Based Downscaling of Aerosol Size Distributions from a Global Climate Model
Abstract. Air pollution, particularly exposure to ultrafine particles (UFPs) with diameters below 100 nm, poses significant health risks, yet their spatial and temporal variability complicates impact assessments. This study explores the potential of machine learning (ML) techniques in enhancing the accuracy of a global aerosol-climate model's outputs through statistical downscaling to better represent observed data. Specifically, the study focuses on the particle number size distributions from the global aerosol-climate model ECHAM-HAMMOZ. The coarse horizontal resolution of ECHAM-HAMMOZ (approx. 200 km) makes modeling sub-gridscale phenomena, such as UFP concentrations, highly challenging. Data from three European measurement stations were used as target of downscaling, covering nucleation, Aitken, and accumulation particle size modes. Six different ML methods were employed, with hyperparameter optimization and feature selection integrated for model improvement. Results showed a notable improvement in prediction accuracy for all particle modes compared to the original global model outputs, particularly for accumulation mode, which achieved the highest fit indices. Challenges remained in downscaling the nucleation mode, likely due to its high variability and the discrepancy in spatial scale between the climate model representation and the underlying processes. Additionally, the study revealed that the choice of downscaling method requires careful consideration of spatial and temporal dimensions as well as the characteristics of the target variable, as different particle size modes or variables in other studies may necessitate tailored approaches. The study demonstrates the feasibility of ML-based downscaling for enhancing air quality assessments. This approach could support future epidemiological studies and inform policies on pollutant exposure. Future integration of ML models dynamically into global climate model frameworks could further refine climate predictions and health impact studies.
Competing interests: One author is a member of the editorial board of journal "Atmospheric Measurement Techniques".
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(10033 KB) - Metadata XML
-
Supplement
(2992 KB) - BibTeX
- EndNote
Status: open (until 17 May 2025)
-
RC1: 'Comment on egusphere-2025-774', Anonymous Referee #1, 17 Apr 2025
reply
This work explores the potential of machine learning techniques in enhancing the accuracy of a global aerosol-climate model’s outputs through statistical downscaling. The study focuses on the particle number size distributions from ECHAM-HAMMOZ and data from three European measurement stations were used for downscaling. The results show an improvement in prediction accuracy compared to the original global model outputs. It is a complex and extended study and the results fit within the scope of AMT, being of interest for the international research community. However, I would suggest some aspects to be considered in order to improve the manuscript and/or strengthen its impact before it is published in AMT.
Major comments
General – The authors state that the ML methods would certainly improve the spatial accuracy of PNSD derived from models. However, in this study, the ML methods are only applied to specific measurement sites. The models are trained using measurements and global model data, but the horizontal resolution of the climate model is 1.9° × 1.9°, which corresponds to approximately 150 km. So the question is: how would the methods perform at other locations within the same grid cell where no measurement data are available for training? For example, in both Helsinki and Leipzig, there are at least two stations measuring PNSD (urban and traffic). I suggest comparing the model performance at two nearby sites, or at least clarifying the intended utility and applicability of the methods and results presented. From the reviewer's perspective, based on the results shown in this manuscript, these methods do not necessarily improve the spatial accuracy of the model but rather enhance the model’s ability to reproduce observations at specific measurement locations.
L130 – The authors defined the nucleation, Aitken, and accumulation mode size ranges using uncommon values (<7.7 nm, 7.7–50 nm, and 50–700 nm). This choice appears to be driven by the bin structure of the SALSA model. If that is the case, I would suggest either avoiding the use of the terms "nucleation," "Aitken," and "accumulation" throughout the manuscript, OR adjusting the size ranges to align with the commonly accepted definitions associated with different aerosol processes (e.g., <25 nm, 25–100 nm, and 100–1000 nm). I would also suggest rephrasing: “These size ranges correspond to the SALSA bins...” by “These size ranges were selected to correspond to the SALSA bins…”
Minor comments
Introduction – The first paragraph of the introduction is unclear regarding the distinction between particle number concentrations and mass concentrations. The two terms appear to be used interchangeably or without clear differentiation. I recommend that the authors clarify when they are referring to number concentrations versus mass concentrations and ensure consistent use of these terms throughout the paragraph. It is important to remain that while UFPs mainly control ambient particle concentrations in terms of number, coarser particles control particle concentrations in terms of mass (PM10 and PM2.5).
L123-129 – Are the PNSDs measured in Germany obtained using a DMPS or a scanning instrument? What size ranges does each instrument cover? The comparison with the model would be site-dependent if the size distributions differ in their lower and upper diameter limits. Uncertainties of the measurements are not considered?
Structure – I suggest reconsidering the structure of the sections. For example, the results of the best-performing method are presented in Section 5.1 before the performance of all methods is discussed in Section 5.2. It may be more logical to first present the comparison across all methods, followed by a deeper look at the best-performing one. Additionally, the title of Section 2, "Climate simulation," may not be the most appropriate for the modelling setup. I would suggest something more descriptive, such as "Global model simulations”.
Nucleation range differences – In several instances, the authors suggest or conclude that “the nucleation mode proved more challenging to downscale due to high spatial variability and limitations in the underlying large-scale climate model output”. From the reviewer’s perspective, the Aitken mode could also exhibit substantial variability, particularly due to urban emissions. Therefore, a more plausible explanation for the difficulty in downscaling the nucleation mode may lie in the limitations of global models in representing new particle formation (such as the treatment of organics, nitrates, sulfuric acid, or nucleation schemes) rather than primarily in the spatial variability of the sources.
Technical corrections
L275 – what means the “-“ at the end of the reference?
L130-131 – change “These size ranges correspond to the SALSA bins...” by “These size ranges were selected to correspond to the SALSA bins…”
L280 - should σ(·) and µ(·) be σ(x) and µ(x)? Actually “x” (eq. 1) is not defined.
Citation: https://doi.org/10.5194/egusphere-2025-774-RC1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
101 | 30 | 5 | 136 | 12 | 4 | 4 |
- HTML: 101
- PDF: 30
- XML: 5
- Total: 136
- Supplement: 12
- BibTeX: 4
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 50 | 38 |
China | 2 | 11 | 8 |
Finland | 3 | 11 | 8 |
Germany | 4 | 6 | 4 |
Spain | 5 | 5 | 3 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 50