the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine Learning-Based Downscaling of Aerosol Size Distributions from a Global Climate Model
Abstract. Air pollution, particularly exposure to ultrafine particles (UFPs) with diameters below 100 nm, poses significant health risks, yet their spatial and temporal variability complicates impact assessments. This study explores the potential of machine learning (ML) techniques in enhancing the accuracy of a global aerosol-climate model's outputs through statistical downscaling to better represent observed data. Specifically, the study focuses on the particle number size distributions from the global aerosol-climate model ECHAM-HAMMOZ. The coarse horizontal resolution of ECHAM-HAMMOZ (approx. 200 km) makes modeling sub-gridscale phenomena, such as UFP concentrations, highly challenging. Data from three European measurement stations were used as target of downscaling, covering nucleation, Aitken, and accumulation particle size modes. Six different ML methods were employed, with hyperparameter optimization and feature selection integrated for model improvement. Results showed a notable improvement in prediction accuracy for all particle modes compared to the original global model outputs, particularly for accumulation mode, which achieved the highest fit indices. Challenges remained in downscaling the nucleation mode, likely due to its high variability and the discrepancy in spatial scale between the climate model representation and the underlying processes. Additionally, the study revealed that the choice of downscaling method requires careful consideration of spatial and temporal dimensions as well as the characteristics of the target variable, as different particle size modes or variables in other studies may necessitate tailored approaches. The study demonstrates the feasibility of ML-based downscaling for enhancing air quality assessments. This approach could support future epidemiological studies and inform policies on pollutant exposure. Future integration of ML models dynamically into global climate model frameworks could further refine climate predictions and health impact studies.
Competing interests: One author is a member of the editorial board of journal "Atmospheric Measurement Techniques".
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(10033 KB) - Metadata XML
-
Supplement
(2992 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-774', Anonymous Referee #1, 17 Apr 2025
This work explores the potential of machine learning techniques in enhancing the accuracy of a global aerosol-climate model’s outputs through statistical downscaling. The study focuses on the particle number size distributions from ECHAM-HAMMOZ and data from three European measurement stations were used for downscaling. The results show an improvement in prediction accuracy compared to the original global model outputs. It is a complex and extended study and the results fit within the scope of AMT, being of interest for the international research community. However, I would suggest some aspects to be considered in order to improve the manuscript and/or strengthen its impact before it is published in AMT.
Major comments
General – The authors state that the ML methods would certainly improve the spatial accuracy of PNSD derived from models. However, in this study, the ML methods are only applied to specific measurement sites. The models are trained using measurements and global model data, but the horizontal resolution of the climate model is 1.9° × 1.9°, which corresponds to approximately 150 km. So the question is: how would the methods perform at other locations within the same grid cell where no measurement data are available for training? For example, in both Helsinki and Leipzig, there are at least two stations measuring PNSD (urban and traffic). I suggest comparing the model performance at two nearby sites, or at least clarifying the intended utility and applicability of the methods and results presented. From the reviewer's perspective, based on the results shown in this manuscript, these methods do not necessarily improve the spatial accuracy of the model but rather enhance the model’s ability to reproduce observations at specific measurement locations.
L130 – The authors defined the nucleation, Aitken, and accumulation mode size ranges using uncommon values (<7.7 nm, 7.7–50 nm, and 50–700 nm). This choice appears to be driven by the bin structure of the SALSA model. If that is the case, I would suggest either avoiding the use of the terms "nucleation," "Aitken," and "accumulation" throughout the manuscript, OR adjusting the size ranges to align with the commonly accepted definitions associated with different aerosol processes (e.g., <25 nm, 25–100 nm, and 100–1000 nm). I would also suggest rephrasing: “These size ranges correspond to the SALSA bins...” by “These size ranges were selected to correspond to the SALSA bins…”
Minor comments
Introduction – The first paragraph of the introduction is unclear regarding the distinction between particle number concentrations and mass concentrations. The two terms appear to be used interchangeably or without clear differentiation. I recommend that the authors clarify when they are referring to number concentrations versus mass concentrations and ensure consistent use of these terms throughout the paragraph. It is important to remain that while UFPs mainly control ambient particle concentrations in terms of number, coarser particles control particle concentrations in terms of mass (PM10 and PM2.5).
L123-129 – Are the PNSDs measured in Germany obtained using a DMPS or a scanning instrument? What size ranges does each instrument cover? The comparison with the model would be site-dependent if the size distributions differ in their lower and upper diameter limits. Uncertainties of the measurements are not considered?
Structure – I suggest reconsidering the structure of the sections. For example, the results of the best-performing method are presented in Section 5.1 before the performance of all methods is discussed in Section 5.2. It may be more logical to first present the comparison across all methods, followed by a deeper look at the best-performing one. Additionally, the title of Section 2, "Climate simulation," may not be the most appropriate for the modelling setup. I would suggest something more descriptive, such as "Global model simulations”.
Nucleation range differences – In several instances, the authors suggest or conclude that “the nucleation mode proved more challenging to downscale due to high spatial variability and limitations in the underlying large-scale climate model output”. From the reviewer’s perspective, the Aitken mode could also exhibit substantial variability, particularly due to urban emissions. Therefore, a more plausible explanation for the difficulty in downscaling the nucleation mode may lie in the limitations of global models in representing new particle formation (such as the treatment of organics, nitrates, sulfuric acid, or nucleation schemes) rather than primarily in the spatial variability of the sources.
Technical corrections
L275 – what means the “-“ at the end of the reference?
L130-131 – change “These size ranges correspond to the SALSA bins...” by “These size ranges were selected to correspond to the SALSA bins…”
L280 - should σ(·) and µ(·) be σ(x) and µ(x)? Actually “x” (eq. 1) is not defined.
Citation: https://doi.org/10.5194/egusphere-2025-774-RC1 -
RC2: 'Comment on egusphere-2025-774', Anonymous Referee #2, 03 Jun 2025
EGUsphere-2025-774: “Machine Learning-Based Downscaling of Aerosol Size Distributions from a Global Climate Model”, by Vartiainen et al.
This manuscript presents an interesting application of machine learning (ML) methods for downscaling particle number size distributions from a global climate model. The topic is relevant for improving exposure estimates in air quality studies and refining global climate model outputs. The authors provide a detailed description of the methods and an extensive discussion of the results. However, the manuscript would benefit from significant revisions to improve its clarity in the methodology and results. In particular, several aspects of the study design, including the choice and justification of ML methods, the feature selection strategy and the interpretation of SHAP results, require clearer explanations and additional justification. In its current form, I believe the manuscript requires major revisions before it can be considered for publication.
General comments:
- In Section 2, the description of the climate simulation with the global climate model would benefit from a clearer explanation of which model outputs are compared with aerosol measurements. The text mentions that the SALSA 2.0 module discretizes the aerosol size distribution into ten size classes, but it is not entirely clear how these classes are defined and whether any conversions or postprocessing steps are applied before comparison with the measurements. While some of this information is briefly introduced later in Section 3 (lines 130–132), it would improve clarity to include this explanation earlier in Section 2, when the SALSA module is first introduced.
- The selected Machine Learning (ML) algorithms are often used for regression and classification tasks, but it would be valuable for the authors to clarify whether they considered alternative models specifically designed for time series prediction, such as Long Short-Term Memory (LSTM) networks or Recurrent Neural Networks (RNNs). These models are well-suited for capturing temporal dependencies and trends, which may be relevant for predicting daily Particle Number Concentration (PNC). A brief discussion or justification of the model’s selection, particularly regarding the temporal structure of the data, would strengthen the manuscript.
- The feature selection procedure described in Section 4.3, based on iterative removal of features with the highest number of correlations above a threshold (red_thresh), is not a standard approach in the literature. It would be helpful if the authors could clarify the reasons for choosing this specific method over more standard approaches, such as filtering correlated pairs directly or using model-based feature importance metrics. Additionally, a brief discussion on the potential risks of this approach, such as removing features that may provide complementary information, would strengthen the methodology.
- While the authors provide an extensive description of the Bayesian Optimization (BO) framework, the methodology for hyperparameter tuning (Section 4.5.1) appears excessively complex for the problem and the size of the data. The reasoning for using kernel modifications for integer and categorical hyperparameters, rather than more standard methods such as grid or random search, could be better explained. It would be helpful for the authors to explain why such a sophisticated method was necessary, and whether it led to substantial improvements in model performance. Additionally, a workflow diagram summarizing the hyperparameter tuning process could improve clarity and reproducibility.
- The logical flow of the results section could be improved. Currently, the downscaling performance (Section 5.1) is presented before the comparison of ML models (Section 5.2). However, it seems more logical to first compare the models, justify the selection of the best performing one, and then present the downscaling results. Reorganizing the sections accordingly would help readers follow the reasoning behind model selection and evaluation.
Specific comments:
- Abstract:
- The abstract would benefit from a clearer explanation of the data used as ground truth for training and validation of the ML models. It is important to specify which datasets were used as reference for the predictions.
- The six ML models tested should be mentioned in the abstract.
- Line 11: It would be desirable to provide a value of the “highest fit indices” mentioned.
- Lines 11-13: The abstract suggests that challenges in downscaling were observed only for the nucleation mode. Could the authors clarify why these challenges were specific to this mode and not evident in the others? A more detailed explanation in the relevant section would be beneficial.
- Sect. 2:
- Line 112-113: Could the authors clarify whether the layer nearest to the surface was used for the analysis? Please, specify.
- Sect. 4.1:
- The inputs to the ML models in Section 4.1 are not clearly described. Please clarify which ECHAM-HAMMOZ outputs were used, and whether the models were trained separately for each station or using combined data. This is explained later but it would be beneficial to include that information also earlier in this section.
- Line 167-168: Was k-fold cross-validation used in any stage of the analysis? Please, provide details.
- Sect. 4.2:
- Lines 181-182: If two of the stations fall within the same ECHAM grid cell, does this mean the simulated data for these two stations is identical, while the observed data differs? If this is the case, how was this handled in the analysis?
- Lines 186-187: I suggest to explain better to which variable do the authors refer with “winter-summer variability” and “spring-summer variability”.
- Lines 193-198: The manuscript mentions splitting the dataset by year (one split per year). Is this a common practice in similar studies? Furthermore, how were missing values handled in the data?
- Lines 199-200: Were any other data preprocessing techniques (apart from normalization) applied? If so, please specify.
- Sect. 4.3:
- Lines 209-211: When two variables were found to be highly correlated, which one was dropped? Additionally, is there a risk that intercorrelations among other variables led to unintended feature removal? Including a correlation matrix for all variables would help clarify and visualize the feature selection process.
- Sect. 4.6:
- In general, this section could benefit from a clearer and more concise organization of ideas. Please, revise it.
- Sect. 5.1:
- Figure 3: Please clarify in the caption and text that the figure corresponds to the test dataset.
- Sect. 5.2:
- Figure 4: The overall performance of the ML models appears low. A comparison with previous studies or a discussion of whether a ρ² of ⁓3 represents a significant improvement would strengthen the interpretation of the results.
- Lines 422-423: The text suggests that the particle number size distribution in Helsinki is the sole factor affecting modeling complexity. Could the authors consider that the input data and variable selection might also contribute to this issue? Focusing solely on a single factor may oversimplify the problem.
- Line 425: The text mentions feature selection, but this is not clearly indicated in Tables S2–S8. Please revise.
- Lines 423-432: It is unclear how the number of features selected for each model were determined. Please clarify.
- Sect. 5.3:
- The section would benefit from a more structured presentation, perhaps summarizing the main conclusions at the end. Clarifying how the feature selection approach interacts with the optimization process and providing a clearer link to the main results of the study would improve the clarity and relevance of this discussion.
- Lines 439-441: Could the authors clarify whether these hyperparameters were intended to be the final optimized values?
- Sect. 5.4:
- Line 496-498: It would be helpful if the authors could specify which features were selected for each model.
- Line 504-508: While the authors use SHAP analysis to indicate feature importance, it is important to note that SHAP values reflect each feature’s contribution to the model’s predictions, not necessarily a causal relationship with the predicted quantity (PNC in this case). For example, the importance of sea salt may result from correlations with other variables (e.g., num_2a6, num_2a7, WAT_2a6, WAT_2a7) more directly linked to PNC. The authors should clarify this distinction and discuss the implications of such proxy features in interpreting the model outputs.
- Supplementary Material:
- The hyperparameter values reported in Tables S2-S8 raise questions. For example, in Table S4, learning rates such as 0.547 (Leipzig Acc) or 0.189 (Helsinki Nuc) are atypically high compared to standard practices in XGBoost modeling, where learning rates are typically in the range of 0.01 to 0.1. Similarly, the large values for regularization parameters (e.g., reg_alpha = 708 for Leipzig Acc) seem unusual and potentially indicative of overfitting or instability. The authors should discuss the implications of these unusual values and whether they align with expected behavior in aerosol-climate model downscaling.
Minor comments:
- Lines 43-44: This sentence looks incomplete.
- Line 91: “were” instead of “was”.
- Line 145-147: Please, revise this sentence.
- Line 164: “a” instead of “an”.
- Line 393-394: Where is this comparison?
- Caption of Figure 3: Is it orange?
Citation: https://doi.org/10.5194/egusphere-2025-774-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
208 | 57 | 11 | 276 | 20 | 14 | 24 |
- HTML: 208
- PDF: 57
- XML: 11
- Total: 276
- Supplement: 20
- BibTeX: 14
- EndNote: 24
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1