the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Detection of structural deficiencies in a global aerosol model to explain limits in parametric uncertainty reduction
Abstract. Understanding and reducing uncertainty in model-based estimates of aerosol radiative forcing is crucial for improving climate projections. A key challenge is that differences between model output and observations can stem from uncertainties in input parameters (parametric uncertainty) or from deficiencies in model code and configuration (structural uncertainty), and these two causes are difficult to distinguish. Structural deficiencies limit efforts to reduce parametric uncertainty through observational constraint because they prevent models from being simultaneously consistent with multiple observations. However, no framework exists to detect structural deficiencies and assess their impact on parametric uncertainty. We propose a workflow to identify structural inconsistencies between observational constraints and diagnose potential structural deficiencies. Using a perturbed parameter ensemble, we sample uncertainty in aerosols, clouds, and radiation in the UK Earth System Model (UKESM), and evaluate model bias against in-situ observations of sulfate aerosol, sulfur dioxide, aerosol optical depth, and particle number concentration across Europe. Applying observational constraints reveals inconsistencies that no combination of the perturbed parameters can resolve. For example, sulfate concentrations in different regions cannot be matched simultaneously, and enforcing a compromise between region reduces skill across most variables. Additional examples include an inter-region inconsistency in SO2 and an inter-variable inconsistency between aerosol optical depth and sulfate. By examining the parameter sets retained by constraints, we trace inconsistencies to the parameterisations that may cause them and propose targeted changes to address them. This approach offers a pathway for evidence-based model development that supports more robust uncertainty reduction and improves climate projection skill.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Atmospheric Chemistry and Physics.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(6939 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 18 Nov 2025)
- RC1: 'Comment on egusphere-2025-4795', Anonymous Referee #1, 02 Nov 2025 reply
-
RC2: 'Comment on egusphere-2025-4795', Hunter Brown, 08 Nov 2025
reply
Review of “Detection of structural deficiencies in a global aerosol model to explain limits in parametric uncertainty reduction” by Prévost et al.
This study describes a framework for diagnosing and identifying potential sources of structural uncertainties within models using a combination of Perturbed Parameter Ensemble (PPE) data (generated using the UKESM) and observational data constraint. The structural uncertainty in question focuses on aerosol-radiation parameterizations within UKESM, and targets European winter observations of sulfate aerosol, sulfur dioxide, AOD, and aerosol particle number concentration. The framework outlined here (1) carefully identifies the key causes of parametric uncertainty in the simulation of the four observable quantities mentioned above; this uncertainty then (2) informs k-means spatial clustering of key parameter influence in three to four main modes over Europe which are then emulated to produce a robust array of parameter combinations for each cluster and calculate emulator bias in corresponding observational datapoints; next (3) observational constrains are applied to corresponding clusters to isolate members that reduce bias while simultaneously identifying how the single cluster constraints contribute to other cluster bias and illuminate structural bias through diagnosis of model precision vs accuracy; finally, this leads to (4) exploration of structural uncertainties and their potential causes.
This work had several key findings that stood out. One is that constraint to observations in a given sector does not necessarily improve model-observational agreement in other sectors. While this is not the first study to describe this, they support previous work with similar findings (summarized in their introduction) by adding a quantitative analysis of this phenomenon. This also informed inter-sector comparisons showing that the constraints within a sector could lead to competing behavior in different variables. In some cases, improving overall model constraints meant greatly weakening the constraint to observations, introducing more error in the model-to-observation comparisons. This all laid the groundwork for characterizing structural uncertainty in aerosol-radiation interactions in UKESM based on cluster and parameter overlap, identifying different degrees of structural error and potential causes.
While my expertise doesn’t lie in many of the statistical methods applied in this work, I found the paper compelling, interesting, well written, and logically organized. I thought the findings were well-supported by analysis and references, and the figures and application told a cohesive story. The one concern I had stems from a question that I became a bit fixated on as I read the paper: how do you know if the uncertainty is structural or related to the design of your PPE (i.e., missing parameter(s))? In some regards, this is still a structural uncertainty, but it comes from the structure of your PPE instead of your model. The authors do acknowledge the gray area of the structural uncertainty quantification before diving into their results as well as identifying parametric choice as a possible structural flag (they mention this briefly in Section 3.6.1). However, I would have liked to hear their thoughts regarding the potential contribution to inferred structural uncertainty that could come from the actual PPE design. Perhaps this was deemed small through history matching? This was unclear to me, and barring some misunderstanding on my part, it seemed an especially important part of the discussion to include in Section 3.6.1 and in the flowchart. Please see my one major comment and the minor comments below for more information.
Overall, I recommend this paper be accepted after minor revisions.
Major comment:
It seems like the PPE design can have strong implications for the structural uncertainty quantification proposed herein, and the degree to which this is contributing wasn’t always clear. This seemed most apparent when reading the interpretation of the AOD-Sulfate discrepancy in Section 3.6.1, where the authors do mention that this spread could be caused by lack of exploration of the parameter space. While they mention that nitrate and carbonaceous emissions may be factors, could this also be due to a lack of dust emission and RI parameters in their PPE? Assuming dust has a significant contribution to AOD in the European winter along with nitrate and carbonaceous aerosol, if their PPE had included dust emission and RI parameters it seems the discrepancy between AOD and sulfate may not have existed as dust could have been changed within its uncertainty while sulfate could remain unchanged (same for nitrate and carbon).
I think this should be explored briefly by the authors through some supplemental description of aerosol species contributions to AOD across their time period. If dust or carbonaceous aerosol have a larger impact than sulfate then this may inform the PPE design. I would also appreciate a bit more discussion as to the role that the PPE plays in interpretation of structural error. If it is significant and could be identified by some key characteristics such as the divergent parametric behavior noted in Section 3.6.1, it might also be worth adding a connection in Fig. 16 (potentially between “Identify related parameterization” and “create PPE”) that describes some expert elicitation on PPE design. Please see minor comments for in-text details.
Minor comments
Line 75: Does history matching operate on an unchanging set of selected parameters that vary in their values (i.e., a single PPE), or does it operate on multiple PPEs with different parameter lists? Please clarify here and/or in the paper. This gets at a concern I have throughout reading this paper which is how one separates the unexplored parametric uncertainty (i.e., from missing parameters in your PPE) from structural uncertainty. Perhaps history matching gives some confidence in that separation, but if the list of parameters remains the same, it seems one may be missing or mischaracterizing the parametric uncertainty from parameters that haven't been included in the parameter set.
Lines 282-284: “it means that no amount of parameter retuning will bring the model into agreement with the observations” – this is true, but this is tied to the chosen parameter ranges. How much confidence is there in the preexisting parameter ranges, and could they be expanded? Also, could it be that the inclusion of another parameter to the PPE might change model sensitivity to preexisting parameter ranges, potentially changing the status of the structurally deficient members?
Lines 498-500: Are you able to speak to the variation in AOD bias across Europe? Is the lower bias in Northern Europe related to being close to the source with more consistent seasalt exposure, while the other regions are impacted more by the more unique dynamical conditions that might transport seasalt into the mainland of Europe?
Line 522: change ‘parametrisations’ to ‘parameterisations’
Line 594: “…model variants that match high sulfate concentrations…” - I'm fairly certain this is in reference to Figure 7, but it would be nice for the reader to have a reference to that figure for clarification.
Line 598: “…likely because sulfate is not strongly biased there…” - Please justify this statement with a figure reference or clarification. Fairly certain you are referencing Fig. 2 but being more explicit in this section will make it easier to follow for the reader.
Line 657: Please consider referencing the black points in this citation. Something like: '(black points; Fig. 11d)'.
Line 708: ‘Aerosol sulfate is a large component of AOD in polluted regions…’ - Please quantify in supplementary or cite where this statement comes from. How large a contribution does sulfate have on average to AOD in the time periods analyzed here? I think something like a mass weighted contribution of all aerosol species to AOD could serve as a good reference.
Line 726-729: This is interesting. I wonder if a high sensitivity to dust is driving your AOD overestimation. This may be structural, but it seems it could also be parametric. On this note, how do you differentiate structural deficiencies from parametric uncertainty that wasn't addressed? Couldn't a dust emission parameter be contributing to structural uncertainty due to its not being included in the perturbed parameter list?
Line 758-760: I think this statement is very important and may require additional elaboration. What stands out to me is that dust/nitrate/carbonaceous emissions and dust optical properties were not perturbed within your PPE framework, both of which could have a large impact on your AOD. I'm not sure how sensitive dust in the UKESM is to meteorological conditions or if it is directly emitted, but I see this as a potential target in this comparison. If dust is indeed the culprit for the AOD overestimate, then sulfate is getting pushed into unrealistic concentrations to account for it. This is also a parametric source of uncertainty, but does this get lumped in with structural uncertainty by virtue of its not being included in the parameter list?
Figure 16, ‘identify related parameterization’ ‘what structural deficiency affects the parameterisation’: Could another option be that the parameter space may be missing a key contributor to the chosen model diagnostic (i.e., AOD)? In this case, would it be necessary to reformulate the PPE or add an additional parameter? Not sure how to identify when this is the case as it may be considered a structural issue but I see this as a potential addition to this workflow.
Line 815: “…they often point directly to process-level assumptions that are missing, misrepresented, or oversimplified.” - I think this is a key comment that could be built on for interrogating the PPE design.
Citation: https://doi.org/10.5194/egusphere-2025-4795-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 272 | 50 | 12 | 334 | 14 | 13 |
- HTML: 272
- PDF: 50
- XML: 12
- Total: 334
- BibTeX: 14
- EndNote: 13
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The authors provide a workflow for identifying structural uncertainties. They use emulators to create surrogate models and constrain plausible parameter combinations. By examining inconsistencies in observational constraints across variables and regions, they trace these inconsistencies back to the underlying parameterisations, links them to likely structural model issues, and explores their possible causes.
The paper is well written and the figures provide sufficient visual context. There are a few minor concerns that the authors should address before publication.
Comments:
Line 23: region to regions
Line 25: repeated “them” is vague — does it refer to inconsistencies or parameterisations?
Line 108: suggested
Line 216. Give a brief definition of Generalised Additive Models (GAMs)
Line 238: Does “six” refer to the number of grid boxes? Alternatively, is the question about how the number of clusters can be compared with the size of the region?
Line 255: Is linear interpolation applied spatially or temporally during collocation?
Line 275: what is the definition of ‘model variants that are common’?
Line 299. If the parameter space does not converge across different observational constraints, does this indicate structural uncertainty and suggest that the model structure needs refinement rather than relying on tuning?
Line 326-7: Can the inconsistency also be attributed to emulator uncertainty?
Line 370 (Figure 3): Does each circle or triangle represent one observational site collocated with a model grid box? You may want to clarify this in the caption.
Figure3: The number of available data points in 3d (N₃) is significantly lower than in the other plots (sulfate, SO₂, and AOD). Could you clarify the reason for this difference? In addition, the spatial locations of data points do not appear to match across the observed variables. This brings me back to a previous question: are the observational data points collocated with the model grid when comparing observations to model outputs?
Line 503-505: Could the possible explanations also include removal processes (e.g., dry_dep_acc, cloud_drop_acidity) as indicated by Figure 5c?
Line 655-665: you may want to label the three groups in Figure 11d. In Figure 11d, you could use boxes to highlight the different groups, so that it’s easier to link the figure with the corresponding text descriptions.
Figure 2B: The emulator uncertainty for N₃ seems large. Does this substantially affect your observational constraints?