the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
DDA-BNN v1.0: A Morphology-Aware Surrogate Model for the Optical Properties of Black Carbon–Containing Particles
Abstract. Black carbon (BC) is the most strongly absorbing component of atmospheric aerosol and significantly impacts Earth's energy balance. The optical properties of BC-containing particles depend on particle-level variability in size, chemical composition, and internal morphology. Such particle-level details are not easily represented in large-scale atmospheric models. Existing parameterizations typically assume idealized particle geometries (e.g., homogeneous spheres or concentric core–shell spheres) and homogeneous mixing, which can yield biased predictions and provide no quantitative estimate of model-form uncertainty at the single-particle level. In this work, we present a probabilistic framework for predicting the optical properties of individual BC-containing particles using a hybrid Bayesian neural network (BNN) model trained on numerically exact discrete dipole approximation (DDA) simulations. The hybrid BNN is a flexible combination of deterministic and Bayesian layers allowing for more realistic treatment of particle optical properties and quantification of uncertainty. The hybrid BNN model predicts extinction efficiency, single-scattering albedo, and asymmetry parameter and returns predictive uncertainty that can be decomposed into aleatoric (data-driven variability) and epistemic (uncertainty due to limited training coverage) components. We show that the hybrid BNN outperforms homogeneous-sphere and core–shell Mie approximations for calculating extinction and scattering-sensitive quantities (Qext, SSA and g), while maintaining comparable accuracy for absorption-related metrics. We further demonstrate how epistemic uncertainty highlights under-sampled regions of particle parameter space, enabling targeted design of future DDA simulations that most effectively reduce model uncertainty. This uncertainty-aware surrogate provides a practical pathway for incorporating realistically complex particle morphologies into parameterizations of aerosol optical properties, which will ultimately improve the reliability of model-based assessments of BC impacts on the atmosphere.
- Preprint
(6382 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
CEC1: 'Comment on egusphere-2026-1270 - No compliance with the policy of the journal', Juan Antonio Añel, 28 Mar 2026
reply
-
AC1: 'Reply on CEC1', Laura Fierce, 30 Mar 2026
reply
We have revised the manuscript to comply with the GMD Code and Data Policy.
The DDA-BNN codebase, including the training pipeline and inference tools, is publicly available at:
https://github.com/pnnl/DDA-BNN (MIT License).The exact version of the code used in this study has been permanently archived on Zenodo and assigned a DOI:
https://doi.org/10.5281/zenodo.19324375
(https://zenodo.org/records/19324375)The training dataset used in this work is publicly available and has also been permanently archived on Zenodo with a DOI:
https://doi.org/10.5281/zenodo.19324185
(https://zenodo.org/records/19324185)We have updated the “Code and Data Availability” section of the revised manuscript to include these repository links and DOIs, and have added the corresponding citations to the bibliography. All code and data are publicly accessible at the links above for the duration of the discussion phase.
Citation: https://doi.org/10.5194/egusphere-2026-1270-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 30 Mar 2026
reply
Dear authors,
Thanks for addressing this issue so quickly. I have checked the repositories and we can consider now the current version of your manuscript in compliance with the code policy of the journal.
Juan A. Añel
Citation: https://doi.org/10.5194/egusphere-2026-1270-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 30 Mar 2026
reply
-
AC1: 'Reply on CEC1', Laura Fierce, 30 Mar 2026
reply
-
CC1: 'Details of the DDA simulations', Maxim A. Yurkin, 03 Apr 2026
reply
The authors have performed a lot of DDA simulations and carefully described the training and validation starting from the DDA dataset (available at Zenodo). However, I couldn't find a description of the ADDA parameters used for simulations. It would be great to specify them for the overall reproducibility, including the ADDA version, DDA formulation, and discretization. Mentioning that some parameters are set to default values will also be fine.
Putting all raw DDA output online would probably be an overkill, but the authors may consider sharing the scripts, which perform these runs (to build a dataset). They will necessarily contain all the command line options for ADDA. Total or per run computational requirements of the DDA can also be interesting for readers.
Related question is that of the uncertainty of the DDA simulations (expected errors). Is there any estimate on that, at least for a few representative cases? It can be obtained, e.g., by several steps of discretization refinement for the same problem. I suspect, that the corresponding DDA errors are small enough to not influence any conclusions or further steps in the manuscript. Still, it would be great to quantify them.
The latter seems especially natural, since the authors already perform an advanced error analysis, separating aleatoric and epistemic uncertainties. The DDA uncertainty probably fall into aleatoric class, but I am not sure if it needs any special consideration.
Finally, I have a minor stylistic note concerning the name of the ADDA code. The official guideline is not to deabbreviate it - see https://github.com/adda-team/adda/wiki/FAQ#what-is-the-official-name-of-the-code-what-does-a-stands-for . In other words, the standard naming is just ADDA.
Similar aspects were recently discussed in another GMD paper - https://doi.org/10.5194/gmd-19-887-2026 .
Citation: https://doi.org/10.5194/egusphere-2026-1270-CC1 -
RC1: 'Comment on egusphere-2026-1270', Myungje Choi, 18 May 2026
reply
Overall assessment
This study introduces a hybrid Bayesian neural network (BNN) model trained on discrete dipole approximation (DDA) simulations for black carbon (BC)-containing particles. The description of the proposed model is generally clear. A total of eight input features are considered: five particle-related inputs (number of BC particles, coating amount, fractal dimension, volume-equivalent size parameter, and imaginary refractive index of the coating) and three additional inputs derived from homogeneous-sphere Mie calculations (extinction coefficient, single-scattering albedo, and asymmetry parameter). The three model outputs consist of optical properties (extinction coefficient, single-scattering albedo, and asymmetry parameter) from DDA simulations. In addition, the treatment and decomposition of aleatoric and epistemic uncertainty are generally well explained.
Overall, the manuscript is well written, well organized, and easy to follow, except for some details in the machine-learning-related sections where I am not an expert. The scope is well addressed, and the general methodology and evaluation approaches appear reasonable. However, several aspects would benefit from additional clarification. Therefore, I recommend publication after the authors address the comments listed below.Specific comments
Different assumptions between Ensemble I and II (Table 1 and related discussion): Why do differences exist with respect to wavelength, coating refractive index, AAE of the coating, and other parameters?
Figure 1: What is the distinction between the blue and brown particles? I assume the partially transparent regions indicate coatings, but it was not entirely clear whether the figure simply represents uncoated versus coated particles.
How is the core fractal dimension (Df,c) calculated? It would be helpful to provide at least a brief definition or formulation in the Appendix. In addition, the definition or relationship of (Df,c) with other parameters appears to differ between Ensembles I and II, which was somewhat difficult to follow.
L79: Is it reasonable to assume a single fixed value for the BC refractive index? Does the BC refractive index have a known uncertainty range or variability that should be considered?
L130: Naming of input properties. For clarity, it may be helpful to explicitly distinguish the two groups of input features. For example, the first five inputs could be described as “microphysical/morphological descriptors,” while Qext, SSA, and g could be referred to as “optical-property descriptors” or “physics-based optical descriptors.” The authors do not necessarily need to follow these specific suggestions, but clearer categorization of the input features would improve readability. In particular, I found it somewhat difficult to follow the transition from optical-property-based inputs to the more general term “particle properties.”
Sections 5 & 6: These results appear to demonstrate the superiority of the proposed hybrid BNN over the homogeneous-sphere and core-shell assumptions. However, this outcome seems somewhat expected because the DDA simulations are treated as the reference truth, and the hybrid BNN is directly trained on those DDA-generated optical properties. Therefore, Figures 3–5 may reflect the superiority of morphology-resolving DDA physics itself rather than the unique advantage of the hybrid BNN architecture. I wonder if the authors could clarify more explicitly what aspects of the improvement should be attributed to the Bayesian/surrogate framework versus the underlying DDA-based physical representation.
Figure 5: The SHAP analysis suggests that the homogeneous-sphere optical-property inputs contribute substantially to the hybrid BNN predictions. Since the framework conceptually combines homogeneous-sphere optical information with morphology-aware particle descriptors, it would be very helpful to more directly quantify the relative influence of these two groups of inputs on the final predictions. For example, an analysis comparing the hybrid BNN predictions with the corresponding homogeneous-sphere optical properties could help clarify the extent to which morphology-dependent particle information contributes beyond the baseline optical physics.
Figure 6: In panels B, E, and H, the hybrid BNN appears to exhibit relatively larger errors within portions of the parameter range represented in the training dataset. I may be misunderstanding the interpretation of this figure, but I initially expected the prediction errors to be minimized near the training-domain conditions. Additional clarification on how these perturbed-parameter experiments relate to the original training manifold versus the shaded “training range” would help improve interpretation of the figure.
Uncertainty decomposition and Monte Carlo sampling: The uncertainty decomposition is an important component of the proposed hybrid BNN framework, but the manuscript provides relatively limited examples and interpretation of its behavior across different particle conditions. Are there general patterns or tendencies in the aleatoric versus epistemic uncertainty components with respect to particle size, morphology, coating amount, or wavelength? In addition, how sensitive are the uncertainty estimates to the Monte Carlo sampling sizes used during inference? Some discussion regarding convergence behavior and the choice of sampling size would help clarify the robustness of the uncertainty characterization.
Section 8: Since epistemic uncertainty is interpreted as uncertainty associated with limited training coverage, it would be interesting to examine how the epistemic and aleatoric uncertainty components change when the size of the training dataset is artificially reduced (e.g., using half of the current training samples). Such an experiment may help demonstrate whether the epistemic uncertainty behaves consistently with the proposed interpretation, and it may be possible without requiring additional DDA simulations.
From the perspective of future radiative transfer and remote sensing applications, it would be interesting to know whether the framework could eventually be extended to predict more detailed angular scattering information beyond the asymmetry parameter, such as single-particle phase functions or phase-matrix-related quantities. Although the current study focuses on single-particle optical properties, such extensions could further increase the applicability of the framework to aerosol optics studies.Citation: https://doi.org/10.5194/egusphere-2026-1270-RC1
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 618 | 178 | 64 | 860 | 39 | 50 |
- HTML: 618
- PDF: 178
- XML: 64
- Total: 860
- BibTeX: 39
- EndNote: 50
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
In your manuscript, you do not provide a repository containing the data used and produced in your work. The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
The 'Code and Data Availability’ section must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Executive Editor