Deep learning representation of the aerosol size distribution

Barahona, Donifan; Breen, Katherine; Block, Karoline; Darmenov, Anton

doi:10.5194/egusphere-2025-482

Preprints

https://doi.org/10.5194/egusphere-2025-482

Preprints

17 Mar 2025

| 17 Mar 2025

Deep learning representation of the aerosol size distribution

Donifan Barahona, Katherine Breen, Karoline Block, and Anton Darmenov

Abstract. Aerosols influence Earth's radiative balance via the scattering and absorbing of solar radiation, affect cloud formation, and play important roles on precipitation, ocean seeding and human health. Accurate modeling of these effects requires knowledge of the the chemical composition and size distribution of aerosol particles present in the atmosphere. Computationally intensive applications like remote sensing and weather forecasting commonly use simplified representations of aerosol microphysics, prescribing the aerosol size distribution (ASD), introducing uncertainty in climate predictions and aerosol retrievals. This work develops a neural network model, termed MAMnet, to predict the ASD and mixing state using the bulk mass of aerosol and the meteorological state. MAMnet can be driven by the output of single moment, mass-based, aerosol schemes or using reanalysis products. We show that MAMnet is able to accurately reproduce the predictions of a two-moment microphysics aerosol model as well as field measurements. Our model paves the way to improve the physical representation of aerosols in physical models while maintaining the versatility and efficiency required in large scale applications.

Received: 31 Jan 2025 – Discussion started: 17 Mar 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2164 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2164 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

26 Mar 2026

Deep learning representation of the aerosol size distribution

Donifan Barahona, Katherine H. Breen, Karoline Block, and Anton Darmenov

Geosci. Model Dev., 19, 2437–2459, https://doi.org/10.5194/gmd-19-2437-2026,https://doi.org/10.5194/gmd-19-2437-2026, 2026

Short summary

Donifan Barahona, Katherine Breen, Karoline Block, and Anton Darmenov

Interactive discussion

Status: closed

CEC1:
'Comment on egusphere-2025-482 - No compliance with the policy of the journal', Juan Antonio Añel, 07 Apr 2025

Dear authors,

Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html

First, you have archived both the GEOS-ESM and the MAMnet code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, in the Data Availability section of your manuscript you provide generic links to main web pages for the full datasets that provide access to the specific data that you have used in your work. We can not accept this. You must provide the exact data that you have used to develop your work. Importantly, in the case of the work that you present, the exact data used for the training of the neural network. This is critical to assure the replicability of your work, and therefore its scientific character.
I have to note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Finally, please, remember that you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the DOI of the new repositories that you create to solve the issues pointed out here.

Juan A. Añel

Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2025-482-CEC1
- AC4: 'Reply on CEC1', Donifan Barahona, 24 Oct 2025
  
  Please see attachment
  
  Citation: https://doi.org/10.5194/egusphere-2025-482-AC4
RC1:
'Review on egusphere-2025-482', Anonymous Referee #1, 28 Apr 2025

The manuscript describes the development of a neural-network model, MAMnet, trained on model output from GEOS+MAM, with the goal to create a computationally cheap platform to estimate aerosol size distributions using outputs from bulk aerosol models, with MERRA-2 used as an example. The work is interesting and is worthy of publication, after addressing my comments below, most of which are minor, but some might qualify as major ones, especially the evaluation part.

General comments
How does the trained model perform during a different time period? Aerosols in the 90s were much higher than they are today, is the model that is practically only driven by temperature (and air density, which does not change much with climate change) able to capture that time period? More generally, what is the validity range of the model, given its training dataset?
How much computational time is saved? There is no MERRA-2+MAM model, but the comparison between GEOS, GEOS+MAM, MERRA-2, and MERRA-2+MAMnet should be able to provide the necessary information.
I guess it is MAM7 used in this work; shouldn’t you be using this name to separate it from other MAM versions?
I am really surprised that only temperature and air density have been used for the meteorological state. I would expect that 3-dimensional wind fields (long-range transport), clouds and precipitation (wet removal, CCN, activation), and surface type (dry deposition) would be of key importance. Clouds can be also important for sulfate formation in the aqueous phase, and then cloud evaporation should affect sulfate size distribution. How can a model be accurate without these processes included?
The lifetime of a single species in MAM (e.g. SU) would depend by the removal rates in each mode, which differs in terms of mode solubility (a function of mode composition) and sedimentation velocity (a function of mode size). The NN training is implicitly using this information, but the NN application in a bulk model like GOCART does not have that distinction when calculating SU mass, so inherently SU is different across models by design. The NN will likely try to compensate that, but can you make a comment on this?

Specific comments
Line 9: Replace “physical representation” with “aerosol microphysics representation”. A machine-learned approach is not physics.
Line 24: “of the same size” should be “in the same bin”. Bulk approaches allow particles in different bins to have the same size but different composition, e.g. sulfate vs. nitrate.
Line 25: “they fail to distinguish” is too harsh, please replace with “they are not designed to resolve”. They would fail if they would try to resolve ASD, but they don’t.
Lines 38-39: “These models offer the most physically consistent representation of the ASD” is not necessarily correct, since modal models assume a shape of the size distribution per mode, typically a lognormal, which is an approximation of reality. One could argue that sectional models, which are even more expensive than modal ones, are better, since they can freely calculate the ASD shape without the need of a lognormal, but they also suffer from assumptions needed when moving mass and number from one section to another. Particle-resolved models might be the most realistic ones, but these are practically impossible to use in large-scale models. The point is that mentioning that modal schemes are the most physically consistent is incorrect.
Line 96: Which years were simulated, and 72 vertical levels up to what altitude?
Line 97: Please elaborate on the choice of 9 AM/PM UTC time for the output and especially the 12-hour frequency. Understandably this is a lot of output already, but I would argue that sampling any individual location just twice a day has a high probability to miss the diurnal variability of ASD. I would expect that 4 times a day would be the minimum reasonable sampling frequency, as a first guess.
Section 2.2.1: I do not follow the files counting and usage. 25 were “randomly selected without replacement for training” (what does that mean?), 10 were used “for the testing of the trained model”, 100 were “not used during training” (how were they used?). What are these files? Each instantaneous output produces one file, so 2 per day, times 365 times 5 years files? If yes, what happens with the remaining thousands of files? And how many have been used for training? I see later (lines 139-140) stated “5 output files for training, 2 for validation” which makes even less sense. Please explain.
Figure 1: Please explain what MAMnet loss is. It is not referenced anywhere else in the manuscript. Also, why GOCART is mentioned? This figure is for the development of the NN, not its application. Isn’t GOCART only used for application?
Table 3: Too many new concepts there which are not explained. Please help the reader understand what these are, or move this table in an appendix, if you consider it too technical to expand.
Section 3: I would recommend adding a section 3.1 “evaluation against GEOS+MAM”, similar as to what current section 3.1 says “evaluation against observations”, instead of having it under the generic section 3.
Figure 2: Are these global means per layer? Assuming that yes, is this a good metric, especially for number concentration? Wouldn’t doing this regionally be much more meaningful? I appreciate the zonal means and maps later, but my question stands. To be more specific, how can you say “systematic errors emerge” in line 199, without knowing whether this error is widespread or just some very large scattered errors that overwhelm the mean?
Figures 2-3, regarding mass concentrations: what is the model performance in terms of mass conservation? The results per mode do not need to conserve mass, but per species across modes mass conservation is paramount. Thinking even further, how will the mass conservation concept be applied when using MAMnet in production runs?
Lines 253-262, and Figure 6: These are an evaluation against MERRA-2, not observations, as the title of section 3.1 denotes. This whole paragraph and figure are a good conclusion in the discussion just before this section, so moving it right after line 247 and before section 3.1 starts should be considered.
Section 3.1: Although I agree with the motivational 1^st paragraph of this section (lines 249-252), it sounds more than wishful thinking. MAMnet is trained with model data, not measurements, so at its peak performance it will be able to emulate the modeled data. In terms of measurements, it can only be as good as GEOS+MAM or MERRA-2 models, and any improvement in skill when compared with measurements (if at all evident) will be coincidental, thus irrelevant. What is really missing from both sections 3.1.1 and 3.1.2 is a baseline discussion: how does MERRA-2 alone perform when comparing with measurements? Of course MERRA-2 does not simulate ASD, but biases in the total aerosol mass (per species or not) will impact ASD. Even more, GEOS+MAM does not include assimilation, so other sorts of biases are likely present in the ASD of the training data set. Since this paper is about MAMnet, and since section 3.1 as a whole is to demonstrate its overall skill, not knowing the skill of the training dataset is a major shortcoming. To the very least, GEOS+MAM should be presented in figures 7 and 8, but a mass concentration comparison (or citation of past evaluation efforts) should be presented as well.
Section 3.2: please explain what Shapley values are exactly. There is some information in the figure legend, but a short introduction would be useful. Also, since this is a comparison against the model data, I would recommend moving it before the observations sections, so swapping sections 3.1 and 3.2.
Line 334: What do you mean by “possibly by promoting secondary aerosol formation” here? Secondary organics will evaporate more at higher temperatures, while secondary inorganic aerosols will have a more complex relationship depending on relative humidity as well.

Technical corrections
Line 44: Change “ML models, we can” to “ML models can”.
Line 79: Add “of different sizes” after “five mass bins”.
Line 80: Replace “hydrophilics” with “hydrophilic”.
Line 86: Table 2 is referenced before Table 1.
Line 97: Replace “these” with “that”.
Figure 1: rho_air is mentioned in the legend, but it is termed AIRD in the figure.
Line 109: Replace “Kg” with “kg”.
Lines 179 and 181: “the original MAM” and “GEOS+MAM” are the same thing, right? Please use one terminology throughout, for clarity.
Line 214: “smaller and less massive” is the same, why not just say “smaller”?
Line 217: Replace “near-perfect” with “very high”.
Line 223: Replace “sulfates” with “sulfate”.
Line 260: Replace “accurate” with “accurately”.
Line 311: Replace “tends align” with “tends to align”.
Figure 9: Please add a figure legend that explains the color lines, on top of the verbal description present in the caption.
Line 363: Replace “predicted concentrations” with “predicted number concentrations”.

Citation: https://doi.org/10.5194/egusphere-2025-482-RC1
- AC2: 'Reply on RC1', Donifan Barahona, 23 Oct 2025
  
  Please see attachment.
  
  Citation: https://doi.org/10.5194/egusphere-2025-482-AC2
RC2:
'Comment on egusphere-2025-482', Anonymous Referee #2, 12 May 2025

This manuscript uses a global aerosol microphysics model to train a neural network model to estimate aerosol size distributions and mixing state from bulk aerosol masses. This is overall a useful contribution. However, I feel like the overview of microphysics methods and understanding needs to be improved, and I’d like the authors to evaluate their results against the typical way of estimating the size distribution and CCN from bulk masses. Once these and specific issues have been addressed, I am supportive of this manuscript being published.
Editorial note: Often, the figures are quite far from where they are discussed in the manuscript, which requires a lot of scrolling or flipping. In most cases, it seems like it would have been straightforward to have arranged the figures to be closer to their discussion.
Major comment
I feel that the paper is missing the #1 test of ML size distributions that I’d want to see. The easiest (and usual) way to get size distributions (and CCN etc.) from a bulk model is simply to assume a fixed size distribution for each species. In your case, it would be good to just use the global average distributions from MAM as a global conversion from bulk to a size distribution. I’d like to know how much better MAMnet is compared to this simplest approximation, which to me is the test of the value of the ML.
Specific comments
L8: This needs more information than “two-moment”. You track two moments for 7 different modes, so this is a two-moment *modal* scheme. There are just plain two-moment (or X-moment) schemes (without assuming a modal shape, e.g., the MATRIX scheme in the GISS climate model) or two-moment sectional schemes (two moments in each size section; e.g., Adams and Seinfeld, 2002 referenced in the manuscript), so I wasn’t sure what you were referring to when originally reading the abstract.
L22: Bulk models don’t need to have bins (in the model you describe later, most species don’t have bins). The key is that even if there are bins, there are no microphysics calculations (nucleation, condensation, coagulation, etc.) that would let the size distribution evolve.
L23: Bulk models do not need to assume external mixing. You can assume that all at any given size (would need to assume a size distribution), all species are mixed into the same particles (internal mixing). Just because GOCART assumes external mixing doesn’t mean that you need to assume external mixing with a bulk model.
L32: two moments of the ASD *for each mode*.
L34: Again, bulk schemes can “handle” internal mixing (you just assume it). What modal schemes can do (assuming they are simulating multiple modes) is to have an explicit calculation of which particles are internally vs. externally mixed that varies in space and time. You also haven’t established that models can have multiple modes yet.
L46 and throughout. Like the other reviewer said, it’s common with MAM to put the number of modes after (in this case MAM7). However, I believe there may be multiple MAM configurations with 7 modes in the literature (e.g., I believe that one has a nucleation mode, and this one does not).
L45: There is other work to parameterize bulk aerosol mass to the size distribution, such as https://doi.org/10.1029/2021GL094133 and https://doi.org/10.5194/acp-23-5023-2023
L97: “From these simulation*s*”
L110: “a a”
L116: My brain wants to read “During training data” as one phrase. Please add a comma after “training”.
L168-177: Please elaborate more on how these methods estimate CCN. Do you expect them to be better than your model estimates? I would guess a lot of assumptions go into these products, so I wouldn’t necessarily expect them to be a useful evaluation. (Note: Figures 8 and 9 show huge disagreements between the datasets.)
L185: [-1, 1] would be the range for “within an order of magnitude of the target value”. [-0.5, 0.5] is an order of magnitude window around the target value (or within a factor of about 3.2).
Figure 2: (1) The colormap is a strange choice for a diverging colorbar in the lower panel. It would be better to make it white in the middle. (2) Would be easier to interpret if it were rotated such that height was the y axis. (3) Have the surface (1000 hPa) be at the origin rather than the model top.
Figure 2 and discussion around line 200: I suspect that the challenge in MAMnet predicting the Aitken mode may stem from the difficulty of predicting when/where nucleation is occurring. If other inputs that may help predict nucleation, like solar radiation and SO2, were included, it might do a better job with the Aitken mode. Also possible is that fresh fossil-fuel combustion emissions (vs. aged in the accumulation mode) into the Aitken mode might be hard to predict, and NOx as an input might help with this as the NOx lifetime is on a similar order as a typical aging timescale (12-24 hours) and they tend to be co-emitted.
L234-236: This sentence overstates things. Relative variability in Dpg is strongly buffered to relative variability in Mass/Number since it goes with the cube root of this ratio. For example, factor-of-2 error in M/N would only be a 26% error in Dpg. Is the MLB of 0.01 really that remarkable or surprising given that it’s much more stable than M/N?
L255: Do we expect the observational constraints (just polar-orbiter AOD in cloud-free regions, right?) on MERRA-2 to improve the relative balance of species masses? My understanding is it just scales the mass of all species in the column up/down until AOD is pushed closer towards the obs.
L271: How did you sample the model for the high-altitude sites? These sites are tricky since they are often at a much higher altitude than the gridbox mean altitude. Sometimes they are in the PBL, sometimes they aren’t. I recommend just leaving them out.
Figure 6 and some other discussions: Is there a way to make MAMnet conserve the mass of the inputs? This seems like a critical thing to do. Also, I recommend ug kg-1 or sm-3 rather than kg kg-1 since people are used to thinking of aerosol masses in ug m-3.
Figure 7: Is this any better or worse than how MAM itself does? I suspect they both have similar issues.
Figures 8 and 9: Are these products that are used for comparison any good? They vary so much, and I’m guessing that there are a lot of assumptions that go into getting CCN from the products.
Figure 9: Please add a legend to the figure rather than stating the colors in the caption.
L313: Please explain what a Shapley value is. What does a high or low feature value mean?
L348-349: Isn’t there a way to just force MAMnet to conserve the mass of the inputs?
L352-353: Are these better than the reference test (fixed size dist for each species) that I described above?
L356-359: Like my earlier comment, “exceedingly well” is an overstatement. Dpm is buffered to errors in Mass/Number.

Citation: https://doi.org/10.5194/egusphere-2025-482-RC2
- AC1: 'Reply on RC2', Donifan Barahona, 23 Oct 2025
  
  Please see attachment.
  
  Citation: https://doi.org/10.5194/egusphere-2025-482-AC1
RC3:
'Comment on egusphere-2025-482', Anonymous Referee #3, 27 May 2025
Use of deep learning model to approximate aerosol size distribution from bulk mass inputs is interesting and operationally valuable. Integration with MERRA-2 opens opportunities for reanalysis and assimilation improvements.
Following are my line-by-line comments:
L24-27: You may also flag that some modal schemes like GLOMAP in UK Met Office Unified Model assume a lognormal shape for each mode with prescribed geometric standard deviation and each mode is internally mixed. (in L87-88 you do mention something similar for another model)
L35: Consider specifying orders of magnitude or cite a study quantifying what is really "better" for representing ASD in models.
L56-57: Clarify what is meant by “meteorological state”—mention that it includes only temperature and air density up front, since this is unexpectedly minimal and a key methodological decision.
L77-79: Clarify whether the model includes any simplified representation of aerosol growth, aging, or wet removal in GOCART (even if parametrized) because "transport and evolution" maybe construed for many physical phenomenon.
L87-88: Suggest clarifying whether the geometric mean diameter is prognosed or computed diagnostically.
L96-97: What years were simulated? Why only two time points per day? This sparsity might miss diurnal features. How were the 25 output files selected—what does “one file” correspond to (single timestamp across globe?)? The use of only 25 files for training seems low given the mention of >100M samples later. Please clarify.
L98-100: Add sentence on whether aerosols evolve freely in these simulations or are constrained by observations. Can we understand to what model levels were these "horizontal winds" nudged and to what extend they affect aerosols number concentration?
L104-105: Better to specify:
Were log10-transformed values standardized after transformation or before?

Are temperature and air density standardized globally or per level?

L110-115: clarify was this Dpg compared only during evaluation, or was it ever used in the loss function? Please state your loss function as some physics informed neural net models have tried modifying it as well.
L116-120: Clarify whether the flattened fields are shuffled across time and space, or whether there’s structure preserved (e.g., batches by time or region). Were any vertical or horizontal correlations exploited or lost?
L125-134: Were other architectures considered (e.g., transformers, residual connections)? If not, briefly justify.
L139-140: The earlier statement (line 97-98) says 25 files used for training, but here it says “5 for training, 2 for validation.” I think I am missing something here?
L163-175: Briefly discuss how errors in Dpg propagate to aerosol number concentration errors for ccn?
L244: The underestimation of Dpg in SH is attributed to low data availability? Could it also be due to extrapolation error—MAMnet may have learned associations biased toward NH-dominant training. Can we not test this applying class reweighting in the loss function?
On reading the conclusion, I had following questions raised in my mind:
L340-344: Under what conditions does this input feature set (bulk mass, T, ρ) suffice? Where do predictions degrade (e.g., strong vertical motions, boundary layer transitions)? Why were other physically relevant predictors (RH, precipitation, cloud fraction, wind) excluded? Does it limits its use in complex meteorological regimes. Without input features tied to wet/dry removal, nucleation, or chemical aging, can the model really be used in weather forecasting or satellite retrievals across diverse regions even when we find high correlations?

L345-350: MAMnet is trained only on MAM model outputs, so how does the model avoid learning MAM's own biases? Can we say that evaluation against MERRA-2 is not necessarily independent since the training data is nudged to MERRA-2 meteorology?

L351-357: Can MAMnet conserve total aerosol mass by design, or does this emerge of calculation? This is never proven numerically—just implied via Dpg.

L358-370: There’s no attribution of error—how much is due to MAMnet, and how much due to MERRA-2 inputs?
Other than these, my overarching general comments are as follows:
Unclear what one “file” represents—single timestep? Single day? Entire global field?

It is also unclear whether any temporal or spatial overlap exists between train/test sets.

No analysis on extrapolation over different time periods (e.g., pre-2000). The network is trained on a 5-year window using meteorology from MERRA-2 (likely post-2000). How would the model perform in periods with different emissions (e.g., 1980s)? Alternatively, discuss potential limitations in extrapolating to past or future climate states.

How is the SHAP analysis computed over such a high-dimensional sample space (using any explainer method)? Was it computed on the flattened single-level dataset? How do you deal with feature correlation?

Is MAMnet architecture resolution-agnostic? Though you use single-level training to make the model resolution-independent, how would MAMnet perform in coarser (~2.5°) or finer (<1°) gridded input?
Citation: https://doi.org/10.5194/egusphere-2025-482-RC3
- AC3: 'Reply on RC3', Donifan Barahona, 23 Oct 2025
  
  Please see attachment.
  
  Citation: https://doi.org/10.5194/egusphere-2025-482-AC3

Interactive discussion

Status: closed

CEC1:
'Comment on egusphere-2025-482 - No compliance with the policy of the journal', Juan Antonio Añel, 07 Apr 2025

Dear authors,

Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html

First, you have archived both the GEOS-ESM and the MAMnet code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, in the Data Availability section of your manuscript you provide generic links to main web pages for the full datasets that provide access to the specific data that you have used in your work. We can not accept this. You must provide the exact data that you have used to develop your work. Importantly, in the case of the work that you present, the exact data used for the training of the neural network. This is critical to assure the replicability of your work, and therefore its scientific character.
I have to note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Finally, please, remember that you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the DOI of the new repositories that you create to solve the issues pointed out here.

Juan A. Añel

Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2025-482-CEC1
- AC4: 'Reply on CEC1', Donifan Barahona, 24 Oct 2025
  
  Please see attachment
  
  Citation: https://doi.org/10.5194/egusphere-2025-482-AC4
RC1:
'Review on egusphere-2025-482', Anonymous Referee #1, 28 Apr 2025

The manuscript describes the development of a neural-network model, MAMnet, trained on model output from GEOS+MAM, with the goal to create a computationally cheap platform to estimate aerosol size distributions using outputs from bulk aerosol models, with MERRA-2 used as an example. The work is interesting and is worthy of publication, after addressing my comments below, most of which are minor, but some might qualify as major ones, especially the evaluation part.

General comments
How does the trained model perform during a different time period? Aerosols in the 90s were much higher than they are today, is the model that is practically only driven by temperature (and air density, which does not change much with climate change) able to capture that time period? More generally, what is the validity range of the model, given its training dataset?
How much computational time is saved? There is no MERRA-2+MAM model, but the comparison between GEOS, GEOS+MAM, MERRA-2, and MERRA-2+MAMnet should be able to provide the necessary information.
I guess it is MAM7 used in this work; shouldn’t you be using this name to separate it from other MAM versions?
I am really surprised that only temperature and air density have been used for the meteorological state. I would expect that 3-dimensional wind fields (long-range transport), clouds and precipitation (wet removal, CCN, activation), and surface type (dry deposition) would be of key importance. Clouds can be also important for sulfate formation in the aqueous phase, and then cloud evaporation should affect sulfate size distribution. How can a model be accurate without these processes included?
The lifetime of a single species in MAM (e.g. SU) would depend by the removal rates in each mode, which differs in terms of mode solubility (a function of mode composition) and sedimentation velocity (a function of mode size). The NN training is implicitly using this information, but the NN application in a bulk model like GOCART does not have that distinction when calculating SU mass, so inherently SU is different across models by design. The NN will likely try to compensate that, but can you make a comment on this?

Specific comments
Line 9: Replace “physical representation” with “aerosol microphysics representation”. A machine-learned approach is not physics.
Line 24: “of the same size” should be “in the same bin”. Bulk approaches allow particles in different bins to have the same size but different composition, e.g. sulfate vs. nitrate.
Line 25: “they fail to distinguish” is too harsh, please replace with “they are not designed to resolve”. They would fail if they would try to resolve ASD, but they don’t.
Lines 38-39: “These models offer the most physically consistent representation of the ASD” is not necessarily correct, since modal models assume a shape of the size distribution per mode, typically a lognormal, which is an approximation of reality. One could argue that sectional models, which are even more expensive than modal ones, are better, since they can freely calculate the ASD shape without the need of a lognormal, but they also suffer from assumptions needed when moving mass and number from one section to another. Particle-resolved models might be the most realistic ones, but these are practically impossible to use in large-scale models. The point is that mentioning that modal schemes are the most physically consistent is incorrect.
Line 96: Which years were simulated, and 72 vertical levels up to what altitude?
Line 97: Please elaborate on the choice of 9 AM/PM UTC time for the output and especially the 12-hour frequency. Understandably this is a lot of output already, but I would argue that sampling any individual location just twice a day has a high probability to miss the diurnal variability of ASD. I would expect that 4 times a day would be the minimum reasonable sampling frequency, as a first guess.
Section 2.2.1: I do not follow the files counting and usage. 25 were “randomly selected without replacement for training” (what does that mean?), 10 were used “for the testing of the trained model”, 100 were “not used during training” (how were they used?). What are these files? Each instantaneous output produces one file, so 2 per day, times 365 times 5 years files? If yes, what happens with the remaining thousands of files? And how many have been used for training? I see later (lines 139-140) stated “5 output files for training, 2 for validation” which makes even less sense. Please explain.
Figure 1: Please explain what MAMnet loss is. It is not referenced anywhere else in the manuscript. Also, why GOCART is mentioned? This figure is for the development of the NN, not its application. Isn’t GOCART only used for application?
Table 3: Too many new concepts there which are not explained. Please help the reader understand what these are, or move this table in an appendix, if you consider it too technical to expand.
Section 3: I would recommend adding a section 3.1 “evaluation against GEOS+MAM”, similar as to what current section 3.1 says “evaluation against observations”, instead of having it under the generic section 3.
Figure 2: Are these global means per layer? Assuming that yes, is this a good metric, especially for number concentration? Wouldn’t doing this regionally be much more meaningful? I appreciate the zonal means and maps later, but my question stands. To be more specific, how can you say “systematic errors emerge” in line 199, without knowing whether this error is widespread or just some very large scattered errors that overwhelm the mean?
Figures 2-3, regarding mass concentrations: what is the model performance in terms of mass conservation? The results per mode do not need to conserve mass, but per species across modes mass conservation is paramount. Thinking even further, how will the mass conservation concept be applied when using MAMnet in production runs?
Lines 253-262, and Figure 6: These are an evaluation against MERRA-2, not observations, as the title of section 3.1 denotes. This whole paragraph and figure are a good conclusion in the discussion just before this section, so moving it right after line 247 and before section 3.1 starts should be considered.
Section 3.1: Although I agree with the motivational 1^st paragraph of this section (lines 249-252), it sounds more than wishful thinking. MAMnet is trained with model data, not measurements, so at its peak performance it will be able to emulate the modeled data. In terms of measurements, it can only be as good as GEOS+MAM or MERRA-2 models, and any improvement in skill when compared with measurements (if at all evident) will be coincidental, thus irrelevant. What is really missing from both sections 3.1.1 and 3.1.2 is a baseline discussion: how does MERRA-2 alone perform when comparing with measurements? Of course MERRA-2 does not simulate ASD, but biases in the total aerosol mass (per species or not) will impact ASD. Even more, GEOS+MAM does not include assimilation, so other sorts of biases are likely present in the ASD of the training data set. Since this paper is about MAMnet, and since section 3.1 as a whole is to demonstrate its overall skill, not knowing the skill of the training dataset is a major shortcoming. To the very least, GEOS+MAM should be presented in figures 7 and 8, but a mass concentration comparison (or citation of past evaluation efforts) should be presented as well.
Section 3.2: please explain what Shapley values are exactly. There is some information in the figure legend, but a short introduction would be useful. Also, since this is a comparison against the model data, I would recommend moving it before the observations sections, so swapping sections 3.1 and 3.2.
Line 334: What do you mean by “possibly by promoting secondary aerosol formation” here? Secondary organics will evaporate more at higher temperatures, while secondary inorganic aerosols will have a more complex relationship depending on relative humidity as well.

Technical corrections
Line 44: Change “ML models, we can” to “ML models can”.
Line 79: Add “of different sizes” after “five mass bins”.
Line 80: Replace “hydrophilics” with “hydrophilic”.
Line 86: Table 2 is referenced before Table 1.
Line 97: Replace “these” with “that”.
Figure 1: rho_air is mentioned in the legend, but it is termed AIRD in the figure.
Line 109: Replace “Kg” with “kg”.
Lines 179 and 181: “the original MAM” and “GEOS+MAM” are the same thing, right? Please use one terminology throughout, for clarity.
Line 214: “smaller and less massive” is the same, why not just say “smaller”?
Line 217: Replace “near-perfect” with “very high”.
Line 223: Replace “sulfates” with “sulfate”.
Line 260: Replace “accurate” with “accurately”.
Line 311: Replace “tends align” with “tends to align”.
Figure 9: Please add a figure legend that explains the color lines, on top of the verbal description present in the caption.
Line 363: Replace “predicted concentrations” with “predicted number concentrations”.

Citation: https://doi.org/10.5194/egusphere-2025-482-RC1
- AC2: 'Reply on RC1', Donifan Barahona, 23 Oct 2025
  
  Please see attachment.
  
  Citation: https://doi.org/10.5194/egusphere-2025-482-AC2
RC2:
'Comment on egusphere-2025-482', Anonymous Referee #2, 12 May 2025

This manuscript uses a global aerosol microphysics model to train a neural network model to estimate aerosol size distributions and mixing state from bulk aerosol masses. This is overall a useful contribution. However, I feel like the overview of microphysics methods and understanding needs to be improved, and I’d like the authors to evaluate their results against the typical way of estimating the size distribution and CCN from bulk masses. Once these and specific issues have been addressed, I am supportive of this manuscript being published.
Editorial note: Often, the figures are quite far from where they are discussed in the manuscript, which requires a lot of scrolling or flipping. In most cases, it seems like it would have been straightforward to have arranged the figures to be closer to their discussion.
Major comment
I feel that the paper is missing the #1 test of ML size distributions that I’d want to see. The easiest (and usual) way to get size distributions (and CCN etc.) from a bulk model is simply to assume a fixed size distribution for each species. In your case, it would be good to just use the global average distributions from MAM as a global conversion from bulk to a size distribution. I’d like to know how much better MAMnet is compared to this simplest approximation, which to me is the test of the value of the ML.
Specific comments
L8: This needs more information than “two-moment”. You track two moments for 7 different modes, so this is a two-moment *modal* scheme. There are just plain two-moment (or X-moment) schemes (without assuming a modal shape, e.g., the MATRIX scheme in the GISS climate model) or two-moment sectional schemes (two moments in each size section; e.g., Adams and Seinfeld, 2002 referenced in the manuscript), so I wasn’t sure what you were referring to when originally reading the abstract.
L22: Bulk models don’t need to have bins (in the model you describe later, most species don’t have bins). The key is that even if there are bins, there are no microphysics calculations (nucleation, condensation, coagulation, etc.) that would let the size distribution evolve.
L23: Bulk models do not need to assume external mixing. You can assume that all at any given size (would need to assume a size distribution), all species are mixed into the same particles (internal mixing). Just because GOCART assumes external mixing doesn’t mean that you need to assume external mixing with a bulk model.
L32: two moments of the ASD *for each mode*.
L34: Again, bulk schemes can “handle” internal mixing (you just assume it). What modal schemes can do (assuming they are simulating multiple modes) is to have an explicit calculation of which particles are internally vs. externally mixed that varies in space and time. You also haven’t established that models can have multiple modes yet.
L46 and throughout. Like the other reviewer said, it’s common with MAM to put the number of modes after (in this case MAM7). However, I believe there may be multiple MAM configurations with 7 modes in the literature (e.g., I believe that one has a nucleation mode, and this one does not).
L45: There is other work to parameterize bulk aerosol mass to the size distribution, such as https://doi.org/10.1029/2021GL094133 and https://doi.org/10.5194/acp-23-5023-2023
L97: “From these simulation*s*”
L110: “a a”
L116: My brain wants to read “During training data” as one phrase. Please add a comma after “training”.
L168-177: Please elaborate more on how these methods estimate CCN. Do you expect them to be better than your model estimates? I would guess a lot of assumptions go into these products, so I wouldn’t necessarily expect them to be a useful evaluation. (Note: Figures 8 and 9 show huge disagreements between the datasets.)
L185: [-1, 1] would be the range for “within an order of magnitude of the target value”. [-0.5, 0.5] is an order of magnitude window around the target value (or within a factor of about 3.2).
Figure 2: (1) The colormap is a strange choice for a diverging colorbar in the lower panel. It would be better to make it white in the middle. (2) Would be easier to interpret if it were rotated such that height was the y axis. (3) Have the surface (1000 hPa) be at the origin rather than the model top.
Figure 2 and discussion around line 200: I suspect that the challenge in MAMnet predicting the Aitken mode may stem from the difficulty of predicting when/where nucleation is occurring. If other inputs that may help predict nucleation, like solar radiation and SO2, were included, it might do a better job with the Aitken mode. Also possible is that fresh fossil-fuel combustion emissions (vs. aged in the accumulation mode) into the Aitken mode might be hard to predict, and NOx as an input might help with this as the NOx lifetime is on a similar order as a typical aging timescale (12-24 hours) and they tend to be co-emitted.
L234-236: This sentence overstates things. Relative variability in Dpg is strongly buffered to relative variability in Mass/Number since it goes with the cube root of this ratio. For example, factor-of-2 error in M/N would only be a 26% error in Dpg. Is the MLB of 0.01 really that remarkable or surprising given that it’s much more stable than M/N?
L255: Do we expect the observational constraints (just polar-orbiter AOD in cloud-free regions, right?) on MERRA-2 to improve the relative balance of species masses? My understanding is it just scales the mass of all species in the column up/down until AOD is pushed closer towards the obs.
L271: How did you sample the model for the high-altitude sites? These sites are tricky since they are often at a much higher altitude than the gridbox mean altitude. Sometimes they are in the PBL, sometimes they aren’t. I recommend just leaving them out.
Figure 6 and some other discussions: Is there a way to make MAMnet conserve the mass of the inputs? This seems like a critical thing to do. Also, I recommend ug kg-1 or sm-3 rather than kg kg-1 since people are used to thinking of aerosol masses in ug m-3.
Figure 7: Is this any better or worse than how MAM itself does? I suspect they both have similar issues.
Figures 8 and 9: Are these products that are used for comparison any good? They vary so much, and I’m guessing that there are a lot of assumptions that go into getting CCN from the products.
Figure 9: Please add a legend to the figure rather than stating the colors in the caption.
L313: Please explain what a Shapley value is. What does a high or low feature value mean?
L348-349: Isn’t there a way to just force MAMnet to conserve the mass of the inputs?
L352-353: Are these better than the reference test (fixed size dist for each species) that I described above?
L356-359: Like my earlier comment, “exceedingly well” is an overstatement. Dpm is buffered to errors in Mass/Number.

Citation: https://doi.org/10.5194/egusphere-2025-482-RC2
- AC1: 'Reply on RC2', Donifan Barahona, 23 Oct 2025
  
  Please see attachment.
  
  Citation: https://doi.org/10.5194/egusphere-2025-482-AC1
RC3:
'Comment on egusphere-2025-482', Anonymous Referee #3, 27 May 2025
Use of deep learning model to approximate aerosol size distribution from bulk mass inputs is interesting and operationally valuable. Integration with MERRA-2 opens opportunities for reanalysis and assimilation improvements.
Following are my line-by-line comments:
L24-27: You may also flag that some modal schemes like GLOMAP in UK Met Office Unified Model assume a lognormal shape for each mode with prescribed geometric standard deviation and each mode is internally mixed. (in L87-88 you do mention something similar for another model)
L35: Consider specifying orders of magnitude or cite a study quantifying what is really "better" for representing ASD in models.
L56-57: Clarify what is meant by “meteorological state”—mention that it includes only temperature and air density up front, since this is unexpectedly minimal and a key methodological decision.
L77-79: Clarify whether the model includes any simplified representation of aerosol growth, aging, or wet removal in GOCART (even if parametrized) because "transport and evolution" maybe construed for many physical phenomenon.
L87-88: Suggest clarifying whether the geometric mean diameter is prognosed or computed diagnostically.
L96-97: What years were simulated? Why only two time points per day? This sparsity might miss diurnal features. How were the 25 output files selected—what does “one file” correspond to (single timestamp across globe?)? The use of only 25 files for training seems low given the mention of >100M samples later. Please clarify.
L98-100: Add sentence on whether aerosols evolve freely in these simulations or are constrained by observations. Can we understand to what model levels were these "horizontal winds" nudged and to what extend they affect aerosols number concentration?
L104-105: Better to specify:
Were log10-transformed values standardized after transformation or before?

Are temperature and air density standardized globally or per level?

L110-115: clarify was this Dpg compared only during evaluation, or was it ever used in the loss function? Please state your loss function as some physics informed neural net models have tried modifying it as well.
L116-120: Clarify whether the flattened fields are shuffled across time and space, or whether there’s structure preserved (e.g., batches by time or region). Were any vertical or horizontal correlations exploited or lost?
L125-134: Were other architectures considered (e.g., transformers, residual connections)? If not, briefly justify.
L139-140: The earlier statement (line 97-98) says 25 files used for training, but here it says “5 for training, 2 for validation.” I think I am missing something here?
L163-175: Briefly discuss how errors in Dpg propagate to aerosol number concentration errors for ccn?
L244: The underestimation of Dpg in SH is attributed to low data availability? Could it also be due to extrapolation error—MAMnet may have learned associations biased toward NH-dominant training. Can we not test this applying class reweighting in the loss function?
On reading the conclusion, I had following questions raised in my mind:
L340-344: Under what conditions does this input feature set (bulk mass, T, ρ) suffice? Where do predictions degrade (e.g., strong vertical motions, boundary layer transitions)? Why were other physically relevant predictors (RH, precipitation, cloud fraction, wind) excluded? Does it limits its use in complex meteorological regimes. Without input features tied to wet/dry removal, nucleation, or chemical aging, can the model really be used in weather forecasting or satellite retrievals across diverse regions even when we find high correlations?

L345-350: MAMnet is trained only on MAM model outputs, so how does the model avoid learning MAM's own biases? Can we say that evaluation against MERRA-2 is not necessarily independent since the training data is nudged to MERRA-2 meteorology?

L351-357: Can MAMnet conserve total aerosol mass by design, or does this emerge of calculation? This is never proven numerically—just implied via Dpg.

L358-370: There’s no attribution of error—how much is due to MAMnet, and how much due to MERRA-2 inputs?
Other than these, my overarching general comments are as follows:
Unclear what one “file” represents—single timestep? Single day? Entire global field?

It is also unclear whether any temporal or spatial overlap exists between train/test sets.

No analysis on extrapolation over different time periods (e.g., pre-2000). The network is trained on a 5-year window using meteorology from MERRA-2 (likely post-2000). How would the model perform in periods with different emissions (e.g., 1980s)? Alternatively, discuss potential limitations in extrapolating to past or future climate states.

How is the SHAP analysis computed over such a high-dimensional sample space (using any explainer method)? Was it computed on the flattened single-level dataset? How do you deal with feature correlation?

Is MAMnet architecture resolution-agnostic? Though you use single-level training to make the model resolution-independent, how would MAMnet perform in coarser (~2.5°) or finer (<1°) gridded input?
Citation: https://doi.org/10.5194/egusphere-2025-482-RC3
- AC3: 'Reply on RC3', Donifan Barahona, 23 Oct 2025
  
  Please see attachment.
  
  Citation: https://doi.org/10.5194/egusphere-2025-482-AC3

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Donifan Barahona on behalf of the Authors (23 Oct 2025) Author's response Author's tracked changes Manuscript

ED: Reconsider after major revisions (26 Oct 2025) by Slimane Bekki

ED: Referee Nomination & Report Request started (05 Dec 2025) by Slimane Bekki

RR by Anonymous Referee #2 (09 Dec 2025)

Suggestions for revision or reasons for rejection

In general, I’m ok with this getting published because it seems like the authors have had to wait a long time in the review process. That said, I’m disappointed by some of the author’s responses, so I wanted to write a response

Response to Major Comment 1

The fact that calculating the ASD for each species in MAM as a comparison test requires strong assumptions (really just 1 assumption, that the ASD is fixed for each species globally) is the whole point of doing it. This calculation is to test how much better the ML method does compared to the simplest assumption. If the ML method is not much better than the simplest assumption, then the ML method is not creating much value (but you can really highlight how much value it’s adding if the ML estimates make the estimates much closer to ML than the simplest calculation).

This calculation *is* straightforward. For each species, calculate the globally averaged fraction of that species’ mass that appears in each of the 7 modes. Let’s say that globally averaged, 5% of sulfate is in mode 1, 10% in mode 2, etc. You would then apply these fractions to the bulk model (doing it for all species) to split the bulk mass across the 7 modes. You would also calculate the globally averaged diameter for each of your modes and apply this to the bulk model globally.

Yes, it is a big assumption, but it’s the “baseline” calculation to see if MAMnet adds information above a 1st-order estimate. And it is a straightforward calculation that could have been done in a day’s work during the responses. The authors seemed to have overthought the calculation, and the article will be weaker without this baseline comparison.

Responses to specific comments

Again, a bulk model does not “inherently” need to treat particles as externally mixed. Many models assume this (e.g., GOCART), but you can easily assume internal mixing. If your model is predicting 5 ug m-3 of sulfate and 5 ug m-3 of organics, one can easily assume that the particles are internally mixed as a 50-50 mixture of the two species. This can be a minor technical correction in the manuscript

I was not suggesting that I thought that MAMnet emulated aerosol processes (I’m not sure what I said that gave that impression), but rather that using SO2, NOx, and solar radiation as inputs would give ML information on how the size distribution should look different (because certain processes were more likely to be happening). I think there would be a lot of power in this approach of including non-aerosol inputs to help predict aerosol size distributions moving forward. The authors could add this in as future work, but I’m not tied to this change.

Hide

RR by Anonymous Referee #3 (15 Dec 2025)

Suggestions for revision or reasons for rejection

1) Please ensure consistent naming of organic and ammonium-related tracers across text, tables, and figures. For example, Table 1 defines SU as “Sulfates, includes ammonium” and defines OG as “Primary and secondary organics”, yet later discussion/figures refer to SOA and AMM (e.g., “SOA_ACC”, “AMM_FSS”, “AMM_CSS”). If these are internal GEOS/MAM variable names, please add a short mapping (e.g., “SOA ⟷ OG”, “AMM?”) to avoid confusion for readers. If ammonium is present, what about nitrate? And several other acronyms that are clearer but not strictly defined.

2) The manuscript notes that some mass tracers are very small at high altitude (∼10⁻²⁰ kg kg⁻¹) and that errors are “exacerbated in logarithmic space”. Please add one or two explicit sentences describing the handling of near-zero/very small values (e.g., clipping floor prior to log transforms, how “missing/zero” is treated) and clarify which figures/metrics are computed in linear vs. log space. This may improve interpretability of mean log bias patterns in clean or remote regimes?

3) Several surface sites are used for evaluation include high-altitude sites. A short reminder in the methods/caption about how model fields are collocated would prevent misinterpretation of near-surface biases at elevated stations. It would be interesting to discuss N30 in section 3.3.1, relating to Figure 8. The ML model lands within the observed variability most of the time but there is a consistent low bias at low altitude. Most likely this is also the case in GEOS-MAM?

4) Lines 453: “All datasets reveal similar spatial patterns”. I would argue that the CALIOP based CCN does not have a similar pattern. The zonal mean over ocean in the CALIOP dataset
would be minimum at the Equator, while all the others have zonal mean over ocean that minimize near Antarctica.

5) please add and comment on lines for GEOS-MAM training or validation data. Since GEOS-MAM has double moment microphysics, we might expect this to look like GiOcean/CALIOP which presumably reflects reality better than MERRA2. If I’m right, this begs the question of whether in future work it might be possible to tweak MAMnet to retain the correct shape? Maybe by allowing air density to play a stronger role?

6) Lines 357-358: suggest rephrase “regions with negligible aerosol concentrations typically do not contribute significantly to atmospheric processes.”, taken out of context this doesn’t seem correct.

Hide

ED: Publish subject to minor revisions (review by editor) (18 Jan 2026) by Slimane Bekki

Here are the final comments from the reviewers. Please, take them into account. I will review your replies.

Reviewer 1 :
In general, I’m ok with this getting published because it seems like the authors have had to wait a long time in the review process. That said, I’m disappointed by some of the author’s responses, so I wanted to write a response

Response to Major Comment 1

The fact that calculating the ASD for each species in MAM as a comparison test requires strong assumptions (really just 1 assumption, that the ASD is fixed for each species globally) is the whole point of doing it. This calculation is to test how much better the ML method does compared to the simplest assumption. If the ML method is not much better than the simplest assumption, then the ML method is not creating much value (but you can really highlight how much value it’s adding if the ML estimates make the estimates much closer to ML than the simplest calculation).

This calculation *is* straightforward. For each species, calculate the globally averaged fraction of that species’ mass that appears in each of the 7 modes. Let’s say that globally averaged, 5% of sulfate is in mode 1, 10% in mode 2, etc. You would then apply these fractions to the bulk model (doing it for all species) to split the bulk mass across the 7 modes. You would also calculate the globally averaged diameter for each of your modes and apply this to the bulk model globally.

Yes, it is a big assumption, but it’s the “baseline” calculation to see if MAMnet adds information above a 1st-order estimate. And it is a straightforward calculation that could have been done in a day’s work during the responses. The authors seemed to have overthought the calculation, and the article will be weaker without this baseline comparison.

Responses to specific comments

Again, a bulk model does not “inherently” need to treat particles as externally mixed. Many models assume this (e.g., GOCART), but you can easily assume internal mixing. If your model is predicting 5 ug m-3 of sulfate and 5 ug m-3 of organics, one can easily assume that the particles are internally mixed as a 50-50 mixture of the two species. This can be a minor technical correction in the manuscript

I was not suggesting that I thought that MAMnet emulated aerosol processes (I’m not sure what I said that gave that impression), but rather that using SO2, NOx, and solar radiation as inputs would give ML information on how the size distribution should look different (because certain processes were more likely to be happening). I think there would be a lot of power in this approach of including non-aerosol inputs to help predict aerosol size distributions moving forward. The authors could add this in as future work, but I’m not tied to this change.

Reviewer 2:
1) Please ensure consistent naming of organic and ammonium-related tracers across text, tables, and figures. For example, Table 1 defines SU as “Sulfates, includes ammonium” and defines OG as “Primary and secondary organics”, yet later discussion/figures refer to SOA and AMM (e.g., “SOA_ACC”, “AMM_FSS”, “AMM_CSS”). If these are internal GEOS/MAM variable names, please add a short mapping (e.g., “SOA ⟷ OG”, “AMM?”) to avoid confusion for readers. If ammonium is present, what about nitrate? And several other acronyms that are clearer but not strictly defined.

2) The manuscript notes that some mass tracers are very small at high altitude (∼10⁻²⁰ kg kg⁻¹) and that errors are “exacerbated in logarithmic space”. Please add one or two explicit sentences describing the handling of near-zero/very small values (e.g., clipping floor prior to log transforms, how “missing/zero” is treated) and clarify which figures/metrics are computed in linear vs. log space. This may improve interpretability of mean log bias patterns in clean or remote regimes?

3) Several surface sites are used for evaluation include high-altitude sites. A short reminder in the methods/caption about how model fields are collocated would prevent misinterpretation of near-surface biases at elevated stations. It would be interesting to discuss N30 in section 3.3.1, relating to Figure 8. The ML model lands within the observed variability most of the time but there is a consistent low bias at low altitude. Most likely this is also the case in GEOS-MAM?

4) Lines 453: “All datasets reveal similar spatial patterns”. I would argue that the CALIOP based CCN does not have a similar pattern. The zonal mean over ocean in the CALIOP dataset
would be minimum at the Equator, while all the others have zonal mean over ocean that minimize near Antarctica.

5) please add and comment on lines for GEOS-MAM training or validation data. Since GEOS-MAM has double moment microphysics, we might expect this to look like GiOcean/CALIOP which presumably reflects reality better than MERRA2. If I’m right, this begs the question of whether in future work it might be possible to tweak MAMnet to retain the correct shape? Maybe by allowing air density to play a stronger role?

6) Lines 357-358: suggest rephrase “regions with negligible aerosol concentrations typically do not contribute significantly to atmospheric processes.”, taken out of context this doesn’t seem correct.

Hide

AR by Donifan Barahona on behalf of the Authors (11 Feb 2026) Author's response Author's tracked changes Manuscript

ED: Publish as is (20 Feb 2026) by Slimane Bekki

AR by Donifan Barahona on behalf of the Authors (27 Feb 2026) Manuscript

Journal article(s) based on this preprint

26 Mar 2026

Deep learning representation of the aerosol size distribution

Donifan Barahona, Katherine H. Breen, Karoline Block, and Anton Darmenov

Geosci. Model Dev., 19, 2437–2459, https://doi.org/10.5194/gmd-19-2437-2026,https://doi.org/10.5194/gmd-19-2437-2026, 2026

Short summary

Donifan Barahona, Katherine Breen, Karoline Block, and Anton Darmenov

Viewed

Total article views: 2,629 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,803	693	133	2,629	108	148

HTML: 1,803
PDF: 693
XML: 133
Total: 2,629
BibTeX: 108
EndNote: 148

Views and downloads (calculated since 17 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	136	34	8	178
Apr 2025	138	34	10	182
May 2025	118	30	6	154
Jun 2025	118	18	10	146
Jul 2025	58	28	0	86
Aug 2025	108	28	2	138
Sep 2025	314	16	20	350
Oct 2025	124	48	24	196
Nov 2025	104	48	8	160
Dec 2025	82	92	8	182
Jan 2026	114	110	20	244
Feb 2026	104	70	0	174
Mar 2026	172	74	10	256
Apr 2026	56	28	2	86
May 2026	51	21	1	73
Jun 2026	4	3	1	8
Jul 2026	2	11	3	16

Cumulative views and downloads (calculated since 17 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	136	34	8	178
Apr 2025	138	34	10	182
May 2025	118	30	6	154
Jun 2025	118	18	10	146
Jul 2025	58	28	0	86
Aug 2025	108	28	2	138
Sep 2025	314	16	20	350
Oct 2025	124	48	24	196
Nov 2025	104	48	8	160
Dec 2025	82	92	8	182
Jan 2026	114	110	20	244
Feb 2026	104	70	0	174
Mar 2026	172	74	10	256
Apr 2026	56	28	2	86
May 2026	51	21	1	73
Jun 2026	4	3	1	8
Jul 2026	2	11	3	16

Viewed (geographical distribution)

Total article views: 2,617 (including HTML, PDF, and XML) Thereof 2,617 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 20 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2164 KB)
Metadata XML

Short summary

Particulate matter impacts Earth's radiation, clouds, and human health, but modeling their size is challenging due to computational and observational limits. We developed a machine learning model to predict aerosol size distributions, which accurately replicates advanced models and field measurements.


Total:	0
HTML:	0
PDF:	0
XML:	0