the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CLEO: The Fundamental Design for High Computational Performance of a New Superdroplet Model
Abstract. CLEO is a Super-Droplet Model (SDM) designed for performance portability on high performance computer architectures and with the intention of modelling warm-clouds in domains large enough to resolve shallow mesoscale cloud organisation O(100 km). This paper introduces CLEO’s novel C++ implementation of SDM, in particular how we map SDM theory to computations which optimise performance, primarily by conservative memory usage and efficient memory access patterns. To further speed-up simulations and to ensure a portable and maintainable code, we avoid conditional code branching and implement thread parallelism through the Kokkos library. As a result CLEO shows optimal linear scaling with increasing number of superdroplets and can use CPU and GPU thread-parallelisation across a diverse range of computer architectures. But CLEO is not just a model for computational performance, it is also designed for warm-cloud process understanding. CLEO possesses a high degree of flexibility, especially with regard to the configuration of microphysical processes and data output, that makes it well-suited to analysing sensitivity to microphysics. CLEO is therefore a new SDM ready to be used for understanding warm-cloud processes.
- Preprint
(2491 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-4398', Anonymous Referee #1, 30 Nov 2025
-
RC2: 'Comment on egusphere-2025-4398', Anonymous Referee #2, 01 Dec 2025
I commend the authors for filling what is generally a gap in cloud microphysics literature, namely detailing the implementation choices and challenges - in this case for a particle-resolved cloud model.
The paper discusses such aspects as memory structures, time-stepping logic, adaptation of language-specific abstractions to domain-relevant challenges, and performance bottlenecks in parallel computing. Documented discussion of these aspects is valuable for the community, matches the scope of GMD, and is not easily discoverable in the documentation resources for other cloud modelling projects.
The key aspects of the paper, where I see room for improvement (assuming that the scope and simulations stay unchanged), is to provide a broader context, namely:
- mention alternative technologies as how they compare to the employed C++20;
- mention alternative parallelisation strategies for the SDM (e.g., where Monte-Carlo collisional growth parallelisation is done in parallel across all pairs in multiple cells of a given subdomain, rather than - IIUC - serially within cell);
- elaborate on the technological challenges stemming from the spatially uneven resource requirements of particle-based simulation (i.e., grid cells with clouds vs. grid cells without clouds), and from the fact that the employed simulations inherently cover aerosol budget (implementing aerosol sources should also benefit from the proposed memory layout);
- elaborate on the implications of the implementation choices in terms of maintenance (e.g., the Monoids should facilitate unit testing, although the contents of the `tests` folder in CLEO does not seem to leverage it yet...).
Furthermore, I suggest to make it clear upfront in the abstract and introduction to the paper that the development - in the stage presented in the paper - is not tested at exascale, not even in a multi-node setup, and only with one-way coupling with a prescribed-flow driver (the Edmond dataset metadata keywords misleadingly include: Climate Modelling, LES, Exascale Computing, ICON);
Starting with the abstract, there are several bold statements regarding optimal performance, efficient access patterns, cache efficiency, load balancing, and speed-ups, which later on in the text are clarified as based on intuition rather than comparison against alternative implementations. Matching the abstract and introduction with the content of the paper will improve reception.
Given that the paper's intended audience seems to be tech-aware research software engineering community, I suggest to redact the technical parts of the paper with references to alternative technologies. For instance, the "flexibility" featured in the title of Section 4, would likely be termed "workarounding the stiffness of C++" from a just-in-time compiled language perspective. Covering it in the paper is fine, but I suggest rephrasing - making sure that the paper is readable, approachable and appreciable for audiences ranging from (modern) FORTRAN to Julia coders. For instance, when writing about C++20 concepts, it would be apt to refer to Julia multiple dispatch alternatives.
The title is missing software version, which is required by GMD guidelines. Moreover, the title suggests that a new model is introduced, but the paper outlines a new implementation of a "classic" model.
The abstract focuses on "marketing", is repetitive, and does not convey a summary of what is in the paper (e.g., what kind of simulations were performed); suggest rewriting focusing on the key results from the presented development, and avoiding vague phrasing (e.g., a diverse range of computer architectures, a high degree of flexibility, ready to be used for understanding warm-cloud processes; all these would best be changed from "a" to "the" statements: which architectures, what kind of flexibility, which processes and what do you mean by understanding?)
The SDM acronym was introduced by Shima et al. as Super-Droplet Method; here it is deciphered as Super-droplet Model (not Method) - is it intentional? Also, it seems unclear what the authors mean by SDM. On page 1, lines 31-32, SDM is used in a broad sense of super-particle microphysics with works employing contrasting particle-collision representations cited (probabilistic, deterministic, super-particle-conserving and not), and on page 17, lines 418, SDM is used to label three works not modelling particle collisions at all; on page 1, line 42, SDM is associated with "its coalescence algorithm" implying that SDM is used in the specific sense of Shima et al. 2009 Monte-Carlo algorithm. Yet, on page 6, line 146, there is a statement about "CLEO's SDM algorithms" (plural). This is in many ways misleading. Please state at the beginning, what is meant by SDM: particle-based microphysics methods (which predate SDM, and would likely better be called particle-based microphysics), Shima et al. Monte-Carlo algorithm; the whole set of schemes from the SDM paper, or something else? This is also important given the later conclusions about random shuffling bottlenecks - the paper does not provide enough context to understand where, when and why the shuffling is needed.
Figures 1, 2, 3, 6, 7, 8 and 9 are supplied in raster graphics format, please replace all figures with vector graphics suitable for publication. Replotting the figures, ensure that font size roughly matches the font size in the main body of the paper text - currently most labels in the plots are too small.
Other comments:
- abstract: "conservative memory usage" is mentioned; however, first: it is not explained in the paper how an alternative memory usage would imply more memory needs (in contrast, it is mentioned that the employed sorting algorithm doubles the memory footprint!); second: the paper states that CLEO abstracts away such details as if the storage layer is based on linked lists or contiguous arrays.
- page 1 / line 13: "observations of warm-rain" is a very broad statement; this very first sentence of the paper would benefit from clarification on what kind of observations you have in mind, what is the type of disagreement (total amount, timing, size spectrum?). For instance, the abstract of the vanZanten work cited to support the disagreement statement states: "The ensemble average of the simulations plausibly reproduces many features of the observed clouds..."
- page 1 / line 16: "obscurity in cloud microphysics" calls for elaboration; obscurity suggests a lack of process understanding (next sentence also mentions "fundamental gaps in our microphysics knowledge"). It is fair to say we don't know everything about cloud microphysical processes, but especially given that the study deals with warm-rain clouds, it would be best to explain what is unknown and how much are the knowledge gaps limiting us vs. how complex are the challenges in modelling well understood but multi-scale and non-linear processes.
- page 1 / first paragraph: overall, I find the first three sentences (with 15 references) "obscuring" the idea of the paper, rather than introducing it. Clearly subjective, but perhaps the two above points can help in rephrasing it.
- page 1 / line 21: super-particle model is later on labelled as Lagrangian, perhaps labeling the "conventional" methods as Eulerian could help in pointing out the difference?
- page 1 / line 43: the commonly used term is "embarrassingly" rather than "extremely parallelisable"; given the scope of the paper, it seems reasonable to point here out that the reason for it is the non-overlapping pair sampling, while the GPUs enable one to leverage it.
- page 1 / line 46: this statement applies to bin models only, while "conventional" has been associated with both bulk and bin earlier; moreover it seems worth mentioning that for "bin" models, two attributes are already a rarity (Lebo & Seinfeld 2011), while employing more attributes is needed to resolve mixed-phase processes or chemical ageing in the context of aerosol-cloud interactions.
- page 4 / line 118: suggest splitting SCALE and ICON references into separate parentheses;
- page 6 / line 153: "sequentially" suggests serial execution;
- page 7 / line 170: I do not agree with the statement "SoA layout would need to perform sorting/shuffling on the array for each individual sub-component of the superdroplets separately", since an SoA layout could feature an indirection layer mapping particle id through a (single) permutation vector;
- page 7 / line 189: "to advect thermodynamics" sounds too jargonic (and suggests that it is CLEO which does the advection, but she is not IIUC);
- page 16 / line 363: please elaborate on how a "2-D kinematic flow" was adapted to a 3-D domain?
- page 16 / line 364: feedback from microphysics is a different thing than relaxation - likely "rather than" is a wrong phrase here;
- page 16 / line 381: suggest rephrasing "suffers from microphysics" and "suffers from motion";
- page 17 / line 374: if only serial random shuffling is featured, it is worth highlighting earlier on;
- since serial shuffling of particles has been identified as a bottleneck, parallel alternatives for permutations (e.g., MergeShuffle, arXiv:1508.03167) or parallel pseudorandom number generation (e.g., cuRAND for CUDA) should be considered... but on page 17, line 406 "creation of thread-safe random number generators" is mentioned - unclear;
- page 17 / line 411: "can divided" missing "be";
- page 19 / lines 472-475: the "Code availability" and "Code and data availability" sections should be merged, and the licensing terms of the code and its key dependencies should be stated.
- in references, 19 entries have malformed URLs: https://doi.org/https://doi.org/...
- in references, there are discussion-stage papers cited, for which accepted peer-reviewed papers are available: Yin et al. 2023, Matsushima et al. 2023;
- in references: please double check bibliography formatting, e.g., the Takasuka et al. 2024 paper has its 2023MS003701 id given four times
Citation: https://doi.org/10.5194/egusphere-2025-4398-RC2 -
CEC1: 'Comment on egusphere-2025-4398 - No compliance with the policy of the journal', Juan Antonio Añel, 08 Dec 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlYou have archived your code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Also, you have archived the data used and produced in your work in Edmond; however, we can not accept Edmond as a platform to publish your data.
Therefore, the current situation with your manuscript is irregular. Please, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the information of the new repositories.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-4398-CEC1 -
CC1: 'Reply on CEC1 (repost from comment on companion manuscript)', Bjorn Stevens, 10 Dec 2025
This comment follows and summarizes some offline discussions. First I want to express my appreciation for the journals efforts to promote open science. My concern is that an overly principled approach becomes artificial and creates a false sense of openness with out addressing the real problems, which are cultural. First I would address this point in general, and then more specifically for the manuscript at hand.
For the general issue: a strict and formal approach to this issue creates a bureaucratic solution (or attempted solution) to a cultural problem. The cultural problem is open science. Using frozen repos to provide snapshots of an implementation only creates the appearance of reproducibility. Worse is that it puts the onus on the implementation rather than the idea. I say 'appearance of reproducibility' because most complex codes only run on the machines they were developed on. We estimated that we spent a good person year porting a version of ICON that ran on NV GPUs (P100s) on Piz Daint, to Juwels Booster which used A100s and a different software stack. We estimated we spent 15 person years to get ICON to run on a different architecture (chip vendor and compiler) on LUMI. So for all practical matters, for the complex codes, where really implementing an idea one's self is difficult, posting the code snapshot to a repo becomes a bureaucratic act that often is a misleading waste of time and energy.
Meaningful reproducibility is a manuscript that describes the methods well enough that someone can reproduce the results. Here the results are not the exact numbers that the author produces, but the ideas that the numbers are used to advance. Ideas should not depend on the details of the implementation and if they are then they are not robust, and this sort of testing is something the scientific process is designed to sort out. Simply posting code snapshots, if they run at all, confuses the implementation with the idea, and there is a real scientific interest in having them separated.
Furthermore having code as part of a living repo, for instance as a tagged version in an open development on GitHub (or similar), provides a much more organic connection between the developers and eventual users of a code, as the users can then contribute, incorporate bug fixes and so on. If a bug fix changes the numbers but not the conclusions of a study it isn't something that should imply an addendum and comment in the journal, but it might change the conclusions of a different study that someone else does, and if so they should be rather directed by the manuscript to the open development as the reference on record.
How we share our science is a cultural issue. GMD would do itself and the community a great favor by revisiting its policies, something that could be initiated by through an open discussion. In revisiting this I would rather look for ways to communicate best practices, by highlighting authors and reviewers who practice them. In the end however, we don't want to restrict the field of ideas by saying that we don't want contributions that only present the idea but not the implementation. If our policies become overly prescriptive we effectively make assumptions that have the effect of making science less open.
As for the more specific issue. The concern with EDMOND seems to be that the authors can retain editing rights and hence delete their repo after posting it. Why not simply ask the authors to say that they won't delete the code on the repo. I understand that moving the trust issue from the authors to the maintainers of the repo seems to give an additional layer of protection, but we also know there are many cases where data is deleted from public repos accidentally by the maintainers. We also know that institutions that maintain repos will go out of business, so why do we emphasize institutional trust rather than personal trust and behavior. All the more so why this singular focus on the implementation of an idea (the code) which in the end is supplementary. Maybe this distinction is more clear if we recognize that no one would envision that every implementation of the same idea merits publication. The idea not the implementation is the contribution.
In the present case a practical solution would be to work with repositories to give them more options to support good practice. All the more so if the repositories are funded by major underwriters of the journal, as is the case here with MPDL being a major supporter of the Copernicus journals. I ask the chief editors to take these points into consideration and review their policies, and also let the manuscript proceed through review as they do so. In the end if they decide not to revise the policies, or if a satisfactory solution cannot be worked out with EDMOND, then we as authors would be faced with the decision to comply (upload a snapshot to zenodo) before final publication.Citation: https://doi.org/10.5194/egusphere-2025-4398-CC1 -
CC2: 'Flaws of the GMD archive standards', Tobias Kölling, 11 Dec 2025
I agree, that reproducibility is not copy-and-paste-ability. The priority should clearly be reproducibility of ideas and scientific results. But still I think there’s some value in making sure there’s a way to get the exact version back, e.g. when trying to re-implement something on a different system, but you can’t reproduce anything, it’s super valuable to be able to dig through the precise version of the code, even if you can’t run it anymore. So, while in practice, most people would like to be able to access the ongoing development process, being able to retrieve a verifiable copy forever is a good (secondary) goal. I however think that if one would go formalizing this strictly (and thus put some significant burden on authors), we should at least do this properly.
So my understanding of this goal for archiving code and data is twofold: There should be an immutable reference to immutable content (e.g. the exact version of the code used), and this version should be permanent (i.e. impossible to delete by anyone particular). I understand that the GMD “archive standards” are aiming to help here, but I think the current formulation is not particular sound. This is why I fully support the idea of revisiting these policies from CC1.
The archive standards:
- recommend a few links to find appropriate repositories: "Springer Nature, PLOS or ESSD"
- The Springer Nature Link contains a reference to this up-to-date article, which says “Springer Nature’s previous list of recommended repositories. Please note these are no longer enforced, and the list is no longer updated, but the list is retained for those who may require suggestions”, and refers to e.g. "DataCite’s Repository Finder", which of course lists EDMOND, so this I guess should be fine.
- The PLOS and ESSD links go straight to a 404 Not Found page…
- don’t give a reason for recommending Zenodo, apart from “Many GMD authors find Zenodo a suitable archival location”, which is a bit short.
- In particular, Zenodo allows manual deletion of records by the author within 30 days of publication and past this timeline by contacting support. The former case is realistic if a change is published late in the review. In the latter case, it must be justified, but I guess if someone really wants to do this, one could find reasons (e.g. based on the right to be forgotten, or by discovering that some piece of code might have been unlawfully copy-and-pasted, which may not be shared)
- To my knowledge, it’s not easy to externally verify that Zenodo actually keeps a given git commit hash without having GitHub available, as Zenodo usually doesn’t record the commit itself, but just the contents of a particular tree in the git repositories history. However, being able to externally verify the stored contents actually match the cited version would obviously help in ensuring the referenced content is immutable.
- don’t give a reason given, why “Usually, a third-party archive is preferable.”, and in particular, also no reason why any single one should suffice. In order to make it impossible to delete the contents, there should be multiple independent copies.
- don’t clarify what “institutional support providing reasonable confidence that the material will remain available for many years/decades” actually means (e.g. does 3 count as many years?, what is “reasonable”?)
- state as a requirement: “mechanisms for identifying the precise version of the material referred to in a persistent way. This will usually be a DOI.”
- The DOI Handbook says “The DOI name is persistent over time. Its persistence is provided by the independence of the identifier name from the element values, in particular from the entity localization or ownership. These elements can change over time: through the DOI name resolution, users will always get the up-to-date element values (This requires that the DOI record data be regularly maintained.).”
- I would already conclude that a DOI is not at all a mechanism for “identifying the precise version of the material referred to in a persistent way”, precisely because it “requires that the DOI record data be regularly maintained”… A DOI is a persistent identifier for objects, not an identifier for persistent objects. If one is about to put some large buerocratic burden aiming to ensure immutability and persistence of the objects, I think building the entire system on top of trusting anyone to regularly maintain some record without any further externally verifiable consistency checks is not at all justified.
- Assuming we can trust people following best practices isn’t easy here, as already this GMD discussion prominently shows that the journal doesn’t care too much about best practices with respect to DOIs, e.g. the comment above is referenced to by this DOI: https://doi.org/10.5194/egusphere-2025-4398-CC1, but both Crossref (head of Paper-DOIs) and DataCite (head of Data-DOIs) prominently state in their best practices, that DOIs should not carry semantic information. Still, I can very clearly map egusphere to a preprint, 2025 to this year and CC1 to the first community comment.
- The DOI Handbook says “The DOI name is persistent over time. Its persistence is provided by the independence of the identifier name from the element values, in particular from the entity localization or ownership. These elements can change over time: through the DOI name resolution, users will always get the up-to-date element values (This requires that the DOI record data be regularly maintained.).”
- state “… GitHub … are made for code development but not suitable for archiving frozen code versions”. In a way, this is literally wrong, as there’s the GitHub Arctic Code Vault, where code is literally frozen in an archive in the permafrost of Svalbard.
I mean, yes, one could say “put it to Zenodo and we guess it’ll be fine”, and maybe that’s actually true, not because of the reasons stated in the “archive standards”, because I think they are good people. But basically saying “everyone is bad, Zenodo saves us all” without any reasoning seems to be a bit short for a scientific community. Especially based on the grounds that apparently the referenced Springer, PLOS and ESSD already backed off…
To move this forward in a constructive way:
- I agree, we should first evaluate the pros and cons of introducing buerocracy.
- To my current understanding, the best way to ensure immutable content is to compute a cryptographic hash (or even better multiple) of the content and keep it with the reference (e.g. print it in the paper, e.g. by embedding it in the reference link).
- To make content permanently available, it’s best to keep the content at multiple, independent places.
- I would like to learn about why distributed version controls like git and associated hosting sites don’t make up a good archiving solution. Git already provides e.g. a SHA1 hash, and e.g. the Journal could create a fork of every repository which has been referenced in a publication, thus preventing the original authors to delete the content. If that’s not enough, we could go to another git hosting platform and have second fork there.
- I would like to learn why DOIs help in providing immutability and persistence more than a permanent hash-based ID.
- I would like to see a revised version of the “archive standards”, which
- reflect on the pros and cons of using a DOI for this
- if necessary, contain actually working references to recommended repositories
- explain the advantages of non git-based repositories
- If DOIs are used, I’d ask to follow the best practices
Citation: https://doi.org/10.5194/egusphere-2025-4398-CC2 -
CEC2: 'Reply on CC2', Juan Antonio Añel, 12 Dec 2025
Dear authors,
Regarding the compliance of the manuscript, considering the communication offline via email, given that Edmond is capable of removing the permissions for modification from the authors of a deposited asset, we could accept Edmond to publish the assets of your manuscript if the managers of Edmond confirm us that they have removed the mentioned permissions for authors. Therefore, we would kindly ask you to contact Edmond managers to that they can do it, and they reply to this comment with the confirmation of it, ideally providing some kind of evidence of such modification.
Regarding the broad discussion on the policy of the journal, many thanks for the insights, we will take them into account. We are very aware of the many limitations it has, and we work on it continuously to try to ensure the replicability of the manuscripts submitted to GMD. To try to address a minor question, GitHub is not acceptable because it is a product of a for-profit company that could end its operations or delete all their contents without any kind of control, whenever and without explanations. Also, GitHub instructs users to migrate to Zenodo repositories when they do academic work and need to get permanent identifiers for their assets.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-4398-CEC2 -
CC3: 'Reply on CEC2', Bjorn Stevens, 13 Dec 2025
Dear Editor Team,
thanks for this constructive and pragmatic decision.
Certainly I appreciate the problems with more of our intellectual property somehow being managed by a private sphere, which is one of the main reasons for supporting journals like the EGU family of journals, and (the new Tellus), which are not only scholarly run and edited, but also work with presses that share the same values, e.g., Copernicus and Ubiquity.On the longer term, I very much that GMD takes on the challenge to continually revisit policies and mechanisms to improve the culture and practice of open access.
Citation: https://doi.org/10.5194/egusphere-2025-4398-CC3 -
CC4: 'Reply on CEC2', David Walter, 16 Dec 2025
Dear editor,
I hereby confirm that I have removed all write permissions for the Edmond dataset https://doi.org/10.17617/3.LNRKSJ, meaning that the author can no longer make any changes to the dataset or remove data. If any further update should be needed in the dataset (e.g. updating the reference to the paper by adding the final paper DOI), that would be done by us (Edmond management team) in a new dataset version (same DOI), while keeping the current dataset version 1.1 unchanged. For any questions you can contact us via edmond@mpdl.mpg.de.
Kind regards,
David----------------------------------------------------------------
David Walter
Research Data Management | Collections | rdm.mpdl.mpg.de
Edmond service lead | https://edmond.mpg.de
Max Planck Digital Library (MPDL) | www.mpdl.mpg.de
Landsberger Straße 346, 80687 München, Deutschland
E-Mail: d.walter@mpdl.mpg.de
----------------------------------------------------------------Citation: https://doi.org/10.5194/egusphere-2025-4398-CC4 -
CEC3: 'Reply on CC4', Juan Antonio Añel, 17 Dec 2025
Dear authors,
Many thanks for this confirmation. We can consider now your manuscript in compliance with the policy of the journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-4398-CEC3
-
CEC3: 'Reply on CC4', Juan Antonio Añel, 17 Dec 2025
-
CC3: 'Reply on CEC2', Bjorn Stevens, 13 Dec 2025
- recommend a few links to find appropriate repositories: "Springer Nature, PLOS or ESSD"
-
CC2: 'Flaws of the GMD archive standards', Tobias Kölling, 11 Dec 2025
-
CC1: 'Reply on CEC1 (repost from comment on companion manuscript)', Bjorn Stevens, 10 Dec 2025
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 314 | 79 | 40 | 433 | 18 | 17 |
- HTML: 314
- PDF: 79
- XML: 40
- Total: 433
- BibTeX: 18
- EndNote: 17
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review of "The Fundamental Design for High Computational Performance of a New Superdroplet Model" by Bayley et al.
This is an interesting manuscript, but very computer science oriented. It describes a framework for a new super droplet code. There is not much scientific content. Normally this would likely be normally this is supplementary material to an actual set of scientific simulations, even for Geoscientific Model development. It seems basically a proof of concept. It almost seems like this needs to be in a computer science journal. I am not sure how GMD should treat that. It’s up to the editor on that front. There is nothing really wrong about the manuscript as a description of the technical aspects of a framework for a new superdoplet code. I also note that much of the computer science stuff is beyond my expertise. I was hoping for more scientific content. I'm not sure how ready for scientific content this code is.
I had some minor specific questions and clarifications. The larger issue of whether this is appropriate for GMD I leave to the editor.
Minor comments:
Page 1, L16: ‘obscurity’ is not the right word. Uncertainty?
Page 2, L41: However, SDM still requires use of collision / collection kernels, which are uncertain….might need to note that. Bin schemes have the same issue (and bulk schemes don’t represent this at all).
Page 3, L65: Define CLEO with first use in text as well as the abstract.
Page 3, L68: Define exascale computer
Page 3, L77: What other aspects of performance?
Page 3, L82: Please describe what a monoid set is. Maybe with an example. See below.
Page 7, L191: How will the array of super droplets with different grid positions be handled by advection? Is this a problem for efficiently if that part of the code wants to loop over grid boxes?
Page 9, L227: Is the basic intent then to run the SDM on a different grid than the dynamics/? See comments above about advection. Can the SDM do its own advection if 1 way coupled?
Page 9, L230: For the non-mathematically inclined. Can you give a simple example of a monoid? The description is not that clear. What are some binary operations? What is a semi group? What is an identity element?
Page 11, L289: But what if process A is estimated with a rate that would result in say complete removal of drops, and then halfway through that A step process B depletes more? How do you harmonize different process rates with different timeteps?
Page 15, L319: How does Figure 5a relate to figure 5b? Not clear how figure 5b plugs into 5a/
Page 16, L360: This drop concentration is quite high and represents very polluted conditions. 100cm-3 would be more reasonable over land. Does that affect the results? You likely would get more precipitation. Are you trying to delay it?
Page 30, Figure 6: what time in the simulation is this? 80 min? Also, why does # superdroplets = # grid boxes? One per grid box?