the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CLEO: The Fundamental Design for High Computational Performance of a New Superdroplet Model
Abstract. CLEO is a Super-Droplet Model (SDM) designed for performance portability on high performance computer architectures and with the intention of modelling warm-clouds in domains large enough to resolve shallow mesoscale cloud organisation O(100 km). This paper introduces CLEO’s novel C++ implementation of SDM, in particular how we map SDM theory to computations which optimise performance, primarily by conservative memory usage and efficient memory access patterns. To further speed-up simulations and to ensure a portable and maintainable code, we avoid conditional code branching and implement thread parallelism through the Kokkos library. As a result CLEO shows optimal linear scaling with increasing number of superdroplets and can use CPU and GPU thread-parallelisation across a diverse range of computer architectures. But CLEO is not just a model for computational performance, it is also designed for warm-cloud process understanding. CLEO possesses a high degree of flexibility, especially with regard to the configuration of microphysical processes and data output, that makes it well-suited to analysing sensitivity to microphysics. CLEO is therefore a new SDM ready to be used for understanding warm-cloud processes.
- Preprint
(2491 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 17 Dec 2025)
- RC1: 'Comment on egusphere-2025-4398', Anonymous Referee #1, 30 Nov 2025 reply
-
RC2: 'Comment on egusphere-2025-4398', Anonymous Referee #2, 01 Dec 2025
reply
I commend the authors for filling what is generally a gap in cloud microphysics literature, namely detailing the implementation choices and challenges - in this case for a particle-resolved cloud model.
The paper discusses such aspects as memory structures, time-stepping logic, adaptation of language-specific abstractions to domain-relevant challenges, and performance bottlenecks in parallel computing. Documented discussion of these aspects is valuable for the community, matches the scope of GMD, and is not easily discoverable in the documentation resources for other cloud modelling projects.
The key aspects of the paper, where I see room for improvement (assuming that the scope and simulations stay unchanged), is to provide a broader context, namely:
- mention alternative technologies as how they compare to the employed C++20;
- mention alternative parallelisation strategies for the SDM (e.g., where Monte-Carlo collisional growth parallelisation is done in parallel across all pairs in multiple cells of a given subdomain, rather than - IIUC - serially within cell);
- elaborate on the technological challenges stemming from the spatially uneven resource requirements of particle-based simulation (i.e., grid cells with clouds vs. grid cells without clouds), and from the fact that the employed simulations inherently cover aerosol budget (implementing aerosol sources should also benefit from the proposed memory layout);
- elaborate on the implications of the implementation choices in terms of maintenance (e.g., the Monoids should facilitate unit testing, although the contents of the `tests` folder in CLEO does not seem to leverage it yet...).
Furthermore, I suggest to make it clear upfront in the abstract and introduction to the paper that the development - in the stage presented in the paper - is not tested at exascale, not even in a multi-node setup, and only with one-way coupling with a prescribed-flow driver (the Edmond dataset metadata keywords misleadingly include: Climate Modelling, LES, Exascale Computing, ICON);
Starting with the abstract, there are several bold statements regarding optimal performance, efficient access patterns, cache efficiency, load balancing, and speed-ups, which later on in the text are clarified as based on intuition rather than comparison against alternative implementations. Matching the abstract and introduction with the content of the paper will improve reception.
Given that the paper's intended audience seems to be tech-aware research software engineering community, I suggest to redact the technical parts of the paper with references to alternative technologies. For instance, the "flexibility" featured in the title of Section 4, would likely be termed "workarounding the stiffness of C++" from a just-in-time compiled language perspective. Covering it in the paper is fine, but I suggest rephrasing - making sure that the paper is readable, approachable and appreciable for audiences ranging from (modern) FORTRAN to Julia coders. For instance, when writing about C++20 concepts, it would be apt to refer to Julia multiple dispatch alternatives.
The title is missing software version, which is required by GMD guidelines. Moreover, the title suggests that a new model is introduced, but the paper outlines a new implementation of a "classic" model.
The abstract focuses on "marketing", is repetitive, and does not convey a summary of what is in the paper (e.g., what kind of simulations were performed); suggest rewriting focusing on the key results from the presented development, and avoiding vague phrasing (e.g., a diverse range of computer architectures, a high degree of flexibility, ready to be used for understanding warm-cloud processes; all these would best be changed from "a" to "the" statements: which architectures, what kind of flexibility, which processes and what do you mean by understanding?)
The SDM acronym was introduced by Shima et al. as Super-Droplet Method; here it is deciphered as Super-droplet Model (not Method) - is it intentional? Also, it seems unclear what the authors mean by SDM. On page 1, lines 31-32, SDM is used in a broad sense of super-particle microphysics with works employing contrasting particle-collision representations cited (probabilistic, deterministic, super-particle-conserving and not), and on page 17, lines 418, SDM is used to label three works not modelling particle collisions at all; on page 1, line 42, SDM is associated with "its coalescence algorithm" implying that SDM is used in the specific sense of Shima et al. 2009 Monte-Carlo algorithm. Yet, on page 6, line 146, there is a statement about "CLEO's SDM algorithms" (plural). This is in many ways misleading. Please state at the beginning, what is meant by SDM: particle-based microphysics methods (which predate SDM, and would likely better be called particle-based microphysics), Shima et al. Monte-Carlo algorithm; the whole set of schemes from the SDM paper, or something else? This is also important given the later conclusions about random shuffling bottlenecks - the paper does not provide enough context to understand where, when and why the shuffling is needed.
Figures 1, 2, 3, 6, 7, 8 and 9 are supplied in raster graphics format, please replace all figures with vector graphics suitable for publication. Replotting the figures, ensure that font size roughly matches the font size in the main body of the paper text - currently most labels in the plots are too small.
Other comments:
- abstract: "conservative memory usage" is mentioned; however, first: it is not explained in the paper how an alternative memory usage would imply more memory needs (in contrast, it is mentioned that the employed sorting algorithm doubles the memory footprint!); second: the paper states that CLEO abstracts away such details as if the storage layer is based on linked lists or contiguous arrays.
- page 1 / line 13: "observations of warm-rain" is a very broad statement; this very first sentence of the paper would benefit from clarification on what kind of observations you have in mind, what is the type of disagreement (total amount, timing, size spectrum?). For instance, the abstract of the vanZanten work cited to support the disagreement statement states: "The ensemble average of the simulations plausibly reproduces many features of the observed clouds..."
- page 1 / line 16: "obscurity in cloud microphysics" calls for elaboration; obscurity suggests a lack of process understanding (next sentence also mentions "fundamental gaps in our microphysics knowledge"). It is fair to say we don't know everything about cloud microphysical processes, but especially given that the study deals with warm-rain clouds, it would be best to explain what is unknown and how much are the knowledge gaps limiting us vs. how complex are the challenges in modelling well understood but multi-scale and non-linear processes.
- page 1 / first paragraph: overall, I find the first three sentences (with 15 references) "obscuring" the idea of the paper, rather than introducing it. Clearly subjective, but perhaps the two above points can help in rephrasing it.
- page 1 / line 21: super-particle model is later on labelled as Lagrangian, perhaps labeling the "conventional" methods as Eulerian could help in pointing out the difference?
- page 1 / line 43: the commonly used term is "embarrassingly" rather than "extremely parallelisable"; given the scope of the paper, it seems reasonable to point here out that the reason for it is the non-overlapping pair sampling, while the GPUs enable one to leverage it.
- page 1 / line 46: this statement applies to bin models only, while "conventional" has been associated with both bulk and bin earlier; moreover it seems worth mentioning that for "bin" models, two attributes are already a rarity (Lebo & Seinfeld 2011), while employing more attributes is needed to resolve mixed-phase processes or chemical ageing in the context of aerosol-cloud interactions.
- page 4 / line 118: suggest splitting SCALE and ICON references into separate parentheses;
- page 6 / line 153: "sequentially" suggests serial execution;
- page 7 / line 170: I do not agree with the statement "SoA layout would need to perform sorting/shuffling on the array for each individual sub-component of the superdroplets separately", since an SoA layout could feature an indirection layer mapping particle id through a (single) permutation vector;
- page 7 / line 189: "to advect thermodynamics" sounds too jargonic (and suggests that it is CLEO which does the advection, but she is not IIUC);
- page 16 / line 363: please elaborate on how a "2-D kinematic flow" was adapted to a 3-D domain?
- page 16 / line 364: feedback from microphysics is a different thing than relaxation - likely "rather than" is a wrong phrase here;
- page 16 / line 381: suggest rephrasing "suffers from microphysics" and "suffers from motion";
- page 17 / line 374: if only serial random shuffling is featured, it is worth highlighting earlier on;
- since serial shuffling of particles has been identified as a bottleneck, parallel alternatives for permutations (e.g., MergeShuffle, arXiv:1508.03167) or parallel pseudorandom number generation (e.g., cuRAND for CUDA) should be considered... but on page 17, line 406 "creation of thread-safe random number generators" is mentioned - unclear;
- page 17 / line 411: "can divided" missing "be";
- page 19 / lines 472-475: the "Code availability" and "Code and data availability" sections should be merged, and the licensing terms of the code and its key dependencies should be stated.
- in references, 19 entries have malformed URLs: https://doi.org/https://doi.org/...
- in references, there are discussion-stage papers cited, for which accepted peer-reviewed papers are available: Yin et al. 2023, Matsushima et al. 2023;
- in references: please double check bibliography formatting, e.g., the Takasuka et al. 2024 paper has its 2023MS003701 id given four times
Citation: https://doi.org/10.5194/egusphere-2025-4398-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 182 | 30 | 18 | 230 | 14 | 14 |
- HTML: 182
- PDF: 30
- XML: 18
- Total: 230
- BibTeX: 14
- EndNote: 14
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review of "The Fundamental Design for High Computational Performance of a New Superdroplet Model" by Bayley et al.
This is an interesting manuscript, but very computer science oriented. It describes a framework for a new super droplet code. There is not much scientific content. Normally this would likely be normally this is supplementary material to an actual set of scientific simulations, even for Geoscientific Model development. It seems basically a proof of concept. It almost seems like this needs to be in a computer science journal. I am not sure how GMD should treat that. It’s up to the editor on that front. There is nothing really wrong about the manuscript as a description of the technical aspects of a framework for a new superdoplet code. I also note that much of the computer science stuff is beyond my expertise. I was hoping for more scientific content. I'm not sure how ready for scientific content this code is.
I had some minor specific questions and clarifications. The larger issue of whether this is appropriate for GMD I leave to the editor.
Minor comments:
Page 1, L16: ‘obscurity’ is not the right word. Uncertainty?
Page 2, L41: However, SDM still requires use of collision / collection kernels, which are uncertain….might need to note that. Bin schemes have the same issue (and bulk schemes don’t represent this at all).
Page 3, L65: Define CLEO with first use in text as well as the abstract.
Page 3, L68: Define exascale computer
Page 3, L77: What other aspects of performance?
Page 3, L82: Please describe what a monoid set is. Maybe with an example. See below.
Page 7, L191: How will the array of super droplets with different grid positions be handled by advection? Is this a problem for efficiently if that part of the code wants to loop over grid boxes?
Page 9, L227: Is the basic intent then to run the SDM on a different grid than the dynamics/? See comments above about advection. Can the SDM do its own advection if 1 way coupled?
Page 9, L230: For the non-mathematically inclined. Can you give a simple example of a monoid? The description is not that clear. What are some binary operations? What is a semi group? What is an identity element?
Page 11, L289: But what if process A is estimated with a rate that would result in say complete removal of drops, and then halfway through that A step process B depletes more? How do you harmonize different process rates with different timeteps?
Page 15, L319: How does Figure 5a relate to figure 5b? Not clear how figure 5b plugs into 5a/
Page 16, L360: This drop concentration is quite high and represents very polluted conditions. 100cm-3 would be more reasonable over land. Does that affect the results? You likely would get more precipitation. Are you trying to delay it?
Page 30, Figure 6: what time in the simulation is this? 80 min? Also, why does # superdroplets = # grid boxes? One per grid box?