the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The Ocean Model for E3SM Global Applications: Omega Version 0.1.0. A New High-Performance Computing Code for Exascale Architectures
Abstract. Here we introduce Omega, the Ocean Model for E3SM Global Applications. Omega is a new ocean model designed to run efficiently on high performance computing (HPC) platforms, including exascale heterogeneous architectures with accelerators, such as Graphics Processing Units (GPUs). Omega is written in C++ and uses the Kokkos performance portability library. These were chosen because they are well-supported, and will help future-proof Omega for upcoming HPC architectures. Omega will eventually replace the Model for Prediction Across Scales-Ocean (MPAS-Ocean) in the US Department of Energy's Energy Exascale Earth System Model (E3SM). Omega runs on unstructured horizontal meshes with variable-resolution capability and implements the same horizontal discretization as MPAS-Ocean. In this paper, we document the design and performance of Omega Version 0.1.0 (Omega-V0), which solves the shallow water equations with passive tracers and is the first step towards the full primitive equation ocean model. On Central Processing Units (CPUs), Omega-V0 is 1.4 times faster than MPAS-Ocean with the same configuration. Omega-V0 is more efficient on GPUs than CPUs on a per-watt basis–by a factor of 5.3 on Frontier and 3.6 on Aurora, two of the world's fastest exascale computers.
- Preprint
(5327 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 04 Jan 2026)
- RC1: 'Comment on egusphere-2025-3500', Seiya Nishizawa, 01 Dec 2025 reply
-
RC2: 'Comment on egusphere-2025-3500', Anonymous Referee #2, 18 Dec 2025
reply
The manuscript “The Ocean Model for E3SM Global Applications: Omega Version 0.1.0. A New High-Performance Computing Code for Exascale Architectures” describes the design and testing, and demonstrates the performance and performance-portability of Omega ocean model which is supposed to replace MPAS-Ocean in the future. The model solves the SWE with passive tracers and is therefore in the early phase of developments.
The manuscript is well organized and has all the contents that I would expect from a model documentation paper for modern architecture. It describes usage of Kokkos in a Ocean model, which is probably one of the first attempts. I recommend publication, as I have only a few minor questions/comments and suggestions.
- Writing style: I strongly suggest the authors to review the introduction. It reads more like a personal experience than a scientific introduction. Specifically, the three options for model port is described in 3 large paragraphs expressing in details the concerns of the group involved. I think the community is well past that stage and does not need a long introduction of status-quo.
- Line 53 and few other places- good to avoid using “our group”, “we”, if possible.
- Why did the authors decide to publish at this early stage? Why not wait for the full ocean model to be ready?
- Line 54: Authors write that MPAS-Ocean is only half OpenACC ported because of the code structure. Could you please elaborate?
- Lines 90+: “The choice of Kokkos required our …..”- it again reflects your experience. See if the sentence can be rephrased to make it more objective.
- Line 105+: It seems that the team has added additional layer of abstraction on top of kokkos. What will happen when these people leave? Is there a strategy behind?
- Related, for my own understanding: Why did you decide to write the code on your own (domain experts) when there is a trend in the community to get it written by the software engineers?
- 11: is not discrete but the line above it says it is. Please check.
- Line 164: “operator convergence rates” – do you mean grid convergence of individual operators? Please check.
- Lines 190+ on multiple domain decomposition- I understand the benefits of supporting multiple domain decomposition but that sentence is not clear to me. Could you please rephrase?
- Line 202: “We have” – try without we.
- Trott et al., 2022b has been cited a few times for Kokkos. Please check.
- Line 251+, for my own understanding: It is mentioned that a manual tuning of kernel execution resulted in 10-20% performance gain as opposed to MDRangePolicy. How and when did the team realize that more can be achieved avoiding what kokkos offers?
- Lines around 275: Author mention that the ability to fuse or to not fuse functor is advantageous as it allows one to try out things. I am wondering if it is practically possible if one has several functors and possible combinations to try out?
- Figure 18 and related text: the comparison with MPAS-Ocean on GPUs is unfair since it is not full ported and likely not tuned. I think it is fine to use it as a reference but I would suggest mentioning it explicitly in the text.
Citation: https://doi.org/10.5194/egusphere-2025-3500-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 242 | 196 | 24 | 462 | 18 | 16 |
- HTML: 242
- PDF: 196
- XML: 24
- Total: 462
- BibTeX: 18
- EndNote: 16
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Comments on
Title: The Ocean Model for E3SM Global Applications: Omega Version 0.1.0. A New High-Performance Computing Code for Exascale Architectures
Authors: Mark R. Petersen et al.
MS No.: egusphere-2025-3500
MS type: Model description paper
General Comments
This manuscript describes Omega-V0.1.0, a new C++/Kokkos-based ocean model for E3SM targeting performance portability across heterogeneous CPU/GPU architectures. The paper provides a clear scientific motivation for the rewrite from MPAS-Ocean, presents the governing equations and discretization in sufficient detail, and includes a broad set of verification tests and multi-platform performance benchmarks. The performance results, especially on multiple exascale-class GPU systems, are a valuable contribution to the community and align well with the objectives of GMD model description papers.
That said, several clarifications are still needed to strengthen reproducibility and to help readers interpret key results. In particular, the paper should provide more concrete explanations of why OpenACC offloading was limited in MPAS-Ocean, supply missing experimental details for the benchmarks, and expand the discussion of some performance claims (for example, regular versus unstructured mesh equivalence, CPU–GPU work partitioning). I also encourage the authors to discuss how the current performance conclusions are expected to extend to Omega-V1 when more complex physical parameterizations are added.
Overall, the manuscript is strong and suitable for publication after minor-to-moderate revisions focused on clarification and consistency.
Specific Comments
The authors list four competing GPU programming approaches. Given the focus on portability, it would be useful to briefly mention recent language-standard based parallel models (for example, C++ and Fortran standard parallelism), and position them relative to the four categories already listed.
The manuscript explains that only about half of MPAS-Ocean could be accelerated with OpenACC and that this led to small kernels and poor throughput. Please add a concise, concrete explanation of which specific structural aspects of MPAS-Ocean prevented directive-based offload (for example, dynamic data structures, or control-flow complexity).
The text states that Omega was developed by a small group mainly composed of domain scientists, and that Kokkos abstractions were simplified for legibility. Given that Omega-V1/V2 will require substantial physics and infrastructure development, it would be valuable to comment on how the Omega developer community is expected to grow (e.g., anticipated contributors from E3SM and the broader ocean/atmosphere community) and on practical strategies for enabling uptake by scientists less familiar with C++.
The description of GPU-aware MPI and the observed 4–6× speedup is clear, but key experimental parameters are missing. Please specify halo width, number and type of variables communicated per step, whether variables were packed separately or aggregated, and total and per-call message sizes.
Please clarify whether you tested or considered other memory orderings of the 3-D fields in Kokkos (changing which index is contiguous), and why the current choice (vertical index contiguous) is expected to be optimal across CPU and GPU architectures. In particular, for vertically dependent physics, non-coalesced access on GPUs could become a bottleneck; a short justification or discussion of tested layouts would be helpful.
The claim that performance is “equivalent” between regular Cartesian and unstructured spherical meshes is not explained. Please clarify what metric “equivalent” refers to and why indirect or irregular accesses in unstructured meshes do not measurably degrade performance.
The mesh is described as a regular hexagonal grid, and the test cases are labeled as 1024×1024×96 and 2048×2048×96. However, the mapping between the “1024×1024” notation and the reported horizontal cell counts (approximately one million and four million, respectively) is not obvious for a hexagonal mesh. Please add a brief explanation of what the 1024 and 2048 represent and how these translate to the stated horizontal cell numbers.
The manuscript notes full utilization of CPUs and GPUs. Please describe how workload sharing between CPU and GPU is determined: automatic or manually tuned.
Table 5 uses fewer CPUs in GPU simulations than in CPU-only simulations. Please explain why the CPU count differs.
Figure 7 is not cited in the text. Please either reference and explain it or remove it.
Omega’s tracer transport tests are conducted without FCT, whereas the manuscript reports the MPAS-Ocean convergence rate only for the FCT case (2.42). To enable a clearer like-for-like comparison, please also provide the MPAS-Ocean convergence rate without FCT and discuss whether that baseline is comparable to Omega’s 1.36 rate.
CPU runtimes on Frontier and Perlmutter are identical despite different compilers being used. Please double-check and add a brief comment confirming correctness if intended.
The reported GPU speedups of Omega over MPAS-Ocean are very large. However, the benchmark configuration targets a relatively simple shallow-water system with passive tracers and does not include the more complex, branching-heavy physical parameterizations that often challenge directive-based approaches. For such a comparatively regular workload, one might expect OpenACC to achieve reasonably high GPU efficiency as well. It is therefore unclear why the performance gap remains so dramatic. Please expand the discussion to identify which kernels or design choices dominate the difference (e.g., memory layout, kernel fusion/granularity, indirect addressing, communication overlap, or data movement), and explain concretely why OpenACC fails to reach similar efficiency for this specific configuration.
The performance analysis is currently presented almost entirely in terms of relative comparisons (across machines and against MPAS-Ocean). While these are useful, the absence of absolute performance metrics makes it difficult to assess efficiency against hardware limits or to compare with other studies. Please add at least one absolute metric (e.g., achieved memory bandwidth/FLOPS, or fraction of peak) to complement the relative results and strengthen the performance section.
Omega-V0 benchmarks a relatively regular, shallow-water workload with passive tracers. Omega-V1 is expected to include more complex processes such as vertical advection and mixing, equation of state, pressure computation, and physics parameterizations. These additions often introduce more branching, irregular memory access, and heterogeneous kernel costs than the current configuration. Please include a short discussion on how the present performance conclusions are expected to translate to Omega-V1. For example:
Even a qualitative outlook would help readers assess the generality of the current performance results.
Technical Corrections
Overall recommendation: Minor revision. The required changes are mainly clarification for reproducibility and a small set of consistency and formatting fixes, with an added request to outline how performance expectations extend to Omega-V1 physics.