the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Toward Exascale Climate Modelling: A Python DSL Approach to ICON’s (Icosahedral Non-hydrostatic) Dynamical Core (icon-exclaim v0.2.0)
Abstract. A refactored atmospheric dynamical core of the ICON model implemented in GT4Py, a Python-based domain-specific language designed for performance portability across heterogeneous CPU-GPU architectures, is presented. Integrated within the existing Fortran infrastructure, the GT4Py core achieves throughput slightly exceeding the optimized OpenACC version, reaching up to 213 simulation days per day when using a quarter of CSCS’s ALPS GPUs.
A multi-tiered testing strategy has been implemented to ensure numerical correctness and scientific reliability of the model code. Validation has been performed through global aquaplanet and prescribed sea-surface temperature simulations to demonstrate model’s capability to simulate mesoscale and its interaction with the larger-scale at km-scale grid spacing. This work establishes a foundation for architecture-agnostic ICON global climate and weather model, and highlights poor strong scaling as a potential bottleneck in scaling toward exascale performance.
- Preprint
(4078 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 09 Dec 2025)
- RC1: 'Comment on egusphere-2025-4808', Anonymous Referee #1, 11 Nov 2025 reply
-
RC2: 'Comment on egusphere-2025-4808', Anonymous Referee #2, 17 Nov 2025
reply
This is a clear well written paper describing a gt4py implementation
of the ICON dynamical core, running in the existing ICON Fortran
modeling system, enabling k-scale atmospheric simulations on the ALPS
GPU supercomputer. The authors describe their porting approach,
including thorough testing from the kernel level up to full physics
simulations. They provide a sober analysis of the potential of GPUs
and their strong scaling limitations.I only have minor comments:
1. Section 4.3: what is "the implementation of horizontal blocking"?
Does that refer to the loop blocking in the Fortran loops, (which was
removed in the Python code?)2. Section 4.3: "...testing is tricky as the results are different due
to rounding..."
The authors have a good port testing strategy in the presence of
roundoff error, but this statement implies that these
rounding differences are unavoidable. The E3SM dycore porting work
(Bertagna et al. GMD 2019 and Bertagna et al. SC2020) showed that it
is possible to obtain BFB agreement between CPUs and GPUs with careful
coding, allowing for a different porting approach which simplifies
some aspects of code porting.3: Section 5.1:
For the final model, I assume all significant code is running on the
GPUs, with the dycore using gt4py and the physics using openACC. I
believe this is implied, but I didn't see it clearly stated. Were
there any software challenges running the two different GPU
programming models in the same executable?4. Line 400: "GT4Py synchronization"
I know of two types of synchronization: across MPI nodes, as well as
synchronization among thread teams running on the GPU. Which
is this referring to?5. Section 5.1
How does the gt4py code compare with the Fortran code on CPUs?
It would be interesting to add CPU-only performance numbers to
Figure 7.Citation: https://doi.org/10.5194/egusphere-2025-4808-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 250 | 67 | 13 | 330 | 12 | 10 |
- HTML: 250
- PDF: 67
- XML: 13
- Total: 330
- BibTeX: 12
- EndNote: 10
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript presents stage one of a multi-tiered plan to support heterogeneous (mixed CPU/GPU) architectures for running the ICON model. The authors utilize GT4Py, a domain-specific language, to modernize the ICON dynamics core from the existing Fortran code base. The outcome is a more performant code, which is also easier to read and develop compared to the equivalent Fortran implementation. The paper is well written and well reasoned, demonstrating promising results that are on par with the current state of GPU-ready Earth System modeling. I recommend that this manuscript be published, as I have only a few minor questions and technical corrections to suggest.
First, I want to commend the authors for their attention to (a) the hardware-based challenges that arise when running these models at scale, and (b) the importance of robust testing. In my experience, these topics are not typically the most exciting to discuss, but they are essential considerations for any group undertaking a similar effort.
Minor Comments:
Introduction
Section 2
Section 3
Section 4