28 Sep 2022
28 Sep 2022

Pace v0.1: A Python-based Performance-Portable Implementation of the FV3 Dynamical Core

Johann Dahm1,, Eddie Davis1,, Florian Deconinck1,, Oliver Elbert1,, Rhea George1,, Jeremy McGibbon1,, Tobias Wicky1,, Elynn Wu1,, Christopher Kung2, Tal Ben-Nun3, Lucas Harris4, Linus Groner5, and Oliver Fuhrer1,6 Johann Dahm et al.
  • 1Allen Institute of Artificial Intelligence, Seattle, U.S.A.
  • 2Global Modeling and Assimilation Office, NASA, Greenbelt MD, U.S.A.
  • 3Department of Computer Science, ETH Zurich, Zurich, Switzerland
  • 4Geophysical Fluid Dynamics Laboratory, NOAA, Princeton NJ, U.S.A.
  • 5Swiss National Supercomputing Centre (CSCS), ETH Zurich, Lugano, Switzerland
  • 6Federal Institute of Meteorology and Climatology MeteoSwiss, Zurich, Switzerland
  • These authors contributed equally to this work and are listed in alphabetical order.

Abstract. Progress in leveraging current and emerging high-performance computing infrastructures using traditional weather and climate models has been slow. This has become known more broadly as the software productivity gap. With the end of Moore's Law driving forward rapid specialization of hardware architectures, building simulation codes on a low-level language with hardware specific optimizations is a significant risk. As a solution, we present Pace, an implementation of the nonhydrostatic FV3 dynamical core which is entirely Python-based. In order to achieve high performance on a diverse set of hardware architectures, Pace is written using the GT4Py domain-specific language. We demonstrate that with this approach we can achieve portability and performance, while significantly improving the readability and maintainability of the code as compared to the Fortran reference implementation. We show that Pace can run at scale on leadership-class supercomputers and achieve performance speeds 3.5–4 times faster than the Fortran code on GPU-accelerated supercomputers. Furthermore, we demonstrate how a Python-based simulation code facilitates existing or enables entirely new use-cases and workflows. Pace demonstrates how a high-level language can insulate us from disruptive changes, provide a more productive development environment, and facilitate the integration with new technologies such as machine learning.

Johann Dahm et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2022-943', Anonymous Referee #1, 31 Oct 2022
  • RC2: 'Comment on egusphere-2022-943', Anonymous Referee #2, 11 Nov 2022

Johann Dahm et al.

Model code and software

ai2cm/pace: v0.1.0 GMD release Rhea George, Elynn Wu, Jeremy McGibbon, Johann Dahm, Eddie Davis, Tobias Wicky, Florian Deconinck, Christopher Kung, Oliver Fuhrer, Oliver Elbert, Ajda Savarin, Noah D. Brenowitz, Mark Cheeseman, Brian Henn, Spencer Clark, and Yannick Niedermayr

ai2cm/gt4py: v0.1.0 GMD release Johann Dahm, Linus Groner, Enrique G. Paredes, Felix Thaler, Hannes Vogt, Eddie Davis, Rico Haeuselmann, Till Ehrengruber, Stefano Ubbiali, Tobias Wicky, Florian Deconinck, Tal Ben-Nun, and Rhea George

Johann Dahm et al.


Total article views: 1,161 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
790 360 11 1,161 5 1
  • HTML: 790
  • PDF: 360
  • XML: 11
  • Total: 1,161
  • BibTeX: 5
  • EndNote: 1
Views and downloads (calculated since 28 Sep 2022)
Cumulative views and downloads (calculated since 28 Sep 2022)

Viewed (geographical distribution)

Total article views: 1,152 (including HTML, PDF, and XML) Thereof 1,152 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 29 Nov 2022
Short summary
It is hard for scientists to write efficient code which runs fast on all kinds of supercomputers. They like writing Python because it is easier to read and use. We re-wrote a Fortran code that simulates weather and climate into Python. The Python code re-writes itself to a much faster language to run on either normal processors or graphics cards. On one big computer system, our code is 3.5–4x faster on its graphics cards than the original code is on its processors.