Preprints
https://doi.org/10.5194/egusphere-2026-3368
https://doi.org/10.5194/egusphere-2026-3368
29 Jun 2026
 | 29 Jun 2026
Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

Enhancing the advection module performance in the EPICC-Model V1.6.0 via GPU-HADVPPM4HIP V1.0 coupling and GPU-optimized strategies

Kai Cao, Qizhong Wu, Xiao Tang, Jinxi Li, Xueshun Chen, Huansheng Chen, Wending Wang, Huangjian Wu, Lei Kong, Jie Li, Jiang Zhu, and Zifa Wang
Note on duplicated preprint: although EGUsphere's policy does not support double preprinting, this preprint has a former version on EGUsphere.

Abstract. The rapid development of Graphics Processing Units (GPUs) has established new computational paradigms for enhancing air quality modeling efficiency. In this study, the heterogeneous-compute interface for portability (HIP) was implemented to parallel computing of the piecewise parabolic method (PPM) advection solver (HADVPPM) on China’s domestic GPU-like accelerators (GPU-like), resulting in a GPU-accelerated version denoted as GPU-HADVPPM4HIP V1.0. Computational performance was enhanced through three strategic optimizations: reducing the central processing unit (CPU) and GPU (CPU-GPU) data transfer frequency, thread-block coordinated indexing, and the Message Passing Interface (MPI) and HIP (“MPI+HIP”) hybrid parallelization across heterogeneous computing clusters. Following validation of the GPU-HADVPPM4HIP V1.0 program’s offline computational consistency and the pollutant simulation performance of the Emission and atmospheric Processes Integrated and Coupled Community version 1.6.0 (EPICC-Model V1.6.0) on the Earth System Numerical Simulation Facility (EarthLab), comprehensive performance testing was conducted. Offline benchmark results demonstrated that GPU-HADVPPM4HIP V1.0 achieved a maximum speedup of 556.5x on a GPU-like using the compiler optimization option compared to the Fortran HADVPPM baseline compiled option for a data size of 108. Integrating GPU-HADVPPM4HIP V1.0 into EPICC-Model V1.6.0 yielded three distinct versions: the initial HIP-based version (HIP-Ori), a version optimized for CPU and GPU communication frequency (HIP‑Opt1), and a further-optimized version employing a thread‑block coordinated indexing strategy (HIP‑Opt2). Compared to the HIP‑Ori version, HIP‑Opt1 achieved a model‑level computational efficiency improvement of 17.0x. Building upon HIP‑Opt1, HIP‑Opt2 delivered an additional 1.5x enhancement in computational efficiency. At the module level, including CPU and GPU data transfer overhead, the GPU implementation improves computational efficiency of the advection module by 39.3 %; when communication cost is excluded, the advection module attains a 20.5× acceleration relative to its CPU counterpart. This coupling establishes a foundational framework for adapting air quality models to GPU-like architectures and identifies critical optimization pathways. Moreover, the methodology provides essential technical support for achieving full-model GPU implementation of the EPICC-Model, addressing both current computational constraints and future demands for high-resolution air quality simulations.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Kai Cao, Qizhong Wu, Xiao Tang, Jinxi Li, Xueshun Chen, Huansheng Chen, Wending Wang, Huangjian Wu, Lei Kong, Jie Li, Jiang Zhu, and Zifa Wang
Note on duplicated preprint: although EGUsphere's policy does not support double preprinting, this preprint has a former version on EGUsphere.

Status: open (until 24 Aug 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Kai Cao, Qizhong Wu, Xiao Tang, Jinxi Li, Xueshun Chen, Huansheng Chen, Wending Wang, Huangjian Wu, Lei Kong, Jie Li, Jiang Zhu, and Zifa Wang
Note on duplicated preprint: although EGUsphere's policy does not support double preprinting, this preprint has a former version on EGUsphere.
Kai Cao, Qizhong Wu, Xiao Tang, Jinxi Li, Xueshun Chen, Huansheng Chen, Wending Wang, Huangjian Wu, Lei Kong, Jie Li, Jiang Zhu, and Zifa Wang
Note on duplicated preprint: although EGUsphere's policy does not support double preprinting, this preprint has a former version on EGUsphere.
Metrics will be available soon.
Latest update: 29 Jun 2026
Download
Short summary
This study achieves significant acceleration by developing an optimized advection module for Emission and atmospheric Processes Integrated and Coupled Community Model on Graphics Processing Unit. Through implementing thread-block indexing, minimizing Central Processing Unit and Graphics Processing Unit communication, and an parallel framework, we demonstrate speedups: 556.5× faster offline performance for the Heterogeneous Interface advection solver and 20.5× acceleration in coupled simulations.
Share