Preprints
https://doi.org/10.5194/egusphere-2025-2918
https://doi.org/10.5194/egusphere-2025-2918
22 Sep 2025
 | 22 Sep 2025
Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

Enhancing the advection module performance in the EPICC-Model V1.0 via GPU-HADVPPM4HIP V1.0 coupling and GPU-optimized strategies

Kai Cao, Qizhong Wu, Xiao Tang, Jinxi Li, Xueshun Chen, Huansheng Chen, Wending Wang, Huangjian Wu, Lei Kong, Jie Li, Jiang Zhu, and Zifa Wang

Abstract. The rapid development of Graphics Processing Units (GPUs) has established new computational paradigms for enhancing air quality modeling efficiency. In this study, the heterogeneous-compute interface for portability (HIP) was implemented to parallel computing of the piecewise parabolic method (PPM) advection solver (HADVPPM) on China’s domestic GPU-like accelerators (GPU-HADVPPM4HIP V1.0). Computational performance was enhanced through three strategic optimizations: reducing the central processing unit (CPU) and GPU (CPU-GPU) data transfer frequency, thread-block coordinated indexing, and the Message Passing Interface and HIP (“MPI+HIP”) hybrid parallelization across heterogeneous computing clusters. Following validation of the GPU-HADVPPM4HIP V1.0 program’s offline computational consistency and the pollutant simulation performance of the Emission and atmospheric Processes Integrated and Coupled Community version 1.0 (EPICC-Model V1.0) on the Earth System Numerical Simulation Facility (EarthLab), comprehensive performance testing was conducted. Offline benchmark results demonstrated that GPU-HADVPPM4HIP V1.0 achieved a maximum speedup of 556.5x on a domestic GPU-like accelerator with compiler optimization. Integration of GPU-HADVPPM4HIP V1.0 into EPICC-Model V1.0, combined with optimized CPU-GPU communication frequency and thread-block coordinated indexing strategies, yielded model-level computational efficiency improvements of 17.0x and 1.5x, respectively. At the module level, GPU-HADVPPM4HIP V1.0 exhibited a 39.3 % computational efficiency gain when accounting for CPU-GPU data transfer overhead, which escalated to 20.5x acceleration when excluding communication costs. This coupling establishes a foundational framework for adapting air quality models to China’s domestic GPU-like architectures and identifies critical optimization pathways. Moreover, the methodology provides essential technical support for achieving full-model GPU implementation of the EPICC-Model, addressing both current computational constraints and future demands for high-resolution air quality simulations.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Kai Cao, Qizhong Wu, Xiao Tang, Jinxi Li, Xueshun Chen, Huansheng Chen, Wending Wang, Huangjian Wu, Lei Kong, Jie Li, Jiang Zhu, and Zifa Wang

Status: open (until 17 Nov 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Kai Cao, Qizhong Wu, Xiao Tang, Jinxi Li, Xueshun Chen, Huansheng Chen, Wending Wang, Huangjian Wu, Lei Kong, Jie Li, Jiang Zhu, and Zifa Wang
Kai Cao, Qizhong Wu, Xiao Tang, Jinxi Li, Xueshun Chen, Huansheng Chen, Wending Wang, Huangjian Wu, Lei Kong, Jie Li, Jiang Zhu, and Zifa Wang

Viewed

Total article views: 116 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
115 1 0 116 0 1 0
  • HTML: 115
  • PDF: 1
  • XML: 0
  • Total: 116
  • Supplement: 0
  • BibTeX: 1
  • EndNote: 0
Views and downloads (calculated since 22 Sep 2025)
Cumulative views and downloads (calculated since 22 Sep 2025)

Viewed (geographical distribution)

Total article views: 74 (including HTML, PDF, and XML) Thereof 74 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 23 Sep 2025
Download
Short summary
This study achieves significant acceleration by developing an optimized advection module for Emission and atmospheric Processes Integrated and Coupled Community Model on GPU-like accelerators. Through implementing thread-block coordinated indexing, minimizing CPU-GPU communication, and an hybrid parallelization framework, we demonstrate prominent speedups: 556.5× faster offline performance for the Heterogeneous Interface PPM solver and 20.5× acceleration in coupled simulations.
Share