Preprints
https://doi.org/10.5194/egusphere-2023-410
https://doi.org/10.5194/egusphere-2023-410
04 Apr 2023
 | 04 Apr 2023

GPU-HADVPPM V1.0: high-efficient parallel GPU design of the Piecewise Parabolic Method (PPM) for horizontal advection in air quality model (CAMx V6.10)

Kai Cao, Qizhong Wu, Lingling Wang, Nan Wang, Huaqiong Cheng, Xiao Tang, Dongqing Li, and Lanning Wang

Abstract. With semiconductor technology gradually approaching its physical and thermal limits, Graphics processing unit (GPU) is becoming an attractive solution in many scientific applications due to their high performance. This paper presents an application of GPU accelerators in air quality model. We endeavor to demonstrate an approach that runs a PPM solver of horizontal advection (HADVPPM) for air quality model CAMx on GPU clusters. Specifically, we first convert the HADVPPM to a new Compute Unified Device Architecture C (CUDA C) code to make it computable on the GPU (GPU-HADVPPM). Then, a series of optimization measures are taken, including reducing the CPU-GPU communication frequency, increasing the size of data computation on GPU, optimizing the GPU memory access, and using thread and block indices in order to improve the overall computing performance of CAMx model coupled with GPU-HADVPPM (named as CAMx-CUDA model). Finally, a heterogeneous, hybrid programming paradigm is presented and utilized with the GPU-HADVPPM on GPU clusters with Massage Passing Interface (MPI) and CUDA. Offline experiment results show that running GPU-HADVPPM on one NVIDIA Tesla K40m and NVIDIA Tesla V100 GPU can achieve up to 845.4x and 1113.6x acceleration. By implementing a series of optimization schemes, the CAMx-CUDA model resulted in a 29.0x and 128.4x improvement in computational efficiency using a GPU accelerator card on a K40m and V100 cluster, respectively. In terms of the single-module computational efficiency of GPU-HADVPPM, it can achieve 1.3x and 19.4x speedup on NVIDIA Tesla K40m GPU and NVIDA Tesla V100 GPU respectively. The multi-GPU acceleration algorithm enables 4.5x speedup with 8 CPU cores and 8 GPU accelerators on V100 cluster.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Journal article(s) based on this preprint

01 Aug 2023
GPU-HADVPPM V1.0: a high-efficiency parallel GPU design of the piecewise parabolic method (PPM) for horizontal advection in an air quality model (CAMx V6.10)
Kai Cao, Qizhong Wu, Lingling Wang, Nan Wang, Huaqiong Cheng, Xiao Tang, Dongqing Li, and Lanning Wang
Geosci. Model Dev., 16, 4367–4383, https://doi.org/10.5194/gmd-16-4367-2023,https://doi.org/10.5194/gmd-16-4367-2023, 2023
Short summary
Kai Cao, Qizhong Wu, Lingling Wang, Nan Wang, Huaqiong Cheng, Xiao Tang, Dongqing Li, and Lanning Wang

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2023-410', Anonymous Referee #1, 18 Apr 2023
  • CC1: 'Comment on egusphere-2023-410', Kai Cao, 20 Apr 2023
    • RC3: 'Reply on CC1', Anonymous Referee #1, 06 May 2023
  • RC2: 'Comment on egusphere-2023-410', Anonymous Referee #2, 05 May 2023
    • CC2: 'Reply on RC2', Kai Cao, 19 May 2023
  • AC1: 'Comment on egusphere-2023-410', Qizhong Wu, 01 Jun 2023

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2023-410', Anonymous Referee #1, 18 Apr 2023
  • CC1: 'Comment on egusphere-2023-410', Kai Cao, 20 Apr 2023
    • RC3: 'Reply on CC1', Anonymous Referee #1, 06 May 2023
  • RC2: 'Comment on egusphere-2023-410', Anonymous Referee #2, 05 May 2023
    • CC2: 'Reply on RC2', Kai Cao, 19 May 2023
  • AC1: 'Comment on egusphere-2023-410', Qizhong Wu, 01 Jun 2023

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
AR by Qizhong Wu on behalf of the Authors (01 Jun 2023)  Author's response   Author's tracked changes   Manuscript 
ED: Publish subject to minor revisions (review by editor) (05 Jun 2023) by Xiaomeng Huang
AR by Qizhong Wu on behalf of the Authors (12 Jun 2023)  Author's response   Author's tracked changes   Manuscript 
ED: Publish as is (20 Jun 2023) by Xiaomeng Huang
AR by Qizhong Wu on behalf of the Authors (24 Jun 2023)  Manuscript 

Journal article(s) based on this preprint

01 Aug 2023
GPU-HADVPPM V1.0: a high-efficiency parallel GPU design of the piecewise parabolic method (PPM) for horizontal advection in an air quality model (CAMx V6.10)
Kai Cao, Qizhong Wu, Lingling Wang, Nan Wang, Huaqiong Cheng, Xiao Tang, Dongqing Li, and Lanning Wang
Geosci. Model Dev., 16, 4367–4383, https://doi.org/10.5194/gmd-16-4367-2023,https://doi.org/10.5194/gmd-16-4367-2023, 2023
Short summary
Kai Cao, Qizhong Wu, Lingling Wang, Nan Wang, Huaqiong Cheng, Xiao Tang, Dongqing Li, and Lanning Wang
Kai Cao, Qizhong Wu, Lingling Wang, Nan Wang, Huaqiong Cheng, Xiao Tang, Dongqing Li, and Lanning Wang

Viewed

Total article views: 442 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
324 100 18 442 29 5 5
  • HTML: 324
  • PDF: 100
  • XML: 18
  • Total: 442
  • Supplement: 29
  • BibTeX: 5
  • EndNote: 5
Views and downloads (calculated since 04 Apr 2023)
Cumulative views and downloads (calculated since 04 Apr 2023)

Viewed (geographical distribution)

Total article views: 442 (including HTML, PDF, and XML) Thereof 442 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 

Cited

Latest update: 03 Sep 2024
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
Offline performance experiments results show that the GPU-HADVPPM on V100 GPU can achieve up to more than one thousand (1113.6x) speedup to its original version on E5-2682 v4 CPU. A series of optimization measures are taken, the CAMx-CUDA model improves the computing efficiency by 128.4x on a single V100 GPU card. A parallel architecture with an MPI+CUDA hybrid paradigm is presented, and it can achieve up to 4.5x speedup when launch 8 CPU cores and 8 GPU cards.