GPU-HADVPPM V1.0: high-efficient parallel GPU design of the Piecewise Parabolic Method (PPM) for horizontal advection in air quality model (CAMx V6.10)
Abstract. With semiconductor technology gradually approaching its physical and thermal limits, Graphics processing unit (GPU) is becoming an attractive solution in many scientific applications due to their high performance. This paper presents an application of GPU accelerators in air quality model. We endeavor to demonstrate an approach that runs a PPM solver of horizontal advection (HADVPPM) for air quality model CAMx on GPU clusters. Specifically, we first convert the HADVPPM to a new Compute Unified Device Architecture C (CUDA C) code to make it computable on the GPU (GPU-HADVPPM). Then, a series of optimization measures are taken, including reducing the CPU-GPU communication frequency, increasing the size of data computation on GPU, optimizing the GPU memory access, and using thread and block indices in order to improve the overall computing performance of CAMx model coupled with GPU-HADVPPM (named as CAMx-CUDA model). Finally, a heterogeneous, hybrid programming paradigm is presented and utilized with the GPU-HADVPPM on GPU clusters with Massage Passing Interface (MPI) and CUDA. Offline experiment results show that running GPU-HADVPPM on one NVIDIA Tesla K40m and NVIDIA Tesla V100 GPU can achieve up to 845.4x and 1113.6x acceleration. By implementing a series of optimization schemes, the CAMx-CUDA model resulted in a 29.0x and 128.4x improvement in computational efficiency using a GPU accelerator card on a K40m and V100 cluster, respectively. In terms of the single-module computational efficiency of GPU-HADVPPM, it can achieve 1.3x and 19.4x speedup on NVIDIA Tesla K40m GPU and NVIDA Tesla V100 GPU respectively. The multi-GPU acceleration algorithm enables 4.5x speedup with 8 CPU cores and 8 GPU accelerators on V100 cluster.
Kai Cao et al.
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2023-410', Anonymous Referee #1, 18 Apr 2023
CC1: 'Comment on egusphere-2023-410', Kai Cao, 20 Apr 2023
- RC3: 'Reply on CC1', Anonymous Referee #1, 06 May 2023
RC2: 'Comment on egusphere-2023-410', Anonymous Referee #2, 05 May 2023
- CC2: 'Reply on RC2', Kai Cao, 19 May 2023
- AC1: 'Comment on egusphere-2023-410', Qizhong Wu, 01 Jun 2023
Kai Cao et al.
Kai Cao et al.
Viewed (geographical distribution)
This paper presents implementation and opitimization of air quality model CAMx using Cuda C targeting GPU clusters. Experiment results show that GPU- HADVPPM can achieve about 1000x acceleration, the series of optimization can acheive about dozens of times acceleration and the final version of CAMx-GPU can achieve 4.5x speedup with 8 CPU cores and 8 GPU accelerators on V100 cluster. Here are some specific comments.