High-Performance Geodynamics on GPUs Using the PETSc CUDA Backend (GAUZZ v1.0.0)
Abstract. The GPU is a computing architecture designed for parallelism in vector and matrix operations. To solve the Stokes equations in geodynamics, we apply a preconditioned conjugate-gradient (PCG) method to the pressure Schur complement system. The algorithm is dominated by sparse matrix-vector products, vector inner products, vector updates, and repeated velocity subproblem solves, all of which map onto GPU architectures. We implement this solver on GPUs by coupling CUDA-enabled PETSc with FEniCS. PETSc AIJCUSPARSE matrices with CUDA vectors keep operators and field vectors resident in VRAM, which limits CPU-GPU transfers during iterations. The P1 mass matrix, the gradient operator, and the divergence operator are reused in the pressure-correction L2-projection step. HYPRE BoomerAMG was adopted for preconditioning the velocity-only subproblem. Accuracy and performance are evaluated on manufactured solutions, the SolCx benchmark, the sticky-air benchmark, 2D and 3D Rayleigh-Taylor instabilities, 3D thermal convection, and a 2D subduction model with nonlinear viscosity. We confirm that the GPU-based implementation reproduces CPU solutions. We distinguish between the execution time, which includes only the pressure correction step, and the wall time, which includes the workflow from mesh generation to Stokes solves. A single-GPU environment achieves a 5–11× reduction in execution time relative to a 32-core CPU. For nonlinear viscosity cases with per-step matrix updates, the wall time is reduced by 1.14–3.46× on a single GPU relative to a 32-core CPU. Using Multi-Process Service to coordinate two-GPU with a 16-core CPU reduces the wall time by 15 5.83× compared to a 32-core CPU.