High-Performance Geodynamics on GPUs Using the PETSc CUDA Backend (GAUZZ v1.0.0)

Lee, Kyeong-Min; Jang, Deok-Kyu; Thieulot, Cedric; So, Byung-Dal

doi:10.5194/egusphere-2026-2528

Preprints

https://doi.org/10.5194/egusphere-2026-2528

Preprints

20 May 2026

| 20 May 2026

Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

High-Performance Geodynamics on GPUs Using the PETSc CUDA Backend (GAUZZ v1.0.0)

Kyeong-Min Lee, Deok-Kyu Jang, Cedric Thieulot, and Byung-Dal So

Abstract. The GPU is a computing architecture designed for parallelism in vector and matrix operations. To solve the Stokes equations in geodynamics, we apply a preconditioned conjugate-gradient (PCG) method to the pressure Schur complement system. The algorithm is dominated by sparse matrix-vector products, vector inner products, vector updates, and repeated velocity subproblem solves, all of which map onto GPU architectures. We implement this solver on GPUs by coupling CUDA-enabled PETSc with FEniCS. PETSc AIJCUSPARSE matrices with CUDA vectors keep operators and field vectors resident in VRAM, which limits CPU-GPU transfers during iterations. The P₁ mass matrix, the gradient operator, and the divergence operator are reused in the pressure-correction L₂-projection step. HYPRE BoomerAMG was adopted for preconditioning the velocity-only subproblem. Accuracy and performance are evaluated on manufactured solutions, the SolCx benchmark, the sticky-air benchmark, 2D and 3D Rayleigh-Taylor instabilities, 3D thermal convection, and a 2D subduction model with nonlinear viscosity. We confirm that the GPU-based implementation reproduces CPU solutions. We distinguish between the execution time, which includes only the pressure correction step, and the wall time, which includes the workflow from mesh generation to Stokes solves. A single-GPU environment achieves a 5–11× reduction in execution time relative to a 32-core CPU. For nonlinear viscosity cases with per-step matrix updates, the wall time is reduced by 1.14–3.46× on a single GPU relative to a 32-core CPU. Using Multi-Process Service to coordinate two-GPU with a 16-core CPU reduces the wall time by 15 5.83× compared to a 32-core CPU.

Received: 01 May 2026 – Discussion started: 20 May 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1919 KB)

Supplement (8911 KB)

Download & links

Kyeong-Min Lee, Deok-Kyu Jang, Cedric Thieulot, and Byung-Dal So

Status: open (until 16 Jul 2026)

Post a comment Subscribe to comment alert

Kyeong-Min Lee, Deok-Kyu Jang, Cedric Thieulot, and Byung-Dal So

Supplement

https://doi.org/10.5194/egusphere-2026-2528-supplement

Kyeong-Min Lee, Deok-Kyu Jang, Cedric Thieulot, and Byung-Dal So

Metrics will be available soon.

Latest update: 21 May 2026

Short summary

Geodynamic simulations of mantle flow and plate tectonics repeatedly solve Stokes equations, demanding large memory and long runtimes on central processing units (CPUs). We built a graphics processing unit (GPU) solver coupling FEniCS, a finite-element library, with PETSc, a numerical toolkit, while keeping matrices and vectors on the GPU. One GPU ran 5–11× faster than a 32-core CPU across seven benchmarks, and two GPUs cut wall time sixfold, enabling higher-resolution geodynamic modeling.