Preprints
https://doi.org/10.5194/egusphere-2026-695
https://doi.org/10.5194/egusphere-2026-695
05 Mar 2026
 | 05 Mar 2026
Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

GPU-accelerated Finite-Element Method for the Three-dimensional Unstructured Mesh Atmospheric Dynamic Framework

Leisheng Li, Ximeng Fu, Xiyu Zheng, Huiyuan Li, and Jinxi Li

Abstract. The three-dimensional unstructured-mesh finite-element atmospheric dynamical framework is gaining significance owing to its flexibility in representing complex topography and capability for multi-scale simulations in high resolutions. However, this framework has substantial bottlenecks. Unlike structured-grid models, the unstructured finite element method (FEM) must frequently access irregular mesh connectivity among nodes, edges, and elements, causing indirect memory addressing, inadequate data locality, and substantial memory bandwidth bottlenecks on conventional CPU architectures. Consequently, element-wise computations and global assembly are the primary contributors to the runtime in high-resolution simulations.

This study develops a GPU-parallel implementation of the Fluidity-Atmosphere dynamical core to address these challenges. The GPU-oriented data structures and optimized kernels are designed to efficiently leverage the computing power of GPUs. These kernels enable parallelized element integration and are efficient solvers for specific size matrices; a parallel assembly strategy enhances memory throughput during global sparse matrix construction. On the NVIDIA A100 GPU, the optimized kernels achieve speeds over 100× for element-wise computations and up to 389.02 times for global matrix assembly, resulting in an overall acceleration of 8.57 times with four messages passing interface (MPI) processes. The proposed framework demonstrates that tailored GPU parallelization is effective in overcoming the computational bottleneck of unstructured FEM-based atmospheric models, facilitating high-resolution simulations on heterogeneous architectures.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Leisheng Li, Ximeng Fu, Xiyu Zheng, Huiyuan Li, and Jinxi Li

Status: open (until 30 Apr 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Leisheng Li, Ximeng Fu, Xiyu Zheng, Huiyuan Li, and Jinxi Li
Leisheng Li, Ximeng Fu, Xiyu Zheng, Huiyuan Li, and Jinxi Li
Metrics will be available soon.
Latest update: 05 Mar 2026
Download
Short summary
Scientists use irregular grid models for accurate weather simulation, which help capture details but also make the calculations slow on traditional computers. We redesigned this model for GPUs by reorganizing data and calculations. This makes the slowest parts hundreds of times faster and the whole simulation over ten times faster. This allows for higher-resolution simulations.
Share