A High-Fidelity CUDA Implementation of the Shchepetkin Density-Jacobian Pressure Gradient Scheme for ROMS
Abstract. Baroclinic pressure gradients in terrain-following ocean models require careful numerical treatment to avoid generating spurious currents over steep topography. The Shchepetkin density Jacobian algorithm achieves high accuracy through harmonic mean density slope reconstruction and fourth-order monotonized cubic polynomial corrections, but has largely remained on CPU architectures despite growing GPU adoption in Earth system models. This work presents the first publicly available CUDA implementation of the complete Shchepetkin density-Jacobian pressure gradient scheme as used in the Regional Ocean Modeling System (ROMS). No algorithmic simplifications were introduced during GPU porting. Validation against CPU reference solutions on the "Tall Isolated Seamount" benchmark (Beckmann and Haidvogel, 1993) with steep topography (r ≈ 0.4) demonstrates maximum relative error of πͺ (10−6) across the 54 × 51 × 13 computational domain. A "lake at rest" test using a horizontally uniform density field over the same Seamount bathymetry confirms the well-balanced property: the kernel produces exactly zero pressure gradient force to machine precision. The small discrepancies in the Seamount comparison reflect floating-point precision and architectural differences between GPU and CPU rather than algorithmic deficiencies. The standalone kernel architecture enables integration into existing ocean models. Source code and validation data are made publicly available under an open-source license.