swLICOM: the multi-core version of an ocean general circulation model on the new generation Sunway supercomputer and its kilometer-scale application

Xu, Kai; Yu, Maoxue; Yu, Jiangfeng; Xie, Jingwei; Han, Xiang; Song, Jiaying; Geng, Mingyao; Jiang, Jinrong; Liu, Hailong; Wang, Pengfei; Lin, Pengfei

doi:10.5194/egusphere-2025-2231

Preprints

https://doi.org/10.5194/egusphere-2025-2231

Preprints

27 Aug 2025

| 27 Aug 2025

Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

swLICOM: the multi-core version of an ocean general circulation model on the new generation Sunway supercomputer and its kilometer-scale application

Kai Xu, Maoxue Yu, Jiangfeng Yu, Jingwei Xie, Xiang Han, Jiaying Song, Mingyao Geng, Jinrong Jiang, Hailong Liu, Pengfei Wang, and Pengfei Lin

Abstract. The global ocean general circulation model (OGCM) with kilometer-scale resolution is of great significance for understanding the climate effects of mesoscale and submesoscale eddies. To address the computational and storage demands of exponential growth associated with kilometer-scale resolution simulation for global OGCMs, we develop an enhanced and deeply optimized OGCM, namely swLICOM, on the new generation Sunway supercomputer. We design a novel split I/O scheme that effectively partitions tripole grid data across processes for reading and writing, resolving the IO bottleneck encountered in kilometer-scale resolution simulation. We also develop a new domain decomposition strategy that removes land points effectively to enhance the simulation capability. In addition, we upgrade the code translation tool swCUDA to convert the LICOM3 CUDA kernels to Sunway kernels efficiently. By further optimization using mixed precision, we achieve a peak performance of 453 Simulated Days per Day (SDPD) with 59 % parallel efficiencies at 1 km resolution, scaling up to 25 million cores. The result of simulation with a 2 km horizontal resolution shows swLICOM is capable of capturing the vigorous mesoscale eddies and active submesoscale phenomena.

Received: 13 May 2025 – Discussion started: 27 Aug 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Kai Xu, Maoxue Yu, Jiangfeng Yu, Jingwei Xie, Xiang Han, Jiaying Song, Mingyao Geng, Jinrong Jiang, Hailong Liu, Pengfei Wang, and Pengfei Lin

Status: open (until 29 Nov 2025)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2025-2231', Anonymous Referee #1, 13 Oct 2025 reply

Review of “swLICOM: the multi-core version of an ocean general circulation model on the new generation Sunway supercomputer and its kilometer-scale application” by Kai Xu et al.
General comments
This study presents swLICOM, a high-performance, multi-core version of the LASG/IAP Climate System Ocean Model (LICOM3) optimized for the new-generation Sunway supercomputer. It enables kilometer-scale global ocean simulations, which are critical for resolving mesoscale and submesoscale eddies that influence ocean circulation and climate.
The authors introduce several key innovations: an automatic CUDA-to-Sunway code translation tool (swCUDA) for efficient porting, a domain decomposition method that removes land grid points, a split I/O scheme to alleviate data bottlenecks, and mixed-precision computing to balance accuracy and performance. These optimizations allow swLICOM to achieve up to 453 simulated days per day (SDPD) with 59% efficiency at 1 km resolution using over 25 million cores. The model captures vigorous mesoscale and submesoscale features, demonstrating excellent scalability and efficiency.
Overall, the paper is clearly written and well-structured, effectively communicating substantial technical work. The study shows significant and comprehensive efforts to enhance the computational performance of LICOM when ported to the Sunway system. The methods are sound, and the results convincingly support the claims. I recommend publication in GMD after minor revisions addressing the specific points below.
Specific comments
Line 36: It is unclear who or what “Kinaco” refers to. Please clarify.
Line 58: LICOM2-GPU, LICOM3-HIP, and LICOM3-CUDA are model versions, not heterogeneous supercomputers; please adjust the wording accordingly.
Section 2.2: The paper refers to the Sunway system as a “heterogeneous” architecture, but this is not clearly explained. Please clarify that heterogeneity arises from two distinct core types within each chip, the general-purpose MPEs and lightweight CPEs with separate memory hierarchies and instruction sets, rather than from separate CPU and GPU components. The section would also benefit from citing one or more detailed references on the SW26010 Pro system architecture.
Section 2.2: Please indicate the overall size of the Sunway supercomputer (e.g., total nodes, processors, or cores) to give readers a clearer sense of the system scale used for the simulations presented here.
Line 132: Please clarify what specific programming challenges are referred to, e.g., related to memory hierarchy, data communication between CPEs and MPEs, or algorithm adaptation to the Sunway architecture.
Line 139: The term “Athread kernel” refers to the parallel programming model on Sunway, but most readers may not be familiar with it. Please provide a brief explanation of Athread and its role in parallel execution.
Figure 1: The text and labels in Fig. 1a are too small to read clearly when printed. Please enlarge the figure or adjust the layout for better legibility.
Line 156: Suggest to place JK decomposition in quotation marks (“JK decomposition”) to indicate it is a specific term introduced by the authors.
Lines 186 and Fig. 5: The discussion of IJ, IK, and WKK decomposition is confusing. Please clarify how these decomposition strategies differ and what “WKK” specifically represents.
Line 214: Please clarify the phrase “across tens of thousands of machines.” Do you mean compute nodes?
Line 228: The term “Canuto parametrization” appears without prior introduction or reference. Please briefly explain or cite the source when first mentioning it.
Line 245: It appears that an equation is missing at this point in the manuscript.
Tables 1 and 3: The timestep units (presumably seconds) are missing. Please also explain why all configurations use the same timestep despite large differences in horizontal resolution. Typically, finer grids require smaller timesteps for stability.
Line 276: The term “super large parallel scale” likely refers to the largest simulations conducted in this study, but please state this explicitly to avoid ambiguity.
Sections 4.3–4.6: These sections are quite brief. Consider merging them into one cohesive section summarizing the scaling and benchmarking results to improve readability.
Figures 11–13: The units of the displayed quantities (e.g., sea surface height, temperature, salinity) are missing. Please add appropriate units to the color bars or captions.
Code and Data Availability: The “project website” and the citation “Xu (2025)” both seem to refer to the same Zenodo record (10.5281/zenodo.15494635). Please clarify whether these are distinct (e.g., project page vs. archived version) or consolidate them to avoid redundancy.
Technical corrections
A careful proofreading or light English edit is recommended to improve readability and ensure consistent terminology.
Please follow the Copernicus manuscript composition guidelines for capitalization, abbreviations, and formatting when referring to Figures, Tables, and Sections:

https://publications.copernicus.org/for_authors/manuscript_preparation.html
Line 106: Please correct or complete the reference “Y.Q. et al.” to match the proper citation format.
Line 114: The degree symbol (°) is missing, please add.
Line 176: The sentence beginning “Inout the attribute is used…” should be revised for clarity, e.g., “The inout attribute indicates whether the array is read-only or modified within the kernel.”
Line 221: Please fix the broken equation references (“equation ??”).
Line 265: The sentence beginning “Whenever the…” is unclear or incomplete; please revise.
Line 277: The manuscript frequently uses “mix precision,” but the correct term is “mixed precision.” Please revise throughout.
Line 336: Replace “double-only implementation” with “double-precision implementation” for accuracy.

Reply

Citation: https://doi.org/10.5194/egusphere-2025-2231-RC1

Kai Xu, Maoxue Yu, Jiangfeng Yu, Jingwei Xie, Xiang Han, Jiaying Song, Mingyao Geng, Jinrong Jiang, Hailong Liu, Pengfei Wang, and Pengfei Lin

Viewed

Total article views: 1,789 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,738	36	15	1,789	21	23

HTML: 1,738
PDF: 36
XML: 15
Total: 1,789
BibTeX: 21
EndNote: 23

Views and downloads (calculated since 27 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	315	5	2	322
Sep 2025	1,307	4	4	1,315
Oct 2025	97	18	8	123
Nov 2025	19	9	1	29

Cumulative views and downloads (calculated since 27 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	315	5	2	322
Sep 2025	1,307	4	4	1,315
Oct 2025	97	18	8	123
Nov 2025	19	9	1	29

Viewed (geographical distribution)

Total article views: 1,694 (including HTML, PDF, and XML) Thereof 1,694 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 17 Nov 2025

Short summary

swLICOM represents a significant advancement in kilometer-scale resolution ocean general circulation models on heterogeneous computing architectures. Our optimization efforts addressed a series of challenges that are particularly crucial for high-resolution modeling. We use swLICOM with a horizontal resolution of 2 km to conduct a short-term simulation test. The 2-km resolution global simulation shows the high capacity of swLICOM to capture the oceanic meso- to submesoscale processes.


Total:	0
HTML:	0
PDF:	0
XML:	0