the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
swLICOM: the multi-core version of an ocean general circulation model on the new generation Sunway supercomputer and its kilometer-scale application
Abstract. The global ocean general circulation model (OGCM) with kilometer-scale resolution is of great significance for understanding the climate effects of mesoscale and submesoscale eddies. To address the computational and storage demands of exponential growth associated with kilometer-scale resolution simulation for global OGCMs, we develop an enhanced and deeply optimized OGCM, namely swLICOM, on the new generation Sunway supercomputer. We design a novel split I/O scheme that effectively partitions tripole grid data across processes for reading and writing, resolving the IO bottleneck encountered in kilometer-scale resolution simulation. We also develop a new domain decomposition strategy that removes land points effectively to enhance the simulation capability. In addition, we upgrade the code translation tool swCUDA to convert the LICOM3 CUDA kernels to Sunway kernels efficiently. By further optimization using mixed precision, we achieve a peak performance of 453 Simulated Days per Day (SDPD) with 59 % parallel efficiencies at 1 km resolution, scaling up to 25 million cores. The result of simulation with a 2 km horizontal resolution shows swLICOM is capable of capturing the vigorous mesoscale eddies and active submesoscale phenomena.
- Preprint
(11223 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 29 Nov 2025)
- RC1: 'Comment on egusphere-2025-2231', Anonymous Referee #1, 13 Oct 2025 reply
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,710 | 22 | 13 | 1,745 | 16 | 19 |
- HTML: 1,710
- PDF: 22
- XML: 13
- Total: 1,745
- BibTeX: 16
- EndNote: 19
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review of “swLICOM: the multi-core version of an ocean general circulation model on the new generation Sunway supercomputer and its kilometer-scale application” by Kai Xu et al.
General comments
This study presents swLICOM, a high-performance, multi-core version of the LASG/IAP Climate System Ocean Model (LICOM3) optimized for the new-generation Sunway supercomputer. It enables kilometer-scale global ocean simulations, which are critical for resolving mesoscale and submesoscale eddies that influence ocean circulation and climate.
The authors introduce several key innovations: an automatic CUDA-to-Sunway code translation tool (swCUDA) for efficient porting, a domain decomposition method that removes land grid points, a split I/O scheme to alleviate data bottlenecks, and mixed-precision computing to balance accuracy and performance. These optimizations allow swLICOM to achieve up to 453 simulated days per day (SDPD) with 59% efficiency at 1 km resolution using over 25 million cores. The model captures vigorous mesoscale and submesoscale features, demonstrating excellent scalability and efficiency.
Overall, the paper is clearly written and well-structured, effectively communicating substantial technical work. The study shows significant and comprehensive efforts to enhance the computational performance of LICOM when ported to the Sunway system. The methods are sound, and the results convincingly support the claims. I recommend publication in GMD after minor revisions addressing the specific points below.
Specific comments
Line 36: It is unclear who or what “Kinaco” refers to. Please clarify.
Line 58: LICOM2-GPU, LICOM3-HIP, and LICOM3-CUDA are model versions, not heterogeneous supercomputers; please adjust the wording accordingly.
Section 2.2: The paper refers to the Sunway system as a “heterogeneous” architecture, but this is not clearly explained. Please clarify that heterogeneity arises from two distinct core types within each chip, the general-purpose MPEs and lightweight CPEs with separate memory hierarchies and instruction sets, rather than from separate CPU and GPU components. The section would also benefit from citing one or more detailed references on the SW26010 Pro system architecture.
Section 2.2: Please indicate the overall size of the Sunway supercomputer (e.g., total nodes, processors, or cores) to give readers a clearer sense of the system scale used for the simulations presented here.
Line 132: Please clarify what specific programming challenges are referred to, e.g., related to memory hierarchy, data communication between CPEs and MPEs, or algorithm adaptation to the Sunway architecture.
Line 139: The term “Athread kernel” refers to the parallel programming model on Sunway, but most readers may not be familiar with it. Please provide a brief explanation of Athread and its role in parallel execution.
Figure 1: The text and labels in Fig. 1a are too small to read clearly when printed. Please enlarge the figure or adjust the layout for better legibility.
Line 156: Suggest to place JK decomposition in quotation marks (“JK decomposition”) to indicate it is a specific term introduced by the authors.
Lines 186 and Fig. 5: The discussion of IJ, IK, and WKK decomposition is confusing. Please clarify how these decomposition strategies differ and what “WKK” specifically represents.
Line 214: Please clarify the phrase “across tens of thousands of machines.” Do you mean compute nodes?
Line 228: The term “Canuto parametrization” appears without prior introduction or reference. Please briefly explain or cite the source when first mentioning it.
Line 245: It appears that an equation is missing at this point in the manuscript.
Tables 1 and 3: The timestep units (presumably seconds) are missing. Please also explain why all configurations use the same timestep despite large differences in horizontal resolution. Typically, finer grids require smaller timesteps for stability.
Line 276: The term “super large parallel scale” likely refers to the largest simulations conducted in this study, but please state this explicitly to avoid ambiguity.
Sections 4.3–4.6: These sections are quite brief. Consider merging them into one cohesive section summarizing the scaling and benchmarking results to improve readability.
Figures 11–13: The units of the displayed quantities (e.g., sea surface height, temperature, salinity) are missing. Please add appropriate units to the color bars or captions.
Code and Data Availability: The “project website” and the citation “Xu (2025)” both seem to refer to the same Zenodo record (10.5281/zenodo.15494635). Please clarify whether these are distinct (e.g., project page vs. archived version) or consolidate them to avoid redundancy.
Technical corrections
A careful proofreading or light English edit is recommended to improve readability and ensure consistent terminology.
Please follow the Copernicus manuscript composition guidelines for capitalization, abbreviations, and formatting when referring to Figures, Tables, and Sections:
https://publications.copernicus.org/for_authors/manuscript_preparation.html
Line 106: Please correct or complete the reference “Y.Q. et al.” to match the proper citation format.
Line 114: The degree symbol (°) is missing, please add.
Line 176: The sentence beginning “Inout the attribute is used…” should be revised for clarity, e.g., “The inout attribute indicates whether the array is read-only or modified within the kernel.”
Line 221: Please fix the broken equation references (“equation ??”).
Line 265: The sentence beginning “Whenever the…” is unclear or incomplete; please revise.
Line 277: The manuscript frequently uses “mix precision,” but the correct term is “mixed precision.” Please revise throughout.
Line 336: Replace “double-only implementation” with “double-precision implementation” for accuracy.