the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Towards a real-time modeling of global ocean waves by the fully GPU-accelerated spectral wave model WAM6-GPU
Abstract. The spectral wave model WAM (Cycle 6) is a commonly-used code package for ocean wave modeling. However, it is still a challenge to include it into the long-term earth system modeling due to huge computing requirement. In this study we have successfully developed a GPU-accelerated version of the WAM model that can run all its computing-demanding components on GPUs, with a significant performance increase compared with its original CPU version. The power of GPU computing has been unleashed through substantial efforts of code refactoring, which reduces the computing time of a 7-day global 1/10° wave modeling to 7.6 minutes in a revolutionary way. The study provides an approach to energy-efficient computing for ocean wave modeling. A preliminary evaluation suggests that approximately 90 % of power can be saved.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(7944 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(7944 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2024-169', Anonymous Referee #1, 07 Apr 2024
I wish to congratulate the authors on a job well done. This is a very comprehensive piece of work, and the speedups achieved on GPU are indeed impressive. What I particularly enjoyed was the level of detail in which code optimisations were discussed. Some of the computational and memory access patterns present in WAM6 are also relevant to many scientific algorithms, and thus the optimisations demonstrated herein have a wider utility beyond the wave modelling community.
I have attached an annotated copy of the paper with some detailed comments. I have mainly requested further clarifications and/or evidence for some of the points made in the paper. The most significant of these is the request for further detail into the optimisation of the non-linear wave interaction. There are also a few typographical and grammar corrections.
-
AC1: 'Reply on RC1', Ye Yuan, 23 Apr 2024
Hi, the anonymous referee,
Thank you for your constructive comments on our manuscript entitled ‘Towards a real-time modeling of global ocean waves by the fully GPU-accelerated spectral wave model WAM6-GPU’. We feel indebted to you for your time on this manuscript. In the response attached, all the comments and concerns are replied point by point.
Kind Regards,
Ye Yuan, on behalf of the co-authors.
-
AC1: 'Reply on RC1', Ye Yuan, 23 Apr 2024
-
RC2: 'Comment on egusphere-2024-169', Anonymous Referee #2, 13 Apr 2024
Review of “Towards a real-time modeling of global ocean waves by the fully GPU-accelerated spectral wave model WAM6-GPU”
Summary
The authors have fully ported the spectral wave model (WAM6) to GPU using OpenACC with a substantial amount of code refactoring. On a GPU cluster with 32-core Intel Xeon6326 and 8 NVIDIA A100 GPUs, the WAM6-GPU code achieved a speed-up of 37x when utilizing all the resources on a node. As a result, they achieved around 90% reduction in power consumption.
This is an important study that would enable century-long global simulations with a stand-alone wave model and also facilitate the integration of wave models into Earth system models. However, before accepting this manuscript, the authors need to address the following issues thoroughly.
Main:
- In the abstract, the authors need to state the speed-up value based on a node comparison e.g., 32-core intel Xeon6326 and 8 NVIDIA A100.
- Looking into the code, I saw that most of the subroutines/modules were refactored. A rough estimate of how much the original CPU code has been refactored should be discussed within the manuscript.
- One important thing missing from this paper is the structure of the WAM code. The authors should include a skeletal code structure of both CPU and GPU versions of some parts of the code. This would greatly improve the manuscript for readers, especially for understanding the SNL optimization explained in line 245-255.
- The use of two CPU-only HPC clusters is confusing. Given that the study focuses on GPU and not the optimization of the CPU code on the CPU, I think there is no need to run the CPU code on two CPU-only HPCs. Since the NMEFC’s GPU server does not have more than one node needed for scalability of the GPU code, the authors should only keep the NMEFC’s HPC cluster for comparing resource usage needed to achieve the GPU execution time.
- Fig. 7: The authors should show the spatial difference between the output parameters generated by the WAM6-GPU and the CPU version. Mean difference (Fig. 8) sometimes averages out the spatial difference between, if any.
- Apart from running on the NVIDIA H100 GPU, are there any other further optimization strategies to improve the WAM6-GPU code on A100?
- Just curious. Considering this study started in 2020, I wonder if the authors used P100 and V100. If so, what were the achieved speed ups?
Minor:
- Line 11: This is a scientific dataset. Cite Cavaleri et al., 2012 as in Line 20
- Line 13: Check citation format.
- Line 33: The new U.S. Department of Energy (DOE) Energy Exascale Earth System Model (E3SM) has also included WW3 as part of the default component. Cite Ikuyajolu et al., 2024 and Brus et al., 2021
- Line 228: Define all terms in the equation
- Figure 6: Check caption for incorrect latex degree symbol
Reference:
Ikuyajolu, O. J., L. Van Roekel, S. R. Brus, E. E. Thomas, Y. Deng, and J. J. Benedict, 2024: Effects of Surface Turbulence Flux Parameterizations on the MJO: The Role of Ocean Surface Waves. J. Climate, https://doi.org/10.1175/JCLI-D-23-0490.1, in press.
Brus, S. R., Wolfram, P. J., Van Roekel, L. P., and Meixner, J. D.: Unstructured global to coastal wave modeling for the Energy Exascale Earth System Model using WAVEWATCH III version 6.07, Geosci. Model Dev., 14, 2917–2938, https://doi.org/10.5194/gmd-14-2917-2021, 2021.
Cavaleri, L., Fox-Kemper, B., and Hemer, M.:WindWaves in the Coupled Climate System, Bulletin of the American Meteorological Society, 93, 1651 – 1661, https://doi.org/10.1175/BAMS-D-11-00170.1, 2012.
-
AC2: 'Reply on RC2', Ye Yuan, 23 Apr 2024
Hi, the anonymous referee,
Thank you for your constructive comments on our manuscript entitled ‘Towards a real-time modeling of global ocean waves by the fully GPU-accelerated spectral wave model WAM6-GPU’. We feel indebted to you for your time on this manuscript. In the response below, all the comments and concerns are replied point by point, and the revised manuscript is attached as PDF supplements.
Kind Regards,
Ye Yuan, on behalf of the co-authors.
-
AC3: 'Comment on egusphere-2024-169', Ye Yuan, 26 Apr 2024
To potential model users,
WAM6-GPU has been updated to version 1.2 in the past few months, which can be download from the Zenodo repository. Now WAM6-GPU supports nested cases. Besides, some important bugs has been fixed. Please see the attached pdf for version history.
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2024-169', Anonymous Referee #1, 07 Apr 2024
I wish to congratulate the authors on a job well done. This is a very comprehensive piece of work, and the speedups achieved on GPU are indeed impressive. What I particularly enjoyed was the level of detail in which code optimisations were discussed. Some of the computational and memory access patterns present in WAM6 are also relevant to many scientific algorithms, and thus the optimisations demonstrated herein have a wider utility beyond the wave modelling community.
I have attached an annotated copy of the paper with some detailed comments. I have mainly requested further clarifications and/or evidence for some of the points made in the paper. The most significant of these is the request for further detail into the optimisation of the non-linear wave interaction. There are also a few typographical and grammar corrections.
-
AC1: 'Reply on RC1', Ye Yuan, 23 Apr 2024
Hi, the anonymous referee,
Thank you for your constructive comments on our manuscript entitled ‘Towards a real-time modeling of global ocean waves by the fully GPU-accelerated spectral wave model WAM6-GPU’. We feel indebted to you for your time on this manuscript. In the response attached, all the comments and concerns are replied point by point.
Kind Regards,
Ye Yuan, on behalf of the co-authors.
-
AC1: 'Reply on RC1', Ye Yuan, 23 Apr 2024
-
RC2: 'Comment on egusphere-2024-169', Anonymous Referee #2, 13 Apr 2024
Review of “Towards a real-time modeling of global ocean waves by the fully GPU-accelerated spectral wave model WAM6-GPU”
Summary
The authors have fully ported the spectral wave model (WAM6) to GPU using OpenACC with a substantial amount of code refactoring. On a GPU cluster with 32-core Intel Xeon6326 and 8 NVIDIA A100 GPUs, the WAM6-GPU code achieved a speed-up of 37x when utilizing all the resources on a node. As a result, they achieved around 90% reduction in power consumption.
This is an important study that would enable century-long global simulations with a stand-alone wave model and also facilitate the integration of wave models into Earth system models. However, before accepting this manuscript, the authors need to address the following issues thoroughly.
Main:
- In the abstract, the authors need to state the speed-up value based on a node comparison e.g., 32-core intel Xeon6326 and 8 NVIDIA A100.
- Looking into the code, I saw that most of the subroutines/modules were refactored. A rough estimate of how much the original CPU code has been refactored should be discussed within the manuscript.
- One important thing missing from this paper is the structure of the WAM code. The authors should include a skeletal code structure of both CPU and GPU versions of some parts of the code. This would greatly improve the manuscript for readers, especially for understanding the SNL optimization explained in line 245-255.
- The use of two CPU-only HPC clusters is confusing. Given that the study focuses on GPU and not the optimization of the CPU code on the CPU, I think there is no need to run the CPU code on two CPU-only HPCs. Since the NMEFC’s GPU server does not have more than one node needed for scalability of the GPU code, the authors should only keep the NMEFC’s HPC cluster for comparing resource usage needed to achieve the GPU execution time.
- Fig. 7: The authors should show the spatial difference between the output parameters generated by the WAM6-GPU and the CPU version. Mean difference (Fig. 8) sometimes averages out the spatial difference between, if any.
- Apart from running on the NVIDIA H100 GPU, are there any other further optimization strategies to improve the WAM6-GPU code on A100?
- Just curious. Considering this study started in 2020, I wonder if the authors used P100 and V100. If so, what were the achieved speed ups?
Minor:
- Line 11: This is a scientific dataset. Cite Cavaleri et al., 2012 as in Line 20
- Line 13: Check citation format.
- Line 33: The new U.S. Department of Energy (DOE) Energy Exascale Earth System Model (E3SM) has also included WW3 as part of the default component. Cite Ikuyajolu et al., 2024 and Brus et al., 2021
- Line 228: Define all terms in the equation
- Figure 6: Check caption for incorrect latex degree symbol
Reference:
Ikuyajolu, O. J., L. Van Roekel, S. R. Brus, E. E. Thomas, Y. Deng, and J. J. Benedict, 2024: Effects of Surface Turbulence Flux Parameterizations on the MJO: The Role of Ocean Surface Waves. J. Climate, https://doi.org/10.1175/JCLI-D-23-0490.1, in press.
Brus, S. R., Wolfram, P. J., Van Roekel, L. P., and Meixner, J. D.: Unstructured global to coastal wave modeling for the Energy Exascale Earth System Model using WAVEWATCH III version 6.07, Geosci. Model Dev., 14, 2917–2938, https://doi.org/10.5194/gmd-14-2917-2021, 2021.
Cavaleri, L., Fox-Kemper, B., and Hemer, M.:WindWaves in the Coupled Climate System, Bulletin of the American Meteorological Society, 93, 1651 – 1661, https://doi.org/10.1175/BAMS-D-11-00170.1, 2012.
-
AC2: 'Reply on RC2', Ye Yuan, 23 Apr 2024
Hi, the anonymous referee,
Thank you for your constructive comments on our manuscript entitled ‘Towards a real-time modeling of global ocean waves by the fully GPU-accelerated spectral wave model WAM6-GPU’. We feel indebted to you for your time on this manuscript. In the response below, all the comments and concerns are replied point by point, and the revised manuscript is attached as PDF supplements.
Kind Regards,
Ye Yuan, on behalf of the co-authors.
-
AC3: 'Comment on egusphere-2024-169', Ye Yuan, 26 Apr 2024
To potential model users,
WAM6-GPU has been updated to version 1.2 in the past few months, which can be download from the Zenodo repository. Now WAM6-GPU supports nested cases. Besides, some important bugs has been fixed. Please see the attached pdf for version history.
Peer review completion
Journal article(s) based on this preprint
Model code and software
The WAM6-GPU: an OpenACC version of the third-generation spectral wave model WAM (Cycle 6) Ye Yuan https://zenodo.org/records/10453369
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
363 | 260 | 43 | 666 | 23 | 19 |
- HTML: 363
- PDF: 260
- XML: 43
- Total: 666
- BibTeX: 23
- EndNote: 19
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Fujiang Yu
Zhi Chen
Xueding Li
Fang Hou
Yuanyong Gao
Zhiyi Gao
Renbo Pang
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(7944 KB) - Metadata XML