the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Porting the Meso-NH atmospheric model on different GPU architectures for the next generation of supercomputers (version MESONH-v55-OpenACC)
Abstract. The advent of heterogeneous supercomputers with multi-core central processing units (CPUs) and graphics processing units (GPUs) requires geoscientific codes to be adapted to these new architectures. Here we describe the porting of the Meso-NH version 5.5 community weather research code to GPUs named MESONH-v55-OpenACC, with guaranteed bit reproducibility thanks to its own MPPDB_CHECK library. This porting includes the use of OpenACC directives, specific memory management, communications optimization, development of a geometric multigrid solver and creation of an in-house preprocessor. Performance on AMD MI250X GPU Adastra platform shows up to 6.0× speedup (4.6x on NVIDIA A100 Leonardo platform), and achieves a gain of a factor 2.3 in energy efficiency compared to AMD Genoa CPU Adastra platform, using the same configuration with 64 nodes. The code is even 17.8 faster by halving the precision and quadrupling the nodes with a gain in energy efficiency of a factor 1.3. First scientific simulations of three representative storms using 128 GPUs nodes of Adastra show successful cascade of scales for horizontal grid spacing down to 100 m and grid size up to 2.1 billion points. For one of these storms, Meso-NH is also successfully coupled to the WAVEWATCH III wave model via the OASIS3-MCT coupler without any extra computational cost. This GPU porting paves the way for Meso-NH to be used on future European exascale machines.
- Preprint
(3083 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2879', Anonymous Referee #1, 04 Nov 2024
-
AC1: 'Reply on RC1', Jean-Pierre Chaboureau, 13 Dec 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2879/egusphere-2024-2879-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Jean-Pierre Chaboureau, 13 Dec 2024
-
RC2: 'Comment on egusphere-2024-2879', Pedro Costa, 11 Dec 2024
This work represents a major GPU porting effort of the Meso-NH model. This model is originally written in Fortran with MPI for distributed-memory parallelization. The work ports significant parts of the model for NVIDIA and AMD architectures using OpenACC. In addition to the directives needed to port GPU kernels, a pre-processor was developed, along with a multi-grid pressure solver as an alternative to the FFT-based one. An extensive performance analysis on different systems is provided.
I found the work insightful and the paper well-organized and written. However, some parts lack the detail needed to fully understand the numerical and computational approach. Without clarifying these details, it becomes quite hard to understand some of the choices that were taken in the effort. I will elaborate below point-by-point.
1. L 78-79: "The current pressure solver consists on [of] a conjugate-residual algorithm accelerated by a flat fast Fourier transform (FFT) precondition." This is insufficient to fully understand the numerical approach to solving the pressure equation. Could you provide more (mathematical) background and mention in which directions FFT(s) are being used and the consequences for the grid spacing and boundary conditions along this and the other directions? Moreover, could you illustrate how this equation is solved in parallel (I could not find a clear answer in the cited references either)?
2. L. 80. Can you not simply state that it is written in Modern Fortran? If you want to be pedantic, you'd need to state that it has features from older standards (77, 90), too.
3. L. 122. Just to comment that I found that using `default(present)` in all OpenACC kernel loops really helps with debugging, as one would get a runtime error whenever something is accessed in a kernel that is not on the device.
4. I found Figure 1 quite hard to understand. Could you improve the captions so that it is clear what we are looking at? Is the left a serial computation, and the right one an MPI decomposed one with 2D pencils?5. L213. Same spirit as comment 1. "the FFT algorithm requires all-to-all communications between MPI processes (...)" Is the FFT algorithm requiring all-to-all communications, or is it the Poisson solver? It is unclear how the pressure equation is being solved numerically (1D or 2D FFTs? + CR along which direction?), and how that is implemented in a distributed-memory paradigm.
6. L 218. "The most promising alternative for solving this type of elliptic equation is the use of a geometric multigrid solver for regular
structured grid". This claim needs to be substantiated or reconsidered, as it is not obvious, especially for GPUs: As you coarsen in an MG method, the GPU occupancy is being massively reduced, making it perform extremely poorly on GPU-based systems. So, I would say that geometric multigrid solvers do not pair that well with GPUs.7. L 235. I see that along one direction the (direct) Thomas algorithm is used, while in other two an iterative (MG) method is used. The linear algebra behind this approach is quite unclear to me, so please provide more mathematical details so a reader can easily follow the method without navigating into the code or other references.
8. L. 250. A comparison between FFT-based and multigrid is performed, but I am missing a lot of details needed for reproducibility and better understanding. What kind of tolerance is being used in the FFT-based flavor (CR method), and in the MG one? What kind of smoother is being used in the geometric multi-grid method? These details need to be clear for better interpreting the results.9. L 288. "The test case uses advection, turbulence, cloud microphysics, pressure solver and other components". Consider being more exhaustive here.
10. L 322. I read that there can be several MPI tasks per GPU. It is unclear how this is implemented in practice. A sketch with the domain decomposition colored by MPI tasks, along with the GPUs that handle each group of tasks, would be very insightful.
11. Please re-consider the performance analysis in light of the fact that with MG the GPU occupancy decreases at coarse levels, and if this can explain some of the observations.12. Finally, in other fluid dynamics domains, direct FFT-based solvers (i.e., FFT factorization along two directions, and Gauss elimination along the last one) show 3x to 100x speed-up compared to multi-grid approaches. While their communication patterns are more complex, their fast performance and good GPU utilization make them quite attractive for GPU-based systems. This goes a bit in contrast with the present observations, though the baseline FFT-based solver is not direct here. I would recommend putting this work in perspective w.r.t. other efforts in the literature that have made similar comparisons.
Feel free to contact me directly if something is unclear at P.SimoesCosta@tudelft.nl.
Citation: https://doi.org/10.5194/egusphere-2024-2879-RC2 -
AC2: 'Reply on RC2', Jean-Pierre Chaboureau, 17 Dec 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2879/egusphere-2024-2879-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Jean-Pierre Chaboureau, 17 Dec 2024
Data sets
Code and data for Porting the Meso-NH atmospheric model on different GPU architectures for the next generation of supercomputers (version MESONH-v55-OpenACC) Juan Escobar, Phillippe Wautelet, Joris Pianezze, Thibaut Dauhut, Christelle Barthe, Florian Pantillon, and Jean-Pierre Chaboureau https://zenodo.org/doi/10.5281/zenodo.13759713
Model code and software
Meso-NH version MESONH-v55-OpenACC The Meso-NH developers http://mesonh.aero.obs-mip.fr/gitweb/?p=MNH-git_open_source-lfs.git;a=commit;h=498cd0cb968041038ff6c5b0f2a76d5066c55bfd
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
258 | 126 | 43 | 427 | 6 | 7 |
- HTML: 258
- PDF: 126
- XML: 43
- Total: 427
- BibTeX: 6
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1