Preprints
https://doi.org/10.5194/egusphere-2023-756
https://doi.org/10.5194/egusphere-2023-756
11 May 2023
 | 11 May 2023
Status: this preprint is open for discussion.

Multiple execution of the same MPI application to exploit parallelism at hotspots with minimal code changes: a case study with FESOM2-Iceberg and FESOM2-REcoM

Kai Himstedt

Abstract. For a typical climate model, parallelization based on a domain decomposition is a predominant technique to speed up its computation as an MPI (Message Passing Interface) application on an HPC (High Performance Computing) system. In this contribution, it is shown how the potential of simultaneously executing multiple instances of such an MPI application can be exploited to achieve a further speedup with an additional parallelization of suitable compute-intensive loops. In contrast to a parallelization based on OpenMP (Open Multi-Processing), no special synchronization effort is required if MPI calls occur in the iterations of the original loop. Splitting the work at such hotspots between the instances represents an independent level of parallelization on top of the domain decomposition. The simple implementation can be performed within the familiar MPI world, where the climate model can largely be considered as a black box. Outside of the hotspots, however, the same computations are performed in all instances. Some examples will show that such a conscious acceptance of redundant computations for parallelization approaches is quite common in other disciplines to reduce the time-to-solution. These approaches thus also represent the main inspiration for the approach presented in this contribution. Experimental results show for the example of the additional parallelization of an iceberg and a biogeochemical model, each embedded into FESOM2, how the time-to-solution can be further reduced with a small number of instances at appropriate efficiency. With the non-parallelized part outside of hotspots, however, the meaningful utilization of a larger number of instances will not be easily possible in practice, which will be explained in more detail in some efficiency considerations with the reference to Amdahl’s Law. Nevertheless, the implementation of the approach for other simulation models with similar properties seems promising, if the further reduction of the time-to-solution is in the focus, but a limit for the scalability based on the domain decomposition is reached.

Kai Himstedt

Status: open (until 06 Jul 2023)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Kai Himstedt

Kai Himstedt

Viewed

Total article views: 143 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
110 30 3 143 0 0
  • HTML: 110
  • PDF: 30
  • XML: 3
  • Total: 143
  • BibTeX: 0
  • EndNote: 0
Views and downloads (calculated since 11 May 2023)
Cumulative views and downloads (calculated since 11 May 2023)

Viewed (geographical distribution)

Total article views: 141 (including HTML, PDF, and XML) Thereof 141 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 04 Jun 2023
Download
Short summary
There is a constant need to speed up the execution of climate models to simulate longer time scales more quickly. With current computing systems, the typical decomposition of a region into smaller parts for their parallel computation can easily reach a limit for achieving a further speedup. Using two FESOM2-based models, it is shown how runtimes can still be reduced with reasonable efficiency by executing the simulation application multiple times to additionally parallelize suitable loops.