Preprints
https://doi.org/10.5194/egusphere-2023-756
https://doi.org/10.5194/egusphere-2023-756
11 May 2023
 | 11 May 2023

Multiple execution of the same MPI application to exploit parallelism at hotspots with minimal code changes: a case study with FESOM2-Iceberg and FESOM2-REcoM

Kai Himstedt

Abstract. For a typical climate model, parallelization based on a domain decomposition is a predominant technique to speed up its computation as an MPI (Message Passing Interface) application on an HPC (High Performance Computing) system. In this contribution, it is shown how the potential of simultaneously executing multiple instances of such an MPI application can be exploited to achieve a further speedup with an additional parallelization of suitable compute-intensive loops. In contrast to a parallelization based on OpenMP (Open Multi-Processing), no special synchronization effort is required if MPI calls occur in the iterations of the original loop. Splitting the work at such hotspots between the instances represents an independent level of parallelization on top of the domain decomposition. The simple implementation can be performed within the familiar MPI world, where the climate model can largely be considered as a black box. Outside of the hotspots, however, the same computations are performed in all instances. Some examples will show that such a conscious acceptance of redundant computations for parallelization approaches is quite common in other disciplines to reduce the time-to-solution. These approaches thus also represent the main inspiration for the approach presented in this contribution. Experimental results show for the example of the additional parallelization of an iceberg and a biogeochemical model, each embedded into FESOM2, how the time-to-solution can be further reduced with a small number of instances at appropriate efficiency. With the non-parallelized part outside of hotspots, however, the meaningful utilization of a larger number of instances will not be easily possible in practice, which will be explained in more detail in some efficiency considerations with the reference to Amdahl’s Law. Nevertheless, the implementation of the approach for other simulation models with similar properties seems promising, if the further reduction of the time-to-solution is in the focus, but a limit for the scalability based on the domain decomposition is reached.

Kai Himstedt

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2023-756', Anonymous Referee #1, 23 Jul 2023
    • AC3: 'Reply on RC1', Kai Himstedt, 31 Aug 2023
  • RC2: 'Comment on egusphere-2023-756', Anonymous Referee #2, 10 Aug 2023
    • AC1: 'Reply on RC2', Kai Himstedt, 24 Aug 2023
  • EC1: 'Comment on egusphere-2023-756', Christoph Knote, 15 Aug 2023
    • AC2: 'Reply on EC1', Kai Himstedt, 24 Aug 2023

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2023-756', Anonymous Referee #1, 23 Jul 2023
    • AC3: 'Reply on RC1', Kai Himstedt, 31 Aug 2023
  • RC2: 'Comment on egusphere-2023-756', Anonymous Referee #2, 10 Aug 2023
    • AC1: 'Reply on RC2', Kai Himstedt, 24 Aug 2023
  • EC1: 'Comment on egusphere-2023-756', Christoph Knote, 15 Aug 2023
    • AC2: 'Reply on EC1', Kai Himstedt, 24 Aug 2023
Kai Himstedt
Kai Himstedt

Viewed

Total article views: 527 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
360 142 25 527 6 13
  • HTML: 360
  • PDF: 142
  • XML: 25
  • Total: 527
  • BibTeX: 6
  • EndNote: 13
Views and downloads (calculated since 11 May 2023)
Cumulative views and downloads (calculated since 11 May 2023)

Viewed (geographical distribution)

Total article views: 522 (including HTML, PDF, and XML) Thereof 522 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 28 Mar 2024
Download
Short summary
There is a constant need to speed up the execution of climate models to simulate longer time scales more quickly. With current computing systems, the typical decomposition of a region into smaller parts for their parallel computation can easily reach a limit for achieving a further speedup. Using two FESOM2-based models, it is shown how runtimes can still be reduced with reasonable efficiency by executing the simulation application multiple times to additionally parallelize suitable loops.