Multiple execution of the same MPI application to exploit parallelism at hotspots with minimal code changes: a case study with FESOM2-Iceberg and FESOM2-REcoM
Abstract. For a typical climate model, parallelization based on a domain decomposition is a predominant technique to speed up its computation as an MPI (Message Passing Interface) application on an HPC (High Performance Computing) system. In this contribution, it is shown how the potential of simultaneously executing multiple instances of such an MPI application can be exploited to achieve a further speedup with an additional parallelization of suitable compute-intensive loops. In contrast to a parallelization based on OpenMP (Open Multi-Processing), no special synchronization effort is required if MPI calls occur in the iterations of the original loop. Splitting the work at such hotspots between the instances represents an independent level of parallelization on top of the domain decomposition. The simple implementation can be performed within the familiar MPI world, where the climate model can largely be considered as a black box. Outside of the hotspots, however, the same computations are performed in all instances. Some examples will show that such a conscious acceptance of redundant computations for parallelization approaches is quite common in other disciplines to reduce the time-to-solution. These approaches thus also represent the main inspiration for the approach presented in this contribution. Experimental results show for the example of the additional parallelization of an iceberg and a biogeochemical model, each embedded into FESOM2, how the time-to-solution can be further reduced with a small number of instances at appropriate efficiency. With the non-parallelized part outside of hotspots, however, the meaningful utilization of a larger number of instances will not be easily possible in practice, which will be explained in more detail in some efficiency considerations with the reference to Amdahl’s Law. Nevertheless, the implementation of the approach for other simulation models with similar properties seems promising, if the further reduction of the time-to-solution is in the focus, but a limit for the scalability based on the domain decomposition is reached.
Status: open (until 06 Jul 2023)
Viewed (geographical distribution)