Multiple execution of the same MPI application to exploit parallelism at hotspots with minimal code changes: a case study with FESOM2-Iceberg and FESOM2-REcoM

Himstedt, Kai

doi:10.5194/egusphere-2023-756

Preprints

https://doi.org/10.5194/egusphere-2023-756

Preprints

11 May 2023

| 11 May 2023

Multiple execution of the same MPI application to exploit parallelism at hotspots with minimal code changes: a case study with FESOM2-Iceberg and FESOM2-REcoM

Kai Himstedt

Abstract. For a typical climate model, parallelization based on a domain decomposition is a predominant technique to speed up its computation as an MPI (Message Passing Interface) application on an HPC (High Performance Computing) system. In this contribution, it is shown how the potential of simultaneously executing multiple instances of such an MPI application can be exploited to achieve a further speedup with an additional parallelization of suitable compute-intensive loops. In contrast to a parallelization based on OpenMP (Open Multi-Processing), no special synchronization effort is required if MPI calls occur in the iterations of the original loop. Splitting the work at such hotspots between the instances represents an independent level of parallelization on top of the domain decomposition. The simple implementation can be performed within the familiar MPI world, where the climate model can largely be considered as a black box. Outside of the hotspots, however, the same computations are performed in all instances. Some examples will show that such a conscious acceptance of redundant computations for parallelization approaches is quite common in other disciplines to reduce the time-to-solution. These approaches thus also represent the main inspiration for the approach presented in this contribution. Experimental results show for the example of the additional parallelization of an iceberg and a biogeochemical model, each embedded into FESOM2, how the time-to-solution can be further reduced with a small number of instances at appropriate efficiency. With the non-parallelized part outside of hotspots, however, the meaningful utilization of a larger number of instances will not be easily possible in practice, which will be explained in more detail in some efficiency considerations with the reference to Amdahl’s Law. Nevertheless, the implementation of the approach for other simulation models with similar properties seems promising, if the further reduction of the time-to-solution is in the focus, but a limit for the scalability based on the domain decomposition is reached.

Received: 17 Apr 2023 – Discussion started: 11 May 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Kai Himstedt

Status: closed

RC1:
'Comment on egusphere-2023-756', Anonymous Referee #1, 23 Jul 2023

The paper presents an approach for speeding up computer simulation

programs that are parallelized by a domain decomposition beyond the

speedup limits that are implicit to that parallelization scheme. The

new approach is the exploitation of trivial parallelism that such

programs might have. By referring to Amdahl's law it is explained under

which circumstances the method is applicable and how much additional

speedup up can be achieved.

The author describes how similar ideas have been used in other

disciplines, presents his approach in detail and shows results of

successfully applying the approach to problems in climate research.

He points out that the method is completely general, i.e. it is

not restricted to climate modeling which was used as an example.
The manuscript adds a useful method to reduce time-to-solution in

parallel computer simulations and is sufficiently detailed to allow

others to apply the method. It is appropriate for publication without

any changes.

Citation: https://doi.org/10.5194/egusphere-2023-756-RC1
- AC3: 'Reply on RC1', Kai Himstedt, 31 Aug 2023
  
  Many thanks for the effort in providing the review and for the positive assessment of the manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2023-756-AC3
RC2:
'Comment on egusphere-2023-756', Anonymous Referee #2, 10 Aug 2023

thanks for the submission.

My main concern is that I don't understand the underlying mechanism that allows for performance gains with multiexeMPI

It seems that the communication are enormous (the whole data must be synced) compared to traditional domain decomposition.

Further, considering multiple groups and a decent strong scalability, the process in the groups could be employed to further fasten the computations with domain decomposition, instead they perform redundant computations. With a strong scalability of 0.75 and with 2 groups, the authors could get a speedup of 1.5 = (2*0.75) would they use the redundant group to activelly participate in the comptuations.

Please explain with a code example and resulting comunication size (on a dummy example) how it does help your comptations.

Also, the first sections of the paper need a major refactoring. The author should focus on the actual contribution of the paper.

For example it's not clear to me the contribution of the multi-threaded MPI description, which also seems outdated.

Citation: https://doi.org/10.5194/egusphere-2023-756-RC2
- AC1: 'Reply on RC2', Kai Himstedt, 24 Aug 2023
  
  Many thanks for the effort in providing the review.
  Please find my reply in the supplement.
  I would be grateful if you could give me your opinion on what I have explained there.
  
  Citation: https://doi.org/10.5194/egusphere-2023-756-AC1
EC1:
'Comment on egusphere-2023-756', Christoph Knote, 15 Aug 2023

Dear author,
we have now received two reviews, one quite favorable and one very critical of your work. The second reviewer rated your manuscript poor in three out of four categories, which usually leads to the paper being rejected. I concur with the second reviewer and do not encourage you to submit a revised version.
The changes and extension necessary to bring this manuscript into a state in which it could be accepted are too extensive and would lead to a very different manuscript. Instead, I suggest to you, if you are willing, to carefully go through all suggestions made by reviewer 2 and, if interested, start anew and resubmit a substantially rewritten manuscript.
With best regards,
Christoph Knote

Citation: https://doi.org/10.5194/egusphere-2023-756-EC1
- AC2: 'Reply on EC1', Kai Himstedt, 24 Aug 2023
  
  Dear editor,
  I am grateful to RC2 for the effort in providing the review. In my reply I have dealt now with the frank criticism of RC2 in some detail.
  Since you are willing to follow RC2's assessment, I would like to ask you to consider my reply to RC2's points of criticism. I followed your advice and have not yet revised the manuscript.
  Best regards,
  
  Kai Himstedt
  
  Citation: https://doi.org/10.5194/egusphere-2023-756-AC2

Status: closed

RC1:
'Comment on egusphere-2023-756', Anonymous Referee #1, 23 Jul 2023

The paper presents an approach for speeding up computer simulation

programs that are parallelized by a domain decomposition beyond the

speedup limits that are implicit to that parallelization scheme. The

new approach is the exploitation of trivial parallelism that such

programs might have. By referring to Amdahl's law it is explained under

which circumstances the method is applicable and how much additional

speedup up can be achieved.

The author describes how similar ideas have been used in other

disciplines, presents his approach in detail and shows results of

successfully applying the approach to problems in climate research.

He points out that the method is completely general, i.e. it is

not restricted to climate modeling which was used as an example.
The manuscript adds a useful method to reduce time-to-solution in

parallel computer simulations and is sufficiently detailed to allow

others to apply the method. It is appropriate for publication without

any changes.

Citation: https://doi.org/10.5194/egusphere-2023-756-RC1
- AC3: 'Reply on RC1', Kai Himstedt, 31 Aug 2023
  
  Many thanks for the effort in providing the review and for the positive assessment of the manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2023-756-AC3
RC2:
'Comment on egusphere-2023-756', Anonymous Referee #2, 10 Aug 2023

thanks for the submission.

My main concern is that I don't understand the underlying mechanism that allows for performance gains with multiexeMPI

It seems that the communication are enormous (the whole data must be synced) compared to traditional domain decomposition.

Further, considering multiple groups and a decent strong scalability, the process in the groups could be employed to further fasten the computations with domain decomposition, instead they perform redundant computations. With a strong scalability of 0.75 and with 2 groups, the authors could get a speedup of 1.5 = (2*0.75) would they use the redundant group to activelly participate in the comptuations.

Please explain with a code example and resulting comunication size (on a dummy example) how it does help your comptations.

Also, the first sections of the paper need a major refactoring. The author should focus on the actual contribution of the paper.

For example it's not clear to me the contribution of the multi-threaded MPI description, which also seems outdated.

Citation: https://doi.org/10.5194/egusphere-2023-756-RC2
- AC1: 'Reply on RC2', Kai Himstedt, 24 Aug 2023
  
  Many thanks for the effort in providing the review.
  Please find my reply in the supplement.
  I would be grateful if you could give me your opinion on what I have explained there.
  
  Citation: https://doi.org/10.5194/egusphere-2023-756-AC1
EC1:
'Comment on egusphere-2023-756', Christoph Knote, 15 Aug 2023

Dear author,
we have now received two reviews, one quite favorable and one very critical of your work. The second reviewer rated your manuscript poor in three out of four categories, which usually leads to the paper being rejected. I concur with the second reviewer and do not encourage you to submit a revised version.
The changes and extension necessary to bring this manuscript into a state in which it could be accepted are too extensive and would lead to a very different manuscript. Instead, I suggest to you, if you are willing, to carefully go through all suggestions made by reviewer 2 and, if interested, start anew and resubmit a substantially rewritten manuscript.
With best regards,
Christoph Knote

Citation: https://doi.org/10.5194/egusphere-2023-756-EC1
- AC2: 'Reply on EC1', Kai Himstedt, 24 Aug 2023
  
  Dear editor,
  I am grateful to RC2 for the effort in providing the review. In my reply I have dealt now with the frank criticism of RC2 in some detail.
  Since you are willing to follow RC2's assessment, I would like to ask you to consider my reply to RC2's points of criticism. I followed your advice and have not yet revised the manuscript.
  Best regards,
  
  Kai Himstedt
  
  Citation: https://doi.org/10.5194/egusphere-2023-756-AC2

Kai Himstedt

Viewed

Total article views: 1,374 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
635	690	49	1,374	55	89

HTML: 635
PDF: 690
XML: 49
Total: 1,374
BibTeX: 55
EndNote: 89

Views and downloads (calculated since 11 May 2023)

Month	HTML	PDF	XML	Total
May 2023	105	27	3	135
Jun 2023	86	23	0	109
Jul 2023	44	10	4	58
Aug 2023	57	15	8	80
Sep 2023	26	11	3	40
Oct 2023	16	12	4	32
Nov 2023	2	5	1	8
Dec 2023	8	19	2	29
Jan 2024	3	6	0	9
Feb 2024	1	5	0	6
Mar 2024	13	10	0	23
Apr 2024	8	4	2	14
May 2024	10	8	1	19
Jun 2024	23	4	3	30
Jul 2024	8	4	4	16
Aug 2024	8	0	8
Sep 2024	9	2	0	11
Oct 2024	6	2	0	8
Nov 2024	2	3	0	5
Dec 2024	6	6	0	12
Jan 2025	4	6	0	10
Feb 2025	15	3	0	18
Mar 2025	4	7	0	11
Apr 2025	4	13	0	17
May 2025	5	9	0	14
Jun 2025	12	21	0	33
Jul 2025	8	16	1	25
Aug 2025	3	11	0	14
Sep 2025	12	9	2	23
Oct 2025	3	16	0	19
Nov 2025	21	36	0	57
Dec 2025	15	14	0	29
Jan 2026	19	58	3	80
Feb 2026	18	27	5	50
Mar 2026	12	161	0	173
Apr 2026	38	103	3	144
May 2026	1	4	0	5

Cumulative views and downloads (calculated since 11 May 2023)

Month	HTML	PDF	XML	Total
May 2023	105	27	3	135
Jun 2023	86	23	0	109
Jul 2023	44	10	4	58
Aug 2023	57	15	8	80
Sep 2023	26	11	3	40
Oct 2023	16	12	4	32
Nov 2023	2	5	1	8
Dec 2023	8	19	2	29
Jan 2024	3	6	0	9
Feb 2024	1	5	0	6
Mar 2024	13	10	0	23
Apr 2024	8	4	2	14
May 2024	10	8	1	19
Jun 2024	23	4	3	30
Jul 2024	8	4	4	16
Aug 2024	8	0	8
Sep 2024	9	2	0	11
Oct 2024	6	2	0	8
Nov 2024	2	3	0	5
Dec 2024	6	6	0	12
Jan 2025	4	6	0	10
Feb 2025	15	3	0	18
Mar 2025	4	7	0	11
Apr 2025	4	13	0	17
May 2025	5	9	0	14
Jun 2025	12	21	0	33
Jul 2025	8	16	1	25
Aug 2025	3	11	0	14
Sep 2025	12	9	2	23
Oct 2025	3	16	0	19
Nov 2025	21	36	0	57
Dec 2025	15	14	0	29
Jan 2026	19	58	3	80
Feb 2026	18	27	5	50
Mar 2026	12	161	0	173
Apr 2026	38	103	3	144
May 2026	1	4	0	5

Viewed (geographical distribution)

Total article views: 1,349 (including HTML, PDF, and XML) Thereof 1,349 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 02 May 2026

Short summary

There is a constant need to speed up the execution of climate models to simulate longer time scales more quickly. With current computing systems, the typical decomposition of a region into smaller parts for their parallel computation can easily reach a limit for achieving a further speedup. Using two FESOM2-based models, it is shown how runtimes can still be reduced with reasonable efficiency by executing the simulation application multiple times to additionally parallelize suitable loops.


Total:	0
HTML:	0
PDF:	0
XML:	0

Multiple execution of the same MPI application to exploit parallelism at hotspots with minimal code changes: a case study with FESOM2-Iceberg and FESOM2-REcoM

Viewed

Viewed (geographical distribution)

Cited

2 citations as recorded by crossref.