the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multiple execution of the same MPI application to exploit parallelism at hotspots with minimal code changes: a case study with FESOM2-Iceberg and FESOM2-REcoM
Abstract. For a typical climate model, parallelization based on a domain decomposition is a predominant technique to speed up its computation as an MPI (Message Passing Interface) application on an HPC (High Performance Computing) system. In this contribution, it is shown how the potential of simultaneously executing multiple instances of such an MPI application can be exploited to achieve a further speedup with an additional parallelization of suitable compute-intensive loops. In contrast to a parallelization based on OpenMP (Open Multi-Processing), no special synchronization effort is required if MPI calls occur in the iterations of the original loop. Splitting the work at such hotspots between the instances represents an independent level of parallelization on top of the domain decomposition. The simple implementation can be performed within the familiar MPI world, where the climate model can largely be considered as a black box. Outside of the hotspots, however, the same computations are performed in all instances. Some examples will show that such a conscious acceptance of redundant computations for parallelization approaches is quite common in other disciplines to reduce the time-to-solution. These approaches thus also represent the main inspiration for the approach presented in this contribution. Experimental results show for the example of the additional parallelization of an iceberg and a biogeochemical model, each embedded into FESOM2, how the time-to-solution can be further reduced with a small number of instances at appropriate efficiency. With the non-parallelized part outside of hotspots, however, the meaningful utilization of a larger number of instances will not be easily possible in practice, which will be explained in more detail in some efficiency considerations with the reference to Amdahl’s Law. Nevertheless, the implementation of the approach for other simulation models with similar properties seems promising, if the further reduction of the time-to-solution is in the focus, but a limit for the scalability based on the domain decomposition is reached.
- Preprint
(12520 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2023-756', Anonymous Referee #1, 23 Jul 2023
The paper presents an approach for speeding up computer simulation
programs that are parallelized by a domain decomposition beyond the
speedup limits that are implicit to that parallelization scheme. The
new approach is the exploitation of trivial parallelism that such
programs might have. By referring to Amdahl's law it is explained under
which circumstances the method is applicable and how much additional
speedup up can be achieved.
The author describes how similar ideas have been used in other
disciplines, presents his approach in detail and shows results of
successfully applying the approach to problems in climate research.
He points out that the method is completely general, i.e. it is
not restricted to climate modeling which was used as an example.The manuscript adds a useful method to reduce time-to-solution in
parallel computer simulations and is sufficiently detailed to allow
others to apply the method. It is appropriate for publication without
any changes.Citation: https://doi.org/10.5194/egusphere-2023-756-RC1 -
AC3: 'Reply on RC1', Kai Himstedt, 31 Aug 2023
Many thanks for the effort in providing the review and for the positive assessment of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2023-756-AC3
-
AC3: 'Reply on RC1', Kai Himstedt, 31 Aug 2023
-
RC2: 'Comment on egusphere-2023-756', Anonymous Referee #2, 10 Aug 2023
thanks for the submission.
My main concern is that I don't understand the underlying mechanism that allows for performance gains with multiexeMPI
It seems that the communication are enormous (the whole data must be synced) compared to traditional domain decomposition.
Further, considering multiple groups and a decent strong scalability, the process in the groups could be employed to further fasten the computations with domain decomposition, instead they perform redundant computations. With a strong scalability of 0.75 and with 2 groups, the authors could get a speedup of 1.5 = (2*0.75) would they use the redundant group to activelly participate in the comptuations.
Please explain with a code example and resulting comunication size (on a dummy example) how it does help your comptations.
Also, the first sections of the paper need a major refactoring. The author should focus on the actual contribution of the paper.
For example it's not clear to me the contribution of the multi-threaded MPI description, which also seems outdated.Citation: https://doi.org/10.5194/egusphere-2023-756-RC2 - AC1: 'Reply on RC2', Kai Himstedt, 24 Aug 2023
-
EC1: 'Comment on egusphere-2023-756', Christoph Knote, 15 Aug 2023
Dear author,
we have now received two reviews, one quite favorable and one very critical of your work. The second reviewer rated your manuscript poor in three out of four categories, which usually leads to the paper being rejected. I concur with the second reviewer and do not encourage you to submit a revised version.
The changes and extension necessary to bring this manuscript into a state in which it could be accepted are too extensive and would lead to a very different manuscript. Instead, I suggest to you, if you are willing, to carefully go through all suggestions made by reviewer 2 and, if interested, start anew and resubmit a substantially rewritten manuscript.
With best regards,
Christoph Knote
Citation: https://doi.org/10.5194/egusphere-2023-756-EC1 -
AC2: 'Reply on EC1', Kai Himstedt, 24 Aug 2023
Dear editor,
I am grateful to RC2 for the effort in providing the review. In my reply I have dealt now with the frank criticism of RC2 in some detail.
Since you are willing to follow RC2's assessment, I would like to ask you to consider my reply to RC2's points of criticism. I followed your advice and have not yet revised the manuscript.
Best regards,
Kai HimstedtCitation: https://doi.org/10.5194/egusphere-2023-756-AC2
-
AC2: 'Reply on EC1', Kai Himstedt, 24 Aug 2023
Status: closed
-
RC1: 'Comment on egusphere-2023-756', Anonymous Referee #1, 23 Jul 2023
The paper presents an approach for speeding up computer simulation
programs that are parallelized by a domain decomposition beyond the
speedup limits that are implicit to that parallelization scheme. The
new approach is the exploitation of trivial parallelism that such
programs might have. By referring to Amdahl's law it is explained under
which circumstances the method is applicable and how much additional
speedup up can be achieved.
The author describes how similar ideas have been used in other
disciplines, presents his approach in detail and shows results of
successfully applying the approach to problems in climate research.
He points out that the method is completely general, i.e. it is
not restricted to climate modeling which was used as an example.The manuscript adds a useful method to reduce time-to-solution in
parallel computer simulations and is sufficiently detailed to allow
others to apply the method. It is appropriate for publication without
any changes.Citation: https://doi.org/10.5194/egusphere-2023-756-RC1 -
AC3: 'Reply on RC1', Kai Himstedt, 31 Aug 2023
Many thanks for the effort in providing the review and for the positive assessment of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2023-756-AC3
-
AC3: 'Reply on RC1', Kai Himstedt, 31 Aug 2023
-
RC2: 'Comment on egusphere-2023-756', Anonymous Referee #2, 10 Aug 2023
thanks for the submission.
My main concern is that I don't understand the underlying mechanism that allows for performance gains with multiexeMPI
It seems that the communication are enormous (the whole data must be synced) compared to traditional domain decomposition.
Further, considering multiple groups and a decent strong scalability, the process in the groups could be employed to further fasten the computations with domain decomposition, instead they perform redundant computations. With a strong scalability of 0.75 and with 2 groups, the authors could get a speedup of 1.5 = (2*0.75) would they use the redundant group to activelly participate in the comptuations.
Please explain with a code example and resulting comunication size (on a dummy example) how it does help your comptations.
Also, the first sections of the paper need a major refactoring. The author should focus on the actual contribution of the paper.
For example it's not clear to me the contribution of the multi-threaded MPI description, which also seems outdated.Citation: https://doi.org/10.5194/egusphere-2023-756-RC2 - AC1: 'Reply on RC2', Kai Himstedt, 24 Aug 2023
-
EC1: 'Comment on egusphere-2023-756', Christoph Knote, 15 Aug 2023
Dear author,
we have now received two reviews, one quite favorable and one very critical of your work. The second reviewer rated your manuscript poor in three out of four categories, which usually leads to the paper being rejected. I concur with the second reviewer and do not encourage you to submit a revised version.
The changes and extension necessary to bring this manuscript into a state in which it could be accepted are too extensive and would lead to a very different manuscript. Instead, I suggest to you, if you are willing, to carefully go through all suggestions made by reviewer 2 and, if interested, start anew and resubmit a substantially rewritten manuscript.
With best regards,
Christoph Knote
Citation: https://doi.org/10.5194/egusphere-2023-756-EC1 -
AC2: 'Reply on EC1', Kai Himstedt, 24 Aug 2023
Dear editor,
I am grateful to RC2 for the effort in providing the review. In my reply I have dealt now with the frank criticism of RC2 in some detail.
Since you are willing to follow RC2's assessment, I would like to ask you to consider my reply to RC2's points of criticism. I followed your advice and have not yet revised the manuscript.
Best regards,
Kai HimstedtCitation: https://doi.org/10.5194/egusphere-2023-756-AC2
-
AC2: 'Reply on EC1', Kai Himstedt, 24 Aug 2023
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
435 | 170 | 35 | 640 | 15 | 33 |
- HTML: 435
- PDF: 170
- XML: 35
- Total: 640
- BibTeX: 15
- EndNote: 33
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1