Parflow 3.9: development of lightweight embedded DSLs for geoscientific models

Piotrowski, Zbigniew P.; Hokkanen, Jaro; Caviedes-Voullieme, Daniel; Stein, Olaf; Kollet, Stefan

doi:https://doi.org/10.5194/egusphere-2023-1079

Preprints

https://doi.org/10.5194/egusphere-2023-1079

Preprints

31 Jul 2023

| 31 Jul 2023

Status: this preprint has been withdrawn by the authors.

Parflow 3.9: development of lightweight embedded DSLs for geoscientific models

Zbigniew P. Piotrowski, Jaro Hokkanen, Daniel Caviedes-Voullieme, Olaf Stein, and Stefan Kollet

Abstract. Recognizing the leap in high-performance computing with accelerated co-processors, we propose a lightweight approach to adapt legacy codes to next generation hardware and achieve efficiently a high degree of performance portability. We focus on abstracting the computing kernels at the loop levels based on the lightweight, preprocessor-based embedded Domain Specific Language (eDSL) concept in conjunction with Unified Memory management. We outline a set of code pre-adaptations that facilitate the proposed abstraction. In two geophysical code applications programmed in C and Fortran, we demonstrate the efficiency of the eDSL approach in adaptation to NVIDIA GPUs with: native CUDA and Kokkos eDSL backends achieving up to 10–30 fold speedup. Our experience suggests that the proposed lightweight eDSL code adaptation is less expensive in terms of Full Time Equivalent of effort than adaptation based on complex DSL approaches, even if no earlier GPU competence exists.

This preprint has been withdrawn.

Received: 22 May 2023 – Discussion started: 31 Jul 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 558 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (558 KB)

Download & links

This preprint has been withdrawn.

Zbigniew P. Piotrowski, Jaro Hokkanen, Daniel Caviedes-Voullieme, Olaf Stein, and Stefan Kollet

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-1079', Anonymous Referee #1, 22 Sep 2023

The paper claims to propose a novel approach for the adaptation of legacy codes to next generation hardware by using an ebedded Domain Specific Language (eDSL) concept. It also presents two application examples.

The main questions the paper would need to answer are what the novelity of the approach is, how it differs from existing approaches and why it is favorable. If it is a general approach it should be applicable to very different kinds of existing code and achieve performance portability, i.e. the program should run with an acceptable efficiency on different hardware platforms (not just run at all).

The approach presented by the authors unfortunately is simply the use of preprocessor macros to encapsulate rather basic constructs like memory allocation or loops. This is a programming technique, which is hardly a novelity. Preprocessor macros have been used intensively in the last century. However, programming experts do not recommend the usage of macros as the can circumvent major syntax checkings of the compiler. When the constructs get more complicated than in the examples presented by the authors they are also hard to read and often the code is hard to debug. Calling something as oldfashioned as precompiler macros "eDSL" does not make them more modern. I doubt that many programmers would call memory allocation or loops as part of a "kernel".

The authors then present two examples for the application of their "eDSL": the hydrology code ParFlow and the flow solver MPDATA. The presented graphs demonstrate, that the codes run both on GPUs and CPUs and that they are considerably faster on GPUs. However, based on the given information it is hard to assess how valid this information is. Usually GPUs require a different organisation of memory and program code than CPUs for optimal performance and some algorithms are easier transferable to GPUs than other. Therefore would be necessary to know more about the numerical algorithms used to solve the problems. Was an explicit or an implicit time stepping scheme used? If an implicit scheme, which linear solver? Some solvers operate well on GPUs, but require much more iterations, than better solvers, which are not easily transferable to the simplified architecture of a GPU. It would also be interesting, if the codes achieve a significant fraction of the peak performance on both architectures. However, this informations are missing.

As modern simulation codes are complex pieces of software consisting components as grid manger, matrix assembly, nonlinear and linear solvers etc. it is not clear, how this macro-based eDSL should be applied. For many problems the solution of the linear equation systems is the most expensive part of the software. Usually highly optimized libraries like Hypre, PETSc... are used to perform this task. How should the eDSL of the authors be generalized to software like this? The approach seems most suitable for rather simple stencil-based problems on regular grids. However, these kinds of problems are easily rewritten in more powerful DSLs, which do not just produce different loop commands, but performance optimized code for different platforms.

The article is written in a rather vague and imprecise style. The introduction reads more like a short history of the development of high performance computing (with too few citations) and the chapter on application agnostic eDSL for accelerators is also not very concrete. The title of the paper is not really fitting. According to the text, the authors want to present a general approach for geoscience models, not a new version of Parflow.

Overall, the authors present precompiler macros, a decades old programming technique as new approach to modernize legacy codes and demonstrate performance gains, which can not really be put in perspective. To me this looks like old wine in new skins. As I see neither the novelty nor the added scientific value of this approach, I recommend to reject the paper.

Citation: https://doi.org/10.5194/egusphere-2023-1079-RC1
- AC2: 'Reply on RC1', Zbigniew Piotrowski, 25 Oct 2023
  
  We thank the referee for preparing the review. Upon careful consideration of the comments, we acknowledge that the term 'eDSL' may not align well with established computer science naming conventions.
  In the manuscript, we intentionally emphasized that the proposed solution is neither of general applicability, nor aiming at maximizing efficiency (e.g. by using terms: simple, lightweight, minimal in several instances). We believe the reviewer's comments in fact support our perspective, highlighting the need to rethink the alternative paradigm of full-blown DSL (or merely code-to-code translator) vs. vendor-locked code adaptation. Porting medium and large codes to modern architectures is complicated, and suboptimal GPU performance is better than none. Lack of funding for dedicated software engineering is a common fact as well. Full scale porting efforts of large legacy codes, with substantial code redesign (as often required by general solutions) may imply an excessively long implementation and re-validation phase. The reality is that a majority of research codes operates at a fraction of the (case-relevant) peak performance, and even the largest well-optimized operational packages receive a tailored optimization with new hardware procurements.
  For practical and historical reasons, numerous top-class codes rely on code preprocessing. While readability concerns vary case-by-case, it is already recognized that hosting several sets of compiler directives is not a desired option either (which the proposed approach attempts to replace). Moreover, preprocessing directives are a valid, non-deprecated part of the language standard and we can't see how their use could inhibit compiler syntax-checking. In our opinion, C/Fortran codes should not be judged by the choice of the language constructs, as it is often a matter of a personal taste, level of compiler support (especially Fortran) or nature of the computational problem at hand, and in the case of legacy codes, it is simply the baseline. It is easy to find a counterexample, where using advanced language construct inhibits straightforward GPU porting or produce error logs that are difficult to process (e.g. with C++ templates).
  The memory organization is indeed crucial for achieving good performance on manycore architectures. The proposed solution, similar to directive-based porting does not offer full automation, although it (comparably) enables several such optimizations without the code duplication, thus favouring readibility and maintainability.
  Parflow employs Kinsol and Hypre packages to solve nonlinear and linear problems, however, for GPU it relies on internal implementation of Newton-Krylov non-linear solver, and multigrid-preconditioned conjugate gradient (MGCG) for the linear solver. The details are readily available in several publications, and summarised in https://doi.org/10.5194/gmd-13-1373-2020. While performance optimisation are always possible and desirable, the presented solution is sufficient to provide portability, and in the particular case of ParFlow, extending beyond a CUDA backend to an even more agnostic state-of-the-art portability layer as is Kokkos.
  
  In turn, MPDATA example consists of a pure explicit advection method that does not lead here to any implicit formulation. Thus, discussing the details of numerics, together with the (already addressed) question of peak performance seem out of scope for this manuscript.
  
  On a side note, fully relying on library-based linear solvers is often not possible in general-application codes. The reason is that numerical modelling of physical systems is often an art of imposing correct boundary conditions bespoke to a problem at hand. With resorting to a matrix-free formulation of linear problems often necessary for large meshes, accurate implementation of the boundary conditions leads to the specialized forms of linear operators at the borders, which are usually not supported by the general-purpose libraries.
  It is true that complex codes have a modular structure. However, it is rather common to, at least initially, port only the timeloop. Furthermore, generalization of the discussed eDSL concept to cooperate with high-performance libraries seems not any more complex than coupling of such libraries with any OpenMP/OpenACC code.
  We strongly disagree with the statement that the eDSL concept is only for simple stencil codes, and Parflow serves as a perfect counterexample. Furthermore, it is absolutely untrue that such complex geophysical codes may be "easily rewritten in more powerful DSLs". Practice demonstrates that most often the DSL is missing some required features. From the authors experience, these might be: efficient support of specialized boundary stencils, global reductions, arrays with more than 3 dimensions. Moreover, the current programming language may not be easily compatible with the otherwise potentially optimal DSL, enforcing complete rewrite. Known examples of the porting effort, such as implementation of COSMO, ICON or FV3 weather Fortran-based dynamical cores in STELLA, GridTools or GT4Py DSLs, clearly prove how difficult this task is and how the DSL needs to be extended along the way. While aforementioned DSLs seem to offer performance that is hard to beat, the corresponding porting effort required is an order of magnitude greater than in the proposed approach, and performance portability promise is still difficult to be fulfilled. Moreover, our approach is not necessarily incompatible with full blown DSLs, which we believe is shown by our use of Kokkos within the eDSL approach in ParFlow. Bluntly (albeit hypothetically) stated, if the Kokkos project would fade away, it would still be possible to reach an alternative solution within the eDSL without modifying the vast majority of the ParFlow code. While it is not our role to criticise other approaches, remarks on difficulties with full-blown DSLs are present in the manuscript, e.g. lines 251-259.
  We agree that the manuscript could benefit from a more refined style, and we will ensure the title better reflects its content. Inclusion of Parflow version number was an attempt to address direct editor request in the initial submission stage.
  
  Citation: https://doi.org/10.5194/egusphere-2023-1079-AC2
RC2:
'Comment on egusphere-2023-1079', Anonymous Referee #2, 10 Oct 2023

In this manuscript, the authors propose an lightweight embedded DSLs method (Parflow 3.9) for geoscientific models on next generation hardware that can achieve a certain degree of speedup compared to the traditional baseline. The method takes the advantage from the embedded Domain Specific Language (eDSL) concept to improve the computing kernels at the loop levels. Howerver, this manuscript is lack of innovation and far from the criterion of an excellent work.

1. The contribution of this proposal is not adequate. As for the structure of this manuscript, the authors pile up lots of related work and tedious background knowledge about eDSL, rather than a concrete illustration of the design of the proposed method itself. Furthermore, when it comes to the methodology, defining a series of macros and wrappers simply seems far from the contribution requirement of a scientific paper. It looks more like some kind of incremental work.

2. The authors fail to compare the proposed method with SOTA (State-of-the-Art) methods in the field of eDSL. In the manuscript, the authors implement their eDSL methods in ParFlow and EULAG respectively. Then, they compare a series of improved versions of their programs with the baseline to demonstrate the performance gains. Although the performance is improved compared to the baseline (CPU version), it is still questionable whether the proposed method can surpass the SOTA methods.

3. The content is not identical to the title. The title of the manuscript is Parflow 3.9: development of light weight embedded DSLs for geoscientific models, while the Parflow eDSLs is just a single implementation of eDSLs given by authors. Actually, in section 4, the authors discuss a lot about the other eDSLs implementation (EULAG/MPDATA) in Fortran, which is unrelated to the so-called Parflow 3.9 within the title. Such conflict between the title and the layout may confuse the audience.

4. Such methodology that merely makes use of the macro may only be adapted to the paralleling of toy/simple serial programs, while whether it is valid in magnitude projects is still a question. Even though the authors give some explicit code examples about macro and wrappers for the purpose of showing the portability of their method, such easy examples may be far from the scenario of many magnitude parallel programs.

In a word, the authors propose a lightweight eDSL method, which is actually a series of macros, to improve geoscientific models. In my opinion, it is not novel enough and far from the frontier technique, and the contribution is insufficient. In addition, there are some problems with the writing of this manuscript. The authors ought to revise the title or content to make the whole proposal consistent.

Citation: https://doi.org/10.5194/egusphere-2023-1079-RC2
- AC1: 'Reply on RC2', Zbigniew Piotrowski, 25 Oct 2023
  
  We sincerely thank the Referee for their thorough review. We acknowledge that the eDSL techniques may not be the ideal context for describing our specific case. While we concur that the technical methods we employed are relatively straightforward, they were applied to a complex and intricate geophysical Parflow code. The simplicity of the technical approach is in fact its advantage. The extension of such a well-established community code to enable GPU execution, which has demonstrated performance improvements, should, in our view, not be considered merely incremental.
  Our primary objective in this work is not to position a relatively simple concept as a direct competitor to state-of-the-art eDSLs. Instead, it aims to showcase its applicability to large legacy codes programmed in C and Fortran and allowing for a swift move to integration on GPUs, with greater flexibility and portability than the typical directive-based approaches. Consequently, it is not our intention to claim superiority over state-of-the-art methods, as these usually require employing dedicated software engineers, simply not available to many research groups. We also acknowledge the fact that new scientific codes may be built over state-of-the-art DSLs effectively, when the design already has contemporary HPC requirements and coding practices in mind. But when addressing legacy codes, many constrains can appear. One such constrain may be the programming language of the legacy code (e.g., porting a FORTRAN code via Kokkos) and the data structures used. We claim simply that our approach is potentially advantageous for porting codes with such restrictions, for which more sophisticated solutions may require a much higher effort.
  We acknowledge that including the specific Parflow version number in the title may have been misleading. Nevertheless, the modification to the title was an effort to address the Editor's comments during the early stages of manuscript submission. We intend to work with the Editor to rectify this issue.
  Regarding the reviewer's comment about the methodology's suitability for "paralleling toy/simple serial programs," we would like to clarify our intent. Firstly, our work is not primarily focused on parallelization, as both Parflow and MPDATA are already parallel. Furthermore, Parflow is by no means a simple serial program on its own. While we do not claim the eDSL approach to be universally applicable, our experience suggests its potential relevance to a significant range of geoscientific software. We specifically report on the value of the eDSL approach to enhance the portability in ParFlow, and how using the same principles, portability is achieved in MPDATA.
  We understand that our manuscript could benefit from a more focused approach on the context of the legacy geophysical software codebase, with less emphasis on the computer science aspects of the eDSL approach. Nonetheless, we firmly believe that the advancements we have reported hold significant practical value and should not be dismissed as merely incremental due to their relative simplicity.
  
  Citation: https://doi.org/10.5194/egusphere-2023-1079-AC1
AC3: 'Comment on egusphere-2023-1079', Zbigniew Piotrowski, 03 Nov 2023

We sincerely thank again the reviewers for their comments and suggestions.

We recognise that from the point of view of computer science, the portability approach based on preprocessor macros is far from the capabilities offered by state-of-the-art DSL methods. Noteworthy, the manuscript proposes a concept/approach for putting large scale legacy code on the path to performance portability. This concept is generic and can be implemented in many different ways. Since we were dealing with C and Fortran legacy code, in the former, a macro based approach was used, also because this type of infrastructure already existed. We realise as well that the terminology of the concept as eDSL does not resonate with the reviewers. This will be change in the revisions.

We agree that the manuscript should be revised to expose more the current situation of the legacy geoscientfic codes in the context of porting to modern supercomputing architectures. In turn, the manuscript should not portray the proposed solution as competing in the field of programming techniques or achieving peak performance, as it apparently is.

Based on the personal experience of the authors, however, we do not agree with the reviewers that the proposed approach should be dismissed due to its ”simplicity” (n.b. it is still easily expandable to address - at least in part - memory organization concerns). We express our knowledge and respect of the cutting-edge implementations of DSLs for performance portability that offer production using GPUs. We recognise that such solutions often do not gain wide community acceptance and need further evolution to address their - rather fundamental - limitations,

not to mention the many codes purposefully remain serial or MPI-only to retain maximal clarity and flexibility. In this work we address the pool of codes for which trading simplicity for performance is simply not useful or feasible, yet they may benefit from efficient and low-cost extension of their portability.

We agree with the reviewers that inclusion of the Parflow version number may be misleading and will work with the editor to resolve this issue.

Citation: https://doi.org/10.5194/egusphere-2023-1079-AC3

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-1079', Anonymous Referee #1, 22 Sep 2023

The paper claims to propose a novel approach for the adaptation of legacy codes to next generation hardware by using an ebedded Domain Specific Language (eDSL) concept. It also presents two application examples.

The main questions the paper would need to answer are what the novelity of the approach is, how it differs from existing approaches and why it is favorable. If it is a general approach it should be applicable to very different kinds of existing code and achieve performance portability, i.e. the program should run with an acceptable efficiency on different hardware platforms (not just run at all).

The approach presented by the authors unfortunately is simply the use of preprocessor macros to encapsulate rather basic constructs like memory allocation or loops. This is a programming technique, which is hardly a novelity. Preprocessor macros have been used intensively in the last century. However, programming experts do not recommend the usage of macros as the can circumvent major syntax checkings of the compiler. When the constructs get more complicated than in the examples presented by the authors they are also hard to read and often the code is hard to debug. Calling something as oldfashioned as precompiler macros "eDSL" does not make them more modern. I doubt that many programmers would call memory allocation or loops as part of a "kernel".

The authors then present two examples for the application of their "eDSL": the hydrology code ParFlow and the flow solver MPDATA. The presented graphs demonstrate, that the codes run both on GPUs and CPUs and that they are considerably faster on GPUs. However, based on the given information it is hard to assess how valid this information is. Usually GPUs require a different organisation of memory and program code than CPUs for optimal performance and some algorithms are easier transferable to GPUs than other. Therefore would be necessary to know more about the numerical algorithms used to solve the problems. Was an explicit or an implicit time stepping scheme used? If an implicit scheme, which linear solver? Some solvers operate well on GPUs, but require much more iterations, than better solvers, which are not easily transferable to the simplified architecture of a GPU. It would also be interesting, if the codes achieve a significant fraction of the peak performance on both architectures. However, this informations are missing.

As modern simulation codes are complex pieces of software consisting components as grid manger, matrix assembly, nonlinear and linear solvers etc. it is not clear, how this macro-based eDSL should be applied. For many problems the solution of the linear equation systems is the most expensive part of the software. Usually highly optimized libraries like Hypre, PETSc... are used to perform this task. How should the eDSL of the authors be generalized to software like this? The approach seems most suitable for rather simple stencil-based problems on regular grids. However, these kinds of problems are easily rewritten in more powerful DSLs, which do not just produce different loop commands, but performance optimized code for different platforms.

The article is written in a rather vague and imprecise style. The introduction reads more like a short history of the development of high performance computing (with too few citations) and the chapter on application agnostic eDSL for accelerators is also not very concrete. The title of the paper is not really fitting. According to the text, the authors want to present a general approach for geoscience models, not a new version of Parflow.

Overall, the authors present precompiler macros, a decades old programming technique as new approach to modernize legacy codes and demonstrate performance gains, which can not really be put in perspective. To me this looks like old wine in new skins. As I see neither the novelty nor the added scientific value of this approach, I recommend to reject the paper.

Citation: https://doi.org/10.5194/egusphere-2023-1079-RC1
- AC2: 'Reply on RC1', Zbigniew Piotrowski, 25 Oct 2023
  
  We thank the referee for preparing the review. Upon careful consideration of the comments, we acknowledge that the term 'eDSL' may not align well with established computer science naming conventions.
  In the manuscript, we intentionally emphasized that the proposed solution is neither of general applicability, nor aiming at maximizing efficiency (e.g. by using terms: simple, lightweight, minimal in several instances). We believe the reviewer's comments in fact support our perspective, highlighting the need to rethink the alternative paradigm of full-blown DSL (or merely code-to-code translator) vs. vendor-locked code adaptation. Porting medium and large codes to modern architectures is complicated, and suboptimal GPU performance is better than none. Lack of funding for dedicated software engineering is a common fact as well. Full scale porting efforts of large legacy codes, with substantial code redesign (as often required by general solutions) may imply an excessively long implementation and re-validation phase. The reality is that a majority of research codes operates at a fraction of the (case-relevant) peak performance, and even the largest well-optimized operational packages receive a tailored optimization with new hardware procurements.
  For practical and historical reasons, numerous top-class codes rely on code preprocessing. While readability concerns vary case-by-case, it is already recognized that hosting several sets of compiler directives is not a desired option either (which the proposed approach attempts to replace). Moreover, preprocessing directives are a valid, non-deprecated part of the language standard and we can't see how their use could inhibit compiler syntax-checking. In our opinion, C/Fortran codes should not be judged by the choice of the language constructs, as it is often a matter of a personal taste, level of compiler support (especially Fortran) or nature of the computational problem at hand, and in the case of legacy codes, it is simply the baseline. It is easy to find a counterexample, where using advanced language construct inhibits straightforward GPU porting or produce error logs that are difficult to process (e.g. with C++ templates).
  The memory organization is indeed crucial for achieving good performance on manycore architectures. The proposed solution, similar to directive-based porting does not offer full automation, although it (comparably) enables several such optimizations without the code duplication, thus favouring readibility and maintainability.
  Parflow employs Kinsol and Hypre packages to solve nonlinear and linear problems, however, for GPU it relies on internal implementation of Newton-Krylov non-linear solver, and multigrid-preconditioned conjugate gradient (MGCG) for the linear solver. The details are readily available in several publications, and summarised in https://doi.org/10.5194/gmd-13-1373-2020. While performance optimisation are always possible and desirable, the presented solution is sufficient to provide portability, and in the particular case of ParFlow, extending beyond a CUDA backend to an even more agnostic state-of-the-art portability layer as is Kokkos.
  
  In turn, MPDATA example consists of a pure explicit advection method that does not lead here to any implicit formulation. Thus, discussing the details of numerics, together with the (already addressed) question of peak performance seem out of scope for this manuscript.
  
  On a side note, fully relying on library-based linear solvers is often not possible in general-application codes. The reason is that numerical modelling of physical systems is often an art of imposing correct boundary conditions bespoke to a problem at hand. With resorting to a matrix-free formulation of linear problems often necessary for large meshes, accurate implementation of the boundary conditions leads to the specialized forms of linear operators at the borders, which are usually not supported by the general-purpose libraries.
  It is true that complex codes have a modular structure. However, it is rather common to, at least initially, port only the timeloop. Furthermore, generalization of the discussed eDSL concept to cooperate with high-performance libraries seems not any more complex than coupling of such libraries with any OpenMP/OpenACC code.
  We strongly disagree with the statement that the eDSL concept is only for simple stencil codes, and Parflow serves as a perfect counterexample. Furthermore, it is absolutely untrue that such complex geophysical codes may be "easily rewritten in more powerful DSLs". Practice demonstrates that most often the DSL is missing some required features. From the authors experience, these might be: efficient support of specialized boundary stencils, global reductions, arrays with more than 3 dimensions. Moreover, the current programming language may not be easily compatible with the otherwise potentially optimal DSL, enforcing complete rewrite. Known examples of the porting effort, such as implementation of COSMO, ICON or FV3 weather Fortran-based dynamical cores in STELLA, GridTools or GT4Py DSLs, clearly prove how difficult this task is and how the DSL needs to be extended along the way. While aforementioned DSLs seem to offer performance that is hard to beat, the corresponding porting effort required is an order of magnitude greater than in the proposed approach, and performance portability promise is still difficult to be fulfilled. Moreover, our approach is not necessarily incompatible with full blown DSLs, which we believe is shown by our use of Kokkos within the eDSL approach in ParFlow. Bluntly (albeit hypothetically) stated, if the Kokkos project would fade away, it would still be possible to reach an alternative solution within the eDSL without modifying the vast majority of the ParFlow code. While it is not our role to criticise other approaches, remarks on difficulties with full-blown DSLs are present in the manuscript, e.g. lines 251-259.
  We agree that the manuscript could benefit from a more refined style, and we will ensure the title better reflects its content. Inclusion of Parflow version number was an attempt to address direct editor request in the initial submission stage.
  
  Citation: https://doi.org/10.5194/egusphere-2023-1079-AC2
RC2:
'Comment on egusphere-2023-1079', Anonymous Referee #2, 10 Oct 2023

In this manuscript, the authors propose an lightweight embedded DSLs method (Parflow 3.9) for geoscientific models on next generation hardware that can achieve a certain degree of speedup compared to the traditional baseline. The method takes the advantage from the embedded Domain Specific Language (eDSL) concept to improve the computing kernels at the loop levels. Howerver, this manuscript is lack of innovation and far from the criterion of an excellent work.

1. The contribution of this proposal is not adequate. As for the structure of this manuscript, the authors pile up lots of related work and tedious background knowledge about eDSL, rather than a concrete illustration of the design of the proposed method itself. Furthermore, when it comes to the methodology, defining a series of macros and wrappers simply seems far from the contribution requirement of a scientific paper. It looks more like some kind of incremental work.

2. The authors fail to compare the proposed method with SOTA (State-of-the-Art) methods in the field of eDSL. In the manuscript, the authors implement their eDSL methods in ParFlow and EULAG respectively. Then, they compare a series of improved versions of their programs with the baseline to demonstrate the performance gains. Although the performance is improved compared to the baseline (CPU version), it is still questionable whether the proposed method can surpass the SOTA methods.

3. The content is not identical to the title. The title of the manuscript is Parflow 3.9: development of light weight embedded DSLs for geoscientific models, while the Parflow eDSLs is just a single implementation of eDSLs given by authors. Actually, in section 4, the authors discuss a lot about the other eDSLs implementation (EULAG/MPDATA) in Fortran, which is unrelated to the so-called Parflow 3.9 within the title. Such conflict between the title and the layout may confuse the audience.

4. Such methodology that merely makes use of the macro may only be adapted to the paralleling of toy/simple serial programs, while whether it is valid in magnitude projects is still a question. Even though the authors give some explicit code examples about macro and wrappers for the purpose of showing the portability of their method, such easy examples may be far from the scenario of many magnitude parallel programs.

In a word, the authors propose a lightweight eDSL method, which is actually a series of macros, to improve geoscientific models. In my opinion, it is not novel enough and far from the frontier technique, and the contribution is insufficient. In addition, there are some problems with the writing of this manuscript. The authors ought to revise the title or content to make the whole proposal consistent.

Citation: https://doi.org/10.5194/egusphere-2023-1079-RC2
- AC1: 'Reply on RC2', Zbigniew Piotrowski, 25 Oct 2023
  
  We sincerely thank the Referee for their thorough review. We acknowledge that the eDSL techniques may not be the ideal context for describing our specific case. While we concur that the technical methods we employed are relatively straightforward, they were applied to a complex and intricate geophysical Parflow code. The simplicity of the technical approach is in fact its advantage. The extension of such a well-established community code to enable GPU execution, which has demonstrated performance improvements, should, in our view, not be considered merely incremental.
  Our primary objective in this work is not to position a relatively simple concept as a direct competitor to state-of-the-art eDSLs. Instead, it aims to showcase its applicability to large legacy codes programmed in C and Fortran and allowing for a swift move to integration on GPUs, with greater flexibility and portability than the typical directive-based approaches. Consequently, it is not our intention to claim superiority over state-of-the-art methods, as these usually require employing dedicated software engineers, simply not available to many research groups. We also acknowledge the fact that new scientific codes may be built over state-of-the-art DSLs effectively, when the design already has contemporary HPC requirements and coding practices in mind. But when addressing legacy codes, many constrains can appear. One such constrain may be the programming language of the legacy code (e.g., porting a FORTRAN code via Kokkos) and the data structures used. We claim simply that our approach is potentially advantageous for porting codes with such restrictions, for which more sophisticated solutions may require a much higher effort.
  We acknowledge that including the specific Parflow version number in the title may have been misleading. Nevertheless, the modification to the title was an effort to address the Editor's comments during the early stages of manuscript submission. We intend to work with the Editor to rectify this issue.
  Regarding the reviewer's comment about the methodology's suitability for "paralleling toy/simple serial programs," we would like to clarify our intent. Firstly, our work is not primarily focused on parallelization, as both Parflow and MPDATA are already parallel. Furthermore, Parflow is by no means a simple serial program on its own. While we do not claim the eDSL approach to be universally applicable, our experience suggests its potential relevance to a significant range of geoscientific software. We specifically report on the value of the eDSL approach to enhance the portability in ParFlow, and how using the same principles, portability is achieved in MPDATA.
  We understand that our manuscript could benefit from a more focused approach on the context of the legacy geophysical software codebase, with less emphasis on the computer science aspects of the eDSL approach. Nonetheless, we firmly believe that the advancements we have reported hold significant practical value and should not be dismissed as merely incremental due to their relative simplicity.
  
  Citation: https://doi.org/10.5194/egusphere-2023-1079-AC1
AC3: 'Comment on egusphere-2023-1079', Zbigniew Piotrowski, 03 Nov 2023

We sincerely thank again the reviewers for their comments and suggestions.

We recognise that from the point of view of computer science, the portability approach based on preprocessor macros is far from the capabilities offered by state-of-the-art DSL methods. Noteworthy, the manuscript proposes a concept/approach for putting large scale legacy code on the path to performance portability. This concept is generic and can be implemented in many different ways. Since we were dealing with C and Fortran legacy code, in the former, a macro based approach was used, also because this type of infrastructure already existed. We realise as well that the terminology of the concept as eDSL does not resonate with the reviewers. This will be change in the revisions.

We agree that the manuscript should be revised to expose more the current situation of the legacy geoscientfic codes in the context of porting to modern supercomputing architectures. In turn, the manuscript should not portray the proposed solution as competing in the field of programming techniques or achieving peak performance, as it apparently is.

Based on the personal experience of the authors, however, we do not agree with the reviewers that the proposed approach should be dismissed due to its ”simplicity” (n.b. it is still easily expandable to address - at least in part - memory organization concerns). We express our knowledge and respect of the cutting-edge implementations of DSLs for performance portability that offer production using GPUs. We recognise that such solutions often do not gain wide community acceptance and need further evolution to address their - rather fundamental - limitations,

not to mention the many codes purposefully remain serial or MPI-only to retain maximal clarity and flexibility. In this work we address the pool of codes for which trading simplicity for performance is simply not useful or feasible, yet they may benefit from efficient and low-cost extension of their portability.

We agree with the reviewers that inclusion of the Parflow version number may be misleading and will work with the editor to resolve this issue.

Citation: https://doi.org/10.5194/egusphere-2023-1079-AC3

Zbigniew P. Piotrowski, Jaro Hokkanen, Daniel Caviedes-Voullieme, Olaf Stein, and Stefan Kollet

Viewed

Total article views: 556 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
371	147	38	556	28	24

HTML: 371
PDF: 147
XML: 38
Total: 556
BibTeX: 28
EndNote: 24

Views and downloads (calculated since 31 Jul 2023)

Month	HTML	PDF	XML	Total
Jul 2023	50	5	1	56
Aug 2023	115	37	4	156
Sep 2023	44	18	3	65
Oct 2023	65	14	8	87
Nov 2023	20	10	4	34
Dec 2023	10	6	4	20
Jan 2024	9	4	1	14
Feb 2024	6	14	0	20
Mar 2024	15	11	1	27
Apr 2024	8	3	2	13
May 2024	6	12	2	20
Jun 2024	13	11	5	29
Jul 2024	10	2	3	15

Cumulative views and downloads (calculated since 31 Jul 2023)

Month	HTML	PDF	XML	Total
Jul 2023	50	5	1	56
Aug 2023	115	37	4	156
Sep 2023	44	18	3	65
Oct 2023	65	14	8	87
Nov 2023	20	10	4	34
Dec 2023	10	6	4	20
Jan 2024	9	4	1	14
Feb 2024	6	14	0	20
Mar 2024	15	11	1	27
Apr 2024	8	3	2	13
May 2024	6	12	2	20
Jun 2024	13	11	5	29
Jul 2024	10	2	3	15

Viewed (geographical distribution)

Total article views: 536 (including HTML, PDF, and XML) Thereof 536 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Jul 2024

Short summary

The computer programs capable of simulation of Earth system components evolve, adapting new fundamental science concepts and more observational data on more and more powerful computer hardware. Adaptation of a large scientific program to a new type of hardware is costly. In this work we propose cheap and simple but effective strategy that enable computation using graphic processing units, based on automated program code modification. This results in better resolution and/or longer predictions.


Total:	0
HTML:	0
PDF:	0
XML:	0