the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
ShellSet v1.1.0 – Parallel Dynamic Neotectonic Modelling: A case study using Earth5-049
Abstract. We present a parallel combination of existing, well known, and robust software used in modelling the neotectonics of planetary lithosphere, which we call ShellSet. The added parallel framework allows multiple models to be run at the same time and with varied input parameters. Additionally, we have added a grid search option to automatically generate models within a given parameter space. ShellSet offers significant advantages over the original programs through its simplicity, efficiency, and speed. We demonstrate the speedup obtained by ShellSet's parallel framework by presenting timing information for a parallel grid search, varying the number of threads and models, on a typical computer. A possible use case for ShellSet is shown using two examples in which we improve upon an existing global model. Initially we improve the model using the same data before further improving the model through the addition of a new scoring data set.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(1368 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1368 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1164', Anonymous Referee #1, 10 Nov 2023
This is the revision of the manuscript titled "ShellSet v1.1.0 – Parallel Dynamic Neotectonic Modelling: A case study using Earth5-049" by Jon B. May, Peter Bird, and Michele M. C. Carafa.
The authors present the development of a previous software for geodynamic modelling. The new software is a single program designed to take advantage of the modern computing capabilities, with improvements in terms of time saving and simplified user-machine interactions. The work is well structured and easy to read, and products are well documented. The final product of this work, the ShellSet code, though it does not represent a substantially new modelling method but a more performant pre-existing method, would allow the geodynamic community affording more complex and finer models, thus stimulating the scientific discussion in the nearest future.
I would recommend the publication of this work with minor revisions.
Few comments are listed as follows, and other minor comments are in the attached pdf.
- The ShellSet program, following its predecessor SHELLS, is able to model lateral variations of the lithospheric strength, in terms of laterally-varying crustal and lithospheric thickness, elevation and heat-flow. However, to my understanding, crustal and mantle density values, or fault parameters (i.e. fault friction) should be kept constant in the whole numerical domain, meaning that, for instance, for each model, fault elements should have the same friction coefficient in the whole domain. This can be an issue in large continental scale areas. Is there a way, in this new program, to laterally varying these parameters in the sense that users can, for instance, model each fault with customized friction parameters?
- Rows 35-37, and 150: Is ShellSet able run on Mac OS? and which Python version is needed?
- Rows 60-61: what kind of test is performed?
- Section 3: the inclusion of a flow diagram (like the one in Figure 1 of the UserGuide) to help the readers/users understand how to use it in an optimal way, would be helpful.
- Rows 120-122: the mentioned “separate file” can be modified by the user? Which are “the most general conditions on the variable” and which are those can be modified by the users? Please, explain this better.
-
AC1: 'Reply on RC1', Jon Bryan May, 28 Nov 2023
We thank the reviewer for their comments & suggestions. We will endeavour to update the article using the suggestions noted both within the comment and the supplied pdf file. We post the following responses to the online comments:
1) ShellSet does not assume laterally-constant density in either the crust or the mantle-lithosphere. Firstly, density at every point in the lithosphere is affected by ambient temperature, which depends on depth, local heat-flow, radioactive heat production in that layer, and any locally non-steady-state component of the geotherm (where heat-flow and non-steady components are computed by OrbData, the heat-production of each layer is an input parameter, and depth is relative to known topography). Secondly, the program OrbData also adds a “lithospheric density anomaly of chemical origin” at each node, which is adjusted to achieve isostasy with other nodes, given the crust and mantle-lithosphere thicknesses that have already been determined by OrbData. These chemical density anomalies are limited to small amplitudes (-50 to +50 kg/m^3) to avoid unreasonable results over trenches and mantle plumes, which are allowed to be non-isostatic instead.
As for fault friction, ShellSet also allows different friction on different faults, if the Lithospheric Rheology #n (LRn) input option is used. Potentially, every fault element could be assigned a different friction. The qualification to this flexibility is that ShellSet will only vary and optimize Lithospheric Rheology 0 (the baseline, or default rheology), so in a model with different frictions on different faults, only one set of faults (or the surrounding domain) could be optimized in any run.
2) While we are quite confident that if an environment is correctly set up for ShellSet (guest Linux OS, e.g., using Oracle VM Virtual Box, required compilers and libraries etc) it will function on any host OS which can provide these (including Mac OS). However, we do not have access to any Mac machine to test this and therefore we cannot state for certain whether it would function or not. We agree that functionality on a Mac OS system would be a useful addition to ShellSet and will actively work to secure access to a suitable machine to test this for future updates to ShellSet. In the meantime, we remain contactable to aid users should issues arise on any OS.
The Python version used is Python 3.8, the routine “ShellSetScatter” uses the libraries: numpy, re, os, matplotlib and mpl_toolkits.mplot3d, while “ShellSetGUI” uses: tkinter and os. We will note the Python version within the article text.
3) We use "test" to mean a set of models, each model being a fixed set of parameters. So, for example, a grid search is a single test of N models where we search a parameter space for the optimal set of values. We will make this clearer in the text.
4) We will add a simple flow diagram to show how the elements of the ShellSet program link together.
5) The separate file (as all source code) is modifiable by the user. The current set conditions are: the fault friction must be less than the continuum friction; the crustal mean density must be less than the mantle mean density. We recommend that any other conditions which users may require be added to this file to 1) avoid requiring any modifications at other source code locations in order to correctly apply their checks & 2) to maintain a consistent location for this program update for simplified further modifications/debugging etc. We have placed no limit on the number of conditions or how the parameters within each condition interact
Citation: https://doi.org/10.5194/egusphere-2023-1164-AC1
-
CC1: 'About ShellSet performance profiling', Daniele Melini, 17 Nov 2023
I have a couple of comments on this really interesting work.
As discussed in the paper, interpreting the results of ShellSet performance tests shown in Table 3 and Figure 3 is not straightforward since the code uses a hybrid MPI/OpenMP model in which the number of MKL threads is adjusted dynamically with the number of MPI workers. If technically possible, a more insightful understanding of the code scaling could be obtained by running an additional set of tests in which, for 64 models, the number of MKL threads is kept fixed. In that way it could be possible to fit the speedup curve with the Amdahl law to get an estimate of the parallel fraction of the code (which I expect that should be quite large). Moreover, by running this test with the number of MKL threads fixed to different values (for instance 1,2,4) it should also be possible to get an idea of whether in large-scale runs it is more computationally efficient to leverage the MKL parallelism or to run all the MPI workers as purely serial tasks.
I am also wondering what is the level of integration between ShellSet and its dependencies: if a new version of one of the three codes are released, how easy it is to drop it into ShellSet?
Best regards
Daniele Melini
Citation: https://doi.org/10.5194/egusphere-2023-1164-CC1 -
AC2: 'Reply on CC1', Jon Bryan May, 28 Nov 2023
Thank you for your time reading and comments on our article.
It is true that performance testing of ShellSet is a little more complicated than usual for the reasons which you have noted. Currently Intel MKL dynamically selects the number of threads based on the problem size and number of physical cores available. As noted in the article we have added a control on this to prevent an over-request of resources. We originally omitted this test since, as we state, the target for ShellSet is a 'typical', likely non-technical user who would be able to rely on the MKL dynamic option, which is set within ShellSet, to automatically select the best number of MKL threads.
One obvious drawback to this testing is that any results, while providing insight, will only be true for the specific problem size on which they are performed. In this article we tackle a global model with 16008 nodes, this is likely to have different performance scaling related to MKL thread numbers than a local model with 2000 nodes. Another is that, since ShellSet was not optimised for cluster use, the speed up potential from varying MKL threads could be partially lost within the program.
We recognise that MKL thread testing could be of interest to more technical users, and we will attempt to perform this analysis if a suitable machine can be found within publication deadlines. We note that this performance test is something which can also be provided for during future updates to the source code and user guide.
The three original codes are kept as 'whole' as possible, with only the required changes made to each one for some efficiency savings, necessary new subroutine calls, error handling etc. Also, to simplify the source code for the user we have separated the original 3 programs and their subroutines/functions into separate files. Unfortunately for these reasons it is not possible to simply exchange an updated program version for its counterpart in ShellSet. However, the vast majority of each of the 3 programs remains the same and so for typical (not root & branch) updates made to each can be simply copied into ShellSet.
Fortunately, amongst our authors we have the owner of the three programs and so in partnership will endeavour to keep ShellSet up to date with its constituent parts. It is for this reason that we would encourage users to check the ShellSet GitHub page (https://github.com/JonBMay/ShellSet) which will continue to maintain the most up to date version, while the source code available on Zenodo (https://zenodo.org/records/7986808) will be periodically updated.
Citation: https://doi.org/10.5194/egusphere-2023-1164-AC2
-
AC2: 'Reply on CC1', Jon Bryan May, 28 Nov 2023
-
RC2: 'Comment on egusphere-2023-1164', Rene Gassmoeller, 25 Jan 2024
The authors present a combination of existing computational tools (OrbData, Shells, OrbScore) into a new framework, which they name ShellSet. The existing tools allow the dynamic modeling of tectonic environments in 2D spherical geometries (surface grids) and the new software described in the manuscript is a combination of the tools that allows (among other new options) a parallel dynamic grid search through a vast number of models as well as an automatic coordination of the individual software components.I applaud the authors for the continuation of the development of SHELLS and its connected software tools and think the presented software marks a significant improvement that deserves publication in GMD. However, the manuscript has some weaknesses in its structure and cross-referencing and the presented performance analysis has some inconclusive results that need to be addressed before I can recommend the publication of the manuscript. I list my concerns below and I look forward to see them addressed in a revised version of the manuscript.Line-by-line comments:+ The manuscript is well written and has a logical flow, starting with the description of the components, proceeding with a benchmark (reproducing the results of a paper on the individual components) and extension of functionality and ending with results on scaling and performance. However, some information is not in the places where I would have expected it. I left my original comments in place below to show you which places are initially confusing to a reader and marked my later additions with **Later edit:**.- Line 34: The authors claim that ShellSet is entirely based on open source software, with a reference to a personal website of one of the authors. Unfortunately this is not sufficient to fulfil the definition of open source software as maintained by the Open Source Initiative (https://opensource.org/osd/). In particular the linked software does not include any license information (after a somewhat exhaustive search). This means that while the author clearly intends to make the software freely available, legally it is not spelled out what users are allowed to do (technically, the strictest copyright applies and users are not allowed to do any modification or distribute the software). This is not a problem for ShellSet itself though, since ShellSet is clearly licensed under GPL 3. I would suggest to either modify this statement to say "freely-distributed" dependencies, or to include an open source license on the linked website to make it clear to users what license this software is distributed under. I am aware that much of this software was written before clear definitions for open source software existed.- Line 35: ShellSet requires Intel compiler, MPI, and MKL, but only provides a makefile with hard-coded include paths to Intel MKL. This approach is prone to problems on user systems since Intel MKL may be installed anywhere, in particular if the user is not an administrator of their system. A more portable solution would be to use cmake or autoconf as an operating system independent configuration system that allows to automatically find the location of Intel MKL. At least the authors should modify the Makefile so that it points to an include directory that also includes the Intel MPI headers and document how to change this directory. On my system mpif.h could not be found after adjusting the include paths, because the Makefile links to the MKL include directory, not the general Intel OneAPI include directory. I had to modify -I"/opt/intel/oneapi/mkl/2021.3.0/include" to -I/opt/intel/oneapi/2024.0/include, which is an unnecessary hurdle for new or inexperienced users. The (necessary) inclusion of version numbers in this path is another obstacle that results from using hand-written Makefiles.- Section 2.3 The description of OrbScore in the manuscript is insufficient. It is not laid out how exactly OrbScore compares to existing datasets and computes the scores of model results and the only reference to a description of OrbScore is given as a link to a personal website that cannot give a guarantee for future availability. At least the short text that is already available on the website should be included in the manuscript. In addition even on the linked website it is not spelled out "how" the score is computed (e.g. is it an RMS difference, or some other error norm? Is it the same norm for all criteria? How are the different criteria scaled against each other for the combined score?). I understand that some of the other tools have been published elsewhere, but there is no reference given to the original publication of OrbScore. **Later edit:** After continuing to read the manuscript I found the necessary reference in lines 181-182, which refer to the original description in Bird et al 2008. I think at least the paragraph that describes the grading procedure in line 179-183 should be generalized and moved into section 2.3 to explain OrbScore to the reader. My preference would be to also list all the available datasets (this gives the chance to explain any new datasets since the Bird et al. 2008 paper), the ways the individual scores are computed, and the procedure (and reasoning) for forming the final score as geometric mean. Lines 179-183 could then be shortened to say that OrbScore was run with the specific subset of datasets as in Bird et al. 2008.- Line 100: This sentence is somewhat confusing. I think what you are saying is that "by default" Intel MKL will automatically choose the maximum number of threads (e.g. as many as there are cores), but since you also run multiple MPI ranks in parallel it is better to let ShellSet choose the number of threads manually (e.g. as number of available cores divided by number of running MPI ranks). Please clarify the first part of the sentence.- Line 102+103: usually we speak of MPI processes (or ranks if you refer to the specific number), not threads. Threads are a different parallelization model, so MPI threads does not usually make sense (I think you refer to MPI processes in these two lines).- Line 107+108: Either provide the name of the command line argument (if it is important), or remove the statement that it is a command line argument (because it is clear that it is activated somehow). The more interesting question here is how does a user know which "value" to choose to get a sufficiently accurate match. Is there a reasonable default value provided, or would a user have to perform manual testing to get "a feel" for a good value?- Line 111: The grid search is an interesting addition, but its description is missing crucial details about the algorithm. In particular you should spell out somewhere that your grid search is actually a contracting grid search (which searches on multiple levels, narrowing in onto the optimal results). It would also be useful to provide some context on other possible search algorithms or why you chose a grid search (see my references on the conclusion for examples). I can think of several different ways to perform this grid search, and the choice of algorithm presumable influences how well it scales in parallel. E.g. does the algorithm create one grid level, then execute all models in this level on the available cores? Then once the level is finished it creates the next level (and how? use the best value as center point in parameter space for a finer grid? or try to find the neighboring points with best values and span a grid between them?) and finishes that level before refining further? And what happens if the algorithm identifies two disjunct regions with similar error values? **Later edit**: After reading the rest of the paper I found the algorithm description in Appendix A (which is not referenced from the paper). I would suggest to move Appendix A as a subsection into Section 3 (E.g. 3.1 could be the MPI parallelism part of Section 3, 3.2 could be the grid search algorithm, 3.3. could be the user interface changes). At the very least reference Appendix A from Section 3 when you mention the grid search algorithm.- Line 120: "theoretically" and "possible" is duplicative and makes the sentence harder to understand- Line 121: Please specify the conditions that are currently implemented and checked. As a simple user it is impossible to know what "the most general conditions" are.- Line 124: It is not clear what "its variable values" refers to. Do you mean the input control parameters?- Line 125: "improved user satisfaction" is a very general term and hard to quantify without tools like user surveys etc. What are your sources? Maybe reword to "This combination ... has reduced error-prone manual operations and saved us and our collaborators valuable research time. This allows ... ." or similar- Line 137: unnecessary "the" before "Shells"- Line 140: Not necessarily a comment on this manuscript but for future versions of the software: You write that you have simplified the controls of the program, but you still require the user to prepare 3 (or so?) different files and up to 9 different command line arguments. In my opinion it should be possible to combine these parameters into a single input file and maybe one or two CLAs. Some of the CLAs look like they are runtime configuration options rather than something that should go into a CLA. And one of your input files only contains paths to other input files, which seems repetetive in design (I understand that this is for historic reasons because the software controls several submodules).- Line 145: This is what I was looking for before (how to compute the combined misfit), but you note that geometric mean is the new option, and you do not spell out what the old option was. Also the individual scores are still not explained. Later addition: After reading Section 4 and Bird et al. 2008 I am more confused. Bird et al 2008 already used a geometric mean to compute the combined score, how does this new method differ from the previously published one?- Section 3.1 seems unnecessary as most of the important information is already given in line 35-37. Maybe include the remaining bits in the introduction (like the info about the specific oneAPI toolkits necessary) and delete this section. Also it is generally not considered good style to have a section number 3.1 if there is no 3.2. In addition it is ok to only list the dependency names in the manuscript, however the Github repository of the software should in addition list the minimum version numbers of all the dependencies (Fortran Compiler, MKL, MPI; or simply the combined Intel OneAPI version). I had to test the program on OneAPI 2024, but the authors clearly developed on some other version, so if 2024 wouldnt have worked for me I had no information on which version to use instead.- Line 177: It is not quite clear what "the authors" is referring to (I presume the authors of the original Bird et al paper), please clarify in the text- Line 206 - 209: This is the description of the gridsearch algorithm that I would have expected in a general form in Section 3. However, the description is not specific enough in its current form to answer all of my questions above. E.g. You select the best 2 models for the creation of the next grid layer, forming new 3x3 grids on the lower level. From the statement that you have 18 models on the lower models I suppose the 2 selected models do not act as corner points of the next level, but you do not spell this out. Instead I suppose you create two independent 3x3 grids around the model parameters of the best models on the coarse level by varying the parameter values from their old values, e.g. as x_(i+1) = x_i +/- dx/2 if x is a model input parameter, i the level index of the grid hierarchy, and dx the step size of the parameter variation? I also assume that the same parameter combination that was run on the coarse grid is not repeated on the finer grid? This is implied in line 209, but it would be worth spelling out more clearly (e.g. add a sentence that explains how the input parameters on the lower level grids are chosen, and that one of the models on the lower level is identical to the "parent" model on the coarser level and therefore not recomputed). **Later edit**: This comment is now obsolete after I found Appendix A, maybe you can still use some of the ideas to improve upon Appendix A. One question that is still open to me: In Appendix A you describe a 2x2 grid search in the main text a 3x3 grid search. How does the algorithm proceed in a 2x2 grid search if the central model (the one from the coarser level) is the best model? This model is not associated with any of the 4 new cells that were generated. (this is not a problem in 3x3 grids, because the original model is also a cell center on the finer level).Fig. 1+2: The colorscale label "Geometric mean" is clear to the authors, but to the reader it is not clear the geometric mean "of what" is shown here. Please reword to "Geometric mean score" or something similar. Also the tauMax axis label is missing its unit (I suspect fFric the other axis is unitless).- Line 296: speedup performance -> performance speedup- Line 298+299: This performance result (as well as the table) while showing some benefit of the MPI parallelization is slightly concerning. I understand the complications of the performance measurement that were also already mentioned in community comment 1 (CC1) on the discussion page of the manuscript. I also agree with the authors that the final optimal setting for most users will be to let ShellSet automatically select the number of MKL threads to optimally use the available compute cores. However, I also think the request raised in CC1 (performance table for fixed number of MKL threads) needs to be addressed. My reasoning for this is the following: The MPI problem solved by ShellSet is very close to a scenario that is called 'embarassingly parallel', which means the number of models that need to be computed are independent from each other and require minimal communication. Therefore, we would expect the compute time to scale inversely proportional to the number of available MPI worker processes as long as a sufficient number of compute cores and models to be computed are available and the number of MKL threads per model is constant. However, due to the changing number of MKL threads between the rows of table 3 this is impossible to check from the table. The only case that can be used as a test for this is the transition from 8 workers / 8 models, to 16 workers / 16 models (both of which use 1 MKL thread according to table caption), for which we would expect the compute time to remain roughly constant within some uncertainties (due to changes in model setups). However, the table clearly shows a doubling of compute time from 15 to 29.5 minutes. This implies that either: (i) the MPI ranks are not optimally distributed among the available compute cores (e.g. all ranks are always distributed among the 8 performance cores, efficiency cores are ignored), (ii) the MPI implementation is incorrect or doesnt scale beyond 8 cores, (iii) the given number of MKL threads is incorrect. A rerun of the table with a fixed MKL thread number of 1 on the same hardware (no need for HPC) could distinguish between these cases. This could prove that the authors MPI implementation is correct and the weird timing results from the complications of modern consumer CPU architecture. This would also show that running ShellSet on HPC CPUs (either workstations or HPC clusters) will be more efficient than this table can show. This is because splitting a model into more and more MKL threads will decrease the parallel efficiency, while distributing more and more models among the available MPI workers will not (on modern HPC clusters we can run >100,000 MPI ranks efficiently in parallel, but for the foreseeable future we will not be able to split a model in >100,000 MKL threads efficiently).- Line 305: "We have shown in Sect. 4 ..."- Line 306: Specify which new data set- Line 311: Move the project name out of the conclusion, this is what the acknowledgments section is for. Keep the future application here.- Line 314-316: This statement is very vague and seems disconnected from the earlier description of the search algorithm. In particular you have not described earlier which part about the grid search algorithm is currently not efficient. I assume it is the bottleneck of choosing the next grid level after one level has been completed, therefore limiting the total number of models that can be run in parallel. The current formulation of the sentence implies you already know the algorithms you want to try ("altering the search algorithm" instead of "exploring other search algorithms"), so you should either: name the alternatives you want to explore and why (see the introduction of Baumann et al. 2014 for a list of algorithms that have been used in geodynamics and Reuber 2021 for a wider list of sampling algorithms), or at least clarify which bottlenecks exist that you have to overcome with a new algorithm.References:https://doi.org/10.1016/j.tecto.2014.04.037.https://doi.org/10.1007/s13137-021-00186-y- Appendix A and B seem like important additions to the manuscript, leaving them to the appendix made the main paper harder to read and understand. Appendix A should certainly move into Section 3. Depending on your decision on my comment about section 2.3 Appendix B should either move into Section 2.3 or at least should be referenced from there as a new dataset available for OrbScore.Citation: https://doi.org/
10.5194/egusphere-2023-1164-RC2 -
AC3: 'Reply on RC2', Jon Bryan May, 11 Feb 2024
We thank Dr Gassmoeller for his time in creating his very detailed review.
Since the review is quite long we have placed our responses into the attached pdf file, which includes a copy of the review text for simplicity. Our responses to each comment are in bold and placed immediately after the comment to which they respond.
-
AC3: 'Reply on RC2', Jon Bryan May, 11 Feb 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1164', Anonymous Referee #1, 10 Nov 2023
This is the revision of the manuscript titled "ShellSet v1.1.0 – Parallel Dynamic Neotectonic Modelling: A case study using Earth5-049" by Jon B. May, Peter Bird, and Michele M. C. Carafa.
The authors present the development of a previous software for geodynamic modelling. The new software is a single program designed to take advantage of the modern computing capabilities, with improvements in terms of time saving and simplified user-machine interactions. The work is well structured and easy to read, and products are well documented. The final product of this work, the ShellSet code, though it does not represent a substantially new modelling method but a more performant pre-existing method, would allow the geodynamic community affording more complex and finer models, thus stimulating the scientific discussion in the nearest future.
I would recommend the publication of this work with minor revisions.
Few comments are listed as follows, and other minor comments are in the attached pdf.
- The ShellSet program, following its predecessor SHELLS, is able to model lateral variations of the lithospheric strength, in terms of laterally-varying crustal and lithospheric thickness, elevation and heat-flow. However, to my understanding, crustal and mantle density values, or fault parameters (i.e. fault friction) should be kept constant in the whole numerical domain, meaning that, for instance, for each model, fault elements should have the same friction coefficient in the whole domain. This can be an issue in large continental scale areas. Is there a way, in this new program, to laterally varying these parameters in the sense that users can, for instance, model each fault with customized friction parameters?
- Rows 35-37, and 150: Is ShellSet able run on Mac OS? and which Python version is needed?
- Rows 60-61: what kind of test is performed?
- Section 3: the inclusion of a flow diagram (like the one in Figure 1 of the UserGuide) to help the readers/users understand how to use it in an optimal way, would be helpful.
- Rows 120-122: the mentioned “separate file” can be modified by the user? Which are “the most general conditions on the variable” and which are those can be modified by the users? Please, explain this better.
-
AC1: 'Reply on RC1', Jon Bryan May, 28 Nov 2023
We thank the reviewer for their comments & suggestions. We will endeavour to update the article using the suggestions noted both within the comment and the supplied pdf file. We post the following responses to the online comments:
1) ShellSet does not assume laterally-constant density in either the crust or the mantle-lithosphere. Firstly, density at every point in the lithosphere is affected by ambient temperature, which depends on depth, local heat-flow, radioactive heat production in that layer, and any locally non-steady-state component of the geotherm (where heat-flow and non-steady components are computed by OrbData, the heat-production of each layer is an input parameter, and depth is relative to known topography). Secondly, the program OrbData also adds a “lithospheric density anomaly of chemical origin” at each node, which is adjusted to achieve isostasy with other nodes, given the crust and mantle-lithosphere thicknesses that have already been determined by OrbData. These chemical density anomalies are limited to small amplitudes (-50 to +50 kg/m^3) to avoid unreasonable results over trenches and mantle plumes, which are allowed to be non-isostatic instead.
As for fault friction, ShellSet also allows different friction on different faults, if the Lithospheric Rheology #n (LRn) input option is used. Potentially, every fault element could be assigned a different friction. The qualification to this flexibility is that ShellSet will only vary and optimize Lithospheric Rheology 0 (the baseline, or default rheology), so in a model with different frictions on different faults, only one set of faults (or the surrounding domain) could be optimized in any run.
2) While we are quite confident that if an environment is correctly set up for ShellSet (guest Linux OS, e.g., using Oracle VM Virtual Box, required compilers and libraries etc) it will function on any host OS which can provide these (including Mac OS). However, we do not have access to any Mac machine to test this and therefore we cannot state for certain whether it would function or not. We agree that functionality on a Mac OS system would be a useful addition to ShellSet and will actively work to secure access to a suitable machine to test this for future updates to ShellSet. In the meantime, we remain contactable to aid users should issues arise on any OS.
The Python version used is Python 3.8, the routine “ShellSetScatter” uses the libraries: numpy, re, os, matplotlib and mpl_toolkits.mplot3d, while “ShellSetGUI” uses: tkinter and os. We will note the Python version within the article text.
3) We use "test" to mean a set of models, each model being a fixed set of parameters. So, for example, a grid search is a single test of N models where we search a parameter space for the optimal set of values. We will make this clearer in the text.
4) We will add a simple flow diagram to show how the elements of the ShellSet program link together.
5) The separate file (as all source code) is modifiable by the user. The current set conditions are: the fault friction must be less than the continuum friction; the crustal mean density must be less than the mantle mean density. We recommend that any other conditions which users may require be added to this file to 1) avoid requiring any modifications at other source code locations in order to correctly apply their checks & 2) to maintain a consistent location for this program update for simplified further modifications/debugging etc. We have placed no limit on the number of conditions or how the parameters within each condition interact
Citation: https://doi.org/10.5194/egusphere-2023-1164-AC1
-
CC1: 'About ShellSet performance profiling', Daniele Melini, 17 Nov 2023
I have a couple of comments on this really interesting work.
As discussed in the paper, interpreting the results of ShellSet performance tests shown in Table 3 and Figure 3 is not straightforward since the code uses a hybrid MPI/OpenMP model in which the number of MKL threads is adjusted dynamically with the number of MPI workers. If technically possible, a more insightful understanding of the code scaling could be obtained by running an additional set of tests in which, for 64 models, the number of MKL threads is kept fixed. In that way it could be possible to fit the speedup curve with the Amdahl law to get an estimate of the parallel fraction of the code (which I expect that should be quite large). Moreover, by running this test with the number of MKL threads fixed to different values (for instance 1,2,4) it should also be possible to get an idea of whether in large-scale runs it is more computationally efficient to leverage the MKL parallelism or to run all the MPI workers as purely serial tasks.
I am also wondering what is the level of integration between ShellSet and its dependencies: if a new version of one of the three codes are released, how easy it is to drop it into ShellSet?
Best regards
Daniele Melini
Citation: https://doi.org/10.5194/egusphere-2023-1164-CC1 -
AC2: 'Reply on CC1', Jon Bryan May, 28 Nov 2023
Thank you for your time reading and comments on our article.
It is true that performance testing of ShellSet is a little more complicated than usual for the reasons which you have noted. Currently Intel MKL dynamically selects the number of threads based on the problem size and number of physical cores available. As noted in the article we have added a control on this to prevent an over-request of resources. We originally omitted this test since, as we state, the target for ShellSet is a 'typical', likely non-technical user who would be able to rely on the MKL dynamic option, which is set within ShellSet, to automatically select the best number of MKL threads.
One obvious drawback to this testing is that any results, while providing insight, will only be true for the specific problem size on which they are performed. In this article we tackle a global model with 16008 nodes, this is likely to have different performance scaling related to MKL thread numbers than a local model with 2000 nodes. Another is that, since ShellSet was not optimised for cluster use, the speed up potential from varying MKL threads could be partially lost within the program.
We recognise that MKL thread testing could be of interest to more technical users, and we will attempt to perform this analysis if a suitable machine can be found within publication deadlines. We note that this performance test is something which can also be provided for during future updates to the source code and user guide.
The three original codes are kept as 'whole' as possible, with only the required changes made to each one for some efficiency savings, necessary new subroutine calls, error handling etc. Also, to simplify the source code for the user we have separated the original 3 programs and their subroutines/functions into separate files. Unfortunately for these reasons it is not possible to simply exchange an updated program version for its counterpart in ShellSet. However, the vast majority of each of the 3 programs remains the same and so for typical (not root & branch) updates made to each can be simply copied into ShellSet.
Fortunately, amongst our authors we have the owner of the three programs and so in partnership will endeavour to keep ShellSet up to date with its constituent parts. It is for this reason that we would encourage users to check the ShellSet GitHub page (https://github.com/JonBMay/ShellSet) which will continue to maintain the most up to date version, while the source code available on Zenodo (https://zenodo.org/records/7986808) will be periodically updated.
Citation: https://doi.org/10.5194/egusphere-2023-1164-AC2
-
AC2: 'Reply on CC1', Jon Bryan May, 28 Nov 2023
-
RC2: 'Comment on egusphere-2023-1164', Rene Gassmoeller, 25 Jan 2024
The authors present a combination of existing computational tools (OrbData, Shells, OrbScore) into a new framework, which they name ShellSet. The existing tools allow the dynamic modeling of tectonic environments in 2D spherical geometries (surface grids) and the new software described in the manuscript is a combination of the tools that allows (among other new options) a parallel dynamic grid search through a vast number of models as well as an automatic coordination of the individual software components.I applaud the authors for the continuation of the development of SHELLS and its connected software tools and think the presented software marks a significant improvement that deserves publication in GMD. However, the manuscript has some weaknesses in its structure and cross-referencing and the presented performance analysis has some inconclusive results that need to be addressed before I can recommend the publication of the manuscript. I list my concerns below and I look forward to see them addressed in a revised version of the manuscript.Line-by-line comments:+ The manuscript is well written and has a logical flow, starting with the description of the components, proceeding with a benchmark (reproducing the results of a paper on the individual components) and extension of functionality and ending with results on scaling and performance. However, some information is not in the places where I would have expected it. I left my original comments in place below to show you which places are initially confusing to a reader and marked my later additions with **Later edit:**.- Line 34: The authors claim that ShellSet is entirely based on open source software, with a reference to a personal website of one of the authors. Unfortunately this is not sufficient to fulfil the definition of open source software as maintained by the Open Source Initiative (https://opensource.org/osd/). In particular the linked software does not include any license information (after a somewhat exhaustive search). This means that while the author clearly intends to make the software freely available, legally it is not spelled out what users are allowed to do (technically, the strictest copyright applies and users are not allowed to do any modification or distribute the software). This is not a problem for ShellSet itself though, since ShellSet is clearly licensed under GPL 3. I would suggest to either modify this statement to say "freely-distributed" dependencies, or to include an open source license on the linked website to make it clear to users what license this software is distributed under. I am aware that much of this software was written before clear definitions for open source software existed.- Line 35: ShellSet requires Intel compiler, MPI, and MKL, but only provides a makefile with hard-coded include paths to Intel MKL. This approach is prone to problems on user systems since Intel MKL may be installed anywhere, in particular if the user is not an administrator of their system. A more portable solution would be to use cmake or autoconf as an operating system independent configuration system that allows to automatically find the location of Intel MKL. At least the authors should modify the Makefile so that it points to an include directory that also includes the Intel MPI headers and document how to change this directory. On my system mpif.h could not be found after adjusting the include paths, because the Makefile links to the MKL include directory, not the general Intel OneAPI include directory. I had to modify -I"/opt/intel/oneapi/mkl/2021.3.0/include" to -I/opt/intel/oneapi/2024.0/include, which is an unnecessary hurdle for new or inexperienced users. The (necessary) inclusion of version numbers in this path is another obstacle that results from using hand-written Makefiles.- Section 2.3 The description of OrbScore in the manuscript is insufficient. It is not laid out how exactly OrbScore compares to existing datasets and computes the scores of model results and the only reference to a description of OrbScore is given as a link to a personal website that cannot give a guarantee for future availability. At least the short text that is already available on the website should be included in the manuscript. In addition even on the linked website it is not spelled out "how" the score is computed (e.g. is it an RMS difference, or some other error norm? Is it the same norm for all criteria? How are the different criteria scaled against each other for the combined score?). I understand that some of the other tools have been published elsewhere, but there is no reference given to the original publication of OrbScore. **Later edit:** After continuing to read the manuscript I found the necessary reference in lines 181-182, which refer to the original description in Bird et al 2008. I think at least the paragraph that describes the grading procedure in line 179-183 should be generalized and moved into section 2.3 to explain OrbScore to the reader. My preference would be to also list all the available datasets (this gives the chance to explain any new datasets since the Bird et al. 2008 paper), the ways the individual scores are computed, and the procedure (and reasoning) for forming the final score as geometric mean. Lines 179-183 could then be shortened to say that OrbScore was run with the specific subset of datasets as in Bird et al. 2008.- Line 100: This sentence is somewhat confusing. I think what you are saying is that "by default" Intel MKL will automatically choose the maximum number of threads (e.g. as many as there are cores), but since you also run multiple MPI ranks in parallel it is better to let ShellSet choose the number of threads manually (e.g. as number of available cores divided by number of running MPI ranks). Please clarify the first part of the sentence.- Line 102+103: usually we speak of MPI processes (or ranks if you refer to the specific number), not threads. Threads are a different parallelization model, so MPI threads does not usually make sense (I think you refer to MPI processes in these two lines).- Line 107+108: Either provide the name of the command line argument (if it is important), or remove the statement that it is a command line argument (because it is clear that it is activated somehow). The more interesting question here is how does a user know which "value" to choose to get a sufficiently accurate match. Is there a reasonable default value provided, or would a user have to perform manual testing to get "a feel" for a good value?- Line 111: The grid search is an interesting addition, but its description is missing crucial details about the algorithm. In particular you should spell out somewhere that your grid search is actually a contracting grid search (which searches on multiple levels, narrowing in onto the optimal results). It would also be useful to provide some context on other possible search algorithms or why you chose a grid search (see my references on the conclusion for examples). I can think of several different ways to perform this grid search, and the choice of algorithm presumable influences how well it scales in parallel. E.g. does the algorithm create one grid level, then execute all models in this level on the available cores? Then once the level is finished it creates the next level (and how? use the best value as center point in parameter space for a finer grid? or try to find the neighboring points with best values and span a grid between them?) and finishes that level before refining further? And what happens if the algorithm identifies two disjunct regions with similar error values? **Later edit**: After reading the rest of the paper I found the algorithm description in Appendix A (which is not referenced from the paper). I would suggest to move Appendix A as a subsection into Section 3 (E.g. 3.1 could be the MPI parallelism part of Section 3, 3.2 could be the grid search algorithm, 3.3. could be the user interface changes). At the very least reference Appendix A from Section 3 when you mention the grid search algorithm.- Line 120: "theoretically" and "possible" is duplicative and makes the sentence harder to understand- Line 121: Please specify the conditions that are currently implemented and checked. As a simple user it is impossible to know what "the most general conditions" are.- Line 124: It is not clear what "its variable values" refers to. Do you mean the input control parameters?- Line 125: "improved user satisfaction" is a very general term and hard to quantify without tools like user surveys etc. What are your sources? Maybe reword to "This combination ... has reduced error-prone manual operations and saved us and our collaborators valuable research time. This allows ... ." or similar- Line 137: unnecessary "the" before "Shells"- Line 140: Not necessarily a comment on this manuscript but for future versions of the software: You write that you have simplified the controls of the program, but you still require the user to prepare 3 (or so?) different files and up to 9 different command line arguments. In my opinion it should be possible to combine these parameters into a single input file and maybe one or two CLAs. Some of the CLAs look like they are runtime configuration options rather than something that should go into a CLA. And one of your input files only contains paths to other input files, which seems repetetive in design (I understand that this is for historic reasons because the software controls several submodules).- Line 145: This is what I was looking for before (how to compute the combined misfit), but you note that geometric mean is the new option, and you do not spell out what the old option was. Also the individual scores are still not explained. Later addition: After reading Section 4 and Bird et al. 2008 I am more confused. Bird et al 2008 already used a geometric mean to compute the combined score, how does this new method differ from the previously published one?- Section 3.1 seems unnecessary as most of the important information is already given in line 35-37. Maybe include the remaining bits in the introduction (like the info about the specific oneAPI toolkits necessary) and delete this section. Also it is generally not considered good style to have a section number 3.1 if there is no 3.2. In addition it is ok to only list the dependency names in the manuscript, however the Github repository of the software should in addition list the minimum version numbers of all the dependencies (Fortran Compiler, MKL, MPI; or simply the combined Intel OneAPI version). I had to test the program on OneAPI 2024, but the authors clearly developed on some other version, so if 2024 wouldnt have worked for me I had no information on which version to use instead.- Line 177: It is not quite clear what "the authors" is referring to (I presume the authors of the original Bird et al paper), please clarify in the text- Line 206 - 209: This is the description of the gridsearch algorithm that I would have expected in a general form in Section 3. However, the description is not specific enough in its current form to answer all of my questions above. E.g. You select the best 2 models for the creation of the next grid layer, forming new 3x3 grids on the lower level. From the statement that you have 18 models on the lower models I suppose the 2 selected models do not act as corner points of the next level, but you do not spell this out. Instead I suppose you create two independent 3x3 grids around the model parameters of the best models on the coarse level by varying the parameter values from their old values, e.g. as x_(i+1) = x_i +/- dx/2 if x is a model input parameter, i the level index of the grid hierarchy, and dx the step size of the parameter variation? I also assume that the same parameter combination that was run on the coarse grid is not repeated on the finer grid? This is implied in line 209, but it would be worth spelling out more clearly (e.g. add a sentence that explains how the input parameters on the lower level grids are chosen, and that one of the models on the lower level is identical to the "parent" model on the coarser level and therefore not recomputed). **Later edit**: This comment is now obsolete after I found Appendix A, maybe you can still use some of the ideas to improve upon Appendix A. One question that is still open to me: In Appendix A you describe a 2x2 grid search in the main text a 3x3 grid search. How does the algorithm proceed in a 2x2 grid search if the central model (the one from the coarser level) is the best model? This model is not associated with any of the 4 new cells that were generated. (this is not a problem in 3x3 grids, because the original model is also a cell center on the finer level).Fig. 1+2: The colorscale label "Geometric mean" is clear to the authors, but to the reader it is not clear the geometric mean "of what" is shown here. Please reword to "Geometric mean score" or something similar. Also the tauMax axis label is missing its unit (I suspect fFric the other axis is unitless).- Line 296: speedup performance -> performance speedup- Line 298+299: This performance result (as well as the table) while showing some benefit of the MPI parallelization is slightly concerning. I understand the complications of the performance measurement that were also already mentioned in community comment 1 (CC1) on the discussion page of the manuscript. I also agree with the authors that the final optimal setting for most users will be to let ShellSet automatically select the number of MKL threads to optimally use the available compute cores. However, I also think the request raised in CC1 (performance table for fixed number of MKL threads) needs to be addressed. My reasoning for this is the following: The MPI problem solved by ShellSet is very close to a scenario that is called 'embarassingly parallel', which means the number of models that need to be computed are independent from each other and require minimal communication. Therefore, we would expect the compute time to scale inversely proportional to the number of available MPI worker processes as long as a sufficient number of compute cores and models to be computed are available and the number of MKL threads per model is constant. However, due to the changing number of MKL threads between the rows of table 3 this is impossible to check from the table. The only case that can be used as a test for this is the transition from 8 workers / 8 models, to 16 workers / 16 models (both of which use 1 MKL thread according to table caption), for which we would expect the compute time to remain roughly constant within some uncertainties (due to changes in model setups). However, the table clearly shows a doubling of compute time from 15 to 29.5 minutes. This implies that either: (i) the MPI ranks are not optimally distributed among the available compute cores (e.g. all ranks are always distributed among the 8 performance cores, efficiency cores are ignored), (ii) the MPI implementation is incorrect or doesnt scale beyond 8 cores, (iii) the given number of MKL threads is incorrect. A rerun of the table with a fixed MKL thread number of 1 on the same hardware (no need for HPC) could distinguish between these cases. This could prove that the authors MPI implementation is correct and the weird timing results from the complications of modern consumer CPU architecture. This would also show that running ShellSet on HPC CPUs (either workstations or HPC clusters) will be more efficient than this table can show. This is because splitting a model into more and more MKL threads will decrease the parallel efficiency, while distributing more and more models among the available MPI workers will not (on modern HPC clusters we can run >100,000 MPI ranks efficiently in parallel, but for the foreseeable future we will not be able to split a model in >100,000 MKL threads efficiently).- Line 305: "We have shown in Sect. 4 ..."- Line 306: Specify which new data set- Line 311: Move the project name out of the conclusion, this is what the acknowledgments section is for. Keep the future application here.- Line 314-316: This statement is very vague and seems disconnected from the earlier description of the search algorithm. In particular you have not described earlier which part about the grid search algorithm is currently not efficient. I assume it is the bottleneck of choosing the next grid level after one level has been completed, therefore limiting the total number of models that can be run in parallel. The current formulation of the sentence implies you already know the algorithms you want to try ("altering the search algorithm" instead of "exploring other search algorithms"), so you should either: name the alternatives you want to explore and why (see the introduction of Baumann et al. 2014 for a list of algorithms that have been used in geodynamics and Reuber 2021 for a wider list of sampling algorithms), or at least clarify which bottlenecks exist that you have to overcome with a new algorithm.References:https://doi.org/10.1016/j.tecto.2014.04.037.https://doi.org/10.1007/s13137-021-00186-y- Appendix A and B seem like important additions to the manuscript, leaving them to the appendix made the main paper harder to read and understand. Appendix A should certainly move into Section 3. Depending on your decision on my comment about section 2.3 Appendix B should either move into Section 2.3 or at least should be referenced from there as a new dataset available for OrbScore.Citation: https://doi.org/
10.5194/egusphere-2023-1164-RC2 -
AC3: 'Reply on RC2', Jon Bryan May, 11 Feb 2024
We thank Dr Gassmoeller for his time in creating his very detailed review.
Since the review is quite long we have placed our responses into the attached pdf file, which includes a copy of the review text for simplicity. Our responses to each comment are in bold and placed immediately after the comment to which they respond.
-
AC3: 'Reply on RC2', Jon Bryan May, 11 Feb 2024
Peer review completion
Journal article(s) based on this preprint
Model code and software
ShellSet - Parallel Dynamic Neotectonic Modelling Jon Bryan May, Peter Bird, Michele Matteo Cosimo Carafa https://zenodo.org/record/7986808
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
401 | 131 | 35 | 567 | 27 | 28 |
- HTML: 401
- PDF: 131
- XML: 35
- Total: 567
- BibTeX: 27
- EndNote: 28
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Jon Bryan May
Peter Bird
Michele Matteo Cosimo Carafa
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1368 KB) - Metadata XML