the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Highly Scalable Geodynamic Simulations with HyTeG
Abstract. High-resolution geodynamic simulations of mantle convection are essential to quantitatively assess the complex physical processes driving the large-scale tectonic phenomena that shape Earth’s surface. Accurately capturing small-scale features such as unstable thermal boundary layers requires global resolution on the order of 1 km, which renders traditional sparse matrix methods impractical due to prohibitive memory demands and low arithmetic intensity. Matrix-free methods offer a scalable alternative, enabling the solution of large-scale linear systems efficiently. In this work, we leverage the matrix-free Finite Element framework HyTeG to conduct large-scale geodynamic simulations that incorporate realistic physical models. We validate the framework through a combination of convergence studies and geophysical benchmarks. These include verifying the convergence rates of Finite Element solutions against analytical solutions and through community benchmarks, including test cases with temperature-dependent and nonlinear rheologies. Our scalability studies demonstrate excellent performance, scaling up to problems with about 1011 unknowns in the Stokes system.
- Preprint
(2425 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2552', Shijie Zhong, 30 Jul 2025
- AC2: 'Reply on RC1', Ponsuganth Ilangovan, 18 Sep 2025
-
CC1: 'Comment on egusphere-2025-2552', Shangxin Liu, 11 Aug 2025
I glimpsed this interesting geodynamic technical paper from the GMD article alert and further read it in detail. This study presents a new finite element modeling framework called HYTEG, which is based on matrix-free geometric multigrid preconditioner (like the one used by the current version of ASPECT) to overcome the need of the large memory for the storage of the stiffness matrix in the classic geometric multigrid preconditioner. The authors show the benchmark results against analytic or previous numerical codes in both 2D and 3D geometries through instantaneous and time-dependent calculations. This new framework provides a useful finite element tool for the community. While the work presented here shows the various capabilities of this new software, I have some comments and suggestions for the authors to consider to further improve the robustness of this study and the HYTEG framework.
1. While the mesh refinement is introduced in the main text, I still found that the number of mesh of each refinement level is not very clear. What are the numbers of the radial elements in each triangulation (refinement) level in each convergence plot for the errors, such as Figs. 1, 2, and 3? For example, in ASPECT, global refinement n means that there is 2^(n+1) number of radial elements in the 3D spherical shell in the default setting. It’s worthwhile to describe the mesh refinement in a clearer way for the convenience of readers of this paper and potential users of the HYTEG framework.
2. The mantle response (delta) function benchmark in section 4.1.
First, the authors should make it clear that how the velocity and pressure errors presented (Fig. 1) are calculated. Are they the errors of the averaged velocity and (dynamic) pressure errors of the whole domain? Which wavelengths (spherical harmonic degree and orders) are presented? These are not quite clear when I read through this part. Although the details may be presented in the earlier studies the authors refer to, I think it’s worthwhile to clarify these details in the main text of the paper as well.
Second, only the velocity and pressure solutions are shown. I strongly suggest the authors also calculate the responses of the surface and CMB dynamic topography and geoid. The calculation of the dynamic topography includes the radial derivatives of the velocity, which requires second-order accuracy of the velocity solutions. The geoid solutions are even more sensitive to the computational accuracy because of the counterbalance effect between the buoyancy from density integral and the dynamic topography. If the response functions of the surface and CMB dynamic topography and geoid are also shown to be accurate, the robustness of the Stokes solver for this code can be verified completely.
Third, at lines 305-306, several previous efforts in the formulation of the response function benchmark in 3D spherical shell geometry are referenced. However, other previous relevant studies on the same topic by the peers should also be acknowledged as well in this paper. For example, Liu and King, 2019 systematically benchmarked the Stokes solver for the open-source community code ASPECT in 3D spherical shell domain using the similar approach, following the formulation of the earlier work of Zhong et al., 2008 for another popular community code CitcomS. It’s noteworthy that Zhong et al., 2008 and Liu and King, 2019 both calculated the response functions not only in isoviscous Stokes system, but also in two-layer viscosity profile with a stiff top lid. I suggest that the authors also calculate the response functions in a two-layer viscosity profile to show that the Stokes solvers of the HYTEG is able to properly handle the radial viscosity jump.
The calculation of the dynamic topography may also involve the use of consistent-boundary-flux (CBF) method to help improve the accuracy of the stress compared with the straightforward pressure smoothing method (Zhong et al., 1993). Including the effects of self-gravity will also significantly change the long-wavelength dynamic topography and geoid. The incorporation of CBF method and self-gravity into a finite element code will require considerable extra work. If not yet, I don’t intend to push the authors to add them into HYTEG for this paper, but it would be necessary to make it clear in the main text that whether CBF method and self-gravity has been included in the calculation of the dynamic topography and geoid solutions. Zhong et al., 2008 and Liu and King, 2019 use CBF method to calculate dynamic topography. The two studies present the response function benchmark for Citcoms and ASPECT for both cases with and without self-gravity.
Liu, S., & King, S. D. (2019). A benchmark study of incompressible Stokes flow in a 3-D spherical shell using ASPECT. Geophysical Journal International, 217(1), 650-667.
3. Time-dependent thermal convection benchmark in 3D spherical shell.
It’s nice to see a match of the average temperature profiles between HYTEG and our recent study presented in Euen et al., 2023. However, to form a complete evaluation of the performance in 3D spherical shell thermal convection, it’s necessary to also show the comparison of the RMS velocity profiles, Nusselt numbers at the top, and Nusselt numbers at the bottom between HYTEG and Euen et al., 2023. The Nusselt numbers require the calculation of the heat flux, i.e., temperature gradient. This diagnostic is a better criterion to evaluate the second-order accuracy of the temperature field. Again, CBF method can improve the accuracy of the heat flux calculation as well (Gresho et al., 1987). Whether CBF method is used in the calculation of heat flux needs to be clarified. The heat flux calculated in the Nusselt numbers of Euen et al., 2023 use the CBF algorithm incorporated into the early version of ASPECT.
4. Lines 270-278. The authors talked about the use of the element-wise viscosity averaging for the matrix-free geometric multigrid method. It would be better to further strengthen the purpose of using the element-wide viscosity averaging for this method. It will especially reduce the memory needed for largely variable viscosity cases compared with the same cases without viscosity averaging. This is similar to the handling of this problem in the current version of ASPECT code (Clevenger and Heister, 2021). In addition, the reason that why harmonic averaging works better than arithmetic averaging in nonlinear rheology needs to be specified.
Minor comments
1. Line 272, computing the an average -> computing an average
2. Euen 2023 -> Euen et al., 2023. This issue appears in some places, such as Figs. 4 and 5.
3. From the equations (5) and (6), it appears that HYTEG solves the Stokes system for dynamic pressure instead of total pressure. I suggest making this point clear in the main text. For example, are the pressure terms in the later response function benchmark in section 4.1 the dynamic pressure or the total pressure?
Shangxin Liu
Citation: https://doi.org/10.5194/egusphere-2025-2552-CC1 - AC3: 'Reply on CC1', Ponsuganth Ilangovan, 18 Sep 2025
-
RC2: 'Comment on egusphere-2025-2552', G. Stadler, 14 Aug 2025
The paper contains benchmark results obtained for the HYTEG code
framework, which solves the typical equations governing mantle
flows. Such comparisons and benchmarks are valuable, even if they do
not add real new methods/algorithms, or new geophysics
results. Generally the paper is well written, but I've a list of
comments and suggestions below.
Main comments:
----------------* Line 108: I would not claim that the surface velocity components are
typically taken from plate-tectonic observations. This is one option
to force the system, but it should arguably be the other way round,
i.e., the forces in the mantle together with the physics should
result in plate-like surface velocities. Imposing the plate motion
as Dirichlet bdry conditions can result in unphysical forces in the
system; I suggest that you at least not claim that this is the
usual/only thing done in mantle flow.
It is again pointed out as the normal case in line 308/309--I dont
think this is the only choice.* Eq 30, line 396 -- something does not seem to be correct in the
nonlinear viscosity here, there might be a missing square root. I
would think that the highest power the denominator should be 1 in
terms of the strain rate, corresponding to plastic yielding.* Regarding nonlinear Stokes solves: I assume all nonlinearity is
treated with Picard fixed points iterations, which has limitations
for stronger nonlinearities. Can you give some information about how
many orders of viscosity variations occur in these benchmarks? Since
Picard just solves a sequence of variable viscosity Stokes problems,
I dont understand the comment made on line 274, namely that harmonic
averages work better for nonlinear problems. These averages only
affect the multigrid coarse grids, which is a purely linear
issue--or am I missing something here?* Rigid body modes, Sec 3.4: Makes sense to penalize rigid body modes
as you discuss and thus save an MPI reduction since you avoid
projecting out the rotation. It is surprising that, as discussed in
Sec 4.1.2 there is sensitivity to the penalty parameter--if systems
are solved reasonably accurate, one would expect that there is no
sensitivity as these modes are in the null space of the PDE
operator, so they should be pretty easy to control (and the solution
should not degenerate if the penalty parameter is large). Does the
sensitivity (and thus the need to tweak this parameter) have to do
with the iterative solver?* Regarding scalability in Sec 5: Since you focus on the Stokes
solves, the task should be the scalability of those solves, and not
(only) at the scalability of a fixed number of iterations towards
that solve. What you study is sometimes called the parallel
scalability and it's a purely implementation issue; but it could be
that more and more iterations are needed for finer discretization
(in weak or even strong scaling), degrading the overall scalability
of the complete solve (that second component is sometimes called
algebraic scalability or simply mesh independence/robustness). For
problems with severely varying coefficients (or even
nonlinearities!) it's challenging and sometimes impossible to
achieve that, and it requires an effective and scalable smoother,
often with more than a few smoothing steps etc. This issue that two
components contribute to the scalability does not come up with
explicit time stepping which typically just amounts to matrix-vector
applications, but it matters for implicit systems (like Stokes) and
makes them much harder to scale. Please at least add a discussion of
that issue to emphasize that just being able to do Krylov iterations
in a scalable fashion does not necessarily imply a scalabe solver!
For the larger scalability test you even reduce the number of
smoothing steps and the GMRES history to save memory; I'm not
convinced that you get that second part needed for overall
scalability of the solver for reasonably challenging problems.* It's great that you can scale to 1e11 unknowns, but it would be
interesting to see settings where such a resolution is actually
needed. I would guess that it only matters for extremely nonlinear
rheologies, extremely varying coefficients, very fine geometric
structures etc. Can you comment on your solver being used (or plans
to being used) for such problems? My concern is that you vastly
overresolve problems and thus do not gain any significant accurarcy
by going to so many DOFs.
Minor:
------* line 83: "A fact to which ..." This isnt a sentence.
* Equation 9: The dot denotes the inner product between two vectors,
but you also seem to use a dot for the matrix-vector multiplication
between tau and \hat r -- is that intentional?* line 167: ...a brief *overview* of some...
* line 185: a Stokes -> Stokes
* line 308: setup more closer -> setup closer
* line 344: in same as -> in the same as
* line 483: upto -> up to
* Missing commas make reading some sentences tricky--please add commas
in the following spots (and probably some more spots):
- line 100 after mantle
- line 110 after no-outflow
- line 148 after Here
- line 197 after factor
- line 216 after optimisations
- line 226 after enough
- line 310 after Next
- line 479 after FirstCitation: https://doi.org/10.5194/egusphere-2025-2552-RC2 - AC1: 'Reply on RC2', Ponsuganth Ilangovan, 18 Sep 2025
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,063 | 37 | 15 | 1,115 | 17 | 30 |
- HTML: 1,063
- PDF: 37
- XML: 15
- Total: 1,115
- BibTeX: 17
- EndNote: 30
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
This paper introduced a finite element modeling framework HyTeG for modeling mantle convection. The framework appears to be very versatile with a lot of capabilities. For example, it is capable of 2-D and 3-D modeling of thermal convection with variable viscosity. It uses triangle/tetrahedral elements with quadratic/linear shape functions for velocity/pressure for numerical stability. It employs a matrix-free solver with multi-grid solver capability that does not require storage of the stiffness matrix, enabling modeling problems with extremely large number of unknowns (10^11). It uses an Eulerian-Lagrangian approach (or semi-Lagrangian method?) to solve the energy equation. The paper covers a lot of topics, as this type of papers often do, including governing equations of compressible mantle convection, numerical methods, and some benchmark results. In general, I support technical effort like what this paper presents. I can see that this paper will be eventually published, but there are a few issues I think that the authors should address and improve before it can be published.
First, some main comments:
1) On the benchmark. For the stationary benchmark in section 4.1 (i.e., for the Stokes flow problem), I think that the authors should present the dynamic topography and geoid benchmark results for two important reasons: a) they are geophysically important and relevant, and b) the geoid anomalies are very sensitive to the solution quality for the pressure and velocity. Analytical solutions for the geoid and dynamic topography are widely available. For example, CitcomS benchmarks in Zhong et al., [2008] have a big section on this type of benchmarks (or even in Zhong et al., [2000]). Fig. 1 for the norm-2 for flow velocity and pressure errors is encouraging, but dynamic topography and the geoid benchmarks will be much more relevant, making the code more useful and appealing to potential users.
On the same topic, in lines 305-306, authors referenced several recent papers (since 2016) on developing semi-analytical solutions for incompressible Stokes flow problem in spherical shell using spectral methods. However, such solutions were developed in geodynamics well before 2016. For example, Tan et al. (2011) showed such analytical solutions for compressible Stokes flow in spherical shell geometry and used them to benchmark compressible version of CitcomS. Zhong et al. (2008, 2000) did the same for incompressible Stokes’ flow and used them for benchmarks of incompressible CitcomS. I think that the authors should acknowledge these studies. Actually, the way that the delta function was treated in numerical benchmark calculations presented in 4.1.1 is actually the same as how it was done in CitcomS benchmark in Zhong et al., 2008. Again, some suitable reference is needed for this (see more comments on this type of issue later).
2) On the benchmark result in Figure 2’s Nussult numbers vs different resolutions for 2-D compressible mantle convection, I believe that there are some errors in how King et al. (2010)’s results were presented in this Figure. The figure showed the current study’s Nu’s are slightly less than 7.4 at the highest resolution, but the authors presented a range of possible solutions from King et al. (2010) between 7.3 and 7.7, thus justifying their results. However, the supplementary Table S6 from King et al. (2010) showed that Nu for this case (Di=0.5 and Ra=1e5) ranges from 7.587 to 7.63 for 5 of 6 different codes, including three well-known finite element codes Citcom, Conman and UM that would be very similar to the code in this paper. Only one code in King’s benchmark study had Nu at 7.50. In any case, the authors need to clarify the origin of King’s benchmark results of Nu from 7.3 to 7.7. From my view, Nu=7.4 for this low Ra number case differs quite significantly from King’s benchmark results, suggesting to me a concerning level of numerical error in their solutions.
3) For the four spherical shell convection benchmark cases listed in Table 1 that Euen et al., 2023 used for ASPECT (originally from Zhong et al., 2008), the authors should presented the steady state results of Nu, Vrms, and other quantities, as Zhong et al., (2008) and Euen et al. (2023) did. Currently, the authors only compared the temperature profiles with Euen et al. (2023) in figures. This can be improved by presenting numerical values.
4) Section 2.3.1 and 2.3.2 discussed treatments of divergence of rho*u and u for compressible convection with no reference. However, the same treatments in finite element codes for compressible mantle convection can be found in Tan et al. 2011 for spherical shell code CitcomS or Leng and Zhong for 2-D Cartesian code. The authors may want to give some references in presenting their methods.
5) The authors highlighted their method as a matrix free method which they stated was the key for solving for a problem with exceedingly large numbers of unknowns. The way I see from reading this paper is that in this method, the elemental stiffness matrix Ke is generated and used to compute Ke*u without the need to assemble elemental Ke to a global stiffness matrix K and without the need to store K of course. If so, I suppose that this will significantly add to the cost of the calculation of Ku, especially if Ke needs to be formed every iteration within a given time step, if you do not store any K or Ke. I can see for constant viscosity and identical element, Ke is probably the same for all the element and only needs to be computed once. However, for variable viscosity, each element may have different Ke which would need to be computed. Therefore, in this matrix-free method, there seems to be a trade-off between computer memory saving and calculation speed. The authors may want to discuss this issue and give an order of magnitude estimate of CPU time for calculations with 10^11 unknowns.
Some minor comments:
1) The authors mentioned in a few places the need for mantle convection to achieve 1 km resolution. I was curious how small the time increment would have to be for such a small element, based on the criterion for picking up the time increment. Is it really feasible to use such a small time increment in global models with such a high resolution? Can a different model be formulated if 1 km is indeed needed.
2) The authors may want to double-check the writing. There are quite some grammar issues.
3) Line 31, I am not sure if Baumgardner (1985) was for a mantle convection code — its mathematical formulation was very different from any of the mantle convection formulations we know, especially how the continuity equation was treated, as I recalled.
4) For Fig. 1 and 2, can the authors explain what the resolution is for each triangulation level?
Shijie Zhong