the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
HOPE: An Arbitrary-Order Non-Oscillatory Finite-Volume Shallow Water Dynamical Core with Automatic Differentiation
Abstract. This study presents the High Order Prediction Environment (HOPE), an automatically differentiable, non-oscillatory finite-volume dynamical core for shallow water equations on the cubed-sphere grid. HOPE integrates four key features: (1) arbitrary high-order accuracy through genuine two-dimensional reconstruction schemes; (2) essential non-oscillation via adaptive polynomial order reduction in discontinuous regions; (3) exact mass conservation inherited from finite-volume discretization; (4) automatically differentiable and (5) GPU-native scalability through PyTorch-based implementation. Another innovation is the intensive panel boundary treatment, which eliminates numerical instability during using high order reconstruction scheme, meanwhile, simplifies the interpolation process to a matrix-vector multiplication without losing accuracy. Numerical experiments demonstrates the capabilities of HOPE: The 11th-order scheme reduces errors to near double-precision round-off levels in steady-state geostrophic flow tests on coarse 1°×1° grids. Maintenance of Rossby-Haurwitz waves over 100 simulation days without crashing. A cylindrical dam-break test case confirms the genuinely two-dimensional WENO scheme exhibits significantly better isotropy compared to dimension-by-dimension approaches. Two implementations are developed: a Fortran version for convergence analysis and a PyTorch version leveraging automatic differentiation and GPU acceleration. The PyTorch implementation maps reconstruction and quadrature operation to 2D convolution and Einstein summation respectively, achieving about 2× speedup on single NVIDIA RTX3090 GPU versus Dual Intel E5-2699v4 CPUs execution. This design enables seamless coupling with neural network parameterizations, positioning HOPE as a foundational tool for next-generation differentiable atmosphere models.
- Preprint
(2769 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 22 Jul 2025)
-
RC1: 'Comment on egusphere-2025-1889', Anonymous Referee #1, 16 Jun 2025
reply
Summary
The manuscript presents a new framework known as HOPE (High Order Prediction Environment) for the numerical solution of the shallow water equations on a cubed sphere grid. HOPE has several novel or interesting features, including options for very high order numerics, options to use inherently 2D WENO schemes, and an implementation in PyTorch that facilitates running on GPUs and provides automatically differentiability. The PyTorch implementation is timely as it facilitates integration of the model within a machine learning system, a topic that is currently of great interest.
I believe these novel features should eventually be sufficient to justify publication after some revision to clarify the presentation and discussion in the manuscript.
Points related to interpretation and understanding
Line 74. It is important to make sure that you compare like with like. A k'th order 1D finite difference derivative requires (generally) a stencil of k+1 points or cells. In HOPE you mostly discuss reconstructions rather than derivatives; a k'th order reconstruction can be done with a stencil of k cells. But if you take a difference of two reconstructions to compute a derivative then you will have used two different stencils and at least k+1 data points.
High order
- It is good to see the requirement for smoothness of data mentioned for high order to be more accurate (line 65, 81-82). Advocates of high order schemes don't always mention this.
- However, it is important to be precise with terminology to avoid confusion for readers (and authors!) The phrase 'convergence accuracy' (line 69) mixes up two ideas that should be kept distinct: order of accuracy and convergence rate. Convergence rate agrees with order of accuracy only for sufficiently smooth data. It is very common for the convergence rate to be less than the order of accuracy.
- Line 77. 'arbitrary accuracy' -> arbitrary order of accuracy. Check for other places where you have used 'accuracy' when you mean 'order of accuracy' (e.g., lines 424, 433, 458, 461, 463). Order of accuracy is not the same thing as accuracy; there are even situations where a higher order of accuracy produces a less accurate solution.
Line 224. It is not obvious that negative values of (an element of) \gamma could cause instability. Presumably the elements of R_H must be allowed to be negative, otherwise we would not be able to achieve more than second order? So why should there be such a restriction on \gamma? Please give some discussion or a reference.
At first glance (41) seems to be dimensionally inconsistent, since it mixes derivatives of different orders. It might be good to remind the reader that the computational coordinates x and y have effectively been non-dimensionalized by the grid spacing so that \Delta x = \Delta y = 1. (Thus, (41) is dimensionally correct, after all.) This non-dimensionalization is also appropriate to ensure that the smoothness indicators \beta_i scale with resolution in an appropriate way. What is \epsilon in (39)?
Section 5, general comment (to the whole community, really!): make sure you extract good, useful information from your test cases, not just attractive-looking plots! For example, see the next point, as well as the suggestion below to diagnose dissipation quantitatively.
Some of the test cases in section 5 are run with the high-order reconstruction and some are run with the WENO scheme, and the flow over the mountain case does not say which scheme is used. In an operational model one must make up one's mind which scheme to use, though, in research mode, having different options available allows one to explore sensitivities. It would be valuable for readers if you could share any knowledge and understanding you have gleaned by comparing WENO vs high order on the different test cases. Even if you don't show figures and tables for all combinations, it would be good to comment on any differences. For example, do WENO3 and WENO5 give 3rd order and 5th order convergence for the steady geostrophic flow case? Do high order schemes produce oscillations in the flow over a mountain case? Does WENO3 produce similar solutions to the third order scheme on the Rossby-Haurwitz test case?
Line 407. 'prone to collapse due to factors such as...'. To be clear, the R=4 Rossby-Haurwitz wave is unstable. Those factors, or even roundoff error, can provide a perturbation that initiates the instability, but they are not the fundamental cause of the collapse - that is the instability itself.Line 491. That test case is actually dominated by Rossby waves, not gravity waves.
Points for discussion; the authors may or may not wish to address these in the manuscript.
Riemann solver- The Riemann solver is applied at every quadrature point before doing the quadrature. Would there be any advantage in doing the quadrature first and then applying the Riemann solver just once at each interface?
High order- Line 378. Do you have any data on how much more expensive high order is? Of course the answer will depend on implementation, computing platform, resolution, etc, but it would be good to have an idea.
- Line 380. I am inclined to agree that 3rd or 5th order will be the practical choice, but I would be interested to know your reasoning.
- One very useful potential application of a code like HOPE would be to help answer that question in a quantitative way. Flows of realistic complexity (therefore not very smooth), like the flow over an isolated mountain case, generally don't have exact solutions available. But if you could compute an accurate high-resolution reference solution then you would be able to plot error versus computational cost as you vary both resolution and order of accuracy.
Rossby-Haurwitz wave collapse- Lines 414-418. How long the Rossby-Haurwitz wave is sustained is a measure of how strongly numerical errors project onto the growing mode(s) at early times. After that the instability grows at its own rate until the RH wave collapses. Note that a cubed sphere (in the usual orientation) has an advantage (compared to an icosahedral grid, for example) in that its discretization errors should project onto zonal wavenumber 4 and higher harmonics, whereas zonal wavenumbers 1, 3, and 5 project onto the instability. Thus, presumably, it is roundoff errors that break the wavenumber 4 symmetry and trigger the instability for HOPE(?) If that is the case, then higher precision should delay the collapse. Have you tested that? It sounds like you are set up to be able to do that easily. Conversely, if higher precision does not delay the collapse, then that begs the question: what is breaking the wavenumber 4 symmetry to trigger the instability, and could it be an implementation bug?
- Also, it is good to be aware of what the time of collapse is really telling you about the model formulation, and to look at other informative aspects of the solution. For example, you mention apparent 'dissipation' of the solution. You could measure that dissipation quantitiatively by diagnosing a conserved quantity like energy or potential enstrophy, for example, and look at how their conservation depends on resolution and order of accuracy.
Points related to improving the clarity of the explanations
Line 18. '...reduces interpolation to matrix-vector multiplication'. When I first read this I thought it was stating the obvious: interpolation is a linear operation. The significance only became clear when I read section 3.3: even though the panel boundary treatment couples ghost points on the two sides of the boundary, it can be reduced to a straightforward matrix-vector multiplication. Perhaps you can briefly mention this two-way coupling in the abstract.Abstract line 23. It is unclear why a separate fortran code version is needed for `convergence analysis'. The fortran version is not mentioned in the main text (though the source code is provided).
Line 44. Comment: whether a spectral method conserves mass depends on which variables are chosen to be represented by a spectral expansion. For example, predicting a spectral representation of surface pressure (rather than the more usual log of surface pressure) should conserve mass in a hydrostatic model.
Line 60. At this point it is unclear which Jacobian matrix you mean. Which derivatives are computed automatically? Some more explanation is needed. Similarly on line 320: which gradients can be computed efficiently?
Line 70, also 243. 'does not surpass 7th order'. Please clarify whether you were using the MCORE code or your own implementation of something similar. Also, is this a fundamental limitation of the mathematical formulation or an issue with a particular implementation? It would be good to clarify what is meant by one-sided interpolation. You could avoid ghost points altogether by doing one-sided reconstruction, but I don't think that is what you mean. Do you mean that with one-sided interpolation there is no coupling between ghost points on the two sides of a panel boundary?
Line 71. I can guess what you mean by ghost interpolation scheme, but many readers will need more explanation at this point, or at least a forward reference to where it is discussed in more detail.
Line 84+. The WENO scheme is (or can be) used whenever the model needs to compute a flux. Is that correct? It was not clear to me.
Line 125. It is not yet clear where 'reconstructing' is used in the algorithm, hence this discussion is hard to follow. It would be good to give a brief overview of the method before getting into details. Also, if the reader does not already know what the 'C-property' is then line 125 does not help them. Either explain or omit. It would be worth adding that, although you use \phi_t in the momentum equation, in (13) you still predict \sqrt{G} \phi for mass conservation.
Line 138. It could be worth mentioning that, although LMARS is an approximate Riemann solver, it combines two high-order estimates to obtain the flux, so the result is high order.
Equation (26). A few words of explanation would be helpful. Here we know the \bar{q}_i, since they are predicted by the time stepping, and we wish to determine the coefficients a.
To be clear, do we need a version of the matrix R (31) for every grid cell, or is a single matrix R applicable to all grid cells?
Can you clarify whether the 2D WENO scheme is arbitrary order too, or is the implementation currently limited to 3rd and 5th order?. (The namelist file suggests the latter.)
Lines 225 to 229. This section is confusing: You define \gamma^+ and \gamma^- then jump to expressing q(x,y) in terms of \omega^+ and \omega^-.
Line 238. '...eight panel boundaries...'. Please check!
The scheme for ghost cell interpolation neatly exploits the auto-differentiation capability of PyTorch! What do you do near panel corners? Section 3.3.1: could you please clarify, is the iterative scheme used once at setup to obtain the matrix G, and then matrix multiplication is used subsequently at run time? Presumably there can be lots of zeros in G, since cells near the centre of a panel do not affect any ghost cells; thus, could some compact representation of G be used?
Line 312. Comment: the Wicker-Skamarock RK scheme is 3rd order only for linear problems.
Which scheme was used to produce figure 8? Was it one of the WENO schemes or the arbitrary order (non-WENO) scheme? In either case, what was the order of accuracy? It is encouraging that there are no numerical oscillations in the vorticity or other fields. Is that true for all the schemes discussed, or only for the WENO schemes?
Points related to equations and mathematical notation
Equation (21). r is a dummy subscript in the middle expression; it should not appear in the final expression. Similarly for equation (22).
Line 162-163 does not make sense, since you have not specified any relation between k and n. There are many inconsistencies in notation in this section. n is the number of terms in a polynomial (line 162), then it is the stencil width in the x-direction (23). m is the stencil width in the y-direction (23), then it is equal to n^2 (line 169, 207, 213). k is the width of the stencil (line 163) then a dummy index for coefficients (23). Line 207: the stencil width is now h (but h is not mentioned again). In section 4 the stencil width is s_w.
Inconsistent fonts are used for the matrix \gamma (compare (32) and (35)).
Presumably (36) refers to individual elements of the matrix \gamma, not the entire matrix?
(37) and (38) don't seem to be correct. (38) implies that \sum_i \omega_i^+ = 1 and \sum_i \omega_i^- = 1. However, in order for (37) to be a proper weighted average of the p_i's we would need \sum_i (\omega_i^+ - \omega_i^-) = 1. \omega_i is mentioned in the text, but only \omega_i^+ and \omega_i^- are defined by equations. Should we assume \omega_i = \omega_i^+ - \omega_i^- ? Please check.
Equation (53) seems to be dimensionally inconsistent. Should sign(m) not be abs(m), which would pick out the upwind value of q? See also line 346. Actually, taking careful note of parentheses, the (fortran) source code seems to be correct, but is inconsistent with equation (53).
The notation G is used for the metric (section 2) and also for the matrix to compute ghost cell values (section 3.3.1). Line 262: g should be bold font.
Lines 324 and 327: can I just check that there should be no comma between n_v and n_p, i.e., the first dimension is of size n_v \times n_p? The code (if I understand it correctly) suggests that these arrays are 5-dimensional. Also, comparing lines 327 and 334, n_{poc} seems to be the size of the second (or perhaps third) dimension, not the first.
Points related to phrasing, typos, etc
Line 17, line 53. 'intensive' panel boundary treatment. What is meant by intensive? Perhaps a different word would be better?
Line 34. 'Unlike...' is not a complete sentence. Perhaps the preceding full stop should be a comma?
Line 78. Does 'its' refer to the new ghost interpolation scheme?
Line 92. If I understand correctly, GPU optimization and automatic differentiabilty are two different things; PyTorch happens to provide them both. The sentence as written implies that automatic differentiation is needed for GPU implementation, which I don't think is correct.
Line 133. Can you clarify: Gaussian quadrature along the interface (rather than, say, over some upwind region).
Line 162. The term 'order' is already overloaded. It is not necessary to talk about a k'th-order square stencil. It is enough to say k \times k stencil. See also line 206.
Line 163: n^2 is the number of cells in the stencil ('cell number in the stencil' is ambiguous). Similarly, it is the number of terms in the TPP.
Line 210. The phrase 'determine the unique weights' suggests that (32) can be solved and has a unique solution. As soon becomes clear, (32) is overdetermined and has no exact solution, and only a least squares approximate solution can be found.
Line 227. 'stencil i is smooth ... stencil i is discontinuous...'. Don't you mean the data sampled or reconstructed on stencil i is smooth or discontinuous?
Line 301. 'location, since'. Full stop and a new sentence would be better.
Equation (47). Since Einstein summation is mentioned in various places, perhaps note that there is no summation over i in (47).
Line 310. I cannot find any other mention of H.
Line 318-321. 'Both of these operations are highly optimized for execution on GPUs...' Do you mean highly optimized in the PyTorch implementation? The next sentence seems to be confusing two distinct ideas: (i) PyTorch has built-in commands for convolutions and matrix-vector multiplication, streamlining implementation (without explicit loop commands); (ii) PyTorch offers automatic differentiation, enabling efficient gradient computation.
Line 323. To be clear, n_v prognostic variables per cell.
Line 359: widely?
Line 363. You haven't said what \alpha is, other than a number that is set to zero.
Line 387. The phrase 'we set' makes it seem like you have made your own choice for \lambda_c and \theta_c. But aren't those values the standard ones for this test case?
Line 401. zonal advection -> zonal propagation. (The wave structure is not simply advected in the zonal direction; it propagates through the Rossby wave propagation mechanism.)
Line 404. Please check the units for c.
Line 481. 'handling of anomalous anisotropic characateristics'. I think the problem is that the 1D scheme lacks isotropy, rather than the data.
References: Kochkov et al.; the year should be 2024.
Citation: https://doi.org/10.5194/egusphere-2025-1889-RC1
Model code and software
HOPE: High Order Predition Environment Lilong Zhou https://gitee.com/DwyaneChou/FVM/tree/Pytorch/
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
70 | 15 | 3 | 88 | 3 | 2 |
- HTML: 70
- PDF: 15
- XML: 3
- Total: 88
- BibTeX: 3
- EndNote: 2
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1