Mapping Antarctic Geothermal Heat Flow with Deep Neural Networks optimized by Particle Swarm Optimization Algorithm
Abstract. The spatial distribution of geothermal heat flow (GHF) beneath the Antarctic Ice Sheet is a major source of uncertainty in projections of ice sheet dynamics and sea-level rise. Direct measurements are sparse, necessitating robust modeling approaches. In this study, we developed a neural network framework whose architecture and hyperparameters are optimized using a particle swarm optimization (PSO) algorithm. Trained on a global heat flow compilation and a suite of geophysical datasets, our model generates a new GHF map for the entire continent. The model's accuracy in regions lacking direct measurements was confirmed through training density validation, with prediction errors constrained to within 20 %. The resulting map delineates a distinct dichotomy: East Antarctica exhibits predominantly low GHF values (<60 mW m-2) with notable exceptions of high heat flow (>80 mW m-2) in the Vostok Subglacial Highlands and Gamburtsev Subglacial Mountains. In contrast, West Antarctica is characterized by widespread high heat flow (>60 mW m-2), especially in tectonically active regions like the Transantarctic Mountains and the Amundsen Sea sector. These predictions show agreement when compared with direct borehole measurements. Our work offers a new, robust estimate of Antarctic GHF, providing a critical boundary condition for ice sheet models. We suggest that future improvements in accuracy and interpretability can be gained by assimilating more high-resolution drilling data and integrating physical constraints into the model framework.
The study demonstrates systematic optimization of neural network architecture using PSO for hyperparameter tuning, making it a significant methodological advancement over some previous work. The systematic optimization represents a robust alternative to ad-hoc tuning methods commonly used in Earth sciences applications, particularly for such challenging problems as geothermal heat flow prediction in data-sparse regions. The research validates the established understanding of Antarctic thermal structure by confirming the East-West pattern, with predominantly low heat flow values (30-60 mW m⁻²) in East Antarctica and higher values (>60 mW m⁻²) in West Antarctica. This consistency with previous studies strengthens confidence in Antarctic crustal thermal architecture. The combination of automated optimization and independent validation of the first-order approximation of the heat flow distribution makes this work valuable for advancing predictive modelling in polar geophysics.
As an output model of actual geothermal heat to expect and include, e.g., interdisciplinary models, I am more skeptical. I have several questions regarding the observables used (listed below). The authors primarily include legacy data (e.g., as available when Aq1 was generated six years ago) along with a few additional datasets that I believe are not very robust. Some choices are not geologically meaningful, as outlined below, and the lack of qualitative assessment of the observables unfortunately invalidates the otherwise sensitive tests conducted. PSO is a valuable tool for DNN, and transparent enough to generate meaningful uncertainty metrics. However, the robustness that PSO is otherwise known for doesn't really help if the features are not meaningful, and we are treating interpolated grid values with the same weight as high-quality and representative observations (discussed by Al-Aghbary et al, 2025, link below). In general, gradient-based optimizers often outperform PSO in similar setups; however, there is certainly a value in testing and comparing various methods, and I believe there will be more development in this field over the coming years, including hybrid strategies (as introduced here with the Adam optimizer).
The ROI analysis offers a reasonable approach to address variations in in-situ data point density; however, a fundamental problem persists regarding how well a single heat flow measurement can represent an entire grid cell. Studies from West Antarctica demonstrate very large local variations in geothermal heat flow. While averaging measurements within global database cells could theoretically mitigate this issue, many cells contain only a single measurement. In these cases, we lack insight into the local conditions that were actually sampled, whether the measurement represents typical regional conditions or a localized anomaly. This spatial representativeness problem becomes particularly acute in Antarctica, where individual point measurements must characterize grid cells spanning millions of square kilometres, potentially introducing significant bias into the training dataset through disproportionate weighting of these sparse observations. However, those issues are not for this paper to resolve, and the methods and analysis are communicated very transparently and clearly. The paper contains many insightful comments regarding concerns and limitations, which are very welcome and still rare.
Some sections of the introduction are challenging to read and don't really make much sense, as if they were written by a language model rather than a scientist. The figures are very good; however, I suggest that Fig.7 be updated (as below).
I am supportive of the publication. However, I am not sure that this is the optimal journal, as the paper’s main quality lies in the development of the DNN methods; however, I leave this for the editor to consider.
Main Items to Address Before Publication
Detailed Comments
L37: Mareschal and Jaupart (2013) is a good overview; however, it’s not relevant to reconstruct Antarctic tectonic history. Reading et al (2022, NREE) is probably the most suitable example here.
L43: The text here is not clear; I understand, but it needs some editing.
L47: Citing Lösing, Ebbing et al (2020[should be 2021?]) here also appears a bit out of place. Rather, acknowledge how this study helped us contextualise previous temperature-gradient-based studies. Or is it Lösing and Ebbing (2021)?
L50: The statement that “process-based modelling [depends on] complex mathematical formulation” requires some explanation of what this means, how this is a problem, and why the study at hand addresses this.
L50-55: This section is very hard to follow and doesn’t really make any sense. It appears to contradict the previous sentence somewhat.
L61: I respectfully disagree that deep learning has been particularly successful in polar regions. Whilst there have been a few very useful studies recently (Notably by Prof. Tang), and a lot of method development, the general applications of studies have largely been limited by data availability and lack of consistency and structure. Compared to other regions of the world, DL/ML methods in polar regions have often failed to generate outputs that have been widely accepted to advance our understanding. The statement requires some analysis of why Polar regions have been more successful than elsewhere. In general, I believe that extra caution is required, and uncertainty must be communicated well when dealing with the unknown subglacial geology and in interdisciplinary studies. In the past, we’ve seen many examples of how research outputs have found interdisciplinary applications that they are not suited for. This is due to change, of course.
Section 2.1: One major problem for empirical heat flow models, and related models, is that the training set, or reference set, is not an unbiased representation of Earth’s surface. Some settings are highly overrepresented (Stål et al, 2022, Frontiers). How does this impact your results?
L87: “marine measurements excluded,” but Figure 1 map shows many marine measurements in, e.g., the North Atlantic, Mediterranean, and East China Sea. Are those included or not?
Fig. 1: “Dataset obtained from IHFC and NGHF”; however, the text only mentions IHFC. Were the two databases merged? Wouldn’t that duplicate most records in NGHF?
Table 1 (and general comments on features used):
Why are you using such relatively old datasets? With all respect to the legacy, the results from CRUST1, Shaeffer and Lebedev (2015), and An et al. (2015) are all good studies; however, they are over ten years old, and a lot of data have been collected since then. I notice a significant similarity with the observables used to produce Aq1 back in 2019-20; however, I would have used different datasets today.
Rock type is likely not a very useful observable, as most of Antarctica is classified as ice, and we know that the crustal geology is important but challenging to model (Stål et al., 2024, GRL).
Some observables, e.g., CTD from Li et al (2017), have very little coverage in Antarctica.
What depths are the P wave speed and S wave speed taken from? Can the tomographic model suggest values for the crust, as suggested on L117?
Distance to hotspot cites Anderson (2016); however, this study is not in the reference list, only Anderson (1998), which, to my understanding, doesn’t provide the spatial data referred to here. Is it the Complete Hot Spot Table? This list, as far as I know, has not been peer-reviewed, and I am rather sceptical of it. As above, it should probably be regarded as legacy work, as there was very little to constrain some of those suggestions 25-30 years ago.
Distance to Volcanoes and hotspots is, as I understand, not distance-weighted in any way. Hence, heat flow values and target locations are equally linked, e.g., if the distance to the nearest volcano is 2000 km or 20 km. This is not a useful predictor of geothermal heat. The Adam optimizer compounds this issue by learning to exploit statistical correlations between raw distance and heat flow in the training data, regardless of physical plausibility. Since Adam operates purely on numerical gradients, I suppose it will adjust network weights to minimize prediction error even when the learned relationships violate fundamental geology. This creates a model that may perform well statistically but also might generate physically meaningless patterns.
L207: What procedure to avoid overfitting is applied? Here, I would urge the authors to consider alternative and informative metrics of uncertainty. The recent paper by Al-Aghbary et al (preprint link below) would be a good starting point. What uncertainty and error could/should we actually optimize to reduce?
L281: Including the few in situ measurements in Antarctica is very problematic. 1. Most of them don’t reach the bed and represent the paleoclimate and hydrology rather than geothermal heat. 2. They are very sparse, and there are no measurements to average in each grid. 3. Some measurements should be treated with some particular care, as they are either old or associated with large technical challenges when made. 4, and most importantly, they get a disproportional weight as they will be very similar to the surrounding region.
Figure 6: There appear to be gridding artifacts from the projection of observables, such as lineations pointing toward the South Pole.
L340 and L438: This statement does not seem to agree with Fig. 7 (?). Instead, it appears that your distribution resembles Lösing et al. (2020) most, which I believe should be Lösing and Ebbing (2021).
L353 Are the values you get at Lake Vostok simply the in-situ measurement extrapolated? This measurement is likely to get a very high weight.
Figure 7. Show the difference, with the sign, rather than the absolute difference.
L361 As explained above, I agree with Fisher, as you cite, and I don’t think this is a valid test. The measurements are too sparse and represent the very local conditions. The measurements in the interior don’t reach the bedrock. We need some further discussion and evidence to claim that the measurements “nonetheless” provide good support. Two papers to consider here are Talalay et al. (2020, Cryosphere) and Mony et al. (2020, Glaciology).
The discussion section is very well written and insightful.
The Conclusion is too long and mainly repetitive. I suggest shortening it and merging some items with the Discussion if required.
Bibliography:
The list is not organised and rather chaotic. A few key studies appear to be missing, and even some citations in the text are missing. Please check Lösing et al. (2020) and Lösing and Ebbing (2021); I suspect that the papers have been mixed up a few times in the manuscript. Anderson (2016) is also missing or might have been confused by another study.
I would recommend that the authors have a look at the following suggestions:
Al-Aghbary et al. (In review, https://www.authorea.com/doi/full/10.22541/au.175373261.14525669)
Mony et al. (2020, Glaciology).
Reading et al (2022, NREE)
Stål et al. (2022, Frontiers).
Stål et al. (2024, GRL).
Talalay et al. (2020, Cryosphere)