Uncertainty quantification of deep learning model for mineral prospectivity mapping
Abstract. Deep learning techniques have significantly advanced mineral prospectivity mapping (MPM) by facilitating automated feature extraction and capturing nonlinear relationships among multi-source geological datasets. However, most deep learning models in MPM neglect the intrinsic uncertainties arising from incomplete geological knowledge, limited sampling, and model variability, leading to overconfident and potentially unreliable predictions. To address this limitation, this study proposes a comprehensive uncertainty quantification framework that jointly evaluates data, model, and prediction uncertainties in deep learning-based MPM. Data uncertainty, originating from sparse of geochemical/geophysical sampling and subjective interpretations of geological features, is characterized through stochastic simulation of evidential layers. Model uncertainty, arising from variability in network architecture and parameters estimation, is captured through a Bayesian convolutional neural network (CNN) employing Monte Carlo Dropout. The proposed framework is demonstrated through a real-world case study of gold prospectivity mapping in western Henan Province, China. These uncertainties are quantified using statistical measures including mean, variance, and entropy. The results indicate that areas exhibiting high prospectivity and low uncertainty represent robust and reliable exploration targets, whereas those with high uncertainty highlight regions requiring improved metallogenic interpretation or model refinement. Furthermore, uncertainty contribution analysis reveals that data uncertainty contributes more to total prediction uncertainty than model uncertainty, suggesting that enhancing the quality and representativeness of evidence layers is more effective for reducing uncertainty than merely optimizing network architecture or parameters. Overall, by modeling and visualizing both data and model uncertainties, the proposed framework transforms deep learning-based MPM from deterministic prediction to probabilistic decision-making, thereby enabling more reliable and trustworthy mineral exploration.
General comments:
Uncertainty quantification is a commonly studied aspect of mineral prospectivity mapping, and this manuscript presents a potentially valuable framework for individually quantifying different types of uncertainty. This is an approach that could be adopted by geologists in both academia and industry to guide and improve their own MPM studies. However, several key methodological steps regarding the treatment of geochemical and geophysical data are either insufficiently described or ambiguously presented, making it difficult to assess the validity of the uncertainty analysis and the robustness of the conclusions. Clarifying these aspects is essential, as they directly underpin the central claims of the study. Furthermore, the discussion section does not adequately consider relevant prior work, nor does it clearly position this study in relation to existing methods. Although the code used to produce the results is included as supplementary material, the data provided are insufficient to resolve the issues described above or to fully reproduce the authors’ workflow. Many sections of the manuscript are repetitive and contain redundant information. The figures are of variable quality, and the figure captions lack sufficient description. With major revisions to improve the structure of the manuscript, clarify the methodology, and strengthen the discussion of data analysis and interpretation, this manuscript has the potential to make a meaningful contribution to the field.
Specific comments:
While individual sections of the manuscript are well written, several sections could be combined and simplified due to repetitive content. In particular, Section 2.1 (lines 129-139) repeats information from earlier sections (lines 75-90), and much of the methodology section provides general background rather than describing the specific methods the authors used. The actual workflow is described later (Sections 4 and 5, lines 301-404). Sections 4 and 5 are very well written, very descriptive, and should be the information provided in the methodology. The authors are encouraged to streamline the methodology by reducing background material and consolidating relevant content into a single, coherent section.
Many figure captions are vague and insufficiently descriptive. This often leads to confusion about what is being shown in the figures and how they relate to the workflow and results. Figure captions should be expanded to clearly explain what is being shown and their relevance to what is being discussed in the text. Figure 3 should be redrafted: It is not clear in either the text, the figure captions, or the illustration itself what is being shown or how it relates to the implementation of a Bayesian CNN. It is also unclear what is shown in Figure 16: do the mean vs uncertainty scatter plots represent pixel-wise values from the mean and variance maps? These points should be clarified in both the text and figure captions.
The inclusion of data and code in the appendices is appreciated and supports reproducibility. However, the provided datasets (two .tif files) are insufficient. In particular, the file evidence.tif is not georeferenced, and it is unclear what it represents.
Terminology relating to the types of uncertainty is inconsistent throughout the manuscript. Four types of uncertainty are initially defined: conceptual uncertainty, data uncertainty, variable uncertainty, and integration uncertainty (after Zuo et al., 2021; lines 55-60). The stated goal of the study is to quantify data uncertainty and “model” uncertainty (line 71). The authors then subsequently define data uncertainty as “limited sampling, observational errors, and the selection of negative samples, which is primarily expressed during the construction of evidential features, including ore-controlling geological structures, geochemical anomalies, and geophysical anomalies” (lines 72 to 74), which combines aspects of both data uncertainty and variable uncertainty. The authors then define model uncertainty as “the variability in deep learning architecture and parameters used to integrate these evidential features (lines 74-75)”, which corresponds more closely to the earlier definition of integration uncertainty (line 59). This inconsistency persists throughout the manuscript and creates confusion, as it is unclear which types of uncertainty the different methods are intended to address. For example, on line 332, “parameter uncertainty” is used to describe what was previously defined as variable uncertainty. The authors are encouraged to adopt a clear terminology framework and apply it consistently throughout the manuscript.
It is not clearly justified why MPS-DS is applied to geochemical and geophysical data. The authors state that MPS-DS is used to interpolate these datasets (explicitly stated for geochemical data in line 345 and implied for geophysical data in line 354), stating that generating multiple realizations allows them to quantify both data and model uncertainty (lines 364-365). However, other more commonly used methods of interpolation such as kriging also allow uncertainty to be quantified. Therefore, the authors should clarify why their approach is advantageous relative to established interpolation methods, particularly in relation to uncertainty quantification.
There is no reference in the text about origin or source of the aeromagnetic data, and the treatment of this dataset raises several concerns. The authors suggest that uncertainty from arises interpolation due low sampling density and can lead to over-smoothed results (lines 83-86, 194-196, 583-585). However, aeromagnetic data typically consist of dense measurements collected along flight lines, with regular spacing of flight lines across the study area. Yet figure 8b shows equidistant data points with extensive coverage over the entire region, which does not appear consistent with how aeromagnetic data are usually collected, nor does it reflect the “limited sampling density” issue the authors wish to address. Additionally, it is unclear how the vertical first-order derivative (1VD) was applied to the aeromagnetic data (lines 295-296). It should be clarified whether:
Given the regular spacing shown in Figure 8b, the latter seems more likely. In either case, the processing workflow should be clearly described in both the main text and the figure captions.
Similar concerns apply to the geochemical data. The authors state that “geochemical data provide direct evidence of elemental anomalies linked to gold mineralization” (lines 281-282), yet the data are stream sediment samples (line 288). These samples are not in situ and therefore do not provide direct evidence of the magmatic-hydrothermal style of gold mineralisation from the study area (line 273). Rather the data reflects a combination of processes, including erosion, transport, and mixing of multiple source lithologies. As such these samples likely reflect the aggregate geochemical composition of multiple outcrops sourced from multiple locations, not just mineralised outcrop. This distinction is not addressed in the manuscript.
Furthermore, the stream sediment samples appear to be located on a regular grid (Figure 8a), which is atypical for stream sediment sampling. This suggests that the data shown in figure 8a is either a mix of outcrop, regolith, and stream sediment samples collected on a grid, or that they are stream sediment samples that have been interpolated prior to simulation. Again, the processing workflow should be clearly explained in both the text and figure captions.
Considering the issues discussed above (vague and undescriptive figure captions regarding geophysical and geochemical data, ambiguous descriptions of how the data was collected and treated, and the presentation of geochemical samples and geophysical data on apparently regular grids which is atypically for how the data is collected), the current presentation of the data gives the impression to the reader that these datasets were interpolated prior to MPS-DS. If both geochemical and geophysical datasets were interpolated prior to MPS-DS, this undermines the claim that MPS-DS “alleviates smoothing effects at unsampled locations” (lines 195-196), as it would instead be sampling from already smoothed data. If this is not the case, the authors should explicitly clarify what is represented in Figure 8 and provide a clear description in the text of how the geochemical and geophysical datasets were collected and processed prior to simulation to avoid confusion.
While the quantification of uncertainty in MPM (and geological models in general) is an active area of research, the discussion section contains only one reference, which is to support the use of ROC-AUC (line 532). There is no mention or discussion of similar studies, or how this study either compliments or improves upon the methods and findings of previous research. Furthermore, there are two claims made in the discussion and conclusion that are not supported by the data and results presented:
Line 78: The statement that weights-of-evidence uses “binary weights (0 or 1)” is inaccurate. The weightings used in WoE are typically logged likelihood ratios. The authors are referred to Bonham-Carter (1994) for a discussion of the method.
Line 129-130: The statement that mineralization is controlled by “geological, geochemical, and geophysical features” is conceptually misleading. Mineralization is controlled by geological processes; geochemical and geophysical data provide evidence for these processes. The authors are referred to Hronsky & Kreuzer (2019) and McCuaig et al. (2010) for a discussion on the distinction of mineral system processes and the use of evidential maps in mineral targeting.
Lines 204-205: The use of randomly selected background samples as negative data is an appropriate and widely used technique in multiple disciples, not just geoscience. Additional references could be included to strengthen this section, for example Montsion et al. (2024), as well as literature from spatial ecology on pseudo-absence data (e.g. Wisz & Guisan, 2009; Barbet-Massin et al., 2012). There are other methods to generate negative training data for MPM, such as using drill hole assay data depleted in a commodity of interest, or the location of mineral systems that are geologically different from the system of interest that may also be mentioned (see Lindsay et al. 2022).
Line 358: The claim that Figure 9 “clearly demonstrates” that DS reproduces the spatial structure of geochemical and geophysical data is not adequately supported. A cumulative frequency distribution alone does not capture spatial structure, and the same cumulative frequency distribution could be reproduced by random data. More appropriate methods would include capture efficiency curves, Ripley’s K function, or nearest-neighbour G function.
Lines 432-435 state that, for data uncertainty quantification, “a deterministic CNN model is trained on simulated evidence layers to produce 100 prospectivity predictions.” I assume that these correspond to the 100 geological evidential layers produced through Monte Carlo sampling of maximum controlling distances, and the 100 geochemical and geophysical maps produced by MPS-DS. If this is the case, the authors state on lines 323-324, and again on lines 364-365, that multiple realizations of the evidential layers allow both the influence of data uncertainty and model uncertainty to be quantified. If the methods used to create these evidential maps incorporate both data uncertainty and model uncertainty, it is unclear in the text how they are subsequently used to quantify only data uncertainty.
Technical corrections: