Evaluating uncertainty and predictive performance of probabilistic models devised for grade estimation in a porphyry copper deposit
Abstract. Probabilistic models are used extensively in geoscience to describe random processes as they allow prediction uncertainties to be quantified in a principled way. These probabilistic predictions are valued in a variety of contexts ranging from geological and geotechnical investigations to understanding subsurface hydrostratigraphic properties and mineral distribution. However, there are no established protocols for evaluating the uncertainty and predictive performance of univariate probabilistic models, and few examples that researchers and practitioners can lean on. This paper aims to bridge this gap by developing a systematic approach that targets three objectives. First, geostatistics are used to check if the probabilistic predictions are reasonable given validation measurements. Second, image-based views of the statistics help facilitate large-scale simultaneous comparisons for a multitude of models across space and time, for instance, spanning multiple regions and inference periods. Third, variogram ratios are used to objectively measure the spatial fidelity of models. In this study, the model candidates include ordinary kriging and Gaussian Process, with and without sequential or correlated random field simulation. A key outcome are recommendations that encompass the FLAGSHIP statistics which examine the fidelity, likelihood, accuracy, goodness, synchronicity, histogram, interval tightness and precision of the model predictive distributions. These statistics are standardised, interpretable and amenable to significance testing. The proposed methods are demonstrated using extensive data from a real copper mine in a grade estimation task, and accompanied by an open-source implementation. The experiments are designed to emphasise data diversity and convey insights, such as the increased difficulty of future-bench prediction (extrapolation) relative to in-situ regression (interpolation). This work presents a holistic approach that enables modellers to evaluate the merits of competing models and employ models with greater confidence by assessing the robustness and validity of probabilistic predictions under challenging conditions.