Using Monte Carlo conformal prediction to evaluate the uncertainty of deep learning soil spectral models
Abstract. Uncertainty quantification is a crucial step for the practical application of soil spectral models, particularly in supporting real-world decision making and risk assessment. While machine learning has made remarkable strides in predicting various physiochemical properties of soils using spectroscopy, predictions devoid of quantified uncertainty offer limited utility in guiding critical decisions. However, uncertainty quantification remains underutilised in the reporting of soil spectral models, with existing methods facing significant limitations. These approaches are either computationally demanding, fail to achieve the desired coverage of observed data, or struggle to handle out-of-domain uncertainty effectively. This study introduces the innovative use of Monte Carlo conformal prediction (MC-CP) as a novel approach to quantify uncertainty in the prediction of clay content from mid-infrared spectroscopy. We compared MC-CP with two established methods: (1) Monte Carlo dropout and (2) conformal prediction. Monte Carlo dropout generates prediction intervals for each sample and is effective at addressing larger uncertainties associated with out-of-domain data. However, it falls short in achieving the desired coverage – its 90 % prediction intervals only covered the observed values in 74 % of cases, well below the expected 90 % coverage. Conformal prediction, on the other hand, guarantees ideal coverage of true values but generates unnecessarily wide prediction intervals, making it overly conservative for many practical applications. In contrast, MC-CP successfully combines the strengths of both methods. It achieved a prediction interval coverage probability of 91 %, closely matching the expected 90 % coverage, and far surpassing the performance of Monte Carlo dropout. Additionally, the mean prediction interval width for MC-CP was 9.05 %, narrower than conformal prediction’s 11.11 %, while still effectively addressing the higher uncertainty in out-of-domain samples. By generating accurate prediction intervals alongside point predictions, MC-CP demonstrated its ability to deliver practical and reliable uncertainty quantification. This breakthrough enhances the real-world applicability of soil spectral models and represents a significant advancement in the field of soil science. The success of MC-CP paves the way for its integration into large-scale machine-learning models, such as soil inference systems, further revolutionising decision-making and risk assessment in soil science.