Preprints
https://doi.org/10.5194/egusphere-2024-3703
https://doi.org/10.5194/egusphere-2024-3703
07 Jan 2025
 | 07 Jan 2025

Using Monte Carlo conformal prediction to evaluate the uncertainty of deep learning soil spectral models

Yin-Chung Huang, José Padarian, Budiman Minasny, and Alex B. McBratney

Abstract. Uncertainty quantification is a crucial step for the practical application of soil spectral models, particularly in supporting real-world decision making and risk assessment. While machine learning has made remarkable strides in predicting various physiochemical properties of soils using spectroscopy, predictions devoid of quantified uncertainty offer limited utility in guiding critical decisions. However, uncertainty quantification remains underutilised in the reporting of soil spectral models, with existing methods facing significant limitations. These approaches are either computationally demanding, fail to achieve the desired coverage of observed data, or struggle to handle out-of-domain uncertainty effectively. This study introduces the innovative use of Monte Carlo conformal prediction (MC-CP) as a novel approach to quantify uncertainty in the prediction of clay content from mid-infrared spectroscopy. We compared MC-CP with two established methods: (1) Monte Carlo dropout and (2) conformal prediction. Monte Carlo dropout generates prediction intervals for each sample and is effective at addressing larger uncertainties associated with out-of-domain data. However, it falls short in achieving the desired coverage – its 90 % prediction intervals only covered the observed values in 74 % of cases, well below the expected 90 % coverage. Conformal prediction, on the other hand, guarantees ideal coverage of true values but generates unnecessarily wide prediction intervals, making it overly conservative for many practical applications. In contrast, MC-CP successfully combines the strengths of both methods. It achieved a prediction interval coverage probability of 91 %, closely matching the expected 90 % coverage, and far surpassing the performance of Monte Carlo dropout. Additionally, the mean prediction interval width for MC-CP was 9.05 %, narrower than conformal prediction’s 11.11 %, while still effectively addressing the higher uncertainty in out-of-domain samples. By generating accurate prediction intervals alongside point predictions, MC-CP demonstrated its ability to deliver practical and reliable uncertainty quantification. This breakthrough enhances the real-world applicability of soil spectral models and represents a significant advancement in the field of soil science. The success of MC-CP paves the way for its integration into large-scale machine-learning models, such as soil inference systems, further revolutionising decision-making and risk assessment in soil science.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share
Download
Short summary
Uncertainty quantification plays a crucial role in reporting machine learning models in soil...
Share