Insights into the prediction uncertainty of machine-learning-based digital soil mapping through a local attribution approach

Rohmer, Jeremy; Belbeze, Stephane; Guyonnet, Dominique

doi:https://doi.org/10.5194/egusphere-2024-323

Jeremy Rohmer, Stephane Belbeze, and Dominique Guyonnet

Abstract. Machine learning (ML) models have become key ingredients for digital soil mapping. To improve the interpretability of their prediction, diagnostic tools have been developed like the widely used local attribution approach known as ‘SHAP’ (SHapley Additive exPlanation). However, the analysis of the prediction is only one part of the problem and there is an interest in getting deeper insights into the drivers of the prediction uncertainty as well, i.e. to explain why the ML model is confident, given the set of chosen covariates’ values (in addition to why the ML model delivered some particular results). We show in this study how to apply SHAP to the local prediction uncertainty estimates for a case of urban soil pollution, namely the presence of petroleum hydrocarbon in soil at Toulouse (France), which poses a health risk via vapour intrusion into buildings, direct soil ingestion or groundwater contamination. To alleviate the computational burden posed by the multiple covariates (typically >10) and by the large number of grid points on the map (typically over several 10,000s), we propose to rely on an approach that combines screening analysis (to filter out non-influential covariates) and grouping of dependent covariates by means of generic kernel-based dependence measures. Our results show evidence that the drivers of the prediction best estimate are not necessarily the ones that drive the confidence in these predictions, hence justifying that decisions regarding data collection and covariates’ characterisation as well as communication of the results should be made accordingly.

Received: 02 Feb 2024 – Discussion started: 21 Feb 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 2344 KB)

Download & links

Journal article(s) based on this preprint

30 Sep 2024

Insights into the prediction uncertainty of machine-learning-based digital soil mapping through a local attribution approach

Jeremy Rohmer, Stephane Belbeze, and Dominique Guyonnet

SOIL, 10, 679–697, https://doi.org/10.5194/soil-10-679-2024,https://doi.org/10.5194/soil-10-679-2024, 2024

Short summary

Country	#	Views	%
United States of America	1	168	35
China	2	61	12
France	3	49	10
Germany	4	33	6
Japan	5	16	3


Total:	0
HTML:	0
PDF:	0
XML:	0

Insights into the prediction uncertainty of machine-learning-based digital soil mapping through a local attribution approach

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Data sets

Model code and software

Viewed

Viewed (geographical distribution)