Preprints
https://doi.org/10.5194/egusphere-2024-323
https://doi.org/10.5194/egusphere-2024-323
21 Feb 2024
 | 21 Feb 2024

Insights into the prediction uncertainty of machine-learning-based digital soil mapping through a local attribution approach

Jeremy Rohmer, Stephane Belbeze, and Dominique Guyonnet

Abstract. Machine learning (ML) models have become key ingredients for digital soil mapping. To improve the interpretability of their prediction, diagnostic tools have been developed like the widely used local attribution approach known as ‘SHAP’ (SHapley Additive exPlanation). However, the analysis of the prediction is only one part of the problem and there is an interest in getting deeper insights into the drivers of the prediction uncertainty as well, i.e. to explain why the ML model is confident, given the set of chosen covariates’ values (in addition to why the ML model delivered some particular results). We show in this study how to apply SHAP to the local prediction uncertainty estimates for a case of urban soil pollution, namely the presence of petroleum hydrocarbon in soil at Toulouse (France), which poses a health risk via vapour intrusion into buildings, direct soil ingestion or groundwater contamination. To alleviate the computational burden posed by the multiple covariates (typically >10) and by the large number of grid points on the map (typically over several 10,000s), we propose to rely on an approach that combines screening analysis (to filter out non-influential covariates) and grouping of dependent covariates by means of generic kernel-based dependence measures. Our results show evidence that the drivers of the prediction best estimate are not necessarily the ones that drive the confidence in these predictions, hence justifying that decisions regarding data collection and covariates’ characterisation as well as communication of the results should be made accordingly.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Journal article(s) based on this preprint

30 Sep 2024
Insights into the prediction uncertainty of machine-learning-based digital soil mapping through a local attribution approach
Jeremy Rohmer, Stephane Belbeze, and Dominique Guyonnet
SOIL, 10, 679–697, https://doi.org/10.5194/soil-10-679-2024,https://doi.org/10.5194/soil-10-679-2024, 2024
Short summary
Jeremy Rohmer, Stephane Belbeze, and Dominique Guyonnet

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2024-323', Anonymous Referee #1, 19 Mar 2024
    • AC1: 'Reply on RC1', Jeremy Rohmer, 29 Apr 2024
  • RC2: 'Comment on egusphere-2024-323', Anonymous Referee #2, 11 Apr 2024
    • AC2: 'Reply on RC2', Jeremy Rohmer, 29 Apr 2024

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2024-323', Anonymous Referee #1, 19 Mar 2024
    • AC1: 'Reply on RC1', Jeremy Rohmer, 29 Apr 2024
  • RC2: 'Comment on egusphere-2024-323', Anonymous Referee #2, 11 Apr 2024
    • AC2: 'Reply on RC2', Jeremy Rohmer, 29 Apr 2024

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
ED: Reconsider after major revisions (14 May 2024) by Alexandre Wadoux
AR by Jeremy Rohmer on behalf of the Authors (25 Jun 2024)  Author's response   Author's tracked changes   Manuscript 
ED: Referee Nomination & Report Request started (01 Jul 2024) by Alexandre Wadoux
RR by Anonymous Referee #1 (16 Jul 2024)
ED: Revision (19 Jul 2024) by Alexandre Wadoux
AR by Jeremy Rohmer on behalf of the Authors (12 Aug 2024)  Author's response   Author's tracked changes   Manuscript 
ED: Publish as is (13 Aug 2024) by Alexandre Wadoux
ED: Publish as is (13 Aug 2024) by Rémi Cardinael (Executive editor)
AR by Jeremy Rohmer on behalf of the Authors (20 Aug 2024)  Manuscript 

Journal article(s) based on this preprint

30 Sep 2024
Insights into the prediction uncertainty of machine-learning-based digital soil mapping through a local attribution approach
Jeremy Rohmer, Stephane Belbeze, and Dominique Guyonnet
SOIL, 10, 679–697, https://doi.org/10.5194/soil-10-679-2024,https://doi.org/10.5194/soil-10-679-2024, 2024
Short summary
Jeremy Rohmer, Stephane Belbeze, and Dominique Guyonnet

Data sets

Data to run the synthetic test case Hannah Meyer https://github.com/HannaMeyer/CAST/tree/master/inst/extdata

Model code and software

R markdown - synthetic test case Jeremy Rohmer https://github.com/anrhouses/groupSHAP-uncertainty

Jeremy Rohmer, Stephane Belbeze, and Dominique Guyonnet

Viewed

Total article views: 465 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
357 82 26 465 18 16
  • HTML: 357
  • PDF: 82
  • XML: 26
  • Total: 465
  • BibTeX: 18
  • EndNote: 16
Views and downloads (calculated since 21 Feb 2024)
Cumulative views and downloads (calculated since 21 Feb 2024)

Viewed (geographical distribution)

Total article views: 479 (including HTML, PDF, and XML) Thereof 479 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 30 Sep 2024
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
Machine learning (ML) models have become key ingredients for digital soil mapping. To explain why the ML model is confident, we apply a popular method from the field of explainable artificial intelligence, i.e. based on the Shapley values, to the uncertainty prediction of hydrocarbon pollutants on an urban soil. To alleviate the implementation difficulties (number of factors, complex relationships between the factors, high resolution maps), a simple-but-efficient grouping approach is tested.