Quantifying spatial uncertainty to improve soil predictions in data-sparse regions

Rau, Kerstin; Eggensperger, Katharina; Schneider, Frank; Blaschek, Michael; Hennig, Philipp; Scholten, Thomas

doi:10.5194/egusphere-2025-166

Preprints

https://doi.org/10.5194/egusphere-2025-166

Preprints

17 Mar 2025

| 17 Mar 2025

Quantifying spatial uncertainty to improve soil predictions in data-sparse regions

Kerstin Rau, Katharina Eggensperger, Frank Schneider, Michael Blaschek, Philipp Hennig, and Thomas Scholten

Abstract. Artificial Neural Networks (ANNs) are valuable tools for predicting soil properties using large datasets. However, a common challenge in soil sciences is the uneven distribution of soil samples, which often results from past sampling projects that heavily sample certain areas while leaving similar yet geographically distant regions under-sampled. One potential solution to this problem is to transfer an already trained model to other similar regions. Robust spatial uncertainty quantification is crucial for this purpose, yet often overlooked in current research. We address this issue by using a Bayesian deep learning technique, Laplace Approximations, to quantify spatial uncertainty. This produces a probability measure encoding where the model’s prediction is deemed reliable, and where a lack of data should lead to a high uncertainty. We train such an ANN on a soil landscape dataset from a specific region in southern Germany and then transfer the trained model to another unseen but to some extend similar region, without any further model training. The model effectively generalized alluvial patterns, demonstrating its ability to recognize repetitive features of river systems. However, the model showed a tendency to favor overrepresented soil units, underscoring the importance of balancing training datasets to reduce overconfidence in dominant classes. Quantifying uncertainty in this way allows stakeholders to better identify regions and settings in need of further data collection, enhancing decision-making and prioritizing efforts in data collection. Our approach is computationally lightweight and can be added post-hoc to existing deep learning solutions for soil prediction, thus offering a practical tool to improve soil property predictions in under-sampled areas, as well as optimizing future sampling strategies, ensuring resources are allocated efficiently for maximum data coverage and accuracy.

Received: 14 Jan 2025 – Discussion started: 17 Mar 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5107 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (5107 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

15 Oct 2025

Quantifying spatial uncertainty to improve soil predictions in data-sparse regions

Kerstin Rau, Katharina Eggensperger, Frank Schneider, Michael Blaschek, Philipp Hennig, and Thomas Scholten

SOIL, 11, 833–847, https://doi.org/10.5194/soil-11-833-2025,https://doi.org/10.5194/soil-11-833-2025, 2025

Short summary

Kerstin Rau, Katharina Eggensperger, Frank Schneider, Michael Blaschek, Philipp Hennig, and Thomas Scholten

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-166', Anonymous Referee #1, 02 Apr 2025

The use of the uncertainty quantification approach through the Last-Layer Laplace Approximation (LLLA) is a novel and much-needed addition to Digital Soil Mapping (DSM). Artificial Neural Networks (ANNs) are often overconfident, but this approach appears to mitigate that risk. The importance of uncertainty quantification in DSM is increasingly recognized. Nowadays, many people use machine learning algorithms without fully considering the risks of overfitting or overconfidence, which highlights the need for accurate uncertainty measurement, whether in interpolation or extrapolation purposes. Overall, I find the general concept of the paper to be quite interesting. However, it could be improved by providing more clarity and adding further details to the methodology section. The results and discussion sections are well written, but the readability would be enhanced if the authors more frequently referenced specific figures. I would recommend this paper for publication in EGU Sphere, pending minor adjustments.

Citation: https://doi.org/10.5194/egusphere-2025-166-RC1
- AC1: 'Reply on RC1', Kerstin Rau, 30 May 2025
  
  We thank the reviewer for the positive and supportive feedback. We're pleased that you find the concept of using LLLA for uncertainty quantification in DSM both relevant and promising. Your comments align well with our motivation to address overconfidence in ANN-based soil models. In the attached PDF, we briefly respond to your suggestions for improvement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-166-AC1
RC2:
'Comment on egusphere-2025-166', Anonymous Referee #2, 12 May 2025

The manuscript by Rau and co-authors addresses an important issue for the use of machine learning (ML) models for digital soil mapping, namely the problem of spatial uncertainty. They propose an approach based on a previously published approach combining neural networks (ANN), Bayesian learning and Laplace approximation. The advantage is that this approach informs on spatial uncertainty. The proposed approach is applied to soil classification in central Baden Württemberg in Germany.
The manuscript is well organized, and the presentation of the methods and results are clear. Yet, many aspects (listed below) remain unclear and even unjustified. They should be clarified and further elaborated before publication. Therefore, I recommend major corrections by incorporating, if possible, the following recommendations.
Main comments
1. Position of the study
In several places in the main text, the authors refer to their previous work published in 2024. It is difficult to see the differences because in that work the authors also address the problem of uncertainty with ANN by combining it with techniques similar to those described in this new study.
Could the authors elaborate more on the differences with this study and on the originality of this new work?
2. Definition of uncertainty
My second comment may be related to my first one. I am really confused about the type of uncertainty that the authors aim to tackle:

- The authors underline that the proposed method addresses the problem of spatial uncertainty,

- At line 155, the authors speak about model parameters and structural uncertainty similarly as the problem of tuning of machine learning models (e.g. Probst et al., 2019).

- The title suggests more a problem of data scarcity.

- The application case with two separate regions seems more related to a problem of transferability (e.g. Ludwig et al., 2023).
Could the authors clarify the notion of uncertainty that they intend to address? The introduction should be expanded on this aspect, and a discussion of the wide range of uncertainties is also welcome.
References:

Ludwig, Marvin, et al. "Assessing and improving the transferability of current global spatial prediction models." Global Ecology and Biogeography 32.3 (2023): 356-368.

Probst, P., Boulesteix, A. L., & Bischl, B. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(53), 1-32.
3. Protocol to address spatial uncertainty
If the main objective is to address the problem of spatial uncertainty, I would encourage the authors to carry out more experiments by varying the key factors of the problem: number of training samples, level of similarity between training and test regions, etc. Could the authors propose and carry out a more extensive series of experiments in order to demonstrate the robustness and effectiveness of their approach in a larger number of situations?
Could the authors propose and carry out a more extensive series of experiments in order to demonstrate the robustness and effectiveness of their approach in a larger number of situations?
4. Comparison to existing methods
From what I understand of the method proposed by the authors, the ANN is equipped with a final layer for predicting the probability of classification. This is a feature shared by many other techniques, i.e. logistic regression, decision trees, random forest, xgboost, neural networks with Monte Carlo dropout, neural networks combined with a deep set, generative models, and so on.

Could the authors elaborate more on the state of the art and discuss the benefits of their method compared to alternative methods?
Minor comments:
Line 55: the authors underline that the ANNs make predictions through complex internal processes that are difficult to understand and interpret. Here references to recent studies improving the interpretability of such methods for digital soil mapping should be added.
Suggested references

Padarian, J., McBratney, A. B., and Minasny, B.: Game theory interpretation of digital soil mapping convolutional neural networks, Soil, 6,389–397, 2020.

Wadoux, A. M. J.-C. and Molnar, C.: Beyond prediction: methods for interpreting complex models of soil variation, Geoderma, 422, 115 953, 2022.

The results in Fig. 5 are very convincing. Despite the efficiency of LLMA, a long tail in the probability distribution still remains. I wonder whether this could be further alleviated with an extra calibration of the probability. See for instance Niculescu-Mizil & Caruana (2005).
Suggested reference

Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning (pp. 625-632).

Citation: https://doi.org/10.5194/egusphere-2025-166-RC2
- AC2: 'Reply on RC2', Kerstin Rau, 30 May 2025
  
  We thank the reviewer for the constructive and detailed feedback on our manuscript. We appreciate the recognition of the importance of addressing spatial uncertainty in digital soil mapping and the clear acknowledgment that the manuscript is well organized and clearly presented. We agree that several aspects required further elaboration and clarification, and we have revised the manuscript accordingly. In the attached pdf file, we respond point-by-point to each of the major and minor comments. For each comment, we detail how we have addressed it in the revised version of the manuscript. Where changes were made, we indicate the relevant sections and figures.
  
  Citation: https://doi.org/10.5194/egusphere-2025-166-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-166', Anonymous Referee #1, 02 Apr 2025

The use of the uncertainty quantification approach through the Last-Layer Laplace Approximation (LLLA) is a novel and much-needed addition to Digital Soil Mapping (DSM). Artificial Neural Networks (ANNs) are often overconfident, but this approach appears to mitigate that risk. The importance of uncertainty quantification in DSM is increasingly recognized. Nowadays, many people use machine learning algorithms without fully considering the risks of overfitting or overconfidence, which highlights the need for accurate uncertainty measurement, whether in interpolation or extrapolation purposes. Overall, I find the general concept of the paper to be quite interesting. However, it could be improved by providing more clarity and adding further details to the methodology section. The results and discussion sections are well written, but the readability would be enhanced if the authors more frequently referenced specific figures. I would recommend this paper for publication in EGU Sphere, pending minor adjustments.

Citation: https://doi.org/10.5194/egusphere-2025-166-RC1
- AC1: 'Reply on RC1', Kerstin Rau, 30 May 2025
  
  We thank the reviewer for the positive and supportive feedback. We're pleased that you find the concept of using LLLA for uncertainty quantification in DSM both relevant and promising. Your comments align well with our motivation to address overconfidence in ANN-based soil models. In the attached PDF, we briefly respond to your suggestions for improvement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-166-AC1
RC2:
'Comment on egusphere-2025-166', Anonymous Referee #2, 12 May 2025

The manuscript by Rau and co-authors addresses an important issue for the use of machine learning (ML) models for digital soil mapping, namely the problem of spatial uncertainty. They propose an approach based on a previously published approach combining neural networks (ANN), Bayesian learning and Laplace approximation. The advantage is that this approach informs on spatial uncertainty. The proposed approach is applied to soil classification in central Baden Württemberg in Germany.
The manuscript is well organized, and the presentation of the methods and results are clear. Yet, many aspects (listed below) remain unclear and even unjustified. They should be clarified and further elaborated before publication. Therefore, I recommend major corrections by incorporating, if possible, the following recommendations.
Main comments
1. Position of the study
In several places in the main text, the authors refer to their previous work published in 2024. It is difficult to see the differences because in that work the authors also address the problem of uncertainty with ANN by combining it with techniques similar to those described in this new study.
Could the authors elaborate more on the differences with this study and on the originality of this new work?
2. Definition of uncertainty
My second comment may be related to my first one. I am really confused about the type of uncertainty that the authors aim to tackle:

- The authors underline that the proposed method addresses the problem of spatial uncertainty,

- At line 155, the authors speak about model parameters and structural uncertainty similarly as the problem of tuning of machine learning models (e.g. Probst et al., 2019).

- The title suggests more a problem of data scarcity.

- The application case with two separate regions seems more related to a problem of transferability (e.g. Ludwig et al., 2023).
Could the authors clarify the notion of uncertainty that they intend to address? The introduction should be expanded on this aspect, and a discussion of the wide range of uncertainties is also welcome.
References:

Ludwig, Marvin, et al. "Assessing and improving the transferability of current global spatial prediction models." Global Ecology and Biogeography 32.3 (2023): 356-368.

Probst, P., Boulesteix, A. L., & Bischl, B. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(53), 1-32.
3. Protocol to address spatial uncertainty
If the main objective is to address the problem of spatial uncertainty, I would encourage the authors to carry out more experiments by varying the key factors of the problem: number of training samples, level of similarity between training and test regions, etc. Could the authors propose and carry out a more extensive series of experiments in order to demonstrate the robustness and effectiveness of their approach in a larger number of situations?
Could the authors propose and carry out a more extensive series of experiments in order to demonstrate the robustness and effectiveness of their approach in a larger number of situations?
4. Comparison to existing methods
From what I understand of the method proposed by the authors, the ANN is equipped with a final layer for predicting the probability of classification. This is a feature shared by many other techniques, i.e. logistic regression, decision trees, random forest, xgboost, neural networks with Monte Carlo dropout, neural networks combined with a deep set, generative models, and so on.

Could the authors elaborate more on the state of the art and discuss the benefits of their method compared to alternative methods?
Minor comments:
Line 55: the authors underline that the ANNs make predictions through complex internal processes that are difficult to understand and interpret. Here references to recent studies improving the interpretability of such methods for digital soil mapping should be added.
Suggested references

Padarian, J., McBratney, A. B., and Minasny, B.: Game theory interpretation of digital soil mapping convolutional neural networks, Soil, 6,389–397, 2020.

Wadoux, A. M. J.-C. and Molnar, C.: Beyond prediction: methods for interpreting complex models of soil variation, Geoderma, 422, 115 953, 2022.

The results in Fig. 5 are very convincing. Despite the efficiency of LLMA, a long tail in the probability distribution still remains. I wonder whether this could be further alleviated with an extra calibration of the probability. See for instance Niculescu-Mizil & Caruana (2005).
Suggested reference

Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning (pp. 625-632).

Citation: https://doi.org/10.5194/egusphere-2025-166-RC2
- AC2: 'Reply on RC2', Kerstin Rau, 30 May 2025
  
  We thank the reviewer for the constructive and detailed feedback on our manuscript. We appreciate the recognition of the importance of addressing spatial uncertainty in digital soil mapping and the clear acknowledgment that the manuscript is well organized and clearly presented. We agree that several aspects required further elaboration and clarification, and we have revised the manuscript accordingly. In the attached pdf file, we respond point-by-point to each of the major and minor comments. For each comment, we detail how we have addressed it in the revised version of the manuscript. Where changes were made, we indicate the relevant sections and figures.
  
  Citation: https://doi.org/10.5194/egusphere-2025-166-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (10 Jun 2025) by Nicolas P.A. Saby

AR by Kerstin Rau on behalf of the Authors (11 Jun 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (30 Jun 2025) by Nicolas P.A. Saby

ED: Publish as is (17 Jul 2025) by Raphael Viscarra Rossel (Executive editor)

AR by Kerstin Rau on behalf of the Authors (19 Jul 2025) Author's response Manuscript

Journal article(s) based on this preprint

15 Oct 2025

Quantifying spatial uncertainty to improve soil predictions in data-sparse regions

Kerstin Rau, Katharina Eggensperger, Frank Schneider, Michael Blaschek, Philipp Hennig, and Thomas Scholten

SOIL, 11, 833–847, https://doi.org/10.5194/soil-11-833-2025,https://doi.org/10.5194/soil-11-833-2025, 2025

Short summary

Kerstin Rau, Katharina Eggensperger, Frank Schneider, Michael Blaschek, Philipp Hennig, and Thomas Scholten

Viewed

Total article views: 3,458 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,845	492	121	3,458	106	146

HTML: 2,845
PDF: 492
XML: 121
Total: 3,458
BibTeX: 106
EndNote: 146

Views and downloads (calculated since 17 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	116	44	6	166
Apr 2025	118	54	8	180
May 2025	110	68	14	192
Jun 2025	136	36	8	180
Jul 2025	86	26	2	114
Aug 2025	238	14	2	254
Sep 2025	988	24	20	1,032
Oct 2025	74	14	2	90
Nov 2025	166	28	8	202
Dec 2025	210	28	10	248
Jan 2026	116	22	12	150
Feb 2026	134	32	10	176
Mar 2026	198	58	12	268
Apr 2026	81	15	2	98
May 2026	57	19	4	80
Jun 2026	14	4	0	18
Jul 2026	3	6	1	10

Cumulative views and downloads (calculated since 17 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	116	44	6	166
Apr 2025	118	54	8	180
May 2025	110	68	14	192
Jun 2025	136	36	8	180
Jul 2025	86	26	2	114
Aug 2025	238	14	2	254
Sep 2025	988	24	20	1,032
Oct 2025	74	14	2	90
Nov 2025	166	28	8	202
Dec 2025	210	28	10	248
Jan 2026	116	22	12	150
Feb 2026	134	32	10	176
Mar 2026	198	58	12	268
Apr 2026	81	15	2	98
May 2026	57	19	4	80
Jun 2026	14	4	0	18
Jul 2026	3	6	1	10

Viewed (geographical distribution)

Total article views: 3,456 (including HTML, PDF, and XML) Thereof 3,456 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 20 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (5107 KB)
Metadata XML

Short summary

Uneven data collection can make it hard to predict soil properties accurately in new areas. We developed a method to show where predictions are reliable and where more data is needed. By training a model in one region and applying it to another, we found that our approach effectively recognized river patterns but was biased toward overrepresented soil types. This tool can guide smarter data collection, helping improve predictions and make better use of resources for soil management.


Total:	0
HTML:	0
PDF:	0
XML:	0