the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Quantifying spatial uncertainty to improve soil predictions in data-sparse regions
Abstract. Artificial Neural Networks (ANNs) are valuable tools for predicting soil properties using large datasets. However, a common challenge in soil sciences is the uneven distribution of soil samples, which often results from past sampling projects that heavily sample certain areas while leaving similar yet geographically distant regions under-sampled. One potential solution to this problem is to transfer an already trained model to other similar regions. Robust spatial uncertainty quantification is crucial for this purpose, yet often overlooked in current research. We address this issue by using a Bayesian deep learning technique, Laplace Approximations, to quantify spatial uncertainty. This produces a probability measure encoding where the model’s prediction is deemed reliable, and where a lack of data should lead to a high uncertainty. We train such an ANN on a soil landscape dataset from a specific region in southern Germany and then transfer the trained model to another unseen but to some extend similar region, without any further model training. The model effectively generalized alluvial patterns, demonstrating its ability to recognize repetitive features of river systems. However, the model showed a tendency to favor overrepresented soil units, underscoring the importance of balancing training datasets to reduce overconfidence in dominant classes. Quantifying uncertainty in this way allows stakeholders to better identify regions and settings in need of further data collection, enhancing decision-making and prioritizing efforts in data collection. Our approach is computationally lightweight and can be added post-hoc to existing deep learning solutions for soil prediction, thus offering a practical tool to improve soil property predictions in under-sampled areas, as well as optimizing future sampling strategies, ensuring resources are allocated efficiently for maximum data coverage and accuracy.
- Preprint
(5107 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-166', Anonymous Referee #1, 02 Apr 2025
The use of the uncertainty quantification approach through the Last-Layer Laplace Approximation (LLLA) is a novel and much-needed addition to Digital Soil Mapping (DSM). Artificial Neural Networks (ANNs) are often overconfident, but this approach appears to mitigate that risk. The importance of uncertainty quantification in DSM is increasingly recognized. Nowadays, many people use machine learning algorithms without fully considering the risks of overfitting or overconfidence, which highlights the need for accurate uncertainty measurement, whether in interpolation or extrapolation purposes. Overall, I find the general concept of the paper to be quite interesting. However, it could be improved by providing more clarity and adding further details to the methodology section. The results and discussion sections are well written, but the readability would be enhanced if the authors more frequently referenced specific figures. I would recommend this paper for publication in EGU Sphere, pending minor adjustments.
-
AC1: 'Reply on RC1', Kerstin Rau, 30 May 2025
We thank the reviewer for the positive and supportive feedback. We're pleased that you find the concept of using LLLA for uncertainty quantification in DSM both relevant and promising. Your comments align well with our motivation to address overconfidence in ANN-based soil models. In the attached PDF, we briefly respond to your suggestions for improvement.
-
AC1: 'Reply on RC1', Kerstin Rau, 30 May 2025
-
RC2: 'Comment on egusphere-2025-166', Anonymous Referee #2, 12 May 2025
The manuscript by Rau and co-authors addresses an important issue for the use of machine learning (ML) models for digital soil mapping, namely the problem of spatial uncertainty. They propose an approach based on a previously published approach combining neural networks (ANN), Bayesian learning and Laplace approximation. The advantage is that this approach informs on spatial uncertainty. The proposed approach is applied to soil classification in central Baden Württemberg in Germany.
The manuscript is well organized, and the presentation of the methods and results are clear. Yet, many aspects (listed below) remain unclear and even unjustified. They should be clarified and further elaborated before publication. Therefore, I recommend major corrections by incorporating, if possible, the following recommendations.
Main comments
1. Position of the study
In several places in the main text, the authors refer to their previous work published in 2024. It is difficult to see the differences because in that work the authors also address the problem of uncertainty with ANN by combining it with techniques similar to those described in this new study.
Could the authors elaborate more on the differences with this study and on the originality of this new work?
2. Definition of uncertainty
My second comment may be related to my first one. I am really confused about the type of uncertainty that the authors aim to tackle:
- The authors underline that the proposed method addresses the problem of spatial uncertainty,
- At line 155, the authors speak about model parameters and structural uncertainty similarly as the problem of tuning of machine learning models (e.g. Probst et al., 2019).
- The title suggests more a problem of data scarcity.
- The application case with two separate regions seems more related to a problem of transferability (e.g. Ludwig et al., 2023).Could the authors clarify the notion of uncertainty that they intend to address? The introduction should be expanded on this aspect, and a discussion of the wide range of uncertainties is also welcome.
References:
Ludwig, Marvin, et al. "Assessing and improving the transferability of current global spatial prediction models." Global Ecology and Biogeography 32.3 (2023): 356-368.
Probst, P., Boulesteix, A. L., & Bischl, B. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(53), 1-32.3. Protocol to address spatial uncertainty
If the main objective is to address the problem of spatial uncertainty, I would encourage the authors to carry out more experiments by varying the key factors of the problem: number of training samples, level of similarity between training and test regions, etc. Could the authors propose and carry out a more extensive series of experiments in order to demonstrate the robustness and effectiveness of their approach in a larger number of situations?
Could the authors propose and carry out a more extensive series of experiments in order to demonstrate the robustness and effectiveness of their approach in a larger number of situations?
4. Comparison to existing methods
From what I understand of the method proposed by the authors, the ANN is equipped with a final layer for predicting the probability of classification. This is a feature shared by many other techniques, i.e. logistic regression, decision trees, random forest, xgboost, neural networks with Monte Carlo dropout, neural networks combined with a deep set, generative models, and so on.
Could the authors elaborate more on the state of the art and discuss the benefits of their method compared to alternative methods?Minor comments:
Line 55: the authors underline that the ANNs make predictions through complex internal processes that are difficult to understand and interpret. Here references to recent studies improving the interpretability of such methods for digital soil mapping should be added.
Suggested references
Padarian, J., McBratney, A. B., and Minasny, B.: Game theory interpretation of digital soil mapping convolutional neural networks, Soil, 6,389–397, 2020.
Wadoux, A. M. J.-C. and Molnar, C.: Beyond prediction: methods for interpreting complex models of soil variation, Geoderma, 422, 115 953, 2022.The results in Fig. 5 are very convincing. Despite the efficiency of LLMA, a long tail in the probability distribution still remains. I wonder whether this could be further alleviated with an extra calibration of the probability. See for instance Niculescu-Mizil & Caruana (2005).
Suggested reference
Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning (pp. 625-632).Citation: https://doi.org/10.5194/egusphere-2025-166-RC2 -
AC2: 'Reply on RC2', Kerstin Rau, 30 May 2025
We thank the reviewer for the constructive and detailed feedback on our manuscript. We appreciate the recognition of the importance of addressing spatial uncertainty in digital soil mapping and the clear acknowledgment that the manuscript is well organized and clearly presented. We agree that several aspects required further elaboration and clarification, and we have revised the manuscript accordingly. In the attached pdf file, we respond point-by-point to each of the major and minor comments. For each comment, we detail how we have addressed it in the revised version of the manuscript. Where changes were made, we indicate the relevant sections and figures.
-
AC2: 'Reply on RC2', Kerstin Rau, 30 May 2025
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
251 | 106 | 18 | 375 | 15 | 20 |
- HTML: 251
- PDF: 106
- XML: 18
- Total: 375
- BibTeX: 15
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1