the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Operationalizing fine-scale soil property mapping with spectroscopy and spatial machine learning
Abstract. One challenge in soil mapping is the transfer of new techniques and methods into operational practice, integrating them with traditional field surveys, reducing costs, and increasing the quality of the soil maps. The latter is paramount, as they form the basis for many thematic maps. As part of a novel approach to soil mapping, we integrate various technologies and pedometric methodologies to create soil property maps for soil surveyors, which they can utilize as a reference before beginning their pedological fieldwork. This gives the surveyors considerably more detailed and accurate prior information, reducing the subjectivity inherent in soil mapping. Our approach comprises a novel soil sampling design that effectively captures spatial and feature spaces, mid-infrared spectroscopy, and spatial machine learning based on a comprehensive set of covariates generated through various feature engineering approaches. We employ multi-scale terrain attributes, temporal multi-scale remote sensing, and Euclidean distance fields to account for environmental correlation, spatial non-stationarity, and spatial autocorrelation in machine learning. Methods to reduce the uncertainties inherent to the spectral and spatial data were integrated. The new sampling design is based on a geographical stratification and focuses on the local soil variability. The method identifies spatially local minima and maxima of the feature space, which is fundamental to soil surveys at the specified scale. The k-means and Kennard-Stone algorithms were applied in a sequential manner within each cell of a hexagonal grid overlaying the study area. This approach permits a systematic sub-sampling from each cell to analyze predictive accuracy for varying sampling densities. We tested one to three samples per hectare. Our findings indicate that a sample size of two samples per hectare was sufficient for accurately mapping soil properties across 300 hectares. This markedly reduces the financial burden associated with subsequent projects, given the significant reduction in the time and resources required for surveying. The spectroscopic and spatial models were unbiased and yielded average R2 values of 0.91 and 0.68–0.86, depending on mapping with or without pedotransfer models. Our study highlights the value of integrating robust pedometric technologies in soil surveys.
- Preprint
(22972 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2810', Anonymous Referee #1, 23 Oct 2024
Review EGU Sphere
Behrens et al., 2024---------------------------
The direction of the presented work is relevant, and the connection of the different methodological approaches of pedometrics (sampling design, spectroscopy, mapping) is also very important for a practical application. The actual study involved a big amount of sampling and laboratory + spectroscopy analysis.
This study claims to present "a methodology for making operational the creation of soil property maps". However, it mostly presents the application of a complex sampling approach that partly miss the justification of why it needs to be done at multiple stages. It is not evident why the sampling design need to be at such a high level of complexity. Given the sampling density, maybe even a simple random sample will result to the same accuracy for the maps. For example, it miss the integration of the prediction errors of spectroscopy into the next mapping process (a requirement for the claimed framework).
Therefore, the manuscript should be re-structured and the introduction need to be expanded by the relevant context mainly targeting the actual presented work. It should maybe just focus on the sampling strategy.
Moreover, there are probably major problems with the validation strategy (not fully documented, so unclear to know). Cross-validation, how it was likely applied, gives far too optimistics results, and therefore the results are hard to interpret.
In some parts of the manuscript, relevant informations are missing.
Please accept the subsequent detailed comments to support my claims above:
Abstract:
-----------------
L3: Authors claim a novel approach to soil mapping. Sampling design seems to be new, but the rest are established methods.L6: Subjectivity of soil mapping, rather "field soil description"? Soil mapping by pedometrics methods as proposed by the current article should already have reduced subjectivity.
L24: Soil maps are available on coarse scale (e.g. european or global maps, national maps?), but their information content and/or resolution/scale is not sufficient.
Introduction
-----------------
The introduction is very poorly structured, i.e. every paragraph provides a new objective of the present study that was not well introduced in the first section of each paragraph. Moreover, the line of thought is not well supported by existing research on the subject nor well argued for. Some examples:First paragraph: the authors detail parts of the mental model used in conventional survey and how it can be supported by soil property maps. It remains unclear when it comes to the role of the mental model in todays digital soil mapping approaches. The process of Gestalt shift and how it can be supported by the proposed method does not become completely clear from the description. In L45 reference scales are mentioned, however, the study then presents digital soil mapping approach having a pixel final resolution (unclear, likely 2m as the predictors were prepared at 2m). It is not introduced which assumptions are often made regarding scale and point density in conventional surveys (see e.g. Legros, La cartographie de sols) which might be relevant as the study compares different point densities.
L36-37: It stays unclear that or why end-users need a higher density of analytical data. Also, that it improves the quality of thematic maps. No argument or citation/evidence is provided.
L36: What is exactly meant with thematic maps? Soil ecosystem services?
L46: Do you maybe mean soil wetness/waterlogging instead of soil moisture? The link with soil quality is likely weaker with the latter (depending on definition).
L50: What is with the 3 remaining observations, are they not recoreded by a surveyor?
L52: This study is not the first to investigate the relationship between sampling density and predictive accuracy, however, no link to findings of other studies is drawn at all (see e.g. Kempen et al., 2014).
L55: Brus, 2022, references a hole text book. It remains unclear if this reference is supporting the whole sentence or just the mentioned methods. Please give at least a chapter or section to make it clear.
L55ff: Either the reasoning needs to be more detailed, providing evidences for the statements in the text, or it needs to be supported by the literature. Does the overrepresentation of the small areas depends from the configuration of the study area or from the variables chosen for the sampling design?
L59: k-means and Kennard Stones are not sampling designs, they can be used to create them.
L62: .. not relevant in most cases. ... To address both of these issues.. : I have difficulties to identify two issues, please clarify. Moreover, if the issues are not relevant, why are they address at all?
Overall introduction: strong focus on sampling design, however, the study shows many other aspects as well. As clear, concise objectives are missing, it remains unclear what the authors truly want to present. Or, if the reader is just dumped with a large number of used methods combined around a sampling design.
Methods
--------------Section 2.1: It remains unclear what soil types are to be expected in the study area. There is no information on climate, geology or geomorphological processes. For the transferability, i.e. the limitations to a specific study area, such background information is very relevant. It remains therefore unclear, if there is geological variation within the study area that has been neglected. Moreover, sampling has been done by fixed depth intervals. Neglecting genetic soil horizons may be done, but not for soil types that have small horizons with abrubt or large changes in properties (e.g. diagostic horizons in podzols).
L88: What are exclusion areas? Will those be mapped, but not sampled?
L95: What were the five different settings?
L115: It seems very strange that bare soil reflectance can be extrapolated to permanent grasslands. But, this seems published work from a co-author.
L117: The resolution of the landsat derived data was changed by "spatial modelling with machine learning". Please add details how this was done. It does not seem a default method.
L120: How was the selection of the predictors made? "carefully" does not inform about the approach. How was the de-correlation approached?
L128: Is this a rank transformation? If yes, maybe mention it to make it easier to understand for the readers.
Section 2.3.1: Using hexagons has the mentioned advantages, but, in the given study area, the area to be sampled is irregular as there are streets removed from the hexagons. The reduction of sampling points seems somewhat arbitrary. Would a clustering by spatial coordiantes proposed in Brus, 2022, not yield better distribution of spatial sub-areas?
L67: It remains unclear how n and p where determined and what would be the rationale behind it.
L190: How were alternative areas defined, size? Why not alterantive sampling points?
Section 2.3.2: Where these samples taken from the original sample set of 812 or were these another new sample of additinal 45?
L212: Using grinded soil samples for the subsequent analysis is very unusual, what is the justification for that? And maybe indicate how fine grain was the grinding done?
L214: texture by sedimentation, do you mean the pipette method? Please give a reference, also for SOC and carbonates.
L218: Were the replicates removed based on Euclidian distance between the replicates or distances computed from within one spectral response?
Section 2.5.1: It remains unclear what hyperparamteres were tuned and how (what candidate values and what procedure to select them, likely cross-validation).
L232: Most likely, the model performance results are too optimistic. According to this section 5 times 10fold cross-validation was applied. Since no further mentioning, splitting was probably done at random ignoring the fact that the samples from different soil depth are not independent observations. Cross-validation would need to be done at least by a leave full locations out splitting. Moreover, it remains unclear, how the model tuning was done and especially the stacking. Most likely selection of model predictors and model paramters involved using the cross-validation sets, repeatedly. Therefore, the final reported cross-validation error metrics are not indepenedent anymore from the fitted model and are too optimistic compared to only one single run of cross-validation and certainly compared to an independent randomly sampled data set (e.g. Brus 2011). Moreover, the "pedotransfer" approach (see next comment) does also strongly confound the cross-validation as most likely maps were used produced with data that was left out for "independent" validation. Reported cross-validation results for the final maps are therefore likely far too optimistics.
Section 2.5.2 in general: It remains unclear if and how uncertainty was quantified for the stacking approach. Moreover, it also remains unclear, how previous models were included as a pedo-transfer function. Were maps created from all soil properties and then in a second step those maps were used as predictors for another model fit? This seems rather unusual and should be clearly explained and also the improvement transparently discussed in the results (currently it is not clear if "pedotransfer function" in results and plots refer to the spectral transfer functions or this assumed appraoch).
Section 2.5.3: Maybe I overlooked it, but for the mapping scenarios with a reduced number of sampling locations it remains unclear, how the validation was done. Was there cross-validation applied to the data used for training? This would then mean that the CV sets are not all the same for the different scenarios and that there is a maybe considerable variation to be expected only due to the different sets. As the variation of the 5 times repeated CV is not shown, it is difficult to estimate the variability of different, i.e. also smaller CV sets.
Model evaluation overall: It is not clear how R2 was computed, was it computed as the MEC model efficiency coefficient which use is widespread now. In addition, the predictions could be more biased for low sampling density, this is not computed/presented.
Results
---------------L260ff: Background on imbalanced data situations, should rather make part of the introductions. Instead, here a proper discussion of the findings would be better.
Figures 14ff: Barplots displaying R2. Report how R2 was obtained in the figure caption, is it a mean or a median of the 5 repeated cross-validation runs? Moroever, barplot are not suitable to display the results, because for a 5 times repeated cross-validation a strip-plot showing the variability would be more suitable. If only one R2 value per response is shown as in figure 14, a table might be more suitiable as the information desnity of the graph is only minimal given the space it takes up in the article.
Section 3.4: Spectroscopy models have non-neglectable errors. How were those considered, if at all, in the subsequent analysis?
Figure 11, 12: There is a lot of information shown at left for interpretation to the reader. Those are the only arguments why the sampling should outperform a simpler one. There is not enough prepared evicence presented. I am not sure if the distribution argument holds (the more similar the distribution of population and sampled location, the better the R2 of the mapping), at least in the introduction does not give evicence that this relationship is strong enough to justify the complex approach.
Further comments:
-------------------------Overall, there are too many figures. Please evaluate if they are truly all needed, some information could be combined into one figure (e.g. for the barplots).
L84: For datasets use citation that also appear in the reference list.
L220: use uppercase title.
Figure 1: It remains unclear, what the colors mean. Not all steps are quite clear, there is a lack of detail in the figure, i.e. field work is completely missing.
Figures in general: color scales are not color blind friendly.
Figure 14: The information content does not justify a figure of this size.
Figure 17ff: Legends are too small. The units are missing.
Figure 19ff: Use figure captions instead of figure titles (that are partly incomplete, i.e. what is the meaning of "...; pedotransfer".
Citation: https://doi.org/10.5194/egusphere-2024-2810-RC1 -
RC2: 'Comment on egusphere-2024-2810', Anonymous Referee #2, 04 Nov 2024
The paper introduces a framework for high-resolution soil mapping of various properties at small to medium scales, with a small emphasis on cost-efficiency. Its novelty and relevance come from the combination of multiple Digital Soil Mapping (DSM) techniques in a practical context. The authors propose a new sampling design, employ diverse feature engineering methods for remote sensing data, and utilize a "two-step modeling approach." In this approach, they initially use spectroscopy to generate additional training data for the final spatial model based on remote sensing data. Their pipeline incorporates various state-of-the-art methods.
However, the paper is not always easy to follow because the authors introduce numerous topics and research goals within a single paper. This led to the drawback that specific important aspects for a functional framework were addressed inadequately, whereas other topics got way too much attention:
The primary aim of the paper, as I understand it, is to present a state-of-the-art framework for DSM modeling that leverages the combination of various DSM methods. Hence, focusing so intensively in the introduction and abstract on how soil surveying could benefit from DSM products seems rather irrelevant to the actual scope of the paper, since “the value of DSM maps for soil surveying” is not addressed in the Methodology or Results & Discussion section anymore. Then the paper introduces a new complicated sampling design as part of the framework. While a long theoretical rationality behind the sampling design was provided, no real evidence was shown that this sampling design is actual capable of improving predictions. Generally, no evidence was shown that this long pipeline used in this paper was in any way more appropriate than a more simplistic approach. On top the researchers also wanted to address the questions of how the sample size may influence prediction performances. On the other hand, information on other aspects like modelling is minimal, which even raises concerns on data-leakages.
The stretch of proposing a new sampling design, while discussing the importance of digital soil maps for soil surveyors harmed the original goal of presenting a functional framework. I propose that the revision should entirely focus on either (1) presenting a clear framework that allows to reproduce their combination of methods, potentially with code, and less focus on e.g., the sampling design and soil surveying, (2) discussing how and why DSM maps are relevant for Soil Surveyors or (3) showing the advantages of their new sampling using a benchmark compared to other sampling designs. However, I suspect that the latter is not really possible without a second sampling campaign, and the first would be most interesting given that the novelty comes from combining so many different methods.
Please see below specific comments:
---------------------------------------
Abstract
L. 2 – 3: “The latter is paramount, as they [Soil Maps] form the basis for many thematic maps.”
The authors probably want to say that soil surveyors can use DSM maps to create new “thematic maps”. This only becomes clear after reading the introduction or follow up sentences in which they introduce the concept of soil surveyors using DSM maps. When first reading the abstract, it is not clear what is meant by thematic maps.
L. 10 – 11: “Methods to reduce the uncertainties inherent to the spectral and spatial data were integrated.”
This seems too vague and could mean everything, because “uncertainty” has a lot of context-specific definitions. Given, that this was only a small part of the actual methodology, this sentence may be dropped.
L. 19 – 20: “Our study highlights the value of integrating robust pedometric technologies in soil surveys.”
The authors did not really give evidence for this (e.g., they did not show how integrating their framework improved soil surveys). Rather the value of this study comes (or should come) from presenting a functional framework in which various pedometric technologies are effectively combined.
Introduction
L. 21 – 33 & L. 40 – 45: In the introduction, the authors extensively discuss how soil surveyors could benefit from DSM products. However, it is unclear how this relates to the paper's primary goal of presenting a framework for DSM modeling. This section should be much shorter, as I, as a reader, expected a paper that integrates the soil surveying aspect within the framework. Yet, the actual paper just focuses on the DSM modelling, which is detached from the introduction. I understand that this work was conducted within a project where the goal is to create DSM maps for soil surveyors but this is not really relevant for a general framework on high-resolution DSM.
L. 48 – 50: Four samples per hectare sounds like a lot. Is this common- or best practice in Switzerland? Maybe a citation could help to clarify this.
L. 47 & L. 53 – 69: The authors' main argument is that a targeted sampling design (i.e., a sampling design that covers the feature space) does not provide even geographical stratification. However, it is difficult to understand why the authors put such a great focus on the spatial coverage and the concept of local extremes without any reference that supports their line of argumentation. Spatial coverage might not even be associated with an increase in performance for DSM modelling as for example indicated in Wadoux et al. (2019). Conversely, it has been repeatedly demonstrated that feature coverage can enhance predictive power, at least when compared to Simple Random Sampling (see, for example, the discussion in Žížala et al. 2024). The cited work by Brus (2022) also refers to this concept at the beginning of chapter 18 and the end of chapter 19. Finally, spatial coordinates can be incorporated into targeted sampling to increase spatial coverage if desired.
Wadoux, A. M. C., Brus, D. J., & Heuvelink, G. B. (2019). Sampling design optimization for soil mapping with random forest. Geoderma, 355, 113913.
Brus, D. J. (2022). Spatial sampling with R. Chapman and Hall/CRC.
Žížala, D., Princ, T., Skála, J., Juřicová, A., Lukas, V., Bohovic, R., Zádorová, T., & Minařík, R. (2024). Soil sampling design matters - Enhancing the efficiency of digital soil mapping at the field scale. Geoderma Regional, 39, e00874
Methods
When introducing a framework, the ultimate goal is for other researchers or DSM practitioners to be able to reproduce the methodology for future DSM campaigns. However, the absence of provided code is a significant drawback. Given the numerous methods employed for feature engineering and the complexity of the sampling design, it would be challenging to reproduce any of the pre-processing steps without access to the code.
Section 2.2.1: A wide range of features have been used and engineered. To better organize and track these different features, an overview table would be beneficial. This table could include columns such as the type of feature data (e.g., DEM, terrain attributes, bare-soil multispectral RS, etc.), the engineering/processing applied (e.g., multi-scale), and the dimensionality of the features as a numerical value.
L. 114 – 115: This is not clear even with the reference. Was the bare-soil multispectral data predicted given the other available features for these affected areas?
L. 120 – 121: There are several questions that should be addressed by the reviewer about the selected covariates for the sampling design:
(1) It is stated that “a combination of carefully selected uncorrelated covariates” were used for the local feature coverage of the new sampling design. Afterwards some rationality behind the picked features is given. This implies that features were handpicked based on expert knowledge and intercorrelation and not picked based on e.g. an automated correlation matrix filter. It would be better to be more explicit about this and make clear that they were handpicked.
(2) Why were Sentinel 2 and Landsat NDVI SD selected? Although it is mentioned that they are based on different time intervals, they appear to be strongly correlated in Fig. 3. As a result, NDVI SD will be heavily overrepresented in the feature space coverage. While this might be a minor issue, using NDVI SD twice seems arbitrary given the wide range of features employed in this study. Was there a specific rationale behind this choice? To a minor degree this also applies to using Flow acc. twice at different scales but at least they appear to be less correlated.
(3) Including a correlation matrix of the selected features in an Appendix could be useful. This addition may also help address some of the other points mentioned.
Section 2.5.1: Was a method used to reduce the dimensionality of the features, such as a correlation matrix filter, feature elimination, PCA, or a similar technique?
L. 227: A five-times repeated 10-fold cross-validation (CV) has been applied. However, the methodology seems to suggest that a non-nested CV was used, despite the fact that hyperparameters were tuned. This approach is likely to result in slightly overoptimistic results. Although the caret package does not offer nested CV by default, using a single nested 10-fold (outer) and 5-fold (inner) CV would require the same computational resources given the five-times repeated 10-fold CV. Implementing a nested CV would ensure the independence of the test data during hyperparameter tuning, which is particularly important as the predictions will be used for the pedotransfer function.
L. 233: As a recommendation, in case the modeling is repeated, consider that 600 features represent a large number relative to the sample size. Without feature selection, the sample-to-feature ratio is nearly 1:1, which heightens the risk of overfitting.
L. 234: It is unclear how the "pedotransfer function" is implemented. Are the predictions of the other soil properties used as additional features for the second model? If so, are they used according to the same "training fold" splits? Additional code and/or more detailed information would be necessary. Particularly, if there's leakage during hyperparameter tuning or if different "training folds" are used, the results may be too optimistic.
Section 2.5.3: Were the models evaluated based on the same test folds? Otherwise, comparability is slightly limited and subject to randomness from the splitting.
Models are evaluated based on a non-probability sampling design. Given the large number of samples and the spread in geographic space, this may not be a large issue but could be a point to consider (see Piikki et al. 2021).
Piikki, K., Wetterlind, J., Söderström, M., & Stenberg, B. (2021). Perspectives on validation in digital soil mapping of continuous attributes—A review. Soil Use and Management, 37(1), 7-21.
Results
The study proposes an interesting framework in which various methods are combined. The results are promising given the high R2. However, this alone is still not convincing that the framework is actually “capable”. It would be useful to have a reference performance. E.g., what if only ordinary kriging is used compared to all the different feature engineering during the spatial modelling or what if samples are selected randomly instead of this “complicated” sub-sampling approach based on locality? Evidence that could demonstrate an increase in accuracy with the proposed framework would benefit the paper significantly.
Section 3.2: The sampling design is an integral part of this paper and the reader is supposed to believe that this new sampling design is more efficient than other common sampling designs. However, this results section does not contextualize the new sampling design compared to other commonly used sampling designs. Without evidence and comparisons to another sampling designs, it is simply not convincing for the reader, that this new design is actually capable of improving predictions.
L. 258 – 259: “A comparison of the frequency distributions of the input data set (grids) with that of the selected sampling locations reveals a high degree of correspondence (Figure 10)”
Is this supposed to be a “good” thing? Random Sampling does this probability the best. In contrast, with feature coverage, one may even expect deviation from the actual frequency distribution function.
Conclusion
L. 330 – 336: This part feels out of place in the conclusion. The context of using soil property maps prior to pedological fieldwork have not been subject of this manuscript apart from the first paragraph of the introduction. In order to keep the conclusion more in touch with the actual discussion and results section, the conclusion should not reiterate the first paragraph of the introduction. L. 337 seems to be a much more appropriate start of the conclusion.
L. 338 – 339: “Developing an effective sampling design is one of the most critical aspects of operationalizing soil mapping.”
The biggest drawback of this study is that the “effectiveness” of the proposed sampling design has not been demonstrated but it just proposes a new framework of a sampling design within a framework for creating soil property maps. Ultimately, there is no evidence that the new sampling design is “effective”.
Additional comments
L. 24: “Study area” instead of “data set”?
L. 113: White space is missing between ”[…] content(Safanelli […]”.
L. 220: Modeling should be uppercase.
Fig. 18 – 22 may be arranged to a single Figure as it allows better evaluation and comparison.
Some Figures could be added to the Appendix because they may be interesting but do not contribute to the results or discussion (e.g. Fig. 12, 17 & 18)
Citation: https://doi.org/10.5194/egusphere-2024-2810-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
139 | 43 | 14 | 196 | 2 | 6 |
- HTML: 139
- PDF: 43
- XML: 14
- Total: 196
- BibTeX: 2
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1