the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Deep learning with a multitask convolutional neural network to generate a nationalscale 3D soil data product: Particle size distribution of the German agricultural soillandscape
Abstract. Many soil functions and processes are controlled by the soil particle size distribution. The generated threedimensional continuous data product, which covers the particle size fractions of sand, silt, and clay in the agricultural soillandscape of Germany, has a spatial resolution of 100 m and a depth resolution of 1 cm. This product is an important component for predicting the effects of agricultural management practices and their adaptability to climate change, as well as for analyzing soil functions and numerous risks. The effectiveness of the convolutional neural network (CNN) algorithm in producing multidimensional, multivariate data products is demonstrated. Even though the potential of this deep learning approach to understand and model the complex soillandscape relationship is virtually limitless, limitations are datadriven. Further research is needed to assess the required complexity and depth of the CNN and the inclusion of the landscape surrounding each soil profile.
 Preprint
(940 KB)  Metadata XML
 BibTeX
 EndNote
Status: closed

CC1: 'Comment on egusphere20232386', Philippe Lagacherie, 08 Dec 2023
This paper describes the predictions over Germany of particle size fractions (clay, silt and sand) using a CNN algorithm trained on around 3,000 locations with measured soil properties and classical soil covariates. There have been a lot of nationwide applications of Digital Soil Map in the past and already some applications of CNN in Digital Soil Mapping. I therefore consider that the novelty of this paper is rather poor. However, I notice that the authors used a genetic algorithm to train the hyperparameters of their CNN, which would perhaps constitute a novelty for the application of CNN in Digital Soil Mapping if the added value of this preprocessing step should be clearly demonstrated, which is not in this present version.
Contrary to what the authors argued in the conclusion, I do not think that the efficiency of a CNN algorithm is clearly demonstrated from the results that are presented. Indeed, using CNN does not increase the performances obtained earlier on the same dataset (topsoil particle size fractions) by a more simpletouse learning algorithm (gradient boosted tree, Gebauer et al) and obtained very poor prediction performances for particle size fractions beyond 30 cm depth (figure 3 bottom line).
Furthermore, I have some additional questions and comments along the text :
L94: explain how the horizon boundaries were taken into account in the vertical sampling scheme
L173: I do not understand why the authors fed their CNN with soil observations containing 100 soil layers of 1cm whereas soil particle fractions were only measured at 5 depth intervals. This uselessly overload the CNN without bringing more significant information. It consequently increases the number of parameters and hamper the convergence of the algorithm toward a satisfactory prediction.
L187: A thorough presentation of the importance of missing data per soil property and soil depth increment is necessary. To my experience, the numbers of missing data generally increase with depth (soils are not all 100 cm thick). This could also explain why the prediction performances collapse beyond 30 cm
L190 : I disagree with this statement. It is quite easy to find transnational covariates as shown by the number of papers presenting continental or global applications of DSM
L 206 : "All predictors were recoded into dummy variables": similarity between the categorical values not taken into account?
 238 : No data augmentation? could be easily done by rotating/mirrorring windows
L245247: As a pedologist, I am very surprised to read that 44,8% of the soil observations
sampled in Germany are polyphasic soils with more than one parent material. Please check this information from an experienced soil scientist.
L247: A table showing the main statistical indicators of the distribution of soil properties (mean, variance, min, max etc…) and histograms would be more informative than figure 2. In particular, we need to know the variance to interpret the RMSEs that are given further
L276: RMSE should be usefully completed by other prediction performance indicators such as R2, Model Efficiency Coefficient (MEC) or LCCC. some scatterplots of the measured versus predicted soil properties should be added to give more insight into the behaviour of the model.
L284 : The bottom line curves show clearly stairstep shapes. Any interpretation of that?
L287291: This interpretation does not convince me. Normally, the P covariate should be more related with deep horizons than with superficial ones as the former are expected to be closer to the parent rock described in geological database. Consequently, the soil property predictions should be better for deep horizons if the limiting factor was the P covariate.
Citation: https://doi.org/10.5194/egusphere20232386CC1  AC1: 'Reply on CC1', Mareike Ließ, 14 Dec 2023

RC1: 'Comment on egusphere20232386', Philippe Lagacherie, 28 Dec 2023
This paper describes the predictions over Germany of particle size fractions (clay, silt and sand) using a CNN algorithm trained on around 3,000 locations with measured soil properties and classical soil covariates. There have been a lot of nationwide applications of Digital Soil Map in the past and already some applications of CNN in Digital Soil Mapping. I therefore consider that the novelty of this paper is rather poor. However, I notice that the authors used a genetic algorithm to train the hyperparameters of their CNN, which would perhaps constitute a novelty for the application of CNN in Digital Soil Mapping if the added value of this preprocessing step should be clearly demonstrated, which is not in this present version.
Contrary to what the authors argued in the conclusion, I do not think that the efficiency of a CNN algorithm is clearly demonstrated from the results that are presented. Indeed, using CNN does not increase the performances obtained earlier on the same dataset (topsoil particle size fractions) by a more simpletouse learning algorithm (gradient boosted tree, Gebauer et al) and obtained very poor prediction performances for particle size fractions beyond 30 cm depth (figure 3 bottom line).
Furthermore, I have some additional questions and comments along the text :
L94: explain how the horizon boundaries were taken into account in the vertical sampling scheme
L173: I do not understand why the authors fed their CNN with soil observations containing 100 soil layers of 1cm whereas soil particle fractions were only measured at 5 depth intervals. This uselessly overload the CNN without bringing more significant information. It consequently increases the number of parameters and hamper the convergence of the algorithm toward a satisfactory prediction.
L187: A thorough presentation of the importance of missing data per soil property and soil depth increment is necessary. To my experience, the numbers of missing data generally increase with depth (soils are not all 100 cm thick). This could also explain why the prediction performances collapse beyond 30 cm
L190 : I disagree with this statement. It is quite easy to find transnational covariates as shown by the number of papers presenting continental or global applications of DSM
L 206 : "All predictors were recoded into dummy variables": similarity between the categorical values not taken into account?
 238 : No data augmentation? could be easily done by rotating/mirrorring windows
L245247: As a pedologist, I am very surprised to read that 44,8% of the soil observations
sampled in Germany are polyphasic soils with more than one parent material. Please check this information from an experienced soil scientist.
L247: A table showing the main statistical indicators of the distribution of soil properties (mean, variance, min, max etc…) and histograms would be more informative than figure 2. In particular, we need to know the variance to interpret the RMSEs that are given further
L276: RMSE should be usefully completed by other prediction performance indicators such as R2, Model Efficiency Coefficient (MEC) or LCCC. some scatterplots of the measured versus predicted soil properties should be added to give more insight into the behaviour of the model.
L284 : The bottom line curves show clearly stairstep shapes. Any interpretation of that?
L287291: This interpretation does not convince me. Normally, the P covariate should be more related with deep horizons than with superficial ones as the former are expected to be closer to the parent rock described in geological database. Consequently, the soil property predictions should be better for deep horizons if the limiting factor was the P covariate.
Citation: https://doi.org/10.5194/egusphere20232386RC1  AC2: 'Reply on RC1', Mareike Ließ, 03 Jan 2024

RC2: 'Comment on egusphere20232386', Anonymous Referee #2, 08 Jan 2024
The paper uses a CNN algorithm in a DSM exercise in Germany, using a relatively large collection of soil profiles. In general, a well written manuscript but it would benefit from reducing the use of "data science" jargon and more consistent citations.
My main concern with the manuscript is that it fails to demonstrate how their approach is more effective (as stated in the abstract and conclusions). They only provide results for two CNN variations without comparing it with conventional DSM models (without spatial context), and they obtain inferior performance compared to previous studies using the same dataset. In addition to that, they use 1 cm slices instead of using a depth function stating that it is better but without showing any results to support it.
Specific comments The abstract needs more work. It reads like the summaries for nonexperts that some journals require.
 L56: Behrens et al. (2018) did not use a CNN.
 L61: You are talking about CNNs applied in the context of soil mapping but, again, some of the references are not related to that (Behrens et al. (2010) and Behrens et al. (2014)). Considering that the list is not very long, you are missing some references.
 L68: I am not sure that this is true. In most of publications, I have seen have some hyperparameter optimisation such as grid or random search.
 L73: is the 3D model often worse than the 2D? any reference?
 L125: You used coordinates as covariates, hoping to represent spatial patterns that other covariates do not capture. What kind of patterns would that be?
 L135146: I understand that you are trying to summarise a lot of concepts in a single paragraph but it does not read well and it is very inaccurate. E.g. the description of the learning rate is very simplistic. A high value not always speeds up the learning process and a low value not always ensures that the network succeeds in learning the predictorresponse relation. In general, I understand what you mean because I have worked with CNNs but another reader will not get any value from this.
 L150162: A lot of "data science" jargon here. Also, you are constantly mixing CNNs, CNNs applied to spatial modelling and the specific CNN architecture that you use. Please, do not mix them all in one paragraph. For instance, towards the end you mention that "the output is flattened before it enters a sequence of dense layers". That is specifically for your CNN but the text reads as if it is true for all CNNs.
 L167: "CNNs cannot handle this type of input". I am OK with your pragmatic approach of limiting the window size to avoid missing data but CNNs can handle missing data. I assume that you are specifically talking of missing data represented by the float "NA".
 Section 2.6: A lot of "new" genetic algorithm jargon. Islands? Migration? I do not think GA is common enough to skip those concepts. The reader would benefit with a brief introduction of the algorithm that you used.
 Section 2.7: No reference to the method? It sounds like a ad hoc implementation of Shapley values but you do not specify any of the details. Number of permutations? All the predictor simultaneously? Please add more details.
 Table 3: I have seen kernels with even number of pixels (e.g. 2x2) in a couple of DSM publications and still have not seen a justification of why an asymmetrical convolution would be desirable (they introduce aliasing errors). That is why in signal processing, kernel operations such as convolutions are often preferred to be symmetrical (e.g. 3x3). You need to be careful when defining the search space for your hyperparameter tuning.
 Section 3.2: You are missing information about the convergence of the GA optimisation. Also, did you get any insights from this process? You have a population of 500 individuals, and assuming that you ran it for at least 20 generations, you trained 10,000 models. In my experience, that is much longer than a well defined grid search. For instance, a dropout rate of 0.1019826 (from your 3D, 5 cells model) is not different from a dropout rate of 0.1.
 L274. I would not call that "uncertainty of the model predictions".
 L306. Did you try it and observed artifacts? It is quite common to assign 1 or other values to missing data.
 L309312: You mentioned that your method does not introduce additional uncertainty (compared to standard intervals methods such as equal area spline) but that is not necessarily true. Since you subdivided into 1cm slices, I assume you have the same value for each slice within the original layer (e.g. 10 slices with the same clay content within a 010cm layer). That procedure is also a depth function but defined by you instead of fitted to data. If you could show that this method is actually better than the traditional DSM approach, it would be a valuable contribution.
 Section 3.4: Interesting that the model mostly uses categorical covariates. How many of the 119 predictors are "dummy" classes? Did you normalise/standardised the contiguous covariates?
Citation: https://doi.org/10.5194/egusphere20232386RC2  AC3: 'Reply on RC2', Mareike Ließ, 09 Jan 2024
Status: closed

CC1: 'Comment on egusphere20232386', Philippe Lagacherie, 08 Dec 2023
This paper describes the predictions over Germany of particle size fractions (clay, silt and sand) using a CNN algorithm trained on around 3,000 locations with measured soil properties and classical soil covariates. There have been a lot of nationwide applications of Digital Soil Map in the past and already some applications of CNN in Digital Soil Mapping. I therefore consider that the novelty of this paper is rather poor. However, I notice that the authors used a genetic algorithm to train the hyperparameters of their CNN, which would perhaps constitute a novelty for the application of CNN in Digital Soil Mapping if the added value of this preprocessing step should be clearly demonstrated, which is not in this present version.
Contrary to what the authors argued in the conclusion, I do not think that the efficiency of a CNN algorithm is clearly demonstrated from the results that are presented. Indeed, using CNN does not increase the performances obtained earlier on the same dataset (topsoil particle size fractions) by a more simpletouse learning algorithm (gradient boosted tree, Gebauer et al) and obtained very poor prediction performances for particle size fractions beyond 30 cm depth (figure 3 bottom line).
Furthermore, I have some additional questions and comments along the text :
L94: explain how the horizon boundaries were taken into account in the vertical sampling scheme
L173: I do not understand why the authors fed their CNN with soil observations containing 100 soil layers of 1cm whereas soil particle fractions were only measured at 5 depth intervals. This uselessly overload the CNN without bringing more significant information. It consequently increases the number of parameters and hamper the convergence of the algorithm toward a satisfactory prediction.
L187: A thorough presentation of the importance of missing data per soil property and soil depth increment is necessary. To my experience, the numbers of missing data generally increase with depth (soils are not all 100 cm thick). This could also explain why the prediction performances collapse beyond 30 cm
L190 : I disagree with this statement. It is quite easy to find transnational covariates as shown by the number of papers presenting continental or global applications of DSM
L 206 : "All predictors were recoded into dummy variables": similarity between the categorical values not taken into account?
 238 : No data augmentation? could be easily done by rotating/mirrorring windows
L245247: As a pedologist, I am very surprised to read that 44,8% of the soil observations
sampled in Germany are polyphasic soils with more than one parent material. Please check this information from an experienced soil scientist.
L247: A table showing the main statistical indicators of the distribution of soil properties (mean, variance, min, max etc…) and histograms would be more informative than figure 2. In particular, we need to know the variance to interpret the RMSEs that are given further
L276: RMSE should be usefully completed by other prediction performance indicators such as R2, Model Efficiency Coefficient (MEC) or LCCC. some scatterplots of the measured versus predicted soil properties should be added to give more insight into the behaviour of the model.
L284 : The bottom line curves show clearly stairstep shapes. Any interpretation of that?
L287291: This interpretation does not convince me. Normally, the P covariate should be more related with deep horizons than with superficial ones as the former are expected to be closer to the parent rock described in geological database. Consequently, the soil property predictions should be better for deep horizons if the limiting factor was the P covariate.
Citation: https://doi.org/10.5194/egusphere20232386CC1  AC1: 'Reply on CC1', Mareike Ließ, 14 Dec 2023

RC1: 'Comment on egusphere20232386', Philippe Lagacherie, 28 Dec 2023
This paper describes the predictions over Germany of particle size fractions (clay, silt and sand) using a CNN algorithm trained on around 3,000 locations with measured soil properties and classical soil covariates. There have been a lot of nationwide applications of Digital Soil Map in the past and already some applications of CNN in Digital Soil Mapping. I therefore consider that the novelty of this paper is rather poor. However, I notice that the authors used a genetic algorithm to train the hyperparameters of their CNN, which would perhaps constitute a novelty for the application of CNN in Digital Soil Mapping if the added value of this preprocessing step should be clearly demonstrated, which is not in this present version.
Contrary to what the authors argued in the conclusion, I do not think that the efficiency of a CNN algorithm is clearly demonstrated from the results that are presented. Indeed, using CNN does not increase the performances obtained earlier on the same dataset (topsoil particle size fractions) by a more simpletouse learning algorithm (gradient boosted tree, Gebauer et al) and obtained very poor prediction performances for particle size fractions beyond 30 cm depth (figure 3 bottom line).
Furthermore, I have some additional questions and comments along the text :
L94: explain how the horizon boundaries were taken into account in the vertical sampling scheme
L173: I do not understand why the authors fed their CNN with soil observations containing 100 soil layers of 1cm whereas soil particle fractions were only measured at 5 depth intervals. This uselessly overload the CNN without bringing more significant information. It consequently increases the number of parameters and hamper the convergence of the algorithm toward a satisfactory prediction.
L187: A thorough presentation of the importance of missing data per soil property and soil depth increment is necessary. To my experience, the numbers of missing data generally increase with depth (soils are not all 100 cm thick). This could also explain why the prediction performances collapse beyond 30 cm
L190 : I disagree with this statement. It is quite easy to find transnational covariates as shown by the number of papers presenting continental or global applications of DSM
L 206 : "All predictors were recoded into dummy variables": similarity between the categorical values not taken into account?
 238 : No data augmentation? could be easily done by rotating/mirrorring windows
L245247: As a pedologist, I am very surprised to read that 44,8% of the soil observations
sampled in Germany are polyphasic soils with more than one parent material. Please check this information from an experienced soil scientist.
L247: A table showing the main statistical indicators of the distribution of soil properties (mean, variance, min, max etc…) and histograms would be more informative than figure 2. In particular, we need to know the variance to interpret the RMSEs that are given further
L276: RMSE should be usefully completed by other prediction performance indicators such as R2, Model Efficiency Coefficient (MEC) or LCCC. some scatterplots of the measured versus predicted soil properties should be added to give more insight into the behaviour of the model.
L284 : The bottom line curves show clearly stairstep shapes. Any interpretation of that?
L287291: This interpretation does not convince me. Normally, the P covariate should be more related with deep horizons than with superficial ones as the former are expected to be closer to the parent rock described in geological database. Consequently, the soil property predictions should be better for deep horizons if the limiting factor was the P covariate.
Citation: https://doi.org/10.5194/egusphere20232386RC1  AC2: 'Reply on RC1', Mareike Ließ, 03 Jan 2024

RC2: 'Comment on egusphere20232386', Anonymous Referee #2, 08 Jan 2024
The paper uses a CNN algorithm in a DSM exercise in Germany, using a relatively large collection of soil profiles. In general, a well written manuscript but it would benefit from reducing the use of "data science" jargon and more consistent citations.
My main concern with the manuscript is that it fails to demonstrate how their approach is more effective (as stated in the abstract and conclusions). They only provide results for two CNN variations without comparing it with conventional DSM models (without spatial context), and they obtain inferior performance compared to previous studies using the same dataset. In addition to that, they use 1 cm slices instead of using a depth function stating that it is better but without showing any results to support it.
Specific comments The abstract needs more work. It reads like the summaries for nonexperts that some journals require.
 L56: Behrens et al. (2018) did not use a CNN.
 L61: You are talking about CNNs applied in the context of soil mapping but, again, some of the references are not related to that (Behrens et al. (2010) and Behrens et al. (2014)). Considering that the list is not very long, you are missing some references.
 L68: I am not sure that this is true. In most of publications, I have seen have some hyperparameter optimisation such as grid or random search.
 L73: is the 3D model often worse than the 2D? any reference?
 L125: You used coordinates as covariates, hoping to represent spatial patterns that other covariates do not capture. What kind of patterns would that be?
 L135146: I understand that you are trying to summarise a lot of concepts in a single paragraph but it does not read well and it is very inaccurate. E.g. the description of the learning rate is very simplistic. A high value not always speeds up the learning process and a low value not always ensures that the network succeeds in learning the predictorresponse relation. In general, I understand what you mean because I have worked with CNNs but another reader will not get any value from this.
 L150162: A lot of "data science" jargon here. Also, you are constantly mixing CNNs, CNNs applied to spatial modelling and the specific CNN architecture that you use. Please, do not mix them all in one paragraph. For instance, towards the end you mention that "the output is flattened before it enters a sequence of dense layers". That is specifically for your CNN but the text reads as if it is true for all CNNs.
 L167: "CNNs cannot handle this type of input". I am OK with your pragmatic approach of limiting the window size to avoid missing data but CNNs can handle missing data. I assume that you are specifically talking of missing data represented by the float "NA".
 Section 2.6: A lot of "new" genetic algorithm jargon. Islands? Migration? I do not think GA is common enough to skip those concepts. The reader would benefit with a brief introduction of the algorithm that you used.
 Section 2.7: No reference to the method? It sounds like a ad hoc implementation of Shapley values but you do not specify any of the details. Number of permutations? All the predictor simultaneously? Please add more details.
 Table 3: I have seen kernels with even number of pixels (e.g. 2x2) in a couple of DSM publications and still have not seen a justification of why an asymmetrical convolution would be desirable (they introduce aliasing errors). That is why in signal processing, kernel operations such as convolutions are often preferred to be symmetrical (e.g. 3x3). You need to be careful when defining the search space for your hyperparameter tuning.
 Section 3.2: You are missing information about the convergence of the GA optimisation. Also, did you get any insights from this process? You have a population of 500 individuals, and assuming that you ran it for at least 20 generations, you trained 10,000 models. In my experience, that is much longer than a well defined grid search. For instance, a dropout rate of 0.1019826 (from your 3D, 5 cells model) is not different from a dropout rate of 0.1.
 L274. I would not call that "uncertainty of the model predictions".
 L306. Did you try it and observed artifacts? It is quite common to assign 1 or other values to missing data.
 L309312: You mentioned that your method does not introduce additional uncertainty (compared to standard intervals methods such as equal area spline) but that is not necessarily true. Since you subdivided into 1cm slices, I assume you have the same value for each slice within the original layer (e.g. 10 slices with the same clay content within a 010cm layer). That procedure is also a depth function but defined by you instead of fitted to data. If you could show that this method is actually better than the traditional DSM approach, it would be a valuable contribution.
 Section 3.4: Interesting that the model mostly uses categorical covariates. How many of the 119 predictors are "dummy" classes? Did you normalise/standardised the contiguous covariates?
Citation: https://doi.org/10.5194/egusphere20232386RC2  AC3: 'Reply on RC2', Mareike Ließ, 09 Jan 2024
Viewed
HTML  XML  Total  BibTeX  EndNote  

261  132  35  428  18  20 
 HTML: 261
 PDF: 132
 XML: 35
 Total: 428
 BibTeX: 18
 EndNote: 20
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1