the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Reducing Temporal Uncertainty in Soil Bulk Density Estimation Using Remote Sensing and Machine Learning Approaches
Abstract. Soil bulk density (BD), a key physical property affecting soil compaction, porosity, and carbon stock estimation, exhibits considerable spatial and temporal variability. However, current BD estimation methods especially traditional pedotransfer functions (PTFs) are inherently static and not designed for temporal analysis. This presents a significant limitation for soil monitoring across large and heterogeneous regions. In this study, we developed a machine learning (ML) approach integrated with remote sensing data to map and monitor BD across Thailand from 2004 to 2009 at national scale. We used multispectral indices, topographic variables, climate data, and organic carbon content to train six ML models: Artificial Neural Networks (ANN), Deep Neural Networks, Random Forest, Support Vector Regression, XGBoost, and LightGBM. Model performance was evaluated using in-situ BD measurements from 236 soil samples collected in 2004. For benchmarking purposes, 76 published PTFs were also assessed on the same dataset. Results showed that the ANN model achieved the highest prediction accuracy (R2 = 0.986; RMSE = 0.017 g cm-3), outperforming both other ML models and all PTFs. Temporal analysis using the ANN model revealed a 7.27 % increase in mean BD and a 41.23 % reduction in standard deviation between 2004 and 2009, indicating increased soil compaction and reduced variability. Feature importance analysis identified organic carbon, vegetation indices, slope, and temperature as the most influential variables. The resulting high-resolution BD maps captured national-scale spatial and temporal trends and provide a robust foundation for soil quality monitoring, carbon accounting, and sustainable land use planning in tropical agroecosystems.
- Preprint
(1773 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2360', Anonymous Referee #1, 14 Sep 2025
-
AC1: 'Reply on reviewer RC1', sunantha ousaha, 03 Oct 2025
Dear Reviewer,
Thank you for the opportunity to submit our response regarding the manuscript entitled “Reducing Temporal Uncertainty in Soil Bulk Density Estimation Using Remote Sensing and Machine Learning Approaches” to SOIL. We sincerely appreciate the time and effort you and the reviewers have dedicated to evaluating our work and providing constructive and insightful feedback. We are grateful for the valuable comments and suggestions, which have significantly helped improve the quality and clarity of our manuscript.
While we are not submitting a revised manuscript at this stage, we are pleased to provide a detailed point-by-point response to your valuable and constructive comments. Your feedback has greatly contributed to improving the clarity and scientific quality of our work.
Please find our response attached as a PDF file.
-
AC1: 'Reply on reviewer RC1', sunantha ousaha, 03 Oct 2025
-
RC2: 'Comment on egusphere-2025-2360', Anonymous Referee #2, 04 Oct 2025
This manuscript presents a comprehensive and timely study on the estimation of soil bulk density (BD) using machine learning (ML) and remote sensing data, with a specific focus on temporal changes in Thailand between 2004 and 2009. The authors are to be commended for the extensive comparison undertaken, evaluating six different ML models against a very large benchmark of 76 published pedotransfer functions (PTFs). The use of Bayesian Optimization for hyperparameter tuning represents a rigorous and state-of-the-art approach. The paper is well-structured, the research question is significant, and the results, if validated, would be a valuable contribution to the fields of soil science, remote sensing, and land management.
However, there are several major concerns, primarily methodological, that must be addressed before the manuscript can be considered for publication. The most critical issue is a fundamental contradiction between the model development/validation and its application for temporal analysis, which currently undermines the paper's main conclusions regarding temporal trends.
Major Comments:
The title is misleading because the temporal variation of BD is treated in only one sub-chapter (Section 3.5), therefore I suggest to modify the title accordingly.
The contradiction in the application of the ANN model for 2009 Predictions is the most significant concern. The authors establish that the Artificial Neural Network (ANN) model is superior to other models, including tree-based methods like Random Forest and XGBoost. The feature importance analysis (Section 3.4, Figure 6) is key to this conclusion, showing that the ANN model uses a balanced set of predictors (slope, temperature, vegetation indices, etc.) and does not overly rely on Organic Carbon (OC). This is presented as a major strength, making the model more robust and generalizable. However, when applying this model to the 2009 dataset for temporal analysis, the authors state: "...utilizing only OC data as the sole predictor for BD, as no ground-truth BD measurements were available for validation in that year". This is a critical methodological flaw. The validated high-performance ANN model is a multivariate model that relies on a suite of remote sensing, topographic, and climate inputs. It cannot be applied using only a single input variable (OC). The authors need to clarify precisely how the 2009 predictions were made. Did they train a new, univariate ANN model using only OC? If so, its performance is unknown and unvalidated, and it cannot be claimed to be the "best-performing model.". Did they apply the original multivariate model but feed it only OC data, with placeholder values (e.g., zero, mean) for all other inputs? This would be invalid and produce meaningless results. As it stands, the entire temporal analysis (Section 3.5), including the reported 7.27% increase in mean BD and the 41.23% reduction in standard deviation, is not supported by the methodology. The conclusions about increased soil compaction and reduced variability are therefore unsubstantiated. The authors must either provide the full suite of predictor variables for 2009 and re-run the analysis or retract the temporal claims.
Equation (1) on page 3 for calculating bulk density. The multiplication by 100 is incorrect. Soil bulk density is a measure of mass per unit volume, with standard units of g cm−3. Multiplying by 100 would make the values physically meaningless (e.g., the reported mean of 1.28 g cm−3 would become 128). This appears to be a significant typo that should be corrected. Please verify if this error propagated into any calculations or if it is merely a display error in the formula.
Minor Comments
- Provide more information on satellite images: how many satellite products did you use in the data analysis after pre-processing? What’s the satellite overpass frequency?
- In Section 2.8, temporal uncertainty is quantified as the absolute difference in standard deviations between the two years (U=∣σ2009−σ2004∣). While this measures the change in the variability or dispersion of BD predictions, it is not a standard definition of model uncertainty (which typically refers to prediction intervals or confidence in the estimates). The authors should consider rephrasing this to "change in spatial variability" to avoid confusion with predictive uncertainty.
- The correlation matrix (Figure 2) shows an exceptionally strong negative correlation between OC and BD (r=−0.92). This suggests that OC explains over 84% (R2≈0.85) of the variance in BD in the 2004 dataset by itself. This may limit the generalizability of the findings to regions where this relationship is less dominant. It would be beneficial for the authors to briefly discuss this in the context of their dataset and how it might influence model performance comparisons.
- Fix an objective evaluation for the RMSE obtained from different models. De Vos (2005) established satisfactory prediction performance for RMSE less than 0.25 g cm-3 (Palladino et al., 2022)
- Figures: In caption for Fig. 1b, please clarify what the colorbar is referring to. I see the red circles in the texture triangle simply indicate the soil samples. Then I see orange circles and colored texture classes. This is confusing. In Fig. 2 the authors should add the colorbar title (“correlation”). The caption for Figure 4 reads "Loss function curves for neural network regression models... and learning curves for other machine learning models...". However, Figure 4 only shows these curves. The scatterplots are in Figure 5. The caption for Figure 4 should be corrected to only describe its own content. Increase fontsize of the axis ticklabels, add a grid. In Fig. 5 increase fontsize of the axis ticklabels, add a grid
- Tables: In Table 2 replace “TPFs” with “PTFs”. In Table 3, please add the coefficient of variation (CV). In Table 6, please specify how many data were used for calibration (training) and how many for validation (testing)
- Inconsistent Units in Text: The manuscript occasionally uses inconsistent formatting for units. For example, in the abstract, the RMSE is given as "0.017 g cm³" , while in the ML performance table (Table 6), the MAE for the ANN model is listed as "0.012" without clear units in the text, and elsewhere as "0.012 g cm→". Please ensure consistent formatting (g cm−3) throughout the manuscript for clarity and professionalism.
Recommendation
I recommend Major Revisions.
The manuscript addresses an important research topic and employs a robust and extensive comparative framework. The rigorous hyperparameter tuning and the scale of the PTF benchmark are significant strengths. However, the foundational contradiction in the methodology for the 2009 temporal analysis is a critical flaw that invalidates the paper's primary conclusions regarding temporal trends in soil bulk density.
The authors must resolve this issue by either providing a valid methodology for the 2009 predictions using the full multivariate ANN model or by reframing the paper to focus solely on the 2004 model comparison, removing the temporal analysis. Additionally, the error in the BD formula and other minor points should be addressed. If the authors can satisfactorily resolve these major concerns, the revised manuscript would likelyo my opinion be suitable for publication.
REFERENCES
De Vos, B., Van Meirvenne, M., Quataert, P., Deckers, J., Muys, B., 2005. Predictive quality of pedotransfer functions for estimating bulk density of forest soils. Soil Sci. Soc. Am. J. 69 (2), 500–510.
Palladino M., N. Romano, E. Pasolli, P. Nasta. 2022. Developing pedotransfer functions for predicting soil bulk density in Campania Region. Geoderma 412, 115726 https://doi.org/10.1016/j.geoderma.2022.115726
Citation: https://doi.org/10.5194/egusphere-2025-2360-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
636 | 86 | 16 | 738 | 15 | 26 |
- HTML: 636
- PDF: 86
- XML: 16
- Total: 738
- BibTeX: 15
- EndNote: 26
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
This paper presents an approach to estimate soil bulk density (BD) from soil data, environmental data, and remote sensing data. Classical pedotransfer functions (PTF) are compared with ML methods that include also remote sensing data to predict soil bulk density. An important result of the study is that soil organic matter or soil organic carbon are the most important input variables of PTFs and PTFs that use only these input variables perform the best. As a consequence, these PTFs relate changes in BD over time only to changes in soil organic matter or carbon. ML approaches also include other variables that could be linked to land use and land management. This improves the prediction of BD compared to the classical PTFs. But, to what extent these extra variables influence the BD estimates depends on the type of method that is used. A difference in sensitivity to different variables affects how predictions of changes in BD respond to changes of input variables over time. Comparing the predicted distributions of BD in 2009 with those of 2004, it seems that both PTFs and ML methods predict similar changes, although there are some differences. The importance or relevance of these differences was not very clear. Furthermore, since no measurements of BD in 2009 were available, it was not possible to verify whether changes in BD were predicted more accurately using the ML method. Given this lack of validation, the authors should give other evidence that demonstrates the additional value of the ML approach they propose. For example, can the differences between the changes in BD that are predicted by the ML and PTF approaches be related to independent information on management etc… ? What is the correlation between the changes in BD that are predicted by the two approaches? Since the change in BD is probably small compared to its spatial variability, it would be interesting to know whether the two approaches predict similar spatial patterns of the change and how these patterns of change are related to which input variables. I think this additional information is needed to give the paper more relevance.
Detailed comments:
Ln 26: ‘Surface BD is dominated factor’ Change to Surface BD is a dominating factor…
Ln 31: A reference would be needed here.
Ln 32 `Pedotransfer Functions (PTFs) have long been used to estimate BD by predicting soil properties based on readily available soil attributes.` This sentence has a strange structure. Skip: by predicting soil properties.
Ln 50: ‘In contrast, vis–NIR spectra from spectroscopy did not show significant differences in performance compared to PTFs-based models, but were still superior (Katuwal et al., 2020).’ This is contradictory.
Ln 58: ‘leading to issues such as overestimation’ I think this is one specific issue but not a general problem.
Ln 83: ‘Additionally, soil samples with OC data collected in 2009 were used for model implementation.’ How many.
Ln 84: ‚These samples included measurements of‘ which samples? The ones collected in 2004 and in 2009?
Eqs 1 and 2 are nearly identical. Eq 1 can be skipped. Eq 3 is trivial and can be skipped as well.
Figure 1: the color scale of the histogram does not match with that in the figure.
Ln 83: temperate climate: shouldn’t it be tropical climate?
Ln 111: weighted median. Which weights were used?
Ln 135: ‘Root Mean Square Error (RMSE)’ with respect to what? The 2004 BD measurements?
Ln 270 ‘as no ground-truth BD measurements were available for validation in that year’ The main purpose of the study was to investigate if the change in BD over time could be derived using the PTFs. To my understanding, that would require sampling of BD over time.
Ln 272: The 2009 dataset comprised 76,089 soil samples, containing OC percentages at a depth of 30 cm. Were sites where samples were taken in 2004 revisited in the 2009 campaign?
Ln 286: If you want to investigate changes of a variable in time, you best observe the parameter at the same location. Then you do a paired t-test.
Ln 287 𝜇2009 and 𝜎2004 should be 𝜎2009 and 𝜎2004
Ln 321: In contrast, the poorest-performing model, PSOC8, exhibited an RMSE of 6.273 g cm⁻³, highlighting significant predictive errors (Fig. 4). The RMSE is far beyond the maximal value of BD of soils. Can it be that wrong units for in- or output variables were used?
Ln 398: ‘This increase may be attributed to factors such as intensified land management practices, reduced soil organic matter, or increased soil compaction over time.’ It would be important to discuss how intensified land management practices and soil compaction are related to variables that are used as input in the ANN.
Ln 400: ‘The minimum BD values also showed a substantial increase, rising from 0.12 g cm⁻³ in 2004 to 0.95 g 400 cm⁻³ in 2009´ if not the same sites were visited, a comparison between the extremes is not very informative.
Ln 407: This transformation suggests a reduction in the occurrence of extreme BD values and a more balanced distribution of BD by 2009. See my comment above.
Ln 447 ‘Our findings show that OC-based PTFs, while exhibiting strong alignment with the RS-ANN model in mean BD values, displayed higher variability and greater prediction uncertainty, particularly in regions with fluctuating organic matter content’ Where is that shown?
Ln 458 where extreme BD values can lead. Do you mean extreme OM?
Ln551: 4) ‘Skewness and kurtosis analyses revealed that the RS-ANN model improved from a highly skewed distribution in 2004 (skewness = -2.81, kurtosis = 15.37) to a more balanced distribution in 2009 (skewness = -0.58, kurtosis = -0.41).’ If I understand it correctly, this is the skewness of the distribution of predicted BD, and not of the distribution of the difference between observed and predicted BD. It is interesting to note that the ANN and PTFs predict a different distribution
Ln 553: ‘In contrast, PTFs continued to show high skewness and kurtosis, indicating persistent prediction errors for outliers.’ This statement is not in line with the results shown in table 7.
Ln 555: ´5) The RS-ANN model demonstrated broader applicability across diverse soil types and land uses compared to traditional PTFs and OC-dominant ML models.´ Where is this shown?