Estimating soil carbon sequestration potential with mid-IR spectroscopy and explainable machine learning
Abstract. Soil carbon sequestration refers to the process of capturing atmospheric carbon through plant photosynthesis and storing it in soil as organic carbon. The primary mechanism for carbon sequestration is via organic carbon molecules adsorbing onto mineral surfaces of the soil's fine fraction (clay + silt ≤ 20 μm), forming mineral-associated organic carbon (MAOC). Soil has a finite capacity to stabilise and sequester organic carbon, known as carbon saturation capacity, which depends on the proportion of reactive minerals in the soil. The difference between the current MAOC content and the carbon saturation capacity is referred to as the organic carbon saturation deficit (Cdef) or sequestration potential. Fourier-transformed (FTIR) mid-infrared (mid-IR) spectroscopy can simultaneously measure soil properties relevant to carbon stabilisation, organic carbon functional groups, clay and iron-oxide mineralogy and particle size. Therefore, we hypothesise that mid-IR spectroscopy can effectively and accurately estimate Cdef. Thus, we aim to (i) develop spectroscopic models to estimate the MAOC and Cdef of 482 Australian topsoil samples, (ii) model MAOC and Cdef using mid-IR spectra and an interpretable machine learning, and (ii) interpret the MAOC and Cdef models using the explainable artificial intelligence (AI) algorithm SHapley Additive exPlanations (SHAP). Using frontier line analysis, we fitted a function to the upper envelope of the MAOC vs clay + silt relationship to derive Cdef. We recorded mid-IR spectra of the samples and used the regression trees method CUBIST to model MAOC content and Cdef. We interpreted these models by examining the regression trees and using SHAP. The models were unbiased and estimated MAOC content with R2 of 0.86 and RMSE of 2.77 (g/kg soil), and Cdef with R2 of 0.89 and RMSE of 3.72 (g/kg soil). Model interpretation revealed Cdef estimates relied on negative interactions with absorptions from organic matter functional groups and positive interactions with absorptions from clay minerals. Our results show that mid-IR spectra can effectively estimate MAOC and soil Cdef, offering a rapid and cost-effective method for assessing and monitoring this critical soil function.
Competing interests: At least one of the (co-)authors is a member of the editorial board of SOIL.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
General comments:
Based on national scale soil samplings, this manuscript proved the potential of implementing mid-IR spectra and machine-learning for MAOC and C deficit prediction. The results show that the CUBIST models for both MAOC and C deficit prediction have good performance, advocating their future application. They also make these models interpretable by matching absorption features of the mid-IR spectra and coefficients in models among different modeling rules. Nevertheless, several issues raised during my review which I think should be addressed before publication.
Minor comments:
Line 41: Instead of fitting 90th quantile regression, Georgiou et al used 95th quantile regression. Please check.
Line 116: Did this back-transformation be performed during uncertainty analysis? Since the authors used logarithm when fitting the frontier line, the upper and lower uncertainty intervals would be different between that undergone first calculating intervals then back-transformation, and that undergone first back-transformation then calculating intervals. Please clarify.
Line 124: What specific are the offset corrections? SNV transformation is well-known in spectroscopic area, while offset correction tend to be a series of mathematical operation on the spectra. Please clarify or at least provide reference.
Line 174-176: The result is not intuitive. It is hard to tell whether samples in Rule 3 have higher absorption in the 2946–2850 cm−1 region than that of Rule 4, given the scale of the y-axis in the two plots are not consistent. Could the authors please make this comparison more intuitive, thus better supporting the statement?
Line 255: The authors mentioned they have propagated the uncertainties from the frontier lines fits and the CUBIST models to our final predictions. Do the uncertainties of the frontier line fits have anything to do with the uncertainty of C deficit CUBIST model? Because the latter is demonstrated with parameters like RMSE only for C deficit model not its upper or lower 95% confidence intervals CUBIST models. There is a mismatch between the grey areas in Figure 5 and statistical parameters of the C deficit CUBIST model, indicating there is no propagation of the intervals to the final C deficit prediction. Please clarify.