PeatDepth-ML: A Global Map of Peat Depth Predicted using Machine Learning
Abstract. Peatlands are major carbon stores that are sensitive to climate change and increasingly affected by human activity. Accurate assessment of carbon stocks and modelling of peatland responses to future climate scenarios requires robust information on peat depth. We developed PeatDepth-ML, a machine learning framework that predicts global peat depths using a comprehensive database of peat depth measurements for training and validation. Building on an existing framework for mapping peatland extent, we incorporated new environmental datasets relevant to peat formation, revised cross-validation procedures, and introduced a custom scoring metric to improve predictions of deep peat deposits. To evaluate model sensitivity to sampling bias inherent in the training data, we applied a bootstrapping approach. Model performance, assessed using a blocked leave-one-out approach, yielded a root mean square error of 70.1 ± 0.9 cm and a mean bias error of 2.1 ± 0.7 cm, performing as well as or better than previously published models. The global map produced by PeatDepth-ML predicts a median peat depth of 134 cm (IQR: 87–187) over areas with more than 30 cm of peat. Like other regression-based models, PeatDepth-ML tended to predict toward mean training depths. An area of applicability analysis suggests the model has good applicability globally with the exception of some coastal and several mountainous regions like the Andes and the highlands of Borneo and New Guinea. Predictor selection was highly sensitive to training data subsets that arose from the bootstrapping approach, occasionally resulting in regional variations in accuracy. The bootstrapping approach and our area of applicability analysis thus clearly demonstrates the prime importance of quality training data in data-driven approaches like PeatDepth-ML. Using our predicted peat depth map, together with peatland extent and literature-derived estimates of bulk density and organic carbon content, we estimate global peat carbon stocks at 327–373 Pg C, consistent with previous global estimates.
The authors present PeatDepth-ML, a machine-learning framework for predicting global peat depth using a large compilation of peat depth measurements and environmental covariates. They extend existing peatland mapping approaches by incorporating additional predictors, revised spatial cross-validation, a custom metric targeting deep peat, and a bootstrapping strategy to assess sensitivity to sampling bias. Model performance is evaluated with blocked leave-one-out validation, and the resulting global peat depth map is used to estimate global peat carbon stocks, which are found to be consistent with previous studies.
I think the work is relevant for the journal and generally well-executed, though I think some revisions are in order prior to publication. I will give detailed list of comments in the following. Thank you for your work.
Detailed comments:
Lines 49 and 66: "machine learning" --> use abbreviation "ML".
Line 92, Figure A1: I think Figure A1 is quite important, presenting the peat data distributions. Why not include it in main text instead of in appendix?
Line 97: "However, grid cells with zero peat depth consistently dominate..." --> explicitly state the percentage of zero peat depth as it is the substantial majority of the data. I think it is good to state as the data is quite, though naturally, imbalanced.
Line 185: "machine learning" --> "ML"
Line 189: What were the hyperparameters which were optimized? I did not see them listed.
Line 192: "cross validation" --> "cross-validation"
Line 205: "don't" --> "do not"
Line 209: Add reference for LightGBM, maybe also fully open up the term. Lets not assume reader knows all the abbreviations by default.
Line 247: Did you mention somewhere how many predictors you had in total available for the ML runs? I would be curious to know this.
Figure 8 and A1: I am not used to horizontal histograms or distributions being presented. Was there a particular reason for this? If not, why not use standard orientation in visualization (vertical bars), which, to my experience, is more common.
Figure A1 caption: extra whitespace before ".", "...desert data ."
Line 357: Open up the abbreviations, although well-known, the RMSE, MBE, NME. They are mentioned also in appendix more specifically, but good the clarify the abbreviations, once introduced.
Line 362: Could you please elaborate on the null models a bit. Do you mean baseline models? Also on same line, notice extra period ". ."
Line 370: "BLOOCV" Did you define this abbreviation, even though clear to myself. But still, define it earlier in the text when you mention cross-validation.
Figure 9: The legend is little bit unclear for me. What is "bootstrap results", what results? Maybe rephrase more clearly, if possible.