the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Estimating soil organic carbon stocks in Pinus halepensis mill. stands using lidar data and field inventory
Abstract. Accurate estimation of soil organic carbon (SOC) in forest ecosystems is essential for quantifying their contribution as carbon sinks and improving management strategies in the face of climate change. The objective of this study was to model SOC in Pinus halepensis Mill. stands using structural metrics derived from LiDAR data from the National Aerial Orthophotography Plan (PNOA). The study area covered 46.8 hectares located in the municipality of Ampudia, Palencia (Spain). To carry out the work, systematic soil sampling and a forest inventory were conducted. LiDAR technology was also applied and 87 structural metrics were obtained. These metrics were integrated with edaphic variables and above-ground biomass data to build predictive models of carbon stock using multivariate regression techniques.
Among the models evaluated, the Random Forest algorithm showed the best performance in cross-validation (R² = 0.81; RMSE = 7.73 Mg/ha), demonstrating adequate predictive capacity compared to other models. The proposed approach made it possible to evaluate the potential of LiDAR data from airborne laser scanning (ALS), acquired within the framework of general mapping programmes, as an effective tool for the spatial estimation of SOC. This procedure, validated on an empirical basis, provides a useful methodological basis for advancing in the estimation of SOC through remote sensing, contributing to improve the quantification of soil-related ecosystem services.
- Preprint
(1471 KB) - Metadata XML
-
Supplement
(1068 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-3669', Shang Wang, 12 Oct 2025
-
AC2: 'Reply on RC1', Celia Herrero de Aza, 09 Feb 2026
Author response to Referee #1
We thank Referee #1 for the careful reading of our manuscript and for their constructive, detailed comments. We appreciate the positive assessment of the relevance of the topic and the modeling approach. The following is our response to each of the referee’s general and specific comments. All suggested changes have been applied to the revised manuscript.
General comments
Comment:
The Introduction and Discussion are too long, the Materials and Methods section lacks sufficient detail, and the Results and Figures require substantial improvement.Response:
We thank the referee for this overall assessment. The clarity and structure of the manuscript have been improved. In the revised version, the Introduction has been shortened (it now ends on line 103) to reduce excessively descriptive literature references. We have reorganized the topics to focus on the most relevant studies of SOC modeling using LiDAR.The Discussion section (lines 288-394) has undergone similar streamlining to highlight the most important findings (SOC approach, LiDAR application, RF evaluation, limitations and further studies).
The referee’s suggestions have also been integrated into the improved Materials and Methods and Results sections. We revised the description fo the soil carbon determination and modeling procedures in the Materials and Methods section to ensure full reproducibility. The Results section has been rewritten and the number of tables and figures has been reduced. Supplementary Material has also been edited.
Specific comments
Title
Comment:
“lidar” → “LiDAR”Response:
This has been corrected.Abstract
Comment:
Please explain the meaning of “LiDAR” when it first appears. Include key SOC stock results and briefly discuss management practices or climate events affecting SOC.Response:
These suggestions have been incorporated into the new manuscript: the meaning of “LiDAR” was added in lines 14 and 42 and SOC stock results appear in lines 19-20, along with a brief comment about management practices in line 26.Introduction
Comment:
The Introduction is relatively long; focus on relevant studies and clearly state scientific hypotheses.Response:
The introduction was revised and reduced in length to focus on relevant and related topics. Two hypotheses were added in line 97.Materials and Methods – Soil carbon determination
Comment:
Provide sufficient methodological detail; the techniques used are not the most common.Response:
A paragraph has been added to the Discussion section (lines 374-389) to clarify this issue.Materials and Methods – Modeling approach
Comment:
The modeling description lacks clarity and justification for model selection.Response:
We have re-worked the modeling description by clarifying the models used and adding two new references in line 226 (Odebiri et al., 2021; Beisekenov et al., 2025). The Materials and Methods section has been thoroughly revised and is substantially improved in the new manuscript.Results
Comment:
Avoid restating what is apparent in tables and figures.Response:
The Results section has been revised, repetitive information eliminated, and the number of tables and figures reduced to five each.Discussion
Comment:
The Discussion is overly long and should be condensed.Response:
The discussion section has been reduced and streamlined to focus on the main topics.
Tables and Figures
Comment:
The number of figures and tables is excessive; improve quality and design.Response:
The number of figures and tables has been reduced and supplementary material has been revised.Line-specific comments
L33:
Clarify what soil condition or SOC status is expected to be achieved by 2050.Response:
The suggestion has been incorporated and appears in Line 34.L45:
SOC already defined in the Abstract.Response:
Agreed. The redundant definition has been removed.L46–47:
“Soil facilitates photosynthesis” is inaccurate.Response:
We agree. The corresponding paragraph has been removed from the revised Introduction.L52:
“reducing the effects of climate change” → “reducing the negative effects of climate change”Response:
We agree. The corresponding paragraph has been removed from the revised Introduction.L192:
Why 34 plots? When was sampling conducted? Temporal consistency of data?Response:
The number of plots was determined based on the systematic sampling design and logistical constraints, to ensure representative spatial coverage of the study area. Soil and forest inventory sampling were conducted within a short and consistent window of time. LiDAR data acquisition occurred within a compatible timeframe, to minimize temporal mismatch effects.L206:
Consider elemental analyzer measurements for total carbon.Response:
A paragraph has been added to the Discussion section (lines 374-389) to clarify this issue.L213:
Clarify how “organic C content of the fine soil fraction” was determined.Response:
The samples were sieved; the term “fine” has been removed to avoid confusion.
-
AC2: 'Reply on RC1', Celia Herrero de Aza, 09 Feb 2026
-
RC2: 'Comment on egusphere-2025-3669', Anonymous Referee #2, 14 Dec 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3669/egusphere-2025-3669-RC2-supplement.pdf
-
AC1: 'Reply on RC2', Celia Herrero de Aza, 09 Feb 2026
Author response to Referee #2
We would like to thank Referee #2 for the insightful review of our manuscript.
We appreciate the positive assessment of the relevance of the topic and the modeling framework, as well as the detailed and constructive suggestions.
The Referee’s suggestions have been incorporated into the revised manuscript, as explained in the following responses to each of the general and specific comments.
General comments
Comment:
The manuscript emphasizes technical descriptions with limited ecological interpretation. The added value of LiDAR compared to non-remote sensing approaches is unclear, validation is limited to cross-validation, and key edaphic controls are not explicitly quantified.Response:
We thank the referee for raising these important conceptual and methodological points. We agree that the original version placed excessive emphasis on technical descriptions and that the ecological interpretation and the broader relevance of LiDAR-based approaches needed to be clarified.In the revised manuscript, we reinforced the ecological interpretation by linking LiDAR-derived forest structural attributes (e.g., canopy height variability, vertical complexity, stand density) to key SOC-related processes, including litter inputs, root biomass distribution, and soil microclimate regulation. Text has been added to the Introduction (lines 54-63) and the Discussion (lines 294–305; 330-336) to clarify these mechanical links and emphasize that LiDAR metrics act primarily as indirect proxies for vegetation-mediated and site-integrating processes rather than as direct drivers of SOC accumulation.
We also clarified the added value of LiDAR compared to approaches that do not use remote sensing. Several paragraphs have been expanded in the text (lines 47-51; 290-293; 342-344) to contrast SOC estimates based exclusively on forest and soil inventory variables with models that integrate structural information derived from LiDAR. The comparison highlights how LiDAR improves the spatial representativeness of SOC predictions by capturing spatial heterogeneity in forest structure, something that cannot be portrayed through conventional field inventories alone. This idea has also been worked into the Conclusion (Line 383).
It is important to note that the new manuscript emphasizes the operational and strategic relevance of LiDAR in countries with systematic and comprehensive aerial laser scanning programs. In Spain, for example, national LiDAR coverage is acquired at regular intervals (lines 47 – 51). Establishing robust relationships between forest structure and SOC makes it possible to generate spatially continuous SOC estimates at regional and national scales. This approach could substantially reduce the number of soil samples required for validation while providing reliable information for large-scale soil monitoring, land use planning, and sustainable soil management – an aspect that is currently underrepresented in national soil assessment frameworks.
Finally, we have expanded the discussion on the model limitations (lines 350-356). The revised text explicitly acknowledges the absence of an independent validation dataset and justifies the use of cross-validation due to sample size constraints. In addition, we note that the lack of certain edaphic variables (e.g., soil texture, pH, biological indicators) constitutes an important limitation and discuss how their inclusion could improve model performance in future studies (lines 327; 334; 346; 373).
Specific comments
1) Feature selection and dimensionality reduction
Comment:
97 structural metrics were used; was feature selection applied to avoid overfitting?Response:
Yes. Feature selection and dimensionality reduction were applied prior to model fitting, to reduce redundancy, control multicollinearity, and mitigate overfitting. The revised section on SOC modeling (line 223) describes the three sequential steps of the pre-processing workflow.First, predictor variables with variance close to zero (variance < 0.01) were removed, as they did not contribute significant information to model performance. Second, multicollinearity among predictors was assessed using a Pearson correlation matrix and highly correlated variables (|r| > 0.90) were excluded, retaining only one variable from each correlated pair. This step substantially reduced redundancy among the original LiDAR-derived metrics. Third, all retained predictors were standardized (mean = 0, standard deviation = 1) prior to model fitting.
In addition, model-level safeguards against overfitting were implemented using Random Forest, which is inherently robust for high-dimensional predictor spaces thanks to bootstrap aggregation and random feature selection at each split (lines 232-237). Model performance was further controlled through k-fold cross-validation (v = 10), so that predictive accuracy was evaluated on unseen data.
Together, these steps ensured that the final models were trained using a reduced and informative subset of LiDAR metrics, thereby minimizing overfitting and preserving the structural information necessary for SOC estimation.
2) Novelty and articulation of the research gap
Comment:
The gap regarding LiDAR-based SOC modeling is not articulated strongly enough.Response:
Two paragraphs have been added, one in the Introduction and another in the Discussion, to emphasize this relationship. (lines 64-84; 330-336).3) Scale mismatches and generalisability
Comment:
Challenges are identified but not clearly addressed.Response:
Text has been added in the Discussion section to explain the research limitations and the direction of further studies (lines 334, 373).4) Mechanistic link between ALS metrics and SOC
Comment:
Why are ALS metrics effective predictors of SOC?Response:
ALS-derived metrics are effective predictors of SOC because they capture structural attributes of the forest. They are mechanistically linked to vegetation-mediated processes that control carbon inputs and stabilization in the soil, even though they are not direct measurements of soil properties.In the revised Discussion (lines 298-303), we clarify that LiDAR metrics serve as indirect proxies for ecosystem structure and functioning by integrating multiple processes that influence SOC accumulation. Metrics related to canopy height, vertical complexity, canopy density, and return distribution reflect above-ground biomass, litter production, and root system development, which are the main sources of organic carbon soil inputs. Structural complexity also influences microclimatic conditions such as soil temperature and moisture, which regulate microbial activity and decomposition rates, thereby affecting SOC persistence.
We further emphasize that ALS metrics integrate spatial variability in vegetation structure at scales that are difficult to capture through conventional forest inventories alone. As a result, they indirectly encode information about site productivity, forest development stage, and management history, all of which are known to influence SOC dynamics in Mediterranean forest ecosystems.
Importantly, the new version clarifies that ALS metrics should not be interpreted as direct drivers of SOC but as integrative indicators of vegetation-soil interactions and site conditions (lines 332-336). In the Discussion we also acknowledge that the absence of certain edaphic variables (e.g., soil texture, pH, biological activity) limits the direct attribution of causality and that ALS-based predictions benefit from combining structural proxies with soil measurements, when available.
5) SOC and bulk density relationship (Table 5)
Comment:
The positive relationship between SOC and bulk density is counterintuitive.Response:
This has been removed from the revised Results section.6) Divergence between R² and RMSE/MAE (Table 7)
Comment:
Please interpret the discrepancy between performance metrics.Response:
Text has been added to address this issue in lines 272-274.7) Figure 4 caption
Comment:
Add descriptions of OM, BD, CD, and C data.Response:
A note has been added with the description of the terms.8) Equation 2 notation and depth information
Comment:
Use lowercase i and specify horizon thickness.Response:
This has been addressed.9) SAS citation placement
Response:
This has been remedied, see line 224.10)
Comment:
“In line 259, it is better to add the information of the percentage of data availability.”Response:
This information is clearer in the improved Materials and Methods section.11)
Comment:
“In line 260-261, could you please clarify why you use the criterion of the heights of less than 2 m as terrain? Do you validate against your ground truth?”Response:
We thank the reviewer for this comment and agree that this point needed clarification.In the revised manuscript (lines 187-190), we have clarified that the height limit < 2 m was only applied during the generation of the Digital Terrain Model (DTM) and not for deriving vegetation structural metrics. This limit was selected based on the vegetation characteristics in the study area, where the understory is sparse and dominated by Quercus regeneration, with heights generally not exceeding 2 m. As a result, returns below this height predominantly correspond to terrestrial surfaces or vegetation at low-to-medium height.
Additionally, the revised manuscript clearly states that the LiDAR data used in this study were obtained from Spain’s National Aerial Orthophotography Plan (PNOA), with established point clouds classification and quality controls, including soil classification, based on standardized national protocols (line 167-168). The < 2 m criterion was used as an additional filtering step during DTM interpolation to ensure consistency at the parcel scale, rather than as a primary soil classification method.
12)
Comment:
“In line 263, could you briefly specify the tree segmentation algorithm and parameters used?”Response:
We thank the reviewer for this comment and agree that the description of the tree segmentation procedure needed greater detail.In the revised manuscript (line 200), we have described how individual tree segmentation was performed using the algorithm proposed by Dalponte et al. (2016), based on canopy height model (CHM) analysis and local maximum detection, then followed by growth region segmentation. We also updated the information regarding DTM and CHM, which were generated with a spatial resolution of 0.5 m. This resolution was selected as a compromise between computational efficiency and the point density available in the study area. It proved adequate to reliably delineate individual tree crowns.
Given the LiDAR point density of the PNOA dataset in the study area, the resulting CHM provided sufficient spatial detail to identify individual trees, particularly in the dominant and co-dominant canopy layers. These are the most relevant layers for deriving the forest structural metrics used in subsequent analyses.
13)
Comment:
“In line 265, could you clarify the reason why you use the IDW method rather than the TIN-based method for generating DTM?”Response:
We thank the reviewer for this comment and agree that the reasoning behind our choice of interpolation method should be included in the text.In the revised manuscript (Line 191), we explain how, behind the scenes, we initially tested several interpolation methods for DTM generation, including TIN-based interpolation, Inverse Distance Weighting (IDW), and Cloth Simulation Filtering (CSF). The resulting DTMs were visually inspected and compared to assess overall consistency and suitability for the objectives of this study. These preliminary comparisons showed that IDW produced terrain surfaces that were visually and structurally comparable to those generated using TIN and CSF, with no substantial differences affecting subsequent canopy normalization or metric extraction.
Given the similarity in results, IDW was selected due to its robustness, computational simplicity, and stable performance across plots with variable point distributions. This choice aligns with previous studies showing that in some contexts TIN-based interpolation can produce higher interpolation errors than other methods. For example, Susetyo (2016) reported that TIN yielded higher mean error and RMSE values than IDW and kriging when constructing digital elevation models from the same dataset.
We also acknowledged that other studies have reported favorable performance of TIN-based approaches under different terrain and data conditions. However, the main objective of this study was forest structural characterization and SOC estimation rather than detailed topographic analysis, and the differences among interpolation methods were considered negligible in relation to the study goals. Consequently, IDW was adopted as a reliable and pragmatic solution that did not compromise the accuracy of the LiDAR-derived metrics or the subsequent modeling results.
14)
Comment:
“In line 269, could you please briefly describe how negative normalized heights (e.g., due to interpolation artifacts) were handled and specify the spatial resolution of the DTM in this study?”Response:
We thank the reviewer for this comment.In the revised manuscript (Line 178), we clarify that height normalization was applied exclusively to LiDAR returns not classified as groun. Therefore, interpolation artifacts associated with DTM had a very limited effect on the normalized vegetation point cloud. Negative normalized heights only occurred in a tiny fraction of the points, with a median magnitude of approximately 7 cm (Appendix A_suppl material), and were negligible in relation to the vertical scale of the vegetation metrics used in this study.
The spatial resolution of DTM and CHM is reported in the LiDAR pre-processing steps described in response to Comment 1.
15)
Comment:
“In line 270, could you please indicate the spatial resolution of the CHM, because this greatly affects tree crown detection accuracy.”Response:
The spatial resolution of CHM is reported in the LiDAR pre-processing steps described in response to Comment 1.16)
Comment:
“In line 286, please specify how LAD metrics were derived (e.g., which algorithm or model was used to estimate LAD from LiDAR returns), as different methods can yield substantially different results.”Response:
We thank the reviewer for this comment.In the revised manuscript (Line 178), we specify that all LiDAR-derived metrics were extracted using the lidRmetrics functions implemented in the lidR R package (Tompalski, 2025). Derivation of Leaf Area Density (LAD)-related metrics in this study follows established LiDAR-based formulations for characterizing vertical canopy structure, as implemented in the lidRmetrics package, which draws on multiple methodological sources in the literature, including the foundational work of Lefsky et al. (1999).
Accordingly, the LAD metrics in this study correspond to relative, LiDAR-based proxies for vertical canopy structure, derived from the normalized point cloud, rather than direct physiological measurements of leaf area. We also acknowledge in the text that different LAD estimation approaches may yield different results, and this source of methodological uncertainty is now noted in the Methods section.
17)
Comment:
“In line 288, the author mentioned all metrics are used as predictors, please briefly describe if there are redundancy and multicollinearity and how the author handles this?”Response:
We thank the reviewer for this comment and offer the following explanations.Although the full set of LiDAR-derived metrics was initially considered, redundancy and multicollinearity were addressed prior to model fitting, as described in Section 2.6. Several pre-processing steps were applied to eliminate variables that did not contribute meaningful information to the models. These steps included the removal of predictors with near-zero variance, the exclusion of highly correlated variables based on Pearson correlation analysis, and the standardization of the remaining predictors. The model was developed exclusively from the subset of variables that were retained after these procedures had been applied.
18)
Comment:
“In line 280-289, please add citations for commonly used metric families (percentiles, LAD, Lmoments).”Response:
We thank the reviewer for this suggestion and agree that providing conceptual references for commonly used LiDAR metric families improves clarity.In the revised manuscript (Line 178), we clarify that all structural metrics were computed using the lidRmetrics package (Tompalski, 2025), and we have included representative references describing the theoretical basis and common use of key metric families. Height percentiles are referenced based on Næsset (2002) and LAD-related metrics follow early formulations of LiDAR-based canopy vertical structure characterization (Lefsky et al., 1999). The L-moment–based metrics are supported by Hosking (1990) and their more recent application in forest LiDAR studies (e.g., McRoberts et al., 2018).
19)
Comment:
“In line 305, does the author mean that only mtry and min_n were tuned, how about other hyperparameters (e.g., number of trees, node depth, sampling scheme), which can also influence model performance. Please clarify whether these values were kept at defaults or fixed manually.”Response:
We thank the reviewer for this comment and agree that this point required clarification.In the revised manuscript (Line 232), we explain that only the mtry and min_n hyperparameters were tuned, while all other Random Forest hyperparameters (e.g., number of trees, maximum node depth, and sampling scheme) were kept at default values, as implemented in the modeling framework used. Given the relatively small sample size, this decision was made to limit model complexity and reduce the risk of overfitting.
20)
Comment:
“In line 365 ‘total carbon tree biomass’ is unclear. Does root biomass account for a portion of total carbon biomass, or are you implying that root biomass equals total carbon biomass? Rephrase for accuracy.”Response:
The sentence was corrected.21)
Comment:
“In Table 7 please add units for RMSE and MAE.”Response:
The units were added.
-
AC1: 'Reply on RC2', Celia Herrero de Aza, 09 Feb 2026
-
AC3: 'Comment on egusphere-2025-3669', Celia Herrero de Aza, 09 Feb 2026
Dear Editor,
We are pleased to submit the revised version of our manuscript titled "stimating soil organic carbon stocks in Pinus halepensis Mill. stands using LiDAR data and field inventories for further consideration in Egusphere
We would like to express our sincere gratitude to you and to the reviewers for the time, expertise, and constructive feedback provided during the evaluation of our previous submission. We have carefully addressed all comments and suggestions, which have greatly contributed to improving the clarity, structure, and scientific robustness of our work.
In the revised manuscript, we have incorporated all requested modifications, and we provide a detailed point‑by‑point response document outlining how each comment has been addressed. All changes made in the manuscript are clearly marked to facilitate your review.
Although it was not explicitly required, we also arranged for a professional English language editor to proofread the entire manuscript. We believe this additional step has substantially improved the readability and overall quality of the text.
We hope that the revised version now meets the journal’s standards and expectations. Thank you again for your thoughtful comments and for considering our work for publication.
Please do not hesitate to contact us if any further clarification is needed.
Kind regads
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 640 | 148 | 25 | 813 | 67 | 23 | 22 |
- HTML: 640
- PDF: 148
- XML: 25
- Total: 813
- Supplement: 67
- BibTeX: 23
- EndNote: 22
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This study presents a valuable modeling analysis of forest SOC stocks based on systematic soil sampling and forest inventory data. The topic is interesting and relevant to the journal’s scope. However, the manuscript has several weaknesses. In its current form, the Introduction and Discussion are too long, the Materials and Methods section lacks sufficient detail, and the Results and Figures require substantial improvement. Below are my general and specific comments:
Title: “lidar” → “LiDAR”
Abstract: Please explain the meaning of “LiDAR” when it first appears. Then, I suggest including some key SOC stock results from your modeling (e.g., the estimated SOC stock in the study region). Moreover, briefly discuss which management practices or climate events could significantly affect SOC stock in this area.
Introduction: This section is relatively long. Please focus on previous studies that are directly relevant to your research and summarize them, rather than listing individual findings (e.g., “who found…”, “who indicated…”). In addition, I strongly recommend that the authors clearly state their scientific hypotheses, as expected in a high-level research article.
Materials and Methods: The methods for soil carbon measurement are critical, especially since the techniques used are not the most common approaches. Please provide sufficient methodological detail to ensure reproducibility. Moreover, I suggest carefully reviewing the modeling methods. While I am not a modeling expert, the description appears to lack clarity and justification for model selection.
Results: Avoid simply restating what is visually apparent in tables or figures. Instead, highlight key comparisons between treatments and emphasize the most interesting findings.
Discussion: This section is overly long and should be condensed to focus on major interpretations and implications. As modeling is not my main expertise, I cannot provide detailed comments here, but greater clarity and focus would strengthen this section.
Tables and Figures: The number of figures and tables seems excessive. Please consider combining related figures or moving some to the Supplementary Material. By the way, figures should be improved in quality and design to meet the publication standards of an SCI journal.
L33: Please clarify what specific soil condition or SOC status is expected to be achieved by 2050.
L45: SOC has already been defined in the Abstract; no need to redefine it here.
L46–47: The statement that “soil facilitates photosynthesis” is inaccurate. Soil does not directly facilitate photosynthesis; it acts as a medium for water cycling and carbon storage. Please revise or clarify this statement.
L52: “reducing the effects of climate change” → “reducing the negative effects of climate change.”
L192: Why were 34 plots established? What distinguishes them? Also, specify when the sampling was conducted. It would be ideal if tree, soil, and remote sensing data were collected within a similar time frame.
L206: The method for soil total carbon (TC) measurement or conversion is crucial. If possible, consider including measurements made with an elemental analyzer.
L213: How was the “organic C content of the fine soil fraction” determined? Please specify the analytical method used.