the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multi-Source Remote Sensing for large-scale biomass estimation in mediterranean olive orchards using GEDI LiDAR and Machine Learning
Abstract. Accurate estimation of Above-Ground Biomass Density (AGBD) is essential for assessing carbon stocks and promoting sustainable agricultural practices. This study integrates multi-source remote sensing data, including GEDI LiDAR, optical, SAR, and topographic variables, to predict AGBD in Mediterranean olive orchards using a Random Forest regression model implemented on Google Earth Engine (GEE). The volumetric approach, based on GEDI L2A canopy height and dendrometric parameters, provided more accurate predictions than the GEDI L4A product, which is limited by its global stratification methodology. The model’s predictive performance varied depending on data combinations, with the fully multi-source configuration achieving the highest accuracy (R² = 0.62, RMSE = 5.95 Mg·ha⁻¹). NDWI, slope, and NDVI were identified as the most influential predictors. The spatial analysis revealed that Spain exhibited the highest total AGBD among the studied countries, followed by Italy and Greece, reflecting their dominance in olive production. The model effectively captured biomass variability across different regions, demonstrating its suitability for large-scale applications. This study highlights the potential of integrating LiDAR, optical, and SAR data for biomass estimation, offering a scalable and cost-effective approach for monitoring carbon stocks and optimizing agricultural resource management. By providing accurate AGBD predictions, this methodology supports climate-smart agriculture and facilitates data-driven decision-making for both farmers and policymakers, contributing to the advancement of sustainable agricultural systems in Mediterranean olive orchards.
- Preprint
(1905 KB) - Metadata XML
-
Supplement
(113 KB) - BibTeX
- EndNote
Status: open (until 14 May 2025)
-
RC1: 'Comment on egusphere-2025-917', Anonymous Referee #1, 29 Apr 2025
reply
General comments:
This paper proposes a method for estimating aboveground biomass density (AGBD) in olive orchards by combining GEDI L2A height data with orthophoto-derived canopy cover through a volumetric approach. Specifically, crown volume is defined as the overlap between the GEDI L2A footprint and canopy cover, and this volume is then converted to AGBD based on a series of assumptions regarding tree density, stem-to-canopy height ratio, and wood density. Finally, the GEDI L2A-derived AGBD is further predicted using satellite imagery and compared against the GEDI L4A AGBD product as a form of "validation". I think, the use of a volumetric method for AGBD estimation is relevant, particularly in plantation settings such as olive orchards, where tree density, stem-to-canopy height ratio, and wood density can be reliably estimated or sourced from the literature. However, there are three major concerns with how the method is applied in this study:
- First, the authors did not validate their product against field measurements or a more robust method for AGBD estimation (e.g., airborne LiDAR combined with allometric equations). Comparing results only against the GEDI L4A AGBD product is insufficient, as both products could share similar biases or errors.
- Second, the use of a volumetric approach at the footprint level based on GEDI L2A data is questionable. The L2A product has a ~25 m footprint, a positional accuracy of ~10 m, and a vertical accuracy of ~5 m. Combining this with orthophoto-derived canopy cover to estimate biomass is ambitious but fundamentally flawed, as it risks substantial spatial mismatches. Given the number of assumptions involved and the lack of field validation, it is difficult to assess the reliability of the resulting AGBD estimates.
- Third, it is unclear why the GEDI L2A-derived AGBD is subsequently predicted using satellite imagery at the footprint level. If the goal is to generate wall-to-wall maps, the issue of spatial mismatch remains. Moreover, the manuscript currently reads as if the L2A product itself was used as a predictor for estimating the GEDI L2A-derived AGBD, which, if true, would introduce severe data leakage and invalidate the approach.
Unfortunately, the paper also reads as an incoherent combination of machine learning approaches and derived results that are neither sufficiently explained nor properly validated. For example:
- The tree density prediction using Gaussian Process Regression lacks a clear explanation of the training data, validation process, or accuracy assessment.
- The canopy cover prediction from aerial imagery using linear regression appears to be based on 250 points derived from orthophotos, but there is insufficient methodological detail or accuracy evaluation.
- The AGBD estimation via the volumetric method seems to depend on unvalidated tree density and canopy cover products.
- The prediction of AGBD estimates from the volumetric method using Random Forest applied to satellite imagery lacks a clear justification, especially since a direct comparison between the GEDI L2A-derived and L4A AGBD products would have been more straightforward.
Finally, the paper also lacks a clear flow, and many concepts are referenced as common knowledge without proper introduction or explanation. There’s a lot of guesswork involved, as key details seem to be assumed rather than explained. It appears that there is a misunderstanding or lack of clarity regarding how the L4A product was derived. Additionally, the resolution at which you are working is unclear. I didn't proceed with the discussion section, as my concerns haven't been addressed earlier in the paper, making it difficult to engage with that part meaningfully.
In short, while the study introduces interesting ideas, the lack of rigorous validation and the reliance on stacked, unverified models undermine confidence in the results.
Specific comments:
P1, L15: The statement "provided more accurate predictions than the GEDI L4A product" is misleading without field data for validation.
P1, L19: Is olive tree biomass correlated with olive production? It would be useful to clarify this relationship.
P2, L1: The background on remote sensing methods for biomass estimation seems insufficient. Could you expand on the approaches typically used?
P2, L40-41: To my knowledge, SAR, especially L-band, can measure standing dead trees. Do you have a reference to support this?
P2, L46-47: Most natural forests are more structurally complex than a plantation. This statement contradicts your assumptions about tree density and structure. Could you clarify this?
P2, L57-58 and P3, L63-64: There are existing allometric databases for trees, including olive trees (e.g. Tallo). Have you considered these?
P3, L64-66: Isn't it the opposite? If you know the volume of each tree, there should be less uncertainty, assuming wood density remains constant. Could you clarify?
P3, L83: I'm still unclear on the main challenges of biomass estimation. Is it data scarcity, scale, or something else?
P3, L88: How exactly do you ensure spatial consistency? Given the misalignment between GEDI, satellite imagery, and orthophotos, this is a critical issue to address.
P3, L89: The jump to carbon sequestration seems sudden. Could you link this to biomass first? Are olive orchards typically used for carbon sequestration? This feels somewhat out of context.
P4, L93: The first sentence should be deleted, as it doesn’t seem relevant or necessary.
P4, L105: The proper notation should be Mg/ha or t/ha, not Ton/ha. Please apply this notation consistently throughout the text.
P5, L08: I'm confused about the contents of your training and validation dataset. Are you using GEDI, SAR, and optical data? This hasn’t been clearly mentioned. Or do you mean that your coverage provides a robust and variable area for training and validating your model?
P5, L113-118: This seems more suited for the introduction rather than the methods section. Could you move it accordingly?
P5, L120: The motivation for not using the L4A product seems to be missing in the introduction. Could you elaborate on why it was not used instead of developing a new product?
P5, L122: The second approach seems more like a comparison rather than an actual methodology. What is your ground truth here?
P5, L123-124: Why integrate with remote sensing data? If it's for creating wall-to-wall maps, this isn’t clear. Could you explain the reasoning?
P5, L125: At this stage, it's still unclear what optical and SAR data you are using. Could you clarify?
P7, L147: You still haven’t explained why you are linking this to remote sensing data. Could you provide more context?
P7, L149: This is confusing. Are you using Random Forest to scale up biomass estimates, or for variable importance? Please clarify.
P7, L151-153: Delete the last sentence as it doesn’t seem to add value.
P7, L163-165: This is not correct. I suggest reviewing how the L4A product is actually derived, as this part is misleading.
P7, L167: You can compare against L4A, but you cannot use it for validation. Validation implies true biomass measurements, which are not available here.
P7, L167-169: This is incorrect. The GEDI L4A product does not directly relate GEDI L2A metrics to field-measured biomass. Instead, it uses a model inversion approach based on airborne LiDAR-derived AGBD estimates, which were previously calibrated with field plot data.
P8, L181-182: Why use SAVI and BSI? Are they known to correlate with olive tree biomass? Could you provide more justification for this choice?
P8, L183: Could you clarify the resolution at which you're processing the HLS imagery?
P8, L184: Ground truth for what exactly? Is this intended for tree cover estimation? This needs to be clarified.
P8, L184-185: This is confusing. Is your canopy cover product based on aerial imagery or Sentinel-2 data? This needs to be clearer.
P8, L189: Why PALSAR and not Sentinel-1? Can you explain your choice of SAR data?
P8, L204: Where are the details of canopy cover prediction? How do you distinguish olive tree canopy cover from other types of cover? This needs more explanation.
P8, L206: Given the positional accuracy of the GEDI 25m footprint (~10m), does it make sense to use it at this scale? Could you clarify?
P9, L215: Please clarify that crown diameter (Cdiam) refers to the average crown diameter per tree within the footprint. This needs to be more explicit.
P9, L218: Olive tree heights range from 3m to 8m, and the RMSE of the L2A height product is about 5m. This creates a significant discrepancy. How do you address this uncertainty in tree height?
P9, L227-229: This needs to be introduced earlier in the paper. How exactly have you derived tree numbers? What ground truth and predictors did you use for your supervised learning?
P9, L235-236: L4A is not your approach; you are simply using it for comparison. Could you rephrase this section?
P10, L248-249: What do you mean by “improve”? Are you using olive trees identified in high-resolution imagery to train a model for olive canopy cover prediction in HLS data?
P10, L258: I don’t think the positional accuracy of GEDI L2A is suited for this type of work. Could you address this limitation?
P10, L272: I’m confused. Are you using both L2A and L4A as predictors? What’s your response variable? Or are you using your volumetric approach (based on L2A and canopy cover) as the response? If so, this creates issues with data leakage, as you cannot use L2A as both the predictor and the response.
P11, Table 1: What are the predictor and response variables here? This table needs clarification.
P13, L321: What’s RVI? Please define this abbreviation clearly.
P14, Figure 3: The frequency of GEDI L2A estimates is 10 times less than that of L4A. Why is this the case? Could you clarify this discrepancy?
P15: L355: In this section, it’s unclear what exactly you’re comparing and what your ground truth is. I assume you are comparing L2A-derived AGBD predicted using remote sensing data against the L4A AGBD product. However, this comparison is problematic without true biomass measurements for validation.
P16, Figure 4: You’re comparing two highly inaccurate models. Without field measurements, this comparison is not meaningful. I suggest reconsidering this analysis
Technical corrections:
P4, L107: Consider using "key producer" instead of "key reference"
P4, L107: Consider replacing "geometric normalization" with "co-registration", which is the more appropriate term in this context.
Citation: https://doi.org/10.5194/egusphere-2025-917-RC1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
103 | 28 | 5 | 136 | 9 | 5 | 6 |
- HTML: 103
- PDF: 28
- XML: 5
- Total: 136
- Supplement: 9
- BibTeX: 5
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1