Spatiotemporal Characterization of Wheat Development Using UAV LiDAR Structure&ndash;Intensity Fusion with Multispectral and Thermal Data

Bates, Jordan Steven; Montzka, Carsten; Bajracharya, Rajina; Vereecken, Harry; Jonard, François

doi:10.5194/egusphere-2025-5336

Preprints

https://doi.org/10.5194/egusphere-2025-5336

Preprints

02 Dec 2025

| 02 Dec 2025

Spatiotemporal Characterization of Wheat Development Using UAV LiDAR Structure–Intensity Fusion with Multispectral and Thermal Data

Jordan Steven Bates, Carsten Montzka, Rajina Bajracharya, Harry Vereecken, and François Jonard

Abstract. This study presents the first integration of UAV LiDAR structure (canopy height (CH), multi-layer gap fraction (GF)) and intensity features with multispectral (MS) and thermal infrared (TIR) data for aboveground biomass (AGB) estimation in winter wheat. A shallow artificial neural network (ANN), trained on a limited but high-quality destructive dataset, enabled direct integration of multi-sensor features without complex parameterization, supporting systematic evaluation of individual and combined sensor performance. Among single-sensor inputs, LiDAR features were most effective. LiDAR alone, combining all of its features such as CH, multi-layer GF, and INT, achieved a testing RMSE of 1.73 t/ha (18.27% error) and R² = 0.87, surpassing the common reliance on CH or MS features in UAV-based AGB studies. Multi-layer GF also improved accuracy compared to conventional ground-return GF and was successfully used as a direct ANN input. Fusion with other sensors further enhanced performance, with the best model (LiDAR INT + MS + TIR) reaching a testing RMSE of 1.47 t/ha (16.3% error) and R² = 0.91. Notably, this outperformed fusion models that included LiDAR CH or GF, indicating that INT is a particularly information-rich predictor likely encoding both structural and physiological canopy properties. Furthermore, sensor contributions varied seasonally, with CH and GF most informative during early growth and canopy closure, while MS and TIR became dominant during senescence and stress, with rankings providing practical guidance for sensor selection based on monitoring periods or economic constraints. Results from nitrogen treatments indicated that UAV data captured management effects more effectively than destructive sampling, highlighting the value of spatially comprehensive observations, an advantage that can be further enhanced through the fusion of emerging UAV sensor products. Overall, the findings position LiDAR’s dual structural and spectral information, particularly INT, as a promising breakthrough for improving UAV-based AGB monitoring, with strong potential to advance multi-sensor fusion approaches as algorithms and crop applications broaden.

Received: 05 Nov 2025 – Discussion started: 02 Dec 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Jordan Steven Bates, Carsten Montzka, Rajina Bajracharya, Harry Vereecken, and François Jonard

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-5336', Jonti Shepherd, 03 Dec 2025
This manuscript addresses UAV-based multi-sensor fusion for biomass estimation in winter wheat. The dataset is valuable and the full-season evaluation is useful. However, several scientific, methodological, and interpretive issues limit the strength of the conclusions. Many claims are overstated or insufficiently supported by the analysis presented. The paper requires significant refinement before it meets high scientific standards.
Major Issues 1. Novelty is overstated
The manuscript repeatedly claims to be the first to integrate LiDAR CH, multi-layer GF, LiDAR intensity, MS, and TIR for biomass estimation. This is not accurate. Multi-sensor fusion combining LiDAR and MS has been done, including in wheat. LiDAR + TIR exists in related cereal studies. The unique contribution here is the specific configuration and systematic comparison, not the existence of the fusion itself.
The novelty claim must be rewritten. As written, it will not pass expert scrutiny.
2. Interpretation of LiDAR intensity is not adequately justified
The paper makes strong claims about LiDAR intensity encoding physiological and structural canopy traits. These claims are not sufficiently supported.
Intensity is highly sensitive to range, angle, moisture, and instrument drift.

Only ground-return normalization is applied, which is insufficient for cross-date physiological interpretation.

No radiometric calibration or angular correction is performed.

No independent evidence is presented linking intensity to nitrogen status, pigment changes, or biochemical traits.

The conclusions around LiDAR intensity are overstated and require substantial tempering. At present, the manuscript treats INT as if it were a calibrated spectral variable, which it is not.
3. The ANN modelling framework lacks statistical rigor
The modelling approach is a weak point of the paper.
Only 86 destructive samples are available, which is extremely limited for training multi-feature ANN models.

A simple 70/30 split is insufficient for robust validation; no cross-validation, no repeated sampling, and no uncertainty estimates are provided.

There is no demonstration that spatial autocorrelation was controlled. The model may be learning subplot-level patterns, not generalizable relationships.

Feature dimensionality is high relative to sample size, yet model selection is described loosely and without formal procedure.

As it stands, the modelling is not statistically robust enough to support the strong performance claims made throughout the manuscript.
4. Multi-layer GF method needs deeper justification
The multi-layer GF analysis is a valuable idea, but the implementation and interpretation need more discipline.
The segmentation thresholds (10/20/30 cm) appear arbitrary.

No empirical or theoretical justification is given for why these scales represent meaningful canopy stratification.

The comparison to 3DPI and classical GF methods is underdeveloped.

The model’s sensitivity to point density and scan geometry is not addressed.

The method shows potential, but the manuscript overstates its generality and does not provide enough evidence for the proposed optimal configuration beyond this specific dataset.
5. Sensor dominance over time is overstated
The temporal analysis suggests shifts in sensor utility across growth stages. While the general trends are plausible, the manuscript repeatedly makes categorical statements (e.g., “MS dominates during senescence,” “CH dominates early”) without rigorous statistical backing.
These conclusions need to be presented as observations from this dataset, not generalizable statements about sensor behaviour.
6. Nitrogen treatment analysis draws conclusions not supported by data
Figure 12 shows expected spatial smoothing from UAV-based predictions compared to subplot-level destructive samples. This does not prove that UAV “captures management effects more effectively.” It only shows that UAV sampling is spatially denser.
The manuscript conflates spatial resolution advantages with biological sensitivity. This needs correction.
Other Critical Points
The abstract uses promotional language (“breakthrough,” “promising tool”) that is not supported by the analysis.

Figures are overly dense, especially Figures 7–9; interpretation is difficult.

No error bars or confidence intervals are provided anywhere, this weakens all conclusions.

The discussion section extrapolates beyond the evidence, particularly regarding the physiological meaning of intensity and the claimed operational advantages.

Overall Recommendation: Major Revision
The study contains useful data and a potentially meaningful contribution, but the manuscript currently over-interprets several findings and lacks the methodological rigor needed to substantiate its strongest claims. The modelling framework must be strengthened, novelty claims must be corrected, and conclusions around LiDAR intensity and temporal sensor dominance must be tempered and grounded.
With substantial revision, this paper could be publishable, but it does not meet the standard required in its current form.
Citation: https://doi.org/10.5194/egusphere-2025-5336-RC1
- AC1:
  'Reply on RC1', Jordan Bates, 18 Apr 2026
  Reviewer 1 Responses
  We thank the reviewer for the detailed and constructive evaluation of our manuscript. The comments provided have been instrumental in improving the clarity, methodological rigor, and interpretation of the study.
  In response, we plan to revise the manuscript to better align the claims with the supporting evidence, strengthen the statistical evaluation of the modelling framework, and refine the discussion of key findings. We believe these revisions would result in a more balanced and robust presentation of the work.
  Below, we provide the reviewer’s full comments in bold and italicized text, followed by our responses to each comment.
  The manuscript repeatedly claims to be the first to integrate LiDAR CH, multi-layer GF, LiDAR intensity, MS, and TIR for biomass estimation. This is not accurate. Multi-sensor fusion combining LiDAR and MS has been done, including in wheat. LiDAR + TIR exists in related cereal studies. The unique contribution here is the specific configuration and systematic comparison, not the existence of the fusion itself. The novelty claim must be rewritten. As written, it will not pass expert scrutiny.
  
  We would like to clarify that it was not our intention to claim that multi-sensor fusion of LiDAR, multispectral, and thermal data has not been previously explored. We agree that such approaches have been applied in crop monitoring, including in wheat.
  Our intention was to emphasize the specific combination and systematic evaluation of multiple LiDAR-derived features, particularly multi-layer gap fraction and normalized intensity, together with multispectral and thermal data within a unified framework. We acknowledge that the original wording may have given the impression of a broader novelty claim than intended.
  In response, we plan to revise the manuscript to remove or rephrase statements that could be interpreted as claiming novelty in the fusion concept itself. The revised text now more clearly positions this study as an extension of existing multi-sensor approaches, focusing on the integration and comparative assessment of underused LiDAR features rather than the fusion concept alone. In addition, we will revise the manuscript more broadly to reduce overly strong wording and ensure a more neutral and scientifically precise tone throughout.
  
  The paper makes strong claims about LiDAR intensity encoding physiological and structural canopy traits. These claims are not sufficiently supported.
  
  Intensity is highly sensitive to range, angle, moisture, and instrument drift.
  
  Only ground-return normalization is applied, which is insufficient for cross-date physiological interpretation.
  
  No radiometric calibration or angular correction is performed.
  
  No independent evidence is presented linking intensity to nitrogen status, pigment changes, or biochemical traits.
  
  The conclusions around LiDAR intensity are overstated and require substantial tempering. At present, the manuscript treats INT as if it were a calibrated spectral variable, which it is not.
  We fully agree that LiDAR intensity is influenced by multiple factors, including range, scan angle, surface moisture, and that it should not be interpreted as a radiometrically calibrated spectral measurement. Our intention was not to treat intensity as calibrated reflectance, but rather as an empirical signal that, when appropriately normalized, can provide useful information related to canopy structure and, indirectly, vegetation condition. We acknowledge that the current manuscript may convey a stronger interpretation than intended and will revise the text to ensure a more cautious and scientifically appropriate framing.
  This interpretation is supported by previous studies demonstrating relationships between LiDAR-derived signals and vegetation condition, including green canopy dynamics (e.g., GAI/GLAI; Liu et al., 2017), plant water status (Junttila et al., 2021), responses to nitrogen treatment (Hütt et al., 2023), and broader bio-/geophysical parameters (Scaioni et al., 2018).
  In this study, we apply ground-return normalization to reduce temporal variability in the intensity signal, following approaches used in previous work (e.g., You et al., 2017), which demonstrated improved LAI estimation using normalized LiDAR signals. While this does not constitute full radiometric calibration, it represents a practical baseline method to improve temporal consistency under typical UAV acquisition conditions. We will clarify this distinction more explicitly in the revised manuscript.
  We will also expand the Discussion (currently briefly addressed in Lines 443–444) to acknowledge that additional corrections (e.g., scan angle, range, and atmospheric effects) are often required in LiDAR intensity analysis. Although these effects may be reduced in our case due to low flight altitude (~50 m), relatively small scan angles (~25°), double-grid flight patterns, and flat terrain, they are not fully eliminated. Previous studies have shown that incidence angle effects become more pronounced at higher angles, while remaining limited at lower angles (e.g., Hütt et al., 2024), and that range effects may be less influential under UAV acquisition conditions compared to airborne systems (e.g., Bakuła et al., 2020). We will emphasize that incorporating such corrections could further improve intensity-based analyses, particularly in more complex scenarios.
  To further support the interpretation of LiDAR intensity, we will include additional analysis comparing normalized intensity with independently measured plant area index (PAI), leaf area index (LAI), green leaf area index (GLAI), and brown leaf area index (BLAI) derived from destructive sampling. This will allow us to assess whether intensity is more closely related to canopy structure or to green foliage components.
  We will present these results to demonstrate that LiDAR intensity may reflect aspects of both canopy structure and greenness, which are themselves linked to vegetation condition. Accordingly, we will position it as an empirical structural–spectral proxy that complements purely structural metrics such as crop height and gap fraction. Overall, we will revise the manuscript to temper its interpretation, explicitly state its limitations, and strengthen the supporting evidence to ensure a more accurate and balanced presentation.
  
  The ANN modelling framework lacks statistical rigor
  
  The modelling approach is a weak point of the paper.
  Only 86 destructive samplesare available, which is extremely limited for training multi-feature ANN models.
  
  A simple 70/30 split is insufficient for robust validation; no cross-validation, no repeated sampling, and no uncertainty estimates are provided.
  
  There is no demonstration that spatial autocorrelation was controlled. The model may be learning subplot-level patterns, not generalizable relationships.
  
  Feature dimensionality is high relative to sample size, yet model selection is described loosely and without formal procedure.
  
  As it stands, the modelling is not statistically robust enough to support the strong performance claims made throughout the manuscript.
  We agree that the original evaluation of the ANN models did not sufficiently demonstrate statistical robustness, and we appreciate the emphasis on strengthening the validation framework. In the revised manuscript, we will substantially improve the modelling evaluation by implementing repeated k-fold cross-validation in addition to the original 70/30 holdout approach. We will report mean performance metrics along with measures of variability (e.g., standard deviation) across folds to provide uncertainty estimates. We will also clarify the model selection procedure and constraints applied to limit overfitting given the relatively small sample size.
  To provide a clearer and more structured evaluation, we will adopt a three-step framework: (1) initial comparison of sensor feature combinations using the holdout dataset; (2) robustness assessment of top-performing and commonly used feature sets using cross-validation; and (3) where possible, independent validation using an external dataset.
  In this context, we will include an additional analysis step by applying cross-validation to the best-performing and representative sensor configurations identified in the initial comparison. This will provide stronger statistical support for the observed performance differences between sensor feature sets.
  If feasible within the revision timeline, we will further evaluate model generalizability by training models on the full 2021 dataset and testing them on an independent dataset collected in 2023 over a separate winter wheat field with identical sensor configurations and destructive measurements. This additional validation will help assess transferability and provide insight into potential spatial dependence effects beyond subplot-level sampling.
  We will also expand the Discussion to explicitly acknowledge limitations related to sample size, feature dimensionality, and potential spatial autocorrelation, noting that the primary objective of this study is to compare sensor feature contributions rather than to develop a fully generalizable predictive model. These additions will provide a more robust, transparent, and statistically supported evaluation of model performance while maintaining the focus on comparative analysis of sensor-derived features.
  
  Multi-layer GF method needs deeper justification
  
  The multi-layer GF analysis is a valuable idea, but the implementation and interpretation need more discipline.
  The segmentation thresholds (10/20/30 cm) appear arbitrary.
  
  No empirical or theoretical justification is given for why these scales represent meaningful canopy stratification.
  
  The comparison to 3DPI and classical GF methods is underdeveloped.
  
  The model’s sensitivity to point density and scan geometry is not addressed.
  
  The method shows potential, but the manuscript overstates its generality and does not provide enough evidence for the proposed optimal configuration beyond this specific dataset.
  We agree that the justification and interpretation of the multi-layer gap fraction (GF) approach require further clarification and more careful framing. In the revised manuscript, we will clarify that the selected vertical layer depths (10–30 cm) and horizontal resolutions (10–30 cm) were chosen as practical discretization levels based on LiDAR range precision (~4 cm), UAV point density, wheat canopy height (~1–1.2 m), and plot dimensions, rather than representing theoretically optimal canopy stratification. Layers thinner than 10 cm approach the sensor precision and lead to sparse voxel representation at the given flight altitude, while layers thicker than 30 cm would result in too few strata to meaningfully describe canopy structure. Similarly, smaller grid cells would contain insufficient returns for stable gap fraction estimation, whereas larger cells would not fully exploit the spatial resolution of the data.
  We will revise the manuscript to emphasize that the identified configuration (e.g., 5 layers at 20 cm with 30 cm GSD) is specific to this dataset and sensor setup, and we will avoid presenting it as a universally optimal solution. Instead, we will frame the analysis as an exploration of parameter sensitivity and practical implementation constraints.
  We will also clarify that the 3D Plant Index (3DPI) is cited as conceptual background for multi-layer canopy representation rather than as a direct benchmark. Although this is briefly addressed in the current manuscript, we will further expand the discussion to more clearly distinguish our approach from classical gap fraction methods and previous 3DPI-based implementations. In addition, we note that Line 385 already highlights a key distinction, where we state that our approach directly incorporates multi-layer GF inputs into the ANN, allowing the model to learn their relative importance throughout the season without additional parameterization, in contrast to 3DPI-based methods. We will revise this section to clarify that this is intended to explain methodological differences rather than serve as a direct comparison. If deemed necessary, and depending on the scope of the revision, we would also consider including a direct 3DPI-based analysis to further contextualize our approach within existing methods.
  Regarding scan geometry and point density, we acknowledge that these factors were not sufficiently addressed. In the revised manuscript, we will expand this section to describe the acquisition conditions, including relatively low scan angles, double-grid flight patterns with high overlap, and flat terrain, which reduce variability in scan geometry and point density. We will also include additional analysis to evaluate the influence of scan angle, including its relationship with AGB and its effect on model performance, noting that its inclusion in preliminary modelling did not improve predictive accuracy. Finally, we will explicitly acknowledge that the sensitivity of the GF approach to point density and scan geometry may become more significant under different acquisition conditions, and that further investigation across crop types, sensor configurations, and flight parameters is needed.
  
  Sensor dominance over time is overstated
  
  The temporal analysis suggests shifts in sensor utility across growth stages. While the general trends are plausible, the manuscript repeatedly makes categorical statements (e.g., “MS dominates during senescence,” “CH dominates early”) without rigorous statistical backing.
  These conclusions need to be presented as observations from this dataset, not generalizable statements about sensor behavior
  We agree that the description of temporal “sensor dominance” was phrased too broadly in the original manuscript. In the revised manuscript, we will reframe these results as observations specific to this dataset rather than generalizable statements about sensor behavior. We will also moderate the language to avoid categorical terms such as “dominates,” and instead describe relative performance trends across growth stages. Where appropriate, we will support these observations with references to prior studies that report similar patterns. Finally, we will expand the Discussion to emphasize that these trends are context-dependent and may vary with crop type, site conditions, and season, thereby avoiding overgeneralization.
  
  Nitrogen treatment analysis draws conclusions not supported by data
  
  Figure 12 shows expected spatial smoothing from UAV-based predictions compared to subplot-level destructive samples. This does not prove that UAV “captures management effects more effectively.” It only shows that UAV sampling is spatially denser.
  The manuscript conflates spatial resolution advantages with biological sensitivity. This needs correction.
  We agree that the original wording may have conflated spatial sampling advantages with biological sensitivity. Our intention was not to imply that UAV-based estimates possess greater biological sensitivity than destructive sampling, which is not possible. In the revised manuscript, we will clarify that UAV-derived biomass maps provide spatially continuous coverage, allowing management-related patterns (e.g., nitrogen gradients) to be visualized more comprehensively than with sparse destructive measurements.
  We will revise the text to explicitly distinguish between spatial completeness and biological sensitivity, emphasizing that the observed differences reflect sampling density rather than an inherent improvement in physiological or biological detection capability. These revisions will ensure a more accurate and appropriately constrained interpretation of the results
  Other Critical Points
  The abstract uses promotional language (“breakthrough,” “promising tool”) that is not supported by the analysis.
  
  We will revise the Abstract to remove promotional language and ensure that all statements are aligned with the strength of the presented analysis and supported by the results.
  
  Figures are overly dense, especially Figures 7–9; interpretation is difficult.
  
  We agree that the original figures were overly dense and could hinder interpretation. We will improve figure clarity by reorganizing the visual content. Specifically, the scatter plots will be moved to the Appendix, where they will be referenced in the main text. These plots are useful for illustrating bias and feature behavior across the growing season. However, their density makes them less suitable for the main body of the manuscript.
  In the main text, we will retain and emphasize bar plot comparisons, which more clearly summarize model performance across sensor features and combinations. Following the initial 70/30 training/testing comparison, we will also include bar plots of cross-validation results for the best-performing and commonly used feature sets, providing a clearer and more structured presentation of model performance.
  No error bars or confidence intervals are provided anywhere, this weakens all conclusions.
  
  We agree that the original manuscript lacked appropriate uncertainty representation. As part of the revised modelling framework, we will include variability metrics derived from cross-validation (e.g., standard deviation across folds), providing a clearer and more robust assessment of model uncertainty.
  The discussion section extrapolates beyond the evidence, particularly regarding the physiological meaning of intensity and the claimed operational advantages.
  
  We will revise the Discussion to reduce over interpretation and ensure that all conclusions are directly supported by the presented results. In particular, we will temper statements related to LiDAR intensity and operational implications to reflect a more cautious and evidence-based interpretation. We will also emphasize that the observed benefits are specific to the conditions and use case investigated in this study, and should be interpreted as demonstrations within this experimental context rather than generalizable conclusions.
  References
  Bakuła, K., Pilarska, M., Ostrowski, W., Nowicki, A., & Kurczyński, Z. (2020). UAV LIDAR DATA PROCESSING: INFLUENCE OF FLIGHT HEIGHT ON GEOMETRIC ACCURACY, RADIOMETRIC INFORMATION AND PARAMETER SETTING IN DTM PRODUCTION. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLIII-B1-2020, 21–26. https://doi.org/10.5194/isprs-archives-XLIII-B1-2020-21-2020
  Hütt, C., Bolten, A., Hüging, H., & Bareth, G. (2023). UAV LiDAR Metrics for Monitoring Crop Height, Biomass and Nitrogen Uptake: A Case Study on a Winter Wheat Field Trial. PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 91(2), 65–76. https://doi.org/10.1007/s41064-022-00228-6
  Hütt, C., Bolten, A., Firl, H., Hüging, H., Jenal, A., & Reddig, F. (2024). LiDAR intensity variability in UAV-based agricultural monitoring: Insights from a winter wheat field trial. DGPF-Jahrestagung 2024 - Stadt, Land, Fluss - Daten Vernetzen, 334–341. https://doi.org/10.24407/KXP:1885708025
  Junttila, S., Hölttä, T., Puttonen, E., Katoh, M., Vastaranta, M., Kaartinen, H., Holopainen, M., & Hyyppä, H. (2021). Terrestrial laser scanning intensity captures diurnal variation in leaf water potential. Remote Sensing of Environment, 255, 112274. https://doi.org/10.1016/j.rse.2020.112274
  Liu, S., Baret, F., Abichou, M., Boudon, F., Thomas, S., Zhao, K., Fournier, C., Andrieu, B., Irfan, K., Hemmerlé, M., & Solan, B. de. (2017). Estimating wheat green area index from ground-based LiDAR measurement using a 3D canopy structure model. Agricultural and Forest Meteorology, 247, 12–20. https://doi.org/10.1016/j.agrformet.2017.07.007
  Scaioni, M., Höfle, B., Baungarten Kersting, A. P., Barazzetti, L., Previtali, M., & Wujanz, D. (2018). METHODS FROM INFORMATION EXTRACTION FROM LIDAR INTENSITY DATA AND MULTISPECTRAL LIDAR TECHNOLOGY. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLII–3, 1503–1510. https://doi.org/10.5194/isprs-archives-XLII-3-1503-2018
  You, H., Wang, T., Skidmore, A. K., & Xing, Y. (2017). Quantifying the Effects of Normalisation of Airborne LiDAR Intensity on Coniferous Forest Leaf Area Index Estimations. Remote Sensing, 9(2), Article 2. https://doi.org/10.3390/rs9020163
  
  Citation: https://doi.org/10.5194/egusphere-2025-5336-AC1
RC2:
'Comment on egusphere-2025-5336', Anonymous Referee #2, 16 Mar 2026

This manuscript investigates the use of multi-sensor UAV data to estimate above-ground biomass in winter wheat. The dataset and seasonal UAV observations appear valuable, and the comparison of different sensor configurations across the growing season is interesting. The study performs extensive experiments combining different sensors and data modalities, providing a comprehensive overview of their performance. However, I have several concerns regarding the novelty claim, the practical benefit of combining multiple sensors compared to LiDAR alone, and some methodological and presentation issues. For these reasons, I recommend a major revision.
Major comments:
Lines 15–25: The manuscript states that this study presents the “first integration” of UAV LiDAR structure and intensity with multispectral and thermal data for AGB estimation. This claim appears quite strong and may need to be positioned more carefully relative to existing studies on UAV multi-sensor crop monitoring.
Results and Discussion sections: LiDAR alone already achieves strong performance, while the fusion model improves the RMSE, I am wondering whether this improvement is worth the additional complexity of combining LiDAR, MS, and TIR sensors in terms of cost, data acquisition, and processing, and above all, modelling and computing costs.
TIR DATA: Collecting TIR data can be challenging because sudden changes in wind or cloud cover may cause significant temperature variations even within a single campaign, and this becomes even more critical when collecting time-series data. I did not find a clear explanation of how the TIR data were collected or how the authors ensured that measurements were obtained under relatively comparable or stable weather conditions.
Machine Learning Method: Multi-sensor ensemble approaches can improve model performance, but they also introduce additional complexity in data collection, preprocessing, and computational requirements. Given these considerations, I wonder whether a simpler setup, for example, using LiDAR alone combined with more advanced machine learning approaches (transformers), could provide comparable performance. A brief discussion of this point in the discussion, as a possible future direction, might be useful.
Minor comments
Line 122: I would suggest removing this information from here and citing the figure at the end of the sentence (Fig. 1).

Line 115: Make sure that the figure caption stays together with the Figure.

Lines 138: Why LiDAR flight height 50 m and MS 100 m? Maybe a short explanation here would help?

Lines 226: CWSI: Please explain this abbreviation.
Lines 239–241: " All sensor features were normalized 240 to a 0–1 range" : I suggest removing this sentence, as it does not seem to be directly related to the topic discussed in this paragraph. belong here, and you mention two times in the following paragraph that the input data is normalized, which is sufficient.

Terminology (e.g., Line 247): The terminology occasionally switches between UAV data, input data, and sensor features, and predictor rasters. More consistent terminology could improve clarity.
Figure 6: The resolution of the plots could be improved, as the text is difficult to read. In the last column, green dots overlap with the text labels, which further reduces readability. It might help to bring the text in front of the plotted points.
Line 438-439: The text suggests that LiDAR may be more beneficial than other data types?

Recommendation

Overall, the dataset and experimental design appear valuable, and the study addresses an interesting topic. However, due to concerns regarding the novelty claim, the practical benefit of the multi-sensor approach compared with LiDAR alone, and several methodological and presentation issues, a major revision is needed.

Citation: https://doi.org/10.5194/egusphere-2025-5336-RC2
- AC2:
  'Reply on RC2', Jordan Bates, 18 Apr 2026
  Reviewer 2 Responses
  We sincerely thank Reviewer 2 for taking the time to thoroughly review our manuscript and for providing detailed and constructive comments. We greatly appreciate the thoughtful suggestions, particularly those highlighting areas that required further clarification or were previously overlooked. These comments have been invaluable in improving the clarity, rigor, and overall quality of the manuscript.
  Below, we provide the reviewer’s full comments in bold and italicized text, followed by our responses to each comment.
  Major Comments
  Lines 15–25: The manuscript states that this study presents the “first integration” of UAV LiDAR structure and intensity with multispectral and thermal data for AGB estimation. This claim appears quite strong and may need to be positioned more carefully relative to existing studies on UAV multi-sensor crop monitoring.
  
  We agree that the phrasing of “first integration” is too strong. Our intention was not to claim that such multi-sensor approaches have not been explored, but rather to highlight the specific combination and systematic evaluation of multiple LiDAR-derived features, particularly the inclusion of intensity and multi-layer gap fraction alongside crop height, in combination with multispectral and thermal data. While studies exist that combine subsets of these sensors, fewer have explicitly evaluated this full set of LiDAR-derived structural and signal-based features together within a unified framework across an entire growing season.
  We acknowledge that our original wording overemphasized novelty, partly out of concern that the specific contributions of the study might be overlooked. However, we agree that this resulted in overly strong and non-standard phrasing. Therefore, we plant to revise the manuscript to remove such statements and to adopt more precise, neutral scientific language throughout.
  In the revised version, we will position this work as an extension of existing multi-sensor studies, emphasizing the systematic comparison and integration of underused LiDAR features rather than claiming a first-ever integration. These changes will be applied in the Abstract, Introduction, and relevant sections of the manuscript.
  
  2. Results and Discussion sections: LiDAR alone already achieves strong performance, while the fusion model improves the RMSE, I am wondering whether this improvement is worth the additional complexity of combining LiDAR, MS, and TIR sensors in terms of cost, data acquisition, and processing, and above all, modelling and computing costs.
  We agree that the additional complexity associated with multi-sensor fusion must be justified in terms of practical benefit, which was not sufficiently emphasized in the original manuscript. As the reviewer correctly notes, LiDAR-only models already achieved strong performance in our study, indicating that LiDAR-derived features capture a large portion of AGB variability. This is partly due to the complementary nature of LiDAR metrics, which include canopy height, canopy density (e.g., gap fraction), and signal intensity. Together, these features provide a multi-dimensional representation of the canopy, where height and density describe structural properties, and intensity provides an empirical signal related to canopy structure and, indirectly, vegetation condition.
  While the inclusion of multispectral (MS) and thermal infrared (TIR) data led to further improvements in RMSE, we agree that these gains are relatively modest and may not always justify the added costs and complexity associated with additional sensors, data acquisition, processing workflows, and modelling. In response, we will revise the Discussion to explicitly frame multi-sensor fusion as a trade-off between accuracy and operational complexity. In particular, we will emphasize that, if a single sensor must be selected for biomass estimation, LiDAR provides the most comprehensive information in this dataset.
  At the same time, we will clarify that MS and TIR sensors can still play important complementary roles in broader agricultural monitoring. For example, TIR data are commonly used for evapotranspiration and water stress analysis, while MS data provide insights into canopy reflectance and pigment-related dynamics. When these sensors are already being collected for other purposes, their integration into biomass estimation may offer additional, albeit incremental, improvements and contribute to a more complete understanding of crop condition, yield potential, and stress responses.
  We therefore position multi-sensor fusion not as a universally necessary approach, but as a context-dependent strategy. In research or precision agriculture contexts where multiple crop processes are monitored simultaneously, the integration of MS and TIR data may provide added value beyond biomass estimation alone. These points will be more clearly emphasized in the revised Discussion.
  
  TIR DATA: Collecting TIR data can be challenging because sudden changes in wind or cloud cover may cause significant temperature variations even within a single campaign, and this becomes even more critical when collecting time-series data. I did not find a clear explanation of how the TIR data were collected or how the authors ensured that measurements were obtained under relatively comparable or stable weather conditions.
  
  We agree that UAV-based thermal measurements are sensitive to environmental conditions and sensor instabilities, including microbolometer drift, wind effects, and cloud variability.
  In the original manuscript, the thermal data acquisition was described too briefly. We now plan to expand this section to clarify that all flights were conducted near solar noon under predominantly clear-sky conditions.. To improve radiometric stability, an external heated shutter was used to perform regular non-uniformity corrections (NUC) and reduce microbolometer drift during each mission. Flight parameters were selected to minimize variability, including relatively low flight speeds (~6 m s⁻¹) to reduce wind and directional effects. Radiometric parameters (air temperature, humidity, emissivity, and object distance) were consistently defined prior to each flight, and in-field thermal reference targets were used for calibration and quality control. A constant emissivity value was applied, as measurements were restricted to dense winter wheat canopies with minimal soil influence.
  We acknowledge that some environmental variability is unavoidable and have added this clarification to the Discussion to encourage cautious interpretation of TIR-derived results. These details are now included in the revised Methods and Discussion sections.
  
  Machine Learning Method: Multi-sensor ensemble approaches can improve model performance, but they also introduce additional complexity in data collection, preprocessing, and computational requirements. Given these considerations, I wonder whether a simpler setup, for example, using LiDAR alone combined with more advanced machine learning approaches (transformers), could provide comparable performance. A brief discussion of this point in the discussion, as a possible future direction, might be useful.
  
  We agree that more advanced machine learning approaches, such as transformer-based models, may improve predictive performance by better capturing feature interactions and temporal dynamics.
  In this study, our objective was to evaluate sensor contributions rather than optimize model architecture. Therefore, we used a consistent ANN framework to ensure that differences in performance were driven by the input data rather than the model.
  We also note that transformer-based approaches typically require substantially larger training datasets to avoid overfitting. Given the limited number of samples in this study, we considered simpler models more appropriate.
  We will add a brief discussion highlighting advanced models as a promising direction for future work, particularly for building on the insights identified here using LiDAR-only or reduced-input datasets, provided that sufficiently large training datasets are available.
  
  Minor Comments
  Line 122: I would suggest removing this information from here and citing the figure at the end of the sentence (Fig. 1).
  We will revise the sentence accordingly by removing the information and placing the figure reference at the end of the sentence (Fig. 1).
  
  Line 115: Make sure that the figure caption stays together with the Figure.
  We will revise the manuscript formatting to ensure that the figure caption remains together with the corresponding figure.
  
  Lines 138: Why LiDAR flight height 50 m and MS 100 m? Maybe a short explanation here would help?
  We thank the reviewer for pointing this out. We agree that the difference in flight altitudes was not sufficiently explained in the original manuscript and will clarify this in the data acquisition section. In general, with our particular sensor, the LiDAR data are collected at a lower altitude (e.g., 50 m) to ensure sufficient point density and signal strength for reliable estimation of structural metrics such as gap fraction and crop height, given the specifications of the sensor. In contrast, multispectral (MS) and thermal (TIR) data are typically acquired at higher altitudes (e.g., ~100 m) to improve spatial coverage and operational efficiency while maintaining adequate spatial resolution for canopy reflectance analysis.
  However, upon re-examination of the dataset, we identified an error in the manuscript: the passive sensors (MS and TIR) were also acquired at 50 m in this experiment, rather than 100 m as originally stated. This deviation from our typical acquisition strategy was due to the relatively small size of the experimental field. In most applications, we operate over larger areas and therefore fly passive sensors at higher altitudes to increase efficiency while maintaining sufficiently high spatial resolution. We will correct this inconsistency in the revised manuscript.
  Finally, we note that newer LiDAR systems may support higher-altitude acquisitions while maintaining sufficient point density; however, the flight parameters used in this study were selected to match the performance characteristics of the specific sensor employed
  
  Lines 226: CWSI: Please explain this abbreviation.
  We will now define CWSI as Crop Water Stress Index at its first occurrence in the manuscript. In addition, we plan to expand the description to provide more context on how CWSI is used and its formulation, along with supporting references to relevant literature.
  
  Lines 239–241: " All sensor features were normalized 240 to a 0–1 range" : I suggest removing this sentence, as it does not seem to be directly related to the topic discussed in this paragraph. belong here, and you mention two times in the following paragraph that the input data is normalized, which is sufficient.
  We agree that this statement was redundant and not optimally placed. The sentence will be removed from this paragraph, and the description of data normalization will be retained in the subsequent paragraph where it is more appropriately discussed.
  
  Terminology (e.g., Line 247): The terminology occasionally switches between UAV data, input data, and sensor features, and predictor rasters. More consistent terminology could improve clarity.
  We agree that the lack on consistent terminology in the original manuscript could reduce clarity. We plant to revise the text throughout to improve clarity and consistency, using more standardized terms to distinguish between raw UAV-derived data, processed sensor features, and model input variables.
  
  Figure 6: The resolution of the plots could be improved, as the text is difficult to read. In the last column, green dots overlap with the text labels, which further reduces readability. It might help to bring the text in front of the plotted points.
  We will try to improve the overall resolution and visual quality of the figures to enhance readability. In addition, we plan to reorganize the figure content to improve clarity. The scatter plots, which provide detailed insight into feature behavior and bias across the growing season, will be moved to the Appendix and referenced in the main text. This will reduce visual complexity in the main figures, allowing the bar plots to more clearly present the key results.
  The overlap between data points and text labels occurs only in a limited number of cases and does not significantly hinder interpretation. However, if deemed necessary, we will adjust the plotting order to ensure that text labels are rendered above the data points.
  
  Line 438-439: The text suggests that LiDAR may be more beneficial than other data types?
  We agree that the current phrasing may imply that LiDAR is universally more beneficial, which was not our intention. Our aim was to highlight a potential advantage of LiDAR-derived structural features in terms of model transferability, as they may be less sensitive to variations in illumination conditions and background reflectance compared to passive multispectral data. However, we acknowledge that this is not definitively established and depends on several factors, including crop type, canopy structure (row and crop spacing), sensor configuration, and flight parameters.
  We will revise the manuscript to soften this statement and clarify that this represents a possible advantage that warrants further investigation, rather than a confirmed outcome. We will also emphasize that multispectral and thermal data provide complementary information, particularly for capturing physiological processes, and remain important depending on the application.
  
  Citation: https://doi.org/10.5194/egusphere-2025-5336-AC2

Jordan Steven Bates, Carsten Montzka, Rajina Bajracharya, Harry Vereecken, and François Jonard

Viewed

Total article views: 2,426 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,609	664	153	2,426	125	141

HTML: 1,609
PDF: 664
XML: 153
Total: 2,426
BibTeX: 125
EndNote: 141

Views and downloads (calculated since 02 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	643	222	78	943
Jan 2026	328	60	20	408
Feb 2026	211	62	18	291
Mar 2026	325	255	28	608
Apr 2026	92	48	9	149
May 2026	10	17	0	27

Cumulative views and downloads (calculated since 02 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	643	222	78	943
Jan 2026	328	60	20	408
Feb 2026	211	62	18	291
Mar 2026	325	255	28	608
Apr 2026	92	48	9	149
May 2026	10	17	0	27

Viewed (geographical distribution)

Total article views: 2,422 (including HTML, PDF, and XML) Thereof 2,422 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 23 May 2026

Short summary

We used drone-based laser, multispectral, and thermal sensors to measure crop growth and health throughout the season. By combining these different data sources with an artificial intelligence model, we found that laser signal strength provides a powerful new way to estimate plant biomass. This method can improve how farmers and researchers monitor crop productivity and manage resources more sustainably.


Total:	0
HTML:	0
PDF:	0
XML:	0