the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Uncertainty Assessment in Deep Learning-based Plant Trait Retrievals from Hyperspectral data
Abstract. The large-scale mapping of plant biophysical and biochemical traits is essential for ecological and environmental applications. Given its finer spectral resolution and unprecedented data availability, hyperspectral data has emerged as a promising, non-destructive tool for accurately retrieving these traits. Machine and particularly deep learning models have shown strong potential in retrieving plant traits from hyperspectral data. However, when deploying these methods at large scales, reliably quantifying associated uncertainty remains a critical challenge, especially when models encounter out-of-domain (OOD) data, such as unseen geographic regions, species, biomes, or data acquisition modalities. Traditional uncertainty quantification methods for deep learning models, including deep ensembles (Ens_UN) and Monte Carlo dropout (MCdrop_UN), rely on the variance of predictions but often fail to capture uncertainty in OOD scenarios, leading to overoptimistic and potentially misleading uncertainty estimates. To address this limitation, we propose a distance-based uncertainty estimation method (Dis_UN) that quantifies prediction uncertainty by measuring dissimilarity in the predictor and embedding space between training and test data. Dis_UN leverages residuals as a proxy for uncertainty and employs dissimilarity indices in data manifolds to estimate worst-case errors via 95-quantile regression. We evaluate Dis_UN on a pre-trained deep learning model for prediction of multiple plant traits from hyperspectral images, analyzing its performance across OOD data, such as pixels containing spectral variation from urban surfaces, bare ground, water, clouds or open surface waters. For this study we target six leaf and canopy traits: Leaf mass per area (LMA), Chlorophyll (Chl), Carotenoids (Car), Nitrogen (N) content, Leaf area index (LAI) and Equivalent water thickness (EWT). Results indicate that Dis_UN effectively differentiates between OOD components and provides more reliable uncertainty estimates than traditional methods, which tend to underestimate the range of uncertainty (on average over traits 26.7 % for Ens_UN and 6.5 % for Dropout_UN). However, challenges remain for traits affected by spectral saturation. These findings highlight the advantages of distance-aware uncertainty quantification methods and underscore the necessity of diverse training datasets to minimize sampling biases and enhance model robustness. The proposed framework improves the reliability of uncertainty estimation in vegetation monitoring and offers a promising approach for broader applications.
- Preprint
(1980 KB) - Metadata XML
-
Supplement
(2822 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-1284', Anonymous Referee #1, 22 Jul 2025
GENERAL COMMENTS
This is an interesting work that deals with the capability of inferring the uncertainty of machine (i.e., Deep) learning models’ predictions based on the dissimilarity between the seen and input data of the model. The proposed methodology is tested using a complete dataset of observations, and the results suggest that, particularly for unseen data, the uncertainty estimates are more accurate or at least more conservative than those provided by other methods, which appear to underestimate uncertainty systematically. The results depend on the biophysical variable predicted by the model, with large differences in performance or behaviour in some cases. The manuscript and the results are well presented, and the discussion is consistent with them; still, some relevant questions remain to be clarified. Overall, the work is relevant to the domain of remote sensing and machine learning, and the proposed method appears to improve upon the state-of-the-art alternatives.
The methodology could be more clearly presented (e.g., including equations and a flowchart). Additionally, some results, particularly those regarding LAI, require a deeper inspection that justifies the hypothesis presented by the authors to justify their findings. Some points made in the discussion should be reviewed or linked to the methodological choices made by the authors.
SPECIFIC COMMENTS
Lines 162-166: Clarify to the reader that spectral data correspond to proximal sensing and airborne canopy reflectance factors so that it is not necessary to access Table S3 to know this detail. Also, specify whether these datasets were gathered from open-access repositories or were privately lent by the producers for this study.
Lines 170-172: While the approach is generally accepted (e.g., the ASD field spectroradiometers interpolate to one nm step in their output), I wonder how the authors pondered this choice. Overall, interpolation will not be able to improve the information of the datasets with the coarsest spectral resolution. What impact do the authors expect from this imbalance in the information rendered in each dataset? Do they expect that the information contained in the narrow spectral features, which remain only in the datasets with the highest spectral performance, will be learned by the model, even if not present in all the datasets, or would this mixture just be a source of confusion and uncertainty for the training? In the second case, wouldn’t it be more robust to downgrade the spectral resolution to the lowest among the datasets?
Lines 172-174: Spectra with different noise levels are smoothed with the same window width. Likely, the field spectroradiometers are less noisy than airborne imagers; thus, there’s a risk of over-smoothing already smooth data. Overall, the aim should be to achieve a comparable noise level for each dataset. Could processing data according to their noise levels improve the learning process? Considering this further, I understand that the levels of uncertainty are unknown, but perhaps different degrees of reliability could be provided for field spectroscopy and airborne imagery. Would giving different weights to each type of data improve the results?
Section 2.1.2: This section’s clarity could benefit from some equations. Additionally, a flowchart summarizing the methodology would provide a clearer view for the reader at an early stage of the manuscript.
Lines 205-207, Section 4.4: The absolute value of the errors is taken. Do the authors expect the errors to be symmetric and centered on zero, or was this checked? Would any knowledge be gained if the error and the 2.5- and 97.5-quantile regressions were applied instead? (e.g., biases in specific directions for specific vegetation types). In the discussion (Section 4.4), they precisely raise the issue of assuming or forcing symmetric error distributions, but their analyses start from absolute error values. Could they comment on the potential impact of their choice in the context of symmetric distributions, and maybe foresee future lines of research at least?
Dis_UN model and training, and lines 440-445: The performance of this model’s training (and test) is not presented; therefore, the reader can’t know whether the predictions were expected to be accurate or precise when applied. Despite being more “conservative” than the other methods, Dis_UN predictions are also uncertain. Fig. 3 compares the absolute value of the error with the expected 95 % of their distribution predicted by Dis_UN, but the 68 % (one standard deviation) for the others, which may not be the most appropriate comparison. The statement in lines 440-445 raises the question of whether this comparison is then fair, or whether, comparing the same uncertainty coverages, the difference between the uncertainty predictions would become lower. Perhaps the 95% tail should be calculated and compared for all methods (e.g., multiplying the Ens_UN and MCdrop_UN estimates by 1.96), or the 68-quantile regression (one standard deviation) used for Dis_UN.
Lines 431-433: Perhaps “correlation” is not the most representative term for the problem; for example, ens_UN might feature higher Pearson coefficient correlations than the other methods (Fig. 3). The coefficient of determination might neither represent the achievement the authors report, thus, another term should be used instead.
Lines 456-457: If the embedded space misses some spectral information that might be different between vegetated and non-vegetated surfaces, do these differences matter when the traits are predicted?
Lines 460-466: Grasslands are not so simple; they can include non-green elements, such as standing senescent material (e.g., Pacheco-Labrador et al, 2021), flowers (Perrone et al, 2024), and pixels usually mix numerous species (Darvishzadeh et al, 2008), which hamper the relationships between spectral and biophysical properties. Phenology during sampling is not reported in the manuscript, but if all the datasets correspond to the green-peak period, grasses will cover most of the soil, and unlike in the shrublands, the background contribution will be minimized. Lower uncertainties might result from bias in the sampling time of the grasslands towards the green peak, if this is indeed the case (which could be confirmed). Unlike forests and shrublands, grasslands exhibit a significantly lower geometrical BRDF component, which may explain the differences between cover types. Forests, in addition to shrublands, will present a more complex vertical profile with a distinct understory of vegetation. There are arguments to justify the findings, but grasslands should not be regarded as “simple”. The issue of the phenology bias is, in fact, commented on in the discussion (Section 4.3).
References:
Darvishzadeh, R., Skidmore, A., Schlerf, M., and Atzberger, C.: Inversion of a radiative transfer model for estimating vegetation LAI and chlorophyll in a heterogeneous grassland, Remote Sensing of Environment, 112, 2592-2604, https://doi.org/10.1016/j.rse.2007.12.003, 2008.
Pacheco-Labrador, J., El-Madany, T. S., van der Tol, C., Martin, M. P., Gonzalez-Cascon, R., Perez-Priego, O., Guan, J., Moreno, G., Carrara, A., Reichstein, M., and Migliavacca, M.: senSCOPE: Modeling mixed canopies combining green and brown senesced leaves. Evaluation in a Mediterranean Grassland, Remote Sensing of Environment, 257, 112352, https://doi.org/10.1016/j.rse.2021.112352, 2021.
Perrone, M., Conti, L., Galland, T., Komárek, J., Lagner, O., Torresani, M., Rossi, C., Carmona, C. P., de Bello, F., Rocchini, D., Moudrý, V., Šímová, P., Bagella, S., and Malavasi, M.: “Flower power”: How flowering affects spectral diversity metrics and their relationship with plant diversity, Ecological Informatics, 81, 102589, https://doi.org/10.1016/j.ecoinf.2024.102589, 2024.
LAI and saturation: The authors argue that the different performance of Dis_UN for LAI is due to saturation; however, in the training datasets, maximum LAI barely reaches 6. I think this hypothesis should be more robustly explored, which may have been done but not presented to the reader. I am not sure saturation can justify the negative coefficients presented in Table S6. Usually, LAI and canopy-scale vegetation variables are easier to retrieve from remote sensing than leaf-level measurements, which should raise some flags.
The authors could start by checking how well the DL model performs in predicting LAI compared to the other variables; i.e., the training and test statistics of the model could be presented in the supplementary material. I would expect LAI to be more predictable than foliar traits.
If there is saturation, where does it happen? The authors could plot, for example, NDVI (or other index, e.g., NIRv) vs. LAI from their training datasets and explore above which LAI level their dataset saturates. Then, could they check whether the problems for Dis_UN to predict uncertainty occur above the threshold?
In the case of non-vegetated surfaces, the DL model might have learnt to predict low LAI for clouds, buildings, or soils even without having seen them. Under this hypothesis, it would also be worth checking whether the estimation uncertainty is low because, indeed, the LAI prediction is accurate. Therefore, the authors may also want to explore the maps of predicted variables to confirm whether the predicted values are reasonable for any of the variables (i.e., LAI being close to 0) for the OOD pixels.
Uncertainty modeling: The variable-dependence on the capability of Dis_UN to predict the uncertainty might be alleviated by computing the dissimilarity in different spectral regions (E.g., Visible, Red-Edge, NIR, and SWIR). While I do not ask the authors to apply this approach, they might want to consider it in the Outlook section (4.5)
TECHNICAL CORRECTIONS
Supplementary materials’ citations: I am unsure whether the journal requires them to be presented in the order of appearance in the main text, but it might make more sense or facilitate the reader’s search.
Line 245: For the least familiarized readers, define what a “dropout rate of 0.5” means and maybe justify why this rate is chosen.
Line 296: Maybe better: “For clouds delineation we used…”
Table S2, and overall, figures and tables: Provide the units of the variables presented.
Table S4: Indicate that the ratio is expressed in percentage.
Supplementary material: Enhance the presentation and ensure that units and symbols are properly introduced.
Terminology: Review and homogenize terminology in the main text, tables, and figures. For example, in the paper and figures, the terms “Ens_UN” and “ens_UN” can be found.
Citation: https://doi.org/10.5194/egusphere-2025-1284-RC1 -
AC1: 'Reply on RC1', Eya Cherif, 27 Aug 2025
Dear Reviewer, dear Editor,
Thank you for your time reviewing our manuscript and for your constructive and helpful comments. In response to the comments and suggestions provided, the main changes to the manuscript will include:
- Methodology presentation: We will improve the methodological section by adding equations describing the dissimilarity indices and by including a clearer and more comprehensive workflow diagram.
- Spectral data description and preprocessing: We will clarify the description of datasets and provide an explanation of the spectral resampling and smoothing strategy, including its rationale.
- Uncertainty modeling and fairness of comparison: We will clarify in the discussion section our rationale for comparing different uncertainty estimation methods as they are typically applied in literature, while also providing recalibrated results in the appendix for additional context.
- Vegetation type interpretation: We will revise the discussion of grasslands to avoid ambiguity around the term “simple,” now framing them in terms of structural homogeneity. We also included additional references.
- Technical corrections: We will address all technical comments, including supplementary material citations, terminology consistency, unit reporting in figures and tables, and clarification of dropout rate usage.
In addition to addressing the reviewer’s comments, we have enriched the comparison to state-of-the-art uncertainty estimation methods. In the literature, both probabilistic and deterministic deep ensemble approaches are used; we now explicitly include both methods.
A detailed, point-by-point response, including the proposed changes in the manuscript, are attached.
Kind regards,
Eya Cherif (on behalf of all co-authors).
-
AC1: 'Reply on RC1', Eya Cherif, 27 Aug 2025
-
RC2: 'Comment on egusphere-2025-1284', Anonymous Referee #2, 26 Jul 2025
This study introduces a distance-based uncertainty quantification method (Dis_UN) that improves the reliability of plant trait retrievals from hyperspectral data, particularly in out-of-domain (OOD) scenarios. This represents an advancement for robust vegetation monitoring and ecological applications. However, there are still a few issues that need to be considered as follows.
- The Novelty and Comparisons with Existing Work: The manuscript highlights the limitations of traditional uncertainty quantification methods (Ens_UN and MCdrop_UN) in OOD scenarios , and positions Dis_UN as a solution to these challenges. To further strengthen the claim of novelty, it is suggested to provide a more detailed discussion on how Dis_UN specifically differentiates itself from other distance-based uncertainty methods mentioned (e.g., Silvan-Cardenas et al., 2008; Khatami et al., 2017; Feilhauer et al., 2021). A brief table summarizing key differences could be considered.
- Clarification of Ecological Applications: While the manuscript lists several ecological applications (e.g., biodiversity monitoring, Earth system modeling, vegetation health assessment), it is suggested to clarify which specific aspects of these applications benefit most from reliable uncertainty quantification. For instance, this could involve identifying areas where model predictions are less trustworthy, or guiding more targeted field campaigns.
- Mathematical Formulation for Dissimilarity Index: To enhance the self-contained clarity of the methodology, please include the mathematical formulation for the dissimilarity index (DI) directly within Section 2.1.2.
- Sensitivity Analysis for 95-Quantile Regression: The choice of 95-quantile regression is justified as estimating worst-case errors. However, please include a brief discussion or a supplementary analysis on the sensitivity of Dis_UN's performance to this specific quantile choice (e.g., how results might differ with 90th or 99th percentile). This would strengthen the methodological rigor.
- Dataset Diversity and Bias Handling: The manuscript describes a "heterogeneous training set" compiled from 50 datasets across various ecosystems , but also acknowledges inherent biases and a lack of fully global representation. Please expand on the specific strategies employed during dataset curation to minimize known sampling biases across regions, species, and biomes. Even if complete mitigation was not possible, detailing the efforts made would be beneficial. Please also explicitly refer to Tables S2 and S3 in the main text when discussing data sources and their representativeness.
- Strategies for Spectral Saturation Challenges: The manuscript identifies spectral saturation as a remaining challenge for certain traits. In the discussion or future work section, please propose specific experimental avenues or mitigation strategies to address this limitation for affected traits. For example, exploring alternative spectral indices, radiative transfer models, or machine learning architectures less susceptible to saturation.
- Additional Potential Limitations: Consider briefly discussing the computational cost associated with the training phase of the Dis_UN model itself. While inference is computationally efficient, the training cost could be a factor for extremely large datasets.
- Abstract Terminology Simplification: While terms are defined later in the manuscript, consider simplifying technical jargon in the abstract, such as "predictor and embedding space", for broader accessibility. Phrases like "data characteristics and model features" might be more immediately understandable.
- Figure Caption Enhancement: Figure captions (e.g., Figure 3 , Figure 4 , Figure 5 , Figure 6 , Figure 7 ) could be more descriptive. For instance, for Figure 3, explicitly state what the X and Y axes represent (e.g., "Predicted Uncertainty vs. Observed Residuals"). For Figures 4, 5, 6, and 7, please ensure the captions clearly explain the relationship between the spatial maps, the box/violin plots, and the JM/KS values to guide the reader through the interpretation of uncertainty comparisons.
- Early Definition of OOD Data: Although the concept of OOD data is contextualized well (Lines 29-30, 48-53), for immediate comprehension, please provide a concise and explicit definition of "OOD data" with concrete examples (e.g., "unseen geographical regions, species, biomes, different sensors, or scene components like clouds and water bodies") at its first introduction.
- Writing Flow and Conciseness: The manuscript is generally well-organized and readable. However, some sentences could be rephrased for improved conciseness or flow. For example, the sentence describing the two phases of the method (Lines 129-130) could be streamlined for better readability.
- Detailed Preprocessing in Supplementary Material: While some preprocessing steps are mentioned (Lines 145-147), for full reproducibility, please include a dedicated supplementary section with more detailed preprocessing steps, including specific interpolation methods, filtering parameters, and precise band exclusions for each dataset.
- Data Availability Clarification: The GitHub link for code availability (Line 1043) is excellent. Please explicitly state whether the full processed datasets used for training and testing are also available or how they can be accessed (e.g., if too large for GitHub, mention a data repository).
- Applications in Other Remote Sensing Domains: Expand the discussion on potential applications of Dis_UN beyond vegetation monitoring. For instance, discuss how it could be applied to uncertainty quantification in other remote sensing domains, such as land cover classification, deforestation detection, or urban mapping, where OOD conditions (e.g., new urban structures, novel land cover types) are common.
- Integration with Other Data Modalities: In the future work section, suggest avenues for integrating Dis_UN with other data modalities beyond hyperspectral (e.g., LiDAR or SAR data) for a more comprehensive uncertainty assessment, particularly for structural traits. This would further enhance the model's generalizability and impact.
Citation: https://doi.org/10.5194/egusphere-2025-1284-RC2 -
AC2: 'Reply on RC2', Eya Cherif, 27 Aug 2025
Dear Reviewer, dear Editor,
Thank you for your time reviewing our manuscript and for your constructive and helpful comments. In response to the comments and suggestions provided, the main changes to the manuscript will include:
- Methodology presentation: We will improve the methodological section by adding equations describing the dissimilarity indices.
- Spectral data description and preprocessing: We will add a dedicated supplementary section describing preprocessing in full detail, including band exclusions, interpolation methods, and smoothing parameters.
- Sensitivity analysis: We conducted and will include a supplementary sensitivity analysis of Dis_UN across different quantile levels (τ = 0.75–0.99), to justify our choice of the 95th quantile.
- Novelty and Future Directions: We extended the introduction and discussion to better distinguish Dis_UN from prior methods and to position it for future research in uncertainty quantification.
In addition to addressing the reviewer’s comments, we have enriched the comparison to state-of-the-art uncertainty estimation methods. In the literature, both probabilistic and deterministic deep ensemble approaches are used; we now explicitly include both methods.
A detailed, point-by-point response, including the proposed changes in the manuscript, are attached.
Kind regards,
Eya Cherif (on behalf of all co-authors).
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
551 | 155 | 17 | 723 | 23 | 13 | 28 |
- HTML: 551
- PDF: 155
- XML: 17
- Total: 723
- Supplement: 23
- BibTeX: 13
- EndNote: 28
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
2 citations as recorded by crossref.
- Mapping fractional vegetation cover in UAS RGB and multispectral imagery in semi-arid Australian ecosystems using CNN-based semantic segmentation L. Sotomayor et al. 10.1007/s10980-025-02193-y
- Facilitating macrosystem biology with organismal‐scale airborne remote sensing: Challenges and opportunities S. Graves et al. 10.1111/1365-2435.70083