the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Assessing forest properties with data-driven vegetation indices: insights from 900,000 forest stands
Abstract. Vegetation indices (VIs) are widely used to assess forest properties, but deriving VIs for attributes not mechanistically linked to forests’ solar reflectance is challenging. Here, data-driven VIs could help, which yield information based on correlations identified in large datasets of forest and reflectance data. However, data-driven VIs are prone to bias and overfitting if data is limited and the functional form and wavelengths used for the VIs are not sensibly constrained. In this study, we facilitate the development of data-driven VIs by systematically analyzing VIs with two wavelengths (400 nm–2400 nm) and evaluating their correlations to biomass, leaf area index (LAI), gross primary production (GPP), and net primary production (NPP) subject to different sources of environmental and physiological uncertainty. Considering 900,000 forest stands simulated via a forest and radiative transfer modelling approach, we introduced a new class of VIs and found that data-driven VIs can provide highly accurate estimates. Particularly VIs combining near and shortwave infrared light yielded promising results, with biomass, LAI, and GPP often being well estimable from the same wavelength combinations; visible light gained importance in less dense and structurally heterogeneous forests. Both the functional form of the VIs and the considered uncertainty factors did not primarily reduce the achievable accuracy, but instead constrained the range of wavelengths from which good indices could be constructed. This suggests that data-driven vegetation indices can yield valuable results if the wavelength choice is optimized. This opens new pathways for utilizing recent hyperspectral satellite missions such as EnMAP.
- Preprint
(4248 KB) - Metadata XML
-
Supplement
(1810 KB) - BibTeX
- EndNote
Status: open (until 02 Jan 2026)
- RC1: 'Comment on egusphere-2025-5198', Colin Bloom, 01 Dec 2025 reply
-
RC2: 'Comment on egusphere-2025-5198', Anonymous Referee #2, 16 Dec 2025
reply
Peer review report on “Assessing forest properties with data-driven vegetation indices: insights from 900,000 forest stands”
Comments to Authors:
Overview and Major Comments:
This manuscript presents a comprehensive and methodologically innovative study that systematically explores the potential of data-driven vegetation indices (VIs) to estimate key forest properties (biomass, LAI, GPP, and NPP). By coupling an individual-based forest dynamics model (FORMIND) with a multilayer radiative transfer model (mSCOPE) and applying a Monte Carlo sampling strategy, the authors generate an exceptionally large and well-structured synthetic dataset. The analysis of all possible two-wavelength combinations across the 400–2400 nm range, combined with an explicit treatment of multiple uncertainty sources, represents a substantial advance over existing studies.
Overall, the manuscript is scientifically sound, clearly written, and highly relevant to the remote sensing and forest ecology communities, particularly in the context of emerging hyperspectral satellite missions such as EnMAP. The study offers valuable conceptual insights into wavelength selection, index design, and uncertainty robustness. I think the manuscript suitable for publication after some revisions, mainly aimed at clarifying applicability and ensuring reproducibility.
Major comments
- The introduction could be strengthened by improving accessibility for a broad audience. A greater emphasis on the ecological motivation would be beneficial. The authors could clarify why forest parameters such as forest biomass, LAI, GPP, and NPP are critical variables and why their large-scale estimation remains challenging.
- While the comprehensive coupling of forest model and radiative transfer model is impressive, the manuscript would be more convincing if the authors could demonstrate that FORMIND reasonably represents real forest conditions. For example, how do the observation-based estimates of forest properties at sites such as “Hohes Holze” (i.e., biomass, LAI, GPP, NPP) compare with the simulated ranges generated by FORMIND? Are all observed values captured within the model’s simulated distributions?
Minor comments:
Line 5: “two wavelengths (400 nm-2400 nm)”à”two wavelength (within 400 nm -2400 nm), to avoid the confusion.
Line 116: You may specify the meaning of the ODM and the correct unit for different variables.
Lines 117-118: Consider briefly explaining the interpretation of DBH entropy values—for example, does a more negative value indicate lower heterogeneity?
For Table 1: It would be helpful to add references or explanations regarding the rationale for selecting the wavelengths used in classical indices.
Lines 179-180: what are the criteria to select the “separating thresholds” for biomass and DBH entropy?
Lines 341-342: The study would be further strengthened if the developed hybrid model could be validated against sites with hyperspectral observations and observation-based estimates of forest properties in future work.
Data sets
Forest characteristics and reflectance spectra of simulated temperate forests under different uncertainty regimes Samuel M. Fischer, Rico Fischer, Andreas Huth https://zenodo.org/doi/10.5281/zenodo.16748241
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 269 | 113 | 23 | 405 | 38 | 18 | 20 |
- HTML: 269
- PDF: 113
- XML: 23
- Total: 405
- Supplement: 38
- BibTeX: 18
- EndNote: 20
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review: Assessing forest properties with data-driven vegetation indices: insights from 900,000 forest stands
This manuscript examines the effectiveness of existing vegetation indices and novel hyperspectral indices for estimating forest properties: biomass, leaf area index (LAI), gross primary production (GPP) and net primary production (NPP). The methods combine individual-based forest and radiative transfer models to simulate many forest stands, their estimated biomass, LAI, GPP, NPP and 250 canopy wavelengths. Results suggest that unique wavelength pairs often offer stronger estimates of NPP, GPP and biomass than common vegetation indices. A data driven approach also offers strong correlations.
In general, I found the applied methodology interesting and compelling. The manuscript offers several valuable insights, for example, Table 2 could serve as a nice reference when considering existing vegetation indices for estimating forest properties. This said, in my opinion, the manuscript also offers significant room for improvement, particularly in its focus and clarity. I apologize in advance for my many comments below, but I hope they will aid the authors in their revisions. Overarching comments are followed by line-by-line comments.
Introduction: A systematic and data driven identification of important reflectance wavelengths for forest properties is absolutely warranted. Few conventional vegetation indices have been systematically developed to target important forest parameters like NPP or GPP. This is a compelling argument for conducting this analysis. Hyperspectral data and data driven indices derived from synthetic samples are only one way to investigate the problem. Much of the introduction, however, is taken up by a methodological background (Lines 33 to 74), which attempts to set the stage for methodological novelty. I find this argument less compelling. As mentioned, Henniger et al. (2023) has already applied the novel combined methodology used in the analysis. Tweaking the approach is just an application of the methods. In my opinion, the methodological background could be substantially condensed and placed at the start of the methods. In its place, a clearer introduction to the overarching problem and research question would greatly improve the clarity of the work. Some relevant questions which I think need to be addressed therein: What are NPP, GPP, and biomass and why are they important? Why do we need to monitor these parameters? How have vegetation indices traditionally been developed? Have vegetation indices been produced for NPP or GPP specifically? What platforms exist (or will exist soon) to monitor changes? I believe more complete answers to these questions will make it abundantly clear why a new systematic approach to vegetation indices is needed.
Methods and Results: Understanding the manuscript in its entirety took several reads and more effort than I think is necessary. While I appreciate and understand the sensitivity testing of the radiative transfer model, this is, in my opinion, secondary to the overall intent of the main manuscript. Moving this portion of the analysis into the Supplement would make it much easier to read the manuscript through and to understand the methods and results.
Methods and Discussion: A lot of effort was made to evaluate different forest structures and sensitivity test the influence of different model parameters. However, it seems like the models are all based on the environmental conditions at one site in Germany (e.g., line 114). Does this also warrant a sensitivity test? At a minimum, it requires justification. Is this site representative of a broad range of sites? Is the model going to be overfit to the conditions at Hohes Holz? If not, why not?
Methods and Discussion: It seems that all wavelengths are an average of the plot, but I cannot find this mentioned. This needs to be discussed. Higher and higher spatial resolution multi- and hyperspectral data are commonplace. Are the model results still relevant on different scales? EnMAP, for example, has 30 m resolution. What is the influence of variability across the plot?
Methods and Discussion: Could a simple correlation analysis between parameters and single wavelengths provide some additional insight into appropriate wavelengths before looking at pairs?
Discussion: I believe this analysis would benefit immensely from (even a very small) application to real-world data. I realize that it is an additional effort which the authors may consider beyond the scope of the analysis, but it is relevant to the level of trust that can be placed in the primarily theoretical findings. Could several derived indices simply be applied to data from Hohes Holz to estimate biomass and compare with survey data? At a bare minimum, some concrete recommendations for next steps are required in the discussion. What models should we try with real-world data? For example, a short table of high performing wavelength combinations which match common satellite data could be very helpful. A huge amount of detailed effort has gone into the analysis, so it seems a bit of a shame for the discussion to end on high-level findings about general regions of the light spectrum that are good for GPP and NPP.
Line-by-line Comments:
Line 5: “with two wavelengths (400 nm- 2400 nm)” suggests only two wavelengths are evaluated. I think this should read something like: “using wavelengths between 400 nm and 2400 nm”
Line 9: “provide highly accurate estimates” of what?
Line 9-11: This sentence is generally unclear and ‘estimable’ (which means ‘worthy of great respect’) was unlikely the intended word choice.
Line 12: “did not primarily reduce the achievable accuracy” I am not sure what this means. The accuracy of what?
Line 21: “Here” implies that remote sensing is part of the analysis.
Line 24: Just because hyperspectral data captures more data does not make it inherently more useful than traditional multi-spectral data. See my comment on the direction of the introduction above.
Line 55: 2500 nm does not match the rest of the text.
Line 133: This seems somewhat circular but also makes a good case for applying new vegetation indices to the actual site to see how it performs.
Line 134 to 149: Moving this (and associated results) to the Supplement would help smooth out the main text.
Line 151 to 152: How much was removed by this filtering?
Line 161: sp. value
Table 1: Different sensors capture different wavelengths, why were these chosen for classical indices? I think it deserves an explanation in the text.
Line 182: “in the absence of noise” Is this the uncertainty analysis? It is not clear.
Line 195: Can this include a percent difference?
Line 202: “estimable” again.
Line 280: Yes, but it was not the original intent of most VIs to determine NPP or GPP. This should be clarified here and in the introduction.
Thank you for the opportunity to evaluate this work and I wish the authors the best of luck in their revisions.
Colin Bloom