Assessing forest properties with data-driven vegetation indices: insights from 900,000 forest stands
Abstract. Vegetation indices (VIs) are widely used to assess forest properties, but deriving VIs for attributes not mechanistically linked to forests’ solar reflectance is challenging. Here, data-driven VIs could help, which yield information based on correlations identified in large datasets of forest and reflectance data. However, data-driven VIs are prone to bias and overfitting if data is limited and the functional form and wavelengths used for the VIs are not sensibly constrained. In this study, we facilitate the development of data-driven VIs by systematically analyzing VIs with two wavelengths (400 nm–2400 nm) and evaluating their correlations to biomass, leaf area index (LAI), gross primary production (GPP), and net primary production (NPP) subject to different sources of environmental and physiological uncertainty. Considering 900,000 forest stands simulated via a forest and radiative transfer modelling approach, we introduced a new class of VIs and found that data-driven VIs can provide highly accurate estimates. Particularly VIs combining near and shortwave infrared light yielded promising results, with biomass, LAI, and GPP often being well estimable from the same wavelength combinations; visible light gained importance in less dense and structurally heterogeneous forests. Both the functional form of the VIs and the considered uncertainty factors did not primarily reduce the achievable accuracy, but instead constrained the range of wavelengths from which good indices could be constructed. This suggests that data-driven vegetation indices can yield valuable results if the wavelength choice is optimized. This opens new pathways for utilizing recent hyperspectral satellite missions such as EnMAP.
Review: Assessing forest properties with data-driven vegetation indices: insights from 900,000 forest stands
This manuscript examines the effectiveness of existing vegetation indices and novel hyperspectral indices for estimating forest properties: biomass, leaf area index (LAI), gross primary production (GPP) and net primary production (NPP). The methods combine individual-based forest and radiative transfer models to simulate many forest stands, their estimated biomass, LAI, GPP, NPP and 250 canopy wavelengths. Results suggest that unique wavelength pairs often offer stronger estimates of NPP, GPP and biomass than common vegetation indices. A data driven approach also offers strong correlations.
In general, I found the applied methodology interesting and compelling. The manuscript offers several valuable insights, for example, Table 2 could serve as a nice reference when considering existing vegetation indices for estimating forest properties. This said, in my opinion, the manuscript also offers significant room for improvement, particularly in its focus and clarity. I apologize in advance for my many comments below, but I hope they will aid the authors in their revisions. Overarching comments are followed by line-by-line comments.
Introduction: A systematic and data driven identification of important reflectance wavelengths for forest properties is absolutely warranted. Few conventional vegetation indices have been systematically developed to target important forest parameters like NPP or GPP. This is a compelling argument for conducting this analysis. Hyperspectral data and data driven indices derived from synthetic samples are only one way to investigate the problem. Much of the introduction, however, is taken up by a methodological background (Lines 33 to 74), which attempts to set the stage for methodological novelty. I find this argument less compelling. As mentioned, Henniger et al. (2023) has already applied the novel combined methodology used in the analysis. Tweaking the approach is just an application of the methods. In my opinion, the methodological background could be substantially condensed and placed at the start of the methods. In its place, a clearer introduction to the overarching problem and research question would greatly improve the clarity of the work. Some relevant questions which I think need to be addressed therein: What are NPP, GPP, and biomass and why are they important? Why do we need to monitor these parameters? How have vegetation indices traditionally been developed? Have vegetation indices been produced for NPP or GPP specifically? What platforms exist (or will exist soon) to monitor changes? I believe more complete answers to these questions will make it abundantly clear why a new systematic approach to vegetation indices is needed.
Methods and Results: Understanding the manuscript in its entirety took several reads and more effort than I think is necessary. While I appreciate and understand the sensitivity testing of the radiative transfer model, this is, in my opinion, secondary to the overall intent of the main manuscript. Moving this portion of the analysis into the Supplement would make it much easier to read the manuscript through and to understand the methods and results.
Methods and Discussion: A lot of effort was made to evaluate different forest structures and sensitivity test the influence of different model parameters. However, it seems like the models are all based on the environmental conditions at one site in Germany (e.g., line 114). Does this also warrant a sensitivity test? At a minimum, it requires justification. Is this site representative of a broad range of sites? Is the model going to be overfit to the conditions at Hohes Holz? If not, why not?
Methods and Discussion: It seems that all wavelengths are an average of the plot, but I cannot find this mentioned. This needs to be discussed. Higher and higher spatial resolution multi- and hyperspectral data are commonplace. Are the model results still relevant on different scales? EnMAP, for example, has 30 m resolution. What is the influence of variability across the plot?
Methods and Discussion: Could a simple correlation analysis between parameters and single wavelengths provide some additional insight into appropriate wavelengths before looking at pairs?
Discussion: I believe this analysis would benefit immensely from (even a very small) application to real-world data. I realize that it is an additional effort which the authors may consider beyond the scope of the analysis, but it is relevant to the level of trust that can be placed in the primarily theoretical findings. Could several derived indices simply be applied to data from Hohes Holz to estimate biomass and compare with survey data? At a bare minimum, some concrete recommendations for next steps are required in the discussion. What models should we try with real-world data? For example, a short table of high performing wavelength combinations which match common satellite data could be very helpful. A huge amount of detailed effort has gone into the analysis, so it seems a bit of a shame for the discussion to end on high-level findings about general regions of the light spectrum that are good for GPP and NPP.
Line-by-line Comments:
Line 5: “with two wavelengths (400 nm- 2400 nm)” suggests only two wavelengths are evaluated. I think this should read something like: “using wavelengths between 400 nm and 2400 nm”
Line 9: “provide highly accurate estimates” of what?
Line 9-11: This sentence is generally unclear and ‘estimable’ (which means ‘worthy of great respect’) was unlikely the intended word choice.
Line 12: “did not primarily reduce the achievable accuracy” I am not sure what this means. The accuracy of what?
Line 21: “Here” implies that remote sensing is part of the analysis.
Line 24: Just because hyperspectral data captures more data does not make it inherently more useful than traditional multi-spectral data. See my comment on the direction of the introduction above.
Line 55: 2500 nm does not match the rest of the text.
Line 133: This seems somewhat circular but also makes a good case for applying new vegetation indices to the actual site to see how it performs.
Line 134 to 149: Moving this (and associated results) to the Supplement would help smooth out the main text.
Line 151 to 152: How much was removed by this filtering?
Line 161: sp. value
Table 1: Different sensors capture different wavelengths, why were these chosen for classical indices? I think it deserves an explanation in the text.
Line 182: “in the absence of noise” Is this the uncertainty analysis? It is not clear.
Line 195: Can this include a percent difference?
Line 202: “estimable” again.
Line 280: Yes, but it was not the original intent of most VIs to determine NPP or GPP. This should be clarified here and in the introduction.
Thank you for the opportunity to evaluate this work and I wish the authors the best of luck in their revisions.
Colin Bloom