Assessing forest properties with data-driven vegetation indices: insights from 900,000 forest stands
Abstract. Vegetation indices (VIs) are widely used to assess forest properties, but deriving VIs for attributes not mechanistically linked to forests’ solar reflectance is challenging. Here, data-driven VIs could help, which yield information based on correlations identified in large datasets of forest and reflectance data. However, data-driven VIs are prone to bias and overfitting if data is limited and the functional form and wavelengths used for the VIs are not sensibly constrained. In this study, we facilitate the development of data-driven VIs by systematically analyzing VIs with two wavelengths (400 nm–2400 nm) and evaluating their correlations to biomass, leaf area index (LAI), gross primary production (GPP), and net primary production (NPP) subject to different sources of environmental and physiological uncertainty. Considering 900,000 forest stands simulated via a forest and radiative transfer modelling approach, we introduced a new class of VIs and found that data-driven VIs can provide highly accurate estimates. Particularly VIs combining near and shortwave infrared light yielded promising results, with biomass, LAI, and GPP often being well estimable from the same wavelength combinations; visible light gained importance in less dense and structurally heterogeneous forests. Both the functional form of the VIs and the considered uncertainty factors did not primarily reduce the achievable accuracy, but instead constrained the range of wavelengths from which good indices could be constructed. This suggests that data-driven vegetation indices can yield valuable results if the wavelength choice is optimized. This opens new pathways for utilizing recent hyperspectral satellite missions such as EnMAP.