the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Contextualizing Pan-Tropical Allometric Models for Biomass Estimation
Abstract. Allometric Models (AMs) play a central role in monitoring and mitigating climate change as they provide accurate estimation of biomass and carbon sequestered by trees from non-destructive, easy to obtain physical measurements. Unfortunately, practitioners spend considerable effort in researching, qualifying and choosing AMs for specific growth conditions. To overcome this situation Chave et al. (2014) developed a pan-tropical AM with equivalent accuracy to local, site-specific AMs. We ameliorate this result by incorporating contextual information pertaining to growth conditions in a Machine Learning (ML) model, eventually achieving a reduction in Mean Average Error (MAE) of -17 % as measured on hold-out data. This breakthrough shall have important impact in applications such as national forest inventories, carbon certifications and calibration of satellite based biomass maps to field data. To complete, we propose a principled method to estimate how much additional error one can expect when applying a given AM to shifting conditions and provide a data-driven safety check to practitioners.
Status: open (until 28 Mar 2026)
- RC1: 'Comment on egusphere-2025-6341', Anonymous Referee #1, 22 Feb 2026 reply
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 71 | 0 | 1 | 72 | 0 | 0 |
- HTML: 71
- PDF: 0
- XML: 1
- Total: 72
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Biomass estimation models are used to estimate tree biomass from simple physical measurements, but choosing the right model for specific conditions is difficult. This paper reanalyzes a global dataset by developing a machine learning model. The resulting model is claimed to be much improved as it reduces prediction error by 17%. Improved biomass estimation models should be based on (1) larger sample sizes, (2) a better representation of a wider range of real-life conditions. Also, developers should ensure that (3) the resulting models are correctly evaluated using the proper goodness of fit methods (Sileshi 2014), and (4) minimize the goodness of fit and show significant improvements over the previous generation of models. Finally they should ensure that (5) they are easy to implement for a wide range of practitioners (including private owners, forestry consultants, academics, and businesses), and the conditions of their use is clearly stated.
This paper address one of the five goals, namely goal 4. It does nothing to address the crucial aspects (1) and (2), and the practical implementation of the method (condition 5) is likely to be more complex rather than simpler. Code availability in Python but also R is not reported, and the fact that the authors have competing interest in this development may explain this situation. Unfortunately, it is unlikely that this manuscript will make a valuable addition in the academic literature. Below are major issues with this study, none of which, unfortunately, is fixable with a minor revision of the text.
First, the argument that including more predictors in a statistical model generally improves its fit to observed data is as old as regression theory. Because the problem at hand is simple (non-linear regression of a single predicted variable), it makes it clear that adding environmental variables as predictors improves the fit. That model (3), vastly more complex that model (1), leads to a reduction of only 17% of the MAE should raise the question of whether this tremendous increase in complexity is worth the effort. There is no clear answer to this question in this manuscript because it is predicated on the assumption that the 4004-tree dataset encapsulates the full universe of possibles. This is a serious shortcoming. In fact, looking at the main result, the 17% reduction in MAE, this result is reported in Table 3. It is shown that none of these complex models perform so well in a cross-validation test relative to the baseline reported in the first column. Gains in RMSE and MAE are at best modest, so this method is better seem as a proof of concept rather than a breakthrough result for biomass regression models.
Second, the text is written for an audience of data scientists, and it misses its potential audience. For a readership with a training in data science, this study is an application of established methods. It may be of interest for the data science community precisely because it is so simple, and which case the manuscript should be submitted to a journal of statistical learning. The intention to submit to Biogeosciences is presumably motivated by the fact to reach out to the user community (and to clients, also, given the competing interests). Users (foresters, or private actors) will however likely find this text totally opaque. Section 2.2.1 is a case in point ("context-agnostic baselines", "we adjunct a L2 regularization", "hyper-parameter optimization", "target encoding") but the whole text is full of technicalities that serve no other apparent purpose than making an impression on the non-specialized reader. If the goal is to reach out to the user community, the recommendation is to take a radically different approach and explain each and every step, assuming no prior knowledge in statistical learning methods. This would imply to drastically cut down the material presented, to provide worked-out examples of applications, and most importantly to make fully open access all the methods and scripts (both in Python and R, the latter being a more go-to language in the foresty community).
Third and last, section 2.3 is seeks to explore situations where a biomass regression model may be applied outside of condition where it was calibrated. In principle this is an important problem. In practice, Table 2 only demonstrates that some covariates are significant predictors in the regression exercise, which was the assumption at the outset. Notably, none of this is reported in this abstract. It would take more practical case studies for this theoretical section to be a convincing addition to the literature on biomass estimation models.