the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
2001–2022 global gross primary productivity dataset using an ensemble model based on random forest
Abstract. The continuous advancement of remote sensing technology has been instrumental in improving models for estimating terrestrial gross primary productivity (GPP). However, challenges arise from inconsistent spatial distributions and interannual variations in GPP datasets, impeding our comprehensive understanding of the entire terrestrial carbon cycle. In contrast to previous models relying on remote sensing and environmental variables, we developed a an ensemble model based on random forest named GPPERF. This model utilized the GPP outputs from established remote sensing-based models (EC-LUE, GPP-kNDVI, GPP-NIRv, Revised-EC-LUE) as inputs for GPP estimations. GPPERF demonstrated significant effectiveness by explaining 83.7 % of the monthly variation in GPP across 171 sites. This performance surpassed the selected remote sensing models (72.4 %–77.1 %) and an independent random forest model using remote sensing and environmental variables (77.7 %). Over the period from 2001 to 2022, the global estimated GPP value using the ensemble model based on random forest was 131.2 PgC yr-1, exhibiting a trend of 0.45 PgC yr-2. Furthermore, evaluation results utilizing flux sites from ChinaFlux indicated that the dateset exhibited good generalization. In summary, the machine learning-based ensemble method helps to reduce the uncertainty in the estimation of a single remote sensing model and provides a more reliable estimation of global GPP.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(1281 KB)
-
Supplement
(1579 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1281 KB) - Metadata XML
-
Supplement
(1579 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2024-114', Anonymous Referee #1, 06 Mar 2024
In their study the authors created two new datasets of gross primary productivity (GPP), one based on remote sensing and environmental predictors and one an ensemble of four existing GPP models. Both models connect predictors and observed GPP using Random Forests. To test the practicality of their approach, the authors compared their two products and the four existing models to FLUXNET site observations. Additionally, they created a global gridded GPP estimate using the ensemble-based approach and performed an independent evaluation using site observations from FluxChina. Improving estimates of global GPP is indeed an important scientific challenge. However, while the reported model metrics suggest a substantial improvement in particular for their ensemble-based model compared to existing models, I am not convinced of the novelty and whether there is indeed a real improvement. My main concerns are the following:
- The methodology behind the model evaluation is unclear. It seems that all models “saw” the full FLUXNET data during parameter calibration and then final model evaluation (Fig. 1-4) was computed based on the full dataset? Model evaluation should be done on a separate test dataset. If no separate test dataset existed, the ensemble approach might just have learned the typical GPP values of this site and its fluctuations from the patterns in the four other models. There is an independent evaluation included in the paper which does not suffer from this issue (ChinaFlux), however, only 12 sites are included and other existing models show comparable prediction skills.
- Even for the evaluation performed on a separate test dataset (i.e. ChinaFlux), I wonder whether the good prediction skill of GPPERF is mostly a result of spatial autocorrelation, i.e. by learning the patterns from the four GPP products RFERF basically finds the correct region and predicts the GPP values of the nearest FLUXNET site?
- The authors’ remote sensing and environmental predictors model seems to be similar to the FLUXCOM approach. I wonder what is the advantage and why FLUXCOM is not included in the comparison?
- The authors recalibrated the parameters underlying the four existing models but the justification for this action is unclear. I would like to see a comparison with the original models to see whether this indeed led to improvements in model performance.
- Several existing GPP datasets are only shown in the comparison to ChinaFlux but were not included in the ensemble-based product. Vice versa, two of the models used in the FLUXNET comparison were omitted from the ChinaFlux comparison. I wonder why the authors selected these four models (EC-LUE, Revised-EC-LUE, GPP-kNDVI, GPP-NIRv) in the ensemble approach even though the comparison in Fig. 6 suggests other products perform much better? If the reason is the spatial resolution this should be better explained.
Minor comments:
L18: Remove “a”.
L33: I think you mean “to the terrestrial carbon cycle”.
L38: Unclear, is this about remote sensing-based estimates or GPP estimates in general? Also it is unclear how the approach applied in this study helps with the problems mentioned in the following sentences. Overall the introduction lacks connectivity.
L48: Unclear, do you mean the models assume a positive relationship between CO2 and GPP while it is actually negative? Or that CO2 fertilization started to saturate?
L55: Is this for the same region?
L73: “low”?
L85: “ERA”. Also references are missing.
L108: How were they resampled?
L115: Why only 171 sites? Did the other sites not contain any high-quality years?
L120: The paper often mentions “remote sensing models” but the atmospheric data is actually from a reanalysis (ERA5) or FLUXNET.
L121: What is “traditional random forest model”? The authors often mix the nature of the data (e.g. remote sensing) and modelling approach (e.g. random forests).
L125: Table 1 says EC-LUE also considers CO2.
L127: SIF was not mentioned previously.
L129 A brief summary of random forests is needed. Also why did you choose these four predictors? I assume adding more variables would increase model performance.
L132: “multi-model”.
L137: Provide information about data source. If I understood correctly, e.g. FPAR is from MODIS (500m) while AT from FLUXNET? And ERA5 AT is only used for the global prediction? This is confusing. Also where is the NIR data from?
L140: What differences do you mean?
L155: The model overestimates or underestimates.
F160: How many? Again, references are missing.
L166: Lack of consistency, GPPERF, ERF_GPP or “random forest-based ensemble model”? Or does GPPERF refer to the site predictions while ERF_GPP to the global ones? Again, why are some models thrown out in this step while others are included for the first time?
L185: What do you mean by changes in cropland? Do you mean seasonal changes in cropland GPP?
Fig. 2+Fig. S3 Why are the metrics different? Is Fig. S3 the mean of the individual sites while Fig. 2 the mean of all data?
L207: “models”. This error occurs several times in the manuscript.
L215: What do you mean by extreme? The highest values (>10 gC/m2/d)? Does this represent 33% of all data?
Fig. S2: Why is there an extra panel for site 1? Why don’t you also show the FLUXNET sites?
In general, having a native English speaker review the text would enhance its quality.
Citation: https://doi.org/10.5194/egusphere-2024-114-RC1 - AC1: 'Reply on RC1', Tiexi Chen, 07 Apr 2024
-
RC2: 'Comment on egusphere-2024-114', Anonymous Referee #2, 12 Mar 2024
Publisher’s note: this comment was edited on 13 March 2024. The following text is not identical to the original comment, but the adjustments were minor without effect on the scientific meaning.
This study offers a contribution to global gross primary production (GPP) mapping, developing an ensemble model based on random forest algorithm. This model inputs GPP estimations from various remote sensing-based models, showing superior accuracy by explaining 83.7% of GPP variations across 171 sites, outperforming traditional models. It estimates the global GPP to be 131.2 PgC yr-1 from 2001-2022, with an increasing trend. While the authors have done a lot of work and the work is significant, the paper could benefit from a more comprehensive consideration of certain details and improvements in writing clarity.
- In Section 2.3, the authors selected specific models as input variables for the ERF model. However, other widely applied models such as the P model, VPM model, MODIS GPP algorithm, and NIRvP for vegetation indices have not been considered. What was the rationale behind selecting these four models? Furthermore, in comparing global results, why were certain products chosen, such as VPM, MODIS, and FLUXCOM data, especially considering FLUXCOM also employs machine learning methods and has released a new version of its data (FLUXCOMX)? Additionally, it appears the ECGC has only recently been launched and may not be as "widely used" as mentioned in the manuscript.
- The authors compare the ERF model with a traditional random forest (RF) model. Table 2 indicates that the traditional RF model used only 4 variables, while the ERF model incorporates several GPP estimation models. However, it actually includes even more variables, such as kNDVI, NIRv, FPAR, CO2, dif/dir SR, etc. The ERF model contains more variables than the RF model, but for a fair comparison, the same data should be used. Would the accuracy of the ERF model still surpass that of the RF model if an RF model were constructed using all data inputs from the ERF model?
- Why did the authors opt to estimate monthly GPP instead of daily? Are the estimation results from different models in the ERF model aggregated from daily to monthly, or are they directly estimating monthly GPP? If monthly, how are parameters like Solar Zenith Angle adjusted when optimizing the rECLUE model?
- In Table 2, the EC-LUE model considers VPD and CO2, which the original model does not. The supplementary documents indicate that the authors modified the EC-LUE model, thus it is no longer the original EC-LUE model. The only difference between it and the rECLUE model seems to be the consideration of sunlit and shaded leaves. Given that Figure 1 shows minimal differences between them, does including it as an input for the ERF model result in redundancy with rECLUE?
- The introduction requires careful revision as many uncertainties or current issues listed by the authors seem not to be addressed in this manuscript.
Some detailed comments:
- L41: The authors suggest poor estimation accuracy partly because remote sensing models cannot fully represent photosynthesis. Does the ERF model overcome this limitation?
- L46-47: What does "this process may be missing" refer to? Is it the CO2 fertilization effect or a negative trend influenced by CO2? If it's the fertilization effect, many models already consider its impact. If it refers to a negative trend, what improvements have been made in the ERF model? I think this negative trend might not be incorporated into the model.
- L52: The authors note significant differences in the same vegetation types across different regions, but it seems the ERF model did not address this variability when optimizing parameters and developing the model.
- L54-55: It's unclear what this typical example refers to. Parameters for C3 and C4 vegetation inherently need to be considered separately, representing two different vegetation types.
- L56-60: Environmental factors add to GPP estimation uncertainty. How have the authors improved or reduced this uncertainty, given that most models already account for environmental factors?
- L69-70: Tian et al. (2023) also applied ML models to multi-model ensembles. What are the innovative aspects of this study compared to their research?
- L85: How is ERA5-LAND data processed in coastal regions? What is the reason for choosing temperature and radiation data from ERA5-Land and ERA5 respectively (this distinction should be made clear in Table 1)?
- L104: What does "reference year" mean? How are different datasets aggregated to 0.05 degrees?
- Section 2.5: Why not utilize all available Fluxnet sites for validation instead of limiting to only Chinese sites? Would this not lead to a smaller dataset and reduce the representativeness for validating a global product?
- Figures 1 and 2: It's recommended to include units for GPP, and RMSE should also specify units.
- Figure 3: Adding seasonal variation for representative sites of different vegetation types could better highlight the model's advantages.
- L228: Does ERF_GPP refer to the global product, while GPPERF denotes site estimation values?
- L257, NIRV should be corrected to NIRv.
- In Figure S4, discrepancies with Figure 6 are noted. Is it reasonable to directly average accuracy across various sites, given differences in data quantity and the range of GPP values at different sites?
- L275: What does "representative" refer to in this context?
- L280-282: Some models and products already utilize dynamic temperature parameters, which the authors have not mentioned or compared.
- L283-293: Could the overestimation of low values be due to scale issues, even at the site scale, considering the used LAI is 500 m?
- In the ERF model, is it possible to output the importance of different models during the estimation process?
- Section 4.2: Supplementing the spatial distribution of product uncertainty is recommended.
Citation: https://doi.org/10.5194/egusphere-2024-114-RC2 - AC2: 'Reply on RC2', Tiexi Chen, 07 Apr 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2024-114', Anonymous Referee #1, 06 Mar 2024
In their study the authors created two new datasets of gross primary productivity (GPP), one based on remote sensing and environmental predictors and one an ensemble of four existing GPP models. Both models connect predictors and observed GPP using Random Forests. To test the practicality of their approach, the authors compared their two products and the four existing models to FLUXNET site observations. Additionally, they created a global gridded GPP estimate using the ensemble-based approach and performed an independent evaluation using site observations from FluxChina. Improving estimates of global GPP is indeed an important scientific challenge. However, while the reported model metrics suggest a substantial improvement in particular for their ensemble-based model compared to existing models, I am not convinced of the novelty and whether there is indeed a real improvement. My main concerns are the following:
- The methodology behind the model evaluation is unclear. It seems that all models “saw” the full FLUXNET data during parameter calibration and then final model evaluation (Fig. 1-4) was computed based on the full dataset? Model evaluation should be done on a separate test dataset. If no separate test dataset existed, the ensemble approach might just have learned the typical GPP values of this site and its fluctuations from the patterns in the four other models. There is an independent evaluation included in the paper which does not suffer from this issue (ChinaFlux), however, only 12 sites are included and other existing models show comparable prediction skills.
- Even for the evaluation performed on a separate test dataset (i.e. ChinaFlux), I wonder whether the good prediction skill of GPPERF is mostly a result of spatial autocorrelation, i.e. by learning the patterns from the four GPP products RFERF basically finds the correct region and predicts the GPP values of the nearest FLUXNET site?
- The authors’ remote sensing and environmental predictors model seems to be similar to the FLUXCOM approach. I wonder what is the advantage and why FLUXCOM is not included in the comparison?
- The authors recalibrated the parameters underlying the four existing models but the justification for this action is unclear. I would like to see a comparison with the original models to see whether this indeed led to improvements in model performance.
- Several existing GPP datasets are only shown in the comparison to ChinaFlux but were not included in the ensemble-based product. Vice versa, two of the models used in the FLUXNET comparison were omitted from the ChinaFlux comparison. I wonder why the authors selected these four models (EC-LUE, Revised-EC-LUE, GPP-kNDVI, GPP-NIRv) in the ensemble approach even though the comparison in Fig. 6 suggests other products perform much better? If the reason is the spatial resolution this should be better explained.
Minor comments:
L18: Remove “a”.
L33: I think you mean “to the terrestrial carbon cycle”.
L38: Unclear, is this about remote sensing-based estimates or GPP estimates in general? Also it is unclear how the approach applied in this study helps with the problems mentioned in the following sentences. Overall the introduction lacks connectivity.
L48: Unclear, do you mean the models assume a positive relationship between CO2 and GPP while it is actually negative? Or that CO2 fertilization started to saturate?
L55: Is this for the same region?
L73: “low”?
L85: “ERA”. Also references are missing.
L108: How were they resampled?
L115: Why only 171 sites? Did the other sites not contain any high-quality years?
L120: The paper often mentions “remote sensing models” but the atmospheric data is actually from a reanalysis (ERA5) or FLUXNET.
L121: What is “traditional random forest model”? The authors often mix the nature of the data (e.g. remote sensing) and modelling approach (e.g. random forests).
L125: Table 1 says EC-LUE also considers CO2.
L127: SIF was not mentioned previously.
L129 A brief summary of random forests is needed. Also why did you choose these four predictors? I assume adding more variables would increase model performance.
L132: “multi-model”.
L137: Provide information about data source. If I understood correctly, e.g. FPAR is from MODIS (500m) while AT from FLUXNET? And ERA5 AT is only used for the global prediction? This is confusing. Also where is the NIR data from?
L140: What differences do you mean?
L155: The model overestimates or underestimates.
F160: How many? Again, references are missing.
L166: Lack of consistency, GPPERF, ERF_GPP or “random forest-based ensemble model”? Or does GPPERF refer to the site predictions while ERF_GPP to the global ones? Again, why are some models thrown out in this step while others are included for the first time?
L185: What do you mean by changes in cropland? Do you mean seasonal changes in cropland GPP?
Fig. 2+Fig. S3 Why are the metrics different? Is Fig. S3 the mean of the individual sites while Fig. 2 the mean of all data?
L207: “models”. This error occurs several times in the manuscript.
L215: What do you mean by extreme? The highest values (>10 gC/m2/d)? Does this represent 33% of all data?
Fig. S2: Why is there an extra panel for site 1? Why don’t you also show the FLUXNET sites?
In general, having a native English speaker review the text would enhance its quality.
Citation: https://doi.org/10.5194/egusphere-2024-114-RC1 - AC1: 'Reply on RC1', Tiexi Chen, 07 Apr 2024
-
RC2: 'Comment on egusphere-2024-114', Anonymous Referee #2, 12 Mar 2024
Publisher’s note: this comment was edited on 13 March 2024. The following text is not identical to the original comment, but the adjustments were minor without effect on the scientific meaning.
This study offers a contribution to global gross primary production (GPP) mapping, developing an ensemble model based on random forest algorithm. This model inputs GPP estimations from various remote sensing-based models, showing superior accuracy by explaining 83.7% of GPP variations across 171 sites, outperforming traditional models. It estimates the global GPP to be 131.2 PgC yr-1 from 2001-2022, with an increasing trend. While the authors have done a lot of work and the work is significant, the paper could benefit from a more comprehensive consideration of certain details and improvements in writing clarity.
- In Section 2.3, the authors selected specific models as input variables for the ERF model. However, other widely applied models such as the P model, VPM model, MODIS GPP algorithm, and NIRvP for vegetation indices have not been considered. What was the rationale behind selecting these four models? Furthermore, in comparing global results, why were certain products chosen, such as VPM, MODIS, and FLUXCOM data, especially considering FLUXCOM also employs machine learning methods and has released a new version of its data (FLUXCOMX)? Additionally, it appears the ECGC has only recently been launched and may not be as "widely used" as mentioned in the manuscript.
- The authors compare the ERF model with a traditional random forest (RF) model. Table 2 indicates that the traditional RF model used only 4 variables, while the ERF model incorporates several GPP estimation models. However, it actually includes even more variables, such as kNDVI, NIRv, FPAR, CO2, dif/dir SR, etc. The ERF model contains more variables than the RF model, but for a fair comparison, the same data should be used. Would the accuracy of the ERF model still surpass that of the RF model if an RF model were constructed using all data inputs from the ERF model?
- Why did the authors opt to estimate monthly GPP instead of daily? Are the estimation results from different models in the ERF model aggregated from daily to monthly, or are they directly estimating monthly GPP? If monthly, how are parameters like Solar Zenith Angle adjusted when optimizing the rECLUE model?
- In Table 2, the EC-LUE model considers VPD and CO2, which the original model does not. The supplementary documents indicate that the authors modified the EC-LUE model, thus it is no longer the original EC-LUE model. The only difference between it and the rECLUE model seems to be the consideration of sunlit and shaded leaves. Given that Figure 1 shows minimal differences between them, does including it as an input for the ERF model result in redundancy with rECLUE?
- The introduction requires careful revision as many uncertainties or current issues listed by the authors seem not to be addressed in this manuscript.
Some detailed comments:
- L41: The authors suggest poor estimation accuracy partly because remote sensing models cannot fully represent photosynthesis. Does the ERF model overcome this limitation?
- L46-47: What does "this process may be missing" refer to? Is it the CO2 fertilization effect or a negative trend influenced by CO2? If it's the fertilization effect, many models already consider its impact. If it refers to a negative trend, what improvements have been made in the ERF model? I think this negative trend might not be incorporated into the model.
- L52: The authors note significant differences in the same vegetation types across different regions, but it seems the ERF model did not address this variability when optimizing parameters and developing the model.
- L54-55: It's unclear what this typical example refers to. Parameters for C3 and C4 vegetation inherently need to be considered separately, representing two different vegetation types.
- L56-60: Environmental factors add to GPP estimation uncertainty. How have the authors improved or reduced this uncertainty, given that most models already account for environmental factors?
- L69-70: Tian et al. (2023) also applied ML models to multi-model ensembles. What are the innovative aspects of this study compared to their research?
- L85: How is ERA5-LAND data processed in coastal regions? What is the reason for choosing temperature and radiation data from ERA5-Land and ERA5 respectively (this distinction should be made clear in Table 1)?
- L104: What does "reference year" mean? How are different datasets aggregated to 0.05 degrees?
- Section 2.5: Why not utilize all available Fluxnet sites for validation instead of limiting to only Chinese sites? Would this not lead to a smaller dataset and reduce the representativeness for validating a global product?
- Figures 1 and 2: It's recommended to include units for GPP, and RMSE should also specify units.
- Figure 3: Adding seasonal variation for representative sites of different vegetation types could better highlight the model's advantages.
- L228: Does ERF_GPP refer to the global product, while GPPERF denotes site estimation values?
- L257, NIRV should be corrected to NIRv.
- In Figure S4, discrepancies with Figure 6 are noted. Is it reasonable to directly average accuracy across various sites, given differences in data quantity and the range of GPP values at different sites?
- L275: What does "representative" refer to in this context?
- L280-282: Some models and products already utilize dynamic temperature parameters, which the authors have not mentioned or compared.
- L283-293: Could the overestimation of low values be due to scale issues, even at the site scale, considering the used LAI is 500 m?
- In the ERF model, is it possible to output the importance of different models during the estimation process?
- Section 4.2: Supplementing the spatial distribution of product uncertainty is recommended.
Citation: https://doi.org/10.5194/egusphere-2024-114-RC2 - AC2: 'Reply on RC2', Tiexi Chen, 07 Apr 2024
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
414 | 134 | 33 | 581 | 67 | 17 | 23 |
- HTML: 414
- PDF: 134
- XML: 33
- Total: 581
- Supplement: 67
- BibTeX: 17
- EndNote: 23
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Xin Chen
Tiexi Chen
Xiaodong Li
Yuanfang Chai
Shengjie Zhou
Renjie Guo
Jie Dai
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1281 KB) - Metadata XML
-
Supplement
(1579 KB) - BibTeX
- EndNote
- Final revised paper