the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Improving Terrestrial Carbon Flux Simulations With Machine Learning and Global Earth Observations
Abstract. The land carbon cycle can act as both a negative and positive climate feedback. Currently, it serves as a negative feedback, absorbing about one-third of anthropogenic CO2 emissions. However, multi-model studies project a weakening of this sink, with the potential for a future shift to a carbon source. Significant inter-model differences persist, limiting confidence in these projections. Some of these discrepancies may arise from parameter uncertainty. Advances in artificial intelligence, computing, and Earth observations now offer new opportunities to better constrain key model parameters. While previous studies have shown that parameter optimization can substantially improve model performance, they have not explored its impact on the future carbon balance. To address this gap, I use a machine learning algorithm to optimize 28 model parameters based on 13 global Earth observation datasets. The resulting parameter set is then applied in carbon cycle simulations under historical conditions and a high-emissions future scenario. Results show that optimization significantly improves model performance, particularly for gross primary productivity (GPP), leaf area index, and sensible heat flux. Globally, optimized net biome productivity is lower than in the default simulation (33 % lower from 1960 to 2022 and 43 % lower from 2015 to 2100) due to reduced GPP and increased autotrophic respiration. Regionally, optimization tends to weaken both carbon sinks and sources, reducing the contrast between them. In conclusion, parameter tuning can substantially alter historical and future carbon fluxes, with effects comparable to adding new processes. To reduce inter-model spread, modeling groups should integrate advanced parameter optimization frameworks into their model development cycle.
- Preprint
(5413 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 19 Sep 2025)
-
RC1: 'Comment on egusphere-2025-2517', Anonymous Referee #1, 01 Aug 2025
reply
This study applies a machine learning-based Genetic Algorithm (GA) and multiple global Earth observation datasets to systematically optimize poorly constrained parameters in the CLASSIC land surface model. The optimization is conducted over a long historical period (1701–2020), simultaneously targeting multiple variables and using multiple observational data streams, aiming to improve historical simulation performance and assess future terrestrial carbon fluxes under the SSP5-8.5 scenario. Despite these strengths, several issues may limit the scientific impact and clarity of the manuscript. My detailed comments are as follows:
L233: The global representativeness of the randomly selected 160 grid cells should be evaluated. These cells may not capture regional differences or small-scale processes, and if the selected grids differ substantially from the target regions, the optimized parameters may not be suitable for local applications. While the 160 grids were randomly selected, it is not stated whether multiple random samplings were performed to test the stability of results. Different random seeds could lead to different optimal parameter sets.
Using the same set of observational data for both fitness evaluation and parameter optimization lacks an independent validation set or cross-validation. This may result in good performance on the training data but poor generalization capability.
L235: The computation time of two weeks is substantial, yet the manuscript does not specify the convergence criteria, number of iterations, or early stopping strategy, raising concerns about potential waste of computational resources. If the solution space is large, GA may still remain trapped in suboptimal solutions.
L253–258: Are the six land surface variables (ALBS, GPP, HFLS, HFSS, LAI, LST) weighted equally in the cost function? Different variables may differ greatly in importance (e.g., GPP is more critical for the carbon cycle), but the manuscript does not explain how weights were assigned.
L270–272: The robustness analysis was conducted with fewer grid cells, a shorter time period, and fewer generations. The representativeness of these reduced settings should be discussed in the manuscript.
L299: The finding that model performance stops improving after 25 generations may be due to GA parameter settings. This should be considered and discussed.
L315: The statement that “some variables did not improve” is made without analyzing the possible causes. This could be due to structural model errors rather than parameter settings, or uncertainties in the observational datasets. The discussion should include potential reasons and possible future improvements.
L338: Although the optimized simulation is slightly better than the default in some statistical metrics, the differences are described as “too minor to be considered meaningful.” The manuscript should discuss why optimizing 28 parameters results in only limited improvement in NBP, which may be related to observation errors, insufficient parameter representativeness, or model structural deficiencies.
L385: While two GA configurations were found to perform better than the default, the manuscript does not analyze their characteristics (e.g., differences in selection/crossover/mutation strategies) or why they perform better. Such analysis would help in better understanding the influence of GA settings on optimization results.
In the main text, some figures and tables could be moved to the supplementary materials to improve readability, such as Figures 1, 2, 7 and Tables 1, 2.
Citation: https://doi.org/10.5194/egusphere-2025-2517-RC1 -
AC1: 'Reply on RC1', Christian Seiler, 20 Aug 2025
reply
I thank the reviewer for their thoughtful and constructive feedback on my manuscript. Please find my point-by-point responses below.
REVIEWER: This study applies a machine learning-based Genetic Algorithm (GA) and multiple global Earth observation datasets to systematically optimize poorly constrained parameters in the CLASSIC land surface model. The optimization is conducted over a long historical period (1701–2020), simultaneously targeting multiple variables and using multiple observational data streams, aiming to improve historical simulation performance and assess future terrestrial carbon fluxes under the SSP5-8.5 scenario. Despite these strengths, several issues may limit the scientific impact and clarity of the manuscript. My detailed comments are as follows:
L233: The global representativeness of the randomly selected 160 grid cells should be evaluated. These cells may not capture regional differences or small-scale processes, and if the selected grids differ substantially from the target regions, the optimized parameters may not be suitable for local applications. While the 160 grids were randomly selected, it is not stated whether multiple random samplings were performed to test the stability of results. Different random seeds could lead to different optimal parameter sets.
ANSWER: I completed the optimization for a single set of randomly selected grid cells. Whether a different selection of grid cells will lead to substantially different parameters values depends on how representative the sample size is. The sample size is based on computational limits rather than representativity. I will address this comment by conducting additional optimizations using a different selection of grid cells. Given the computational expense, I will only be able to provide few additional optimization experiments.
REVIEWER: Using the same set of observational data for both fitness evaluation and parameter optimization lacks an independent validation set or cross-validation. This may result in good performance on the training data but poor generalization capability.
ANSWER: The optimization is performed for 160 grid cells, while the evaluation shown in Figure 6 includes all 2,444 grid cells. Thus, only about 7% of the grid cells used in the evaluation were also included in the tuning process. Therefore, the evaluation results are largely driven by grid cells that were not part of the optimization.
REVIEWER: L235: The computation time of two weeks is substantial, yet the manuscript does not specify the convergence criteria, number of iterations, or early stopping strategy, raising concerns about potential waste of computational resources. If the solution space is large, GA may still remain trapped in suboptimal solutions.
ANSWER: This information is shown in Figure 4 and described in the text (L304). The figure indicates that I used 25 generations with a population size of 100 chromosomes. This corresponds to 25 x 100 = 2500 simulations for 160 grid cells. I will add this information to the text to make it more explicit.
The improvement in performance decreases from generation to generation, and Figure 4 illustrates that very little gain can be expected after generation 25. One might argue that computational time could have been saved by stopping the optimization after generation 15. However, this is not evident unless additional simulations are conducted that demonstrate diminishing progress. While I am confident that the solution could be improved by adding more iterations, I believe that the cost–benefit ratio would become too large.
It is possible that the solution represents a local rather than a global optimum. However, I would like to emphasize that the method I chose is less prone to being trapped in local optima due to the use of populations. Even if the result does reflect a local optimum, it is still superior to the default solution. Finally, if systematic parameter optimization is not conducted, parameter values must be hand-tuned - a cumbersome approach that is far more likely to result in a suboptimal solution.REVIEWER: L253–258: Are the six land surface variables (ALBS, GPP, HFLS, HFSS, LAI, LST) weighted equally in the cost function? Different variables may differ greatly in importance (e.g., GPP is more critical for the carbon cycle), but the manuscript does not explain how weights were assigned.
ANSWER: Yes, I assign all variables equal weight. I have considered weighting them differently, but that immediately raises the question of which criteria should determine the weights. One could argue that GPP is more critical for the carbon cycle, but the carbon, energy, and water cycles are all coupled and must remain consistent. It could also be argued that larger weights should be assigned to variables with lower observational uncertainty, but such uncertainties are difficult to quantify. In my view, defining weights opens the door to very subjective discussions that I would prefer to avoid. From my perspective, all aspects of the carbon, water, and energy fluxes should be considered equally important. I will add this argument to the text.
REVIEWER: L270–272: The robustness analysis was conducted with fewer grid cells, a shorter time period, and fewer generations. The representativeness of these reduced settings should be discussed in the manuscript.
ANSWER: Agree, I will either raise this limitation in the discussion section, or replace this part of the analysis with the additional experiments using a different selection of grid cells, as outlined above.
REVIEWER: L299: The finding that model performance stops improving after 25 generations may be due to GA parameter settings. This should be considered and discussed.
ANSWER: Optimizing the optimization process is challenging given the large number of different possible combinations of selection, crossover, and mutation functions and corresponding hyperparameters. I briefly raise the issue in Line 425 and will expand on this in the revised version of the manuscript.
REVIEWER: L315: The statement that “some variables did not improve” is made without analyzing the possible causes. This could be due to structural model errors rather than parameter settings, or uncertainties in the observational datasets. The discussion should include potential reasons and possible future improvements.
ANSWER: Agree, I will include this in the revisions.
REVIEWER: L338: Although the optimized simulation is slightly better than the default in some statistical metrics, the differences are described as “too minor to be considered meaningful.” The manuscript should discuss why optimizing 28 parameters results in only limited improvement in NBP, which may be related to observation errors, insufficient parameter representativeness, or model structural deficiencies.
ANSWER: Please note that the optimization significantly improves model performance, particularly for gross primary productivity, leaf area index, and sensible heat flux. The model was not optimized for NBP, as no reliable globally gridded observational NBP data sets are available. My hope was that improving other surface variables would lead to global NBP values more consistent with global observations (i.e. globally accumulated NBP, which is reasonably well constrained). This hope was somewhat disappointed. Interestingly though, the NBP from the optimized run differs considerably from that of the default run, but the overall improvement is too minor to be meaningful. The limited improvement arises because NBP was not included in the optimization. I address how this limitation could be overcome in future studies (L419), namely by replacing the model with a much faster statistical emulator and by optimizing this emulator for global NBP. I will expand on this point in the revised manuscript.
REVIEWER: L385: While two GA configurations were found to perform better than the default, the manuscript does not analyze their characteristics (e.g., differences in selection/crossover/mutation strategies) or why they perform better. Such analysis would help in better understanding the influence of GA settings on optimization results.
ANSWER: I agree. I will either discuss the differences or replace this part of the analysis with the additional experiments using a different selection of grid cells, as outlined above. The analysis shown in Figure 11 is based on a much shorter optimization period and a smaller sample of grid cells. If I conduct multiple optimizations with the same settings as in the final optimization, then the results from that analysis will be directly comparable.
REVIEWER: In the main text, some figures and tables could be moved to the supplementary materials to improve readability, such as Figures 1, 2, 7 and Tables 1, 2.
ANSWER: I will carefully revisit what figures and tables should be in the main text.
Citation: https://doi.org/10.5194/egusphere-2025-2517-AC1
-
AC1: 'Reply on RC1', Christian Seiler, 20 Aug 2025
reply
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
345 | 38 | 14 | 397 | 6 | 21 |
- HTML: 345
- PDF: 38
- XML: 14
- Total: 397
- BibTeX: 6
- EndNote: 21
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1