Improving Terrestrial Carbon Flux Simulations With Machine Learning and Global Earth Observations

Seiler, Christian

doi:10.5194/egusphere-2025-2517

Preprints

https://doi.org/10.5194/egusphere-2025-2517

Preprints

16 Jun 2025

| 16 Jun 2025

Improving Terrestrial Carbon Flux Simulations With Machine Learning and Global Earth Observations

Christian Seiler

Abstract. The land carbon cycle can act as both a negative and positive climate feedback. Currently, it serves as a negative feedback, absorbing about one-third of anthropogenic CO₂ emissions. However, multi-model studies project a weakening of this sink, with the potential for a future shift to a carbon source. Significant inter-model differences persist, limiting confidence in these projections. Some of these discrepancies may arise from parameter uncertainty. Advances in artificial intelligence, computing, and Earth observations now offer new opportunities to better constrain key model parameters. While previous studies have shown that parameter optimization can substantially improve model performance, they have not explored its impact on the future carbon balance. To address this gap, I use a machine learning algorithm to optimize 28 model parameters based on 13 global Earth observation datasets. The resulting parameter set is then applied in carbon cycle simulations under historical conditions and a high-emissions future scenario. Results show that optimization significantly improves model performance, particularly for gross primary productivity (GPP), leaf area index, and sensible heat flux. Globally, optimized net biome productivity is lower than in the default simulation (33 % lower from 1960 to 2022 and 43 % lower from 2015 to 2100) due to reduced GPP and increased autotrophic respiration. Regionally, optimization tends to weaken both carbon sinks and sources, reducing the contrast between them. In conclusion, parameter tuning can substantially alter historical and future carbon fluxes, with effects comparable to adding new processes. To reduce inter-model spread, modeling groups should integrate advanced parameter optimization frameworks into their model development cycle.

Received: 28 May 2025 – Discussion started: 16 Jun 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5413 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (5413 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

01 Jun 2026

Improving terrestrial carbon flux simulations with machine learning and global Earth observations

Christian Seiler

Earth Syst. Dynam., 17, 651–671, https://doi.org/10.5194/esd-17-651-2026,https://doi.org/10.5194/esd-17-651-2026, 2026

Short summary

Christian Seiler

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-2517', Anonymous Referee #1, 01 Aug 2025

This study applies a machine learning-based Genetic Algorithm (GA) and multiple global Earth observation datasets to systematically optimize poorly constrained parameters in the CLASSIC land surface model. The optimization is conducted over a long historical period (1701–2020), simultaneously targeting multiple variables and using multiple observational data streams, aiming to improve historical simulation performance and assess future terrestrial carbon fluxes under the SSP5-8.5 scenario. Despite these strengths, several issues may limit the scientific impact and clarity of the manuscript. My detailed comments are as follows:
L233: The global representativeness of the randomly selected 160 grid cells should be evaluated. These cells may not capture regional differences or small-scale processes, and if the selected grids differ substantially from the target regions, the optimized parameters may not be suitable for local applications. While the 160 grids were randomly selected, it is not stated whether multiple random samplings were performed to test the stability of results. Different random seeds could lead to different optimal parameter sets.
Using the same set of observational data for both fitness evaluation and parameter optimization lacks an independent validation set or cross-validation. This may result in good performance on the training data but poor generalization capability.
L235: The computation time of two weeks is substantial, yet the manuscript does not specify the convergence criteria, number of iterations, or early stopping strategy, raising concerns about potential waste of computational resources. If the solution space is large, GA may still remain trapped in suboptimal solutions.
L253–258: Are the six land surface variables (ALBS, GPP, HFLS, HFSS, LAI, LST) weighted equally in the cost function? Different variables may differ greatly in importance (e.g., GPP is more critical for the carbon cycle), but the manuscript does not explain how weights were assigned.
L270–272: The robustness analysis was conducted with fewer grid cells, a shorter time period, and fewer generations. The representativeness of these reduced settings should be discussed in the manuscript.
L299: The finding that model performance stops improving after 25 generations may be due to GA parameter settings. This should be considered and discussed.
L315: The statement that “some variables did not improve” is made without analyzing the possible causes. This could be due to structural model errors rather than parameter settings, or uncertainties in the observational datasets. The discussion should include potential reasons and possible future improvements.
L338: Although the optimized simulation is slightly better than the default in some statistical metrics, the differences are described as “too minor to be considered meaningful.” The manuscript should discuss why optimizing 28 parameters results in only limited improvement in NBP, which may be related to observation errors, insufficient parameter representativeness, or model structural deficiencies.
L385: While two GA configurations were found to perform better than the default, the manuscript does not analyze their characteristics (e.g., differences in selection/crossover/mutation strategies) or why they perform better. Such analysis would help in better understanding the influence of GA settings on optimization results.
In the main text, some figures and tables could be moved to the supplementary materials to improve readability, such as Figures 1, 2, 7 and Tables 1, 2.

Citation: https://doi.org/10.5194/egusphere-2025-2517-RC1
- AC1: 'Reply on RC1', Christian Seiler, 20 Aug 2025
  
  I thank the reviewer for their thoughtful and constructive feedback on my manuscript. Please find my point-by-point responses below.
  REVIEWER: This study applies a machine learning-based Genetic Algorithm (GA) and multiple global Earth observation datasets to systematically optimize poorly constrained parameters in the CLASSIC land surface model. The optimization is conducted over a long historical period (1701–2020), simultaneously targeting multiple variables and using multiple observational data streams, aiming to improve historical simulation performance and assess future terrestrial carbon fluxes under the SSP5-8.5 scenario. Despite these strengths, several issues may limit the scientific impact and clarity of the manuscript. My detailed comments are as follows:
  L233: The global representativeness of the randomly selected 160 grid cells should be evaluated. These cells may not capture regional differences or small-scale processes, and if the selected grids differ substantially from the target regions, the optimized parameters may not be suitable for local applications. While the 160 grids were randomly selected, it is not stated whether multiple random samplings were performed to test the stability of results. Different random seeds could lead to different optimal parameter sets.
  ANSWER: I completed the optimization for a single set of randomly selected grid cells. Whether a different selection of grid cells will lead to substantially different parameters values depends on how representative the sample size is. The sample size is based on computational limits rather than representativity. I will address this comment by conducting additional optimizations using a different selection of grid cells. Given the computational expense, I will only be able to provide few additional optimization experiments.
  REVIEWER: Using the same set of observational data for both fitness evaluation and parameter optimization lacks an independent validation set or cross-validation. This may result in good performance on the training data but poor generalization capability.
  ANSWER: The optimization is performed for 160 grid cells, while the evaluation shown in Figure 6 includes all 2,444 grid cells. Thus, only about 7% of the grid cells used in the evaluation were also included in the tuning process. Therefore, the evaluation results are largely driven by grid cells that were not part of the optimization.
  REVIEWER: L235: The computation time of two weeks is substantial, yet the manuscript does not specify the convergence criteria, number of iterations, or early stopping strategy, raising concerns about potential waste of computational resources. If the solution space is large, GA may still remain trapped in suboptimal solutions.
  ANSWER: This information is shown in Figure 4 and described in the text (L304). The figure indicates that I used 25 generations with a population size of 100 chromosomes. This corresponds to 25 x 100 = 2500 simulations for 160 grid cells. I will add this information to the text to make it more explicit.
  
  The improvement in performance decreases from generation to generation, and Figure 4 illustrates that very little gain can be expected after generation 25. One might argue that computational time could have been saved by stopping the optimization after generation 15. However, this is not evident unless additional simulations are conducted that demonstrate diminishing progress. While I am confident that the solution could be improved by adding more iterations, I believe that the cost–benefit ratio would become too large.
  
  It is possible that the solution represents a local rather than a global optimum. However, I would like to emphasize that the method I chose is less prone to being trapped in local optima due to the use of populations. Even if the result does reflect a local optimum, it is still superior to the default solution. Finally, if systematic parameter optimization is not conducted, parameter values must be hand-tuned - a cumbersome approach that is far more likely to result in a suboptimal solution.
  REVIEWER: L253–258: Are the six land surface variables (ALBS, GPP, HFLS, HFSS, LAI, LST) weighted equally in the cost function? Different variables may differ greatly in importance (e.g., GPP is more critical for the carbon cycle), but the manuscript does not explain how weights were assigned.
  ANSWER: Yes, I assign all variables equal weight. I have considered weighting them differently, but that immediately raises the question of which criteria should determine the weights. One could argue that GPP is more critical for the carbon cycle, but the carbon, energy, and water cycles are all coupled and must remain consistent. It could also be argued that larger weights should be assigned to variables with lower observational uncertainty, but such uncertainties are difficult to quantify. In my view, defining weights opens the door to very subjective discussions that I would prefer to avoid. From my perspective, all aspects of the carbon, water, and energy fluxes should be considered equally important. I will add this argument to the text.
  REVIEWER: L270–272: The robustness analysis was conducted with fewer grid cells, a shorter time period, and fewer generations. The representativeness of these reduced settings should be discussed in the manuscript.
  ANSWER: Agree, I will either raise this limitation in the discussion section, or replace this part of the analysis with the additional experiments using a different selection of grid cells, as outlined above.
  REVIEWER: L299: The finding that model performance stops improving after 25 generations may be due to GA parameter settings. This should be considered and discussed.
  ANSWER: Optimizing the optimization process is challenging given the large number of different possible combinations of selection, crossover, and mutation functions and corresponding hyperparameters. I briefly raise the issue in Line 425 and will expand on this in the revised version of the manuscript.
  REVIEWER: L315: The statement that “some variables did not improve” is made without analyzing the possible causes. This could be due to structural model errors rather than parameter settings, or uncertainties in the observational datasets. The discussion should include potential reasons and possible future improvements.
  ANSWER: Agree, I will include this in the revisions.
  REVIEWER: L338: Although the optimized simulation is slightly better than the default in some statistical metrics, the differences are described as “too minor to be considered meaningful.” The manuscript should discuss why optimizing 28 parameters results in only limited improvement in NBP, which may be related to observation errors, insufficient parameter representativeness, or model structural deficiencies.
  ANSWER: Please note that the optimization significantly improves model performance, particularly for gross primary productivity, leaf area index, and sensible heat flux. The model was not optimized for NBP, as no reliable globally gridded observational NBP data sets are available. My hope was that improving other surface variables would lead to global NBP values more consistent with global observations (i.e. globally accumulated NBP, which is reasonably well constrained). This hope was somewhat disappointed. Interestingly though, the NBP from the optimized run differs considerably from that of the default run, but the overall improvement is too minor to be meaningful. The limited improvement arises because NBP was not included in the optimization. I address how this limitation could be overcome in future studies (L419), namely by replacing the model with a much faster statistical emulator and by optimizing this emulator for global NBP. I will expand on this point in the revised manuscript.
  REVIEWER: L385: While two GA configurations were found to perform better than the default, the manuscript does not analyze their characteristics (e.g., differences in selection/crossover/mutation strategies) or why they perform better. Such analysis would help in better understanding the influence of GA settings on optimization results.
  ANSWER: I agree. I will either discuss the differences or replace this part of the analysis with the additional experiments using a different selection of grid cells, as outlined above. The analysis shown in Figure 11 is based on a much shorter optimization period and a smaller sample of grid cells. If I conduct multiple optimizations with the same settings as in the final optimization, then the results from that analysis will be directly comparable.
  REVIEWER: In the main text, some figures and tables could be moved to the supplementary materials to improve readability, such as Figures 1, 2, 7 and Tables 1, 2.
  ANSWER: I will carefully revisit what figures and tables should be in the main text.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2517-AC1
RC2:
'Comment on egusphere-2025-2517', Anonymous Referee #2, 20 Sep 2025

The paper proposes a Genetic Algorithm-based framework for optimizing parameters in the CLASSIC land surface model, using multiple global Earth observation datasets. It finds that the optimized parameters significantly improve key variables including GPP, LAI, and sensible heat fluxes. The paper is generally well-written and is suitable for publication after addressing the following comments.
Major comments
1. The author notes that multiple datasets are used per variable "to reduce the risk of overfitting" and "help account for observational uncertainty". However, it seems like the paper does not rigorously incorporate observational uncertainties into the optimization. A more rigorous treatment, or discussion on this, of observational uncertainty would strengthen the robustness of the conclusions.
2. I am particularly concerned about the generalizability of the optimized parameters, which the paper does not fully address. Since the optimization uses Earth observations from the modern climate, it remains unclear whether these parameter values will remain valid under future climate conditions, potentially limiting the robustness of the projections. A discussion of this limitation can strengthen the manuscript.
3. The author acknowledges that the optimization is evaluated only in offline mode, with prescribed CO2 and meteorological forcing, and notes that a fully coupled setup would alter NBP feedbacks. It would strengthen the paper if this limitation can be emphasized more clearly in the conclusions, with a brief discussion of how coupled feedbacks might influence the results.
Minor comments
L300: Figure 5a
L304: Figure 5b
Figure 10: caption does not mention (g) and (h)
Maybe Figures 2 and 7 can be moved to supplementary materials.

Citation: https://doi.org/10.5194/egusphere-2025-2517-RC2
- AC2: 'Reply on RC2', Christian Seiler, 29 Sep 2025
  
  I thank the reviewer for their thoughtful and constructive feedback on my manuscript. Please find my point-by-point responses below.
  REVIEWER: The paper proposes a Genetic Algorithm-based framework for optimizing parameters in the CLASSIC land surface model, using multiple global Earth observation datasets. It finds that the optimized parameters significantly improve key variables including GPP, LAI, and sensible heat fluxes. The paper is generally well-written and is suitable for publication after addressing the following comments.
  ANSWER: Thank you for your positive evaluation of the manuscript.
  Major comments
  REVIEWER: 1. The author notes that multiple datasets are used per variable "to reduce the risk of overfitting" and "help account for observational uncertainty". However, it seems like the paper does not rigorously incorporate observational uncertainties into the optimization. A more rigorous treatment, or discussion on this, of observational uncertainty would strengthen the robustness of the conclusions.
  ANSWER: There are several sources of uncertainty relevant to this study, including uncertainties in the model forcing data, model configuration, parameter ranges, grid-cell selection, optimization period, optimization algorithm, and hyperparameters. Observational uncertainty is therefore only one of many contributing factors. A particular challenge is that the uncertainty of observation-based products is often poorly documented, as highlighted in many parameter optimization studies. Moreover, there is no community-wide consensus on how best to represent observational error. To address your comment, I propose to discuss the different methods that have been used and the strengths and weaknesses of the approach adopted in my manuscript.
  REVIEWER: 2. I am particularly concerned about the generalizability of the optimized parameters, which the paper does not fully address. Since the optimization uses Earth observations from the modern climate, it remains unclear whether these parameter values will remain valid under future climate conditions, potentially limiting the robustness of the projections. A discussion of this limitation can strengthen the manuscript.
  ANSWER: I think it is important to acknowledge that whenever a new parameterization is introduced in a model, developers typically select parameter values within an uncertainty range so that the model output matches observations from the modern climate. This kind of ad hoc tuning is common practice, and your criticism applies equally to it. Replacing ad hoc tuning with a more systematic approach is not different in principle - it is simply far more effective. I suggest emphasizing this point more strongly in the discussion section.
  REVIEWER: 3. The author acknowledges that the optimization is evaluated only in offline mode, with prescribed CO2 and meteorological forcing, and notes that a fully coupled setup would alter NBP feedbacks. It would strengthen the paper if this limitation can be emphasized more clearly in the conclusions, with a brief discussion of how coupled feedbacks might influence the results.
  ANSWER: I agree and will elaborate on this in the Discussion section.
  Minor comments
  REVIEWER: L300: Figure 5a
  ANSWER: Yes, thank you. I will change 4a to 5a.
  REVIEWER: L304: Figure 5b
  ANSWER: Yes, thank you. I will change 4b to 5b.
  REVIEWER: Figure 10: caption does not mention (g) and (h)
  ANSWER: Yes, I will add the description of (g) and (h) in the caption.
  REVIEWER: Maybe Figures 2 and 7 can be moved to supplementary materials.
  ANSWER: I will carefully revisit the selection of the figures that will go into the main text.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2517-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-2517', Anonymous Referee #1, 01 Aug 2025

This study applies a machine learning-based Genetic Algorithm (GA) and multiple global Earth observation datasets to systematically optimize poorly constrained parameters in the CLASSIC land surface model. The optimization is conducted over a long historical period (1701–2020), simultaneously targeting multiple variables and using multiple observational data streams, aiming to improve historical simulation performance and assess future terrestrial carbon fluxes under the SSP5-8.5 scenario. Despite these strengths, several issues may limit the scientific impact and clarity of the manuscript. My detailed comments are as follows:
L233: The global representativeness of the randomly selected 160 grid cells should be evaluated. These cells may not capture regional differences or small-scale processes, and if the selected grids differ substantially from the target regions, the optimized parameters may not be suitable for local applications. While the 160 grids were randomly selected, it is not stated whether multiple random samplings were performed to test the stability of results. Different random seeds could lead to different optimal parameter sets.
Using the same set of observational data for both fitness evaluation and parameter optimization lacks an independent validation set or cross-validation. This may result in good performance on the training data but poor generalization capability.
L235: The computation time of two weeks is substantial, yet the manuscript does not specify the convergence criteria, number of iterations, or early stopping strategy, raising concerns about potential waste of computational resources. If the solution space is large, GA may still remain trapped in suboptimal solutions.
L253–258: Are the six land surface variables (ALBS, GPP, HFLS, HFSS, LAI, LST) weighted equally in the cost function? Different variables may differ greatly in importance (e.g., GPP is more critical for the carbon cycle), but the manuscript does not explain how weights were assigned.
L270–272: The robustness analysis was conducted with fewer grid cells, a shorter time period, and fewer generations. The representativeness of these reduced settings should be discussed in the manuscript.
L299: The finding that model performance stops improving after 25 generations may be due to GA parameter settings. This should be considered and discussed.
L315: The statement that “some variables did not improve” is made without analyzing the possible causes. This could be due to structural model errors rather than parameter settings, or uncertainties in the observational datasets. The discussion should include potential reasons and possible future improvements.
L338: Although the optimized simulation is slightly better than the default in some statistical metrics, the differences are described as “too minor to be considered meaningful.” The manuscript should discuss why optimizing 28 parameters results in only limited improvement in NBP, which may be related to observation errors, insufficient parameter representativeness, or model structural deficiencies.
L385: While two GA configurations were found to perform better than the default, the manuscript does not analyze their characteristics (e.g., differences in selection/crossover/mutation strategies) or why they perform better. Such analysis would help in better understanding the influence of GA settings on optimization results.
In the main text, some figures and tables could be moved to the supplementary materials to improve readability, such as Figures 1, 2, 7 and Tables 1, 2.

Citation: https://doi.org/10.5194/egusphere-2025-2517-RC1
- AC1: 'Reply on RC1', Christian Seiler, 20 Aug 2025
  
  I thank the reviewer for their thoughtful and constructive feedback on my manuscript. Please find my point-by-point responses below.
  REVIEWER: This study applies a machine learning-based Genetic Algorithm (GA) and multiple global Earth observation datasets to systematically optimize poorly constrained parameters in the CLASSIC land surface model. The optimization is conducted over a long historical period (1701–2020), simultaneously targeting multiple variables and using multiple observational data streams, aiming to improve historical simulation performance and assess future terrestrial carbon fluxes under the SSP5-8.5 scenario. Despite these strengths, several issues may limit the scientific impact and clarity of the manuscript. My detailed comments are as follows:
  L233: The global representativeness of the randomly selected 160 grid cells should be evaluated. These cells may not capture regional differences or small-scale processes, and if the selected grids differ substantially from the target regions, the optimized parameters may not be suitable for local applications. While the 160 grids were randomly selected, it is not stated whether multiple random samplings were performed to test the stability of results. Different random seeds could lead to different optimal parameter sets.
  ANSWER: I completed the optimization for a single set of randomly selected grid cells. Whether a different selection of grid cells will lead to substantially different parameters values depends on how representative the sample size is. The sample size is based on computational limits rather than representativity. I will address this comment by conducting additional optimizations using a different selection of grid cells. Given the computational expense, I will only be able to provide few additional optimization experiments.
  REVIEWER: Using the same set of observational data for both fitness evaluation and parameter optimization lacks an independent validation set or cross-validation. This may result in good performance on the training data but poor generalization capability.
  ANSWER: The optimization is performed for 160 grid cells, while the evaluation shown in Figure 6 includes all 2,444 grid cells. Thus, only about 7% of the grid cells used in the evaluation were also included in the tuning process. Therefore, the evaluation results are largely driven by grid cells that were not part of the optimization.
  REVIEWER: L235: The computation time of two weeks is substantial, yet the manuscript does not specify the convergence criteria, number of iterations, or early stopping strategy, raising concerns about potential waste of computational resources. If the solution space is large, GA may still remain trapped in suboptimal solutions.
  ANSWER: This information is shown in Figure 4 and described in the text (L304). The figure indicates that I used 25 generations with a population size of 100 chromosomes. This corresponds to 25 x 100 = 2500 simulations for 160 grid cells. I will add this information to the text to make it more explicit.
  
  The improvement in performance decreases from generation to generation, and Figure 4 illustrates that very little gain can be expected after generation 25. One might argue that computational time could have been saved by stopping the optimization after generation 15. However, this is not evident unless additional simulations are conducted that demonstrate diminishing progress. While I am confident that the solution could be improved by adding more iterations, I believe that the cost–benefit ratio would become too large.
  
  It is possible that the solution represents a local rather than a global optimum. However, I would like to emphasize that the method I chose is less prone to being trapped in local optima due to the use of populations. Even if the result does reflect a local optimum, it is still superior to the default solution. Finally, if systematic parameter optimization is not conducted, parameter values must be hand-tuned - a cumbersome approach that is far more likely to result in a suboptimal solution.
  REVIEWER: L253–258: Are the six land surface variables (ALBS, GPP, HFLS, HFSS, LAI, LST) weighted equally in the cost function? Different variables may differ greatly in importance (e.g., GPP is more critical for the carbon cycle), but the manuscript does not explain how weights were assigned.
  ANSWER: Yes, I assign all variables equal weight. I have considered weighting them differently, but that immediately raises the question of which criteria should determine the weights. One could argue that GPP is more critical for the carbon cycle, but the carbon, energy, and water cycles are all coupled and must remain consistent. It could also be argued that larger weights should be assigned to variables with lower observational uncertainty, but such uncertainties are difficult to quantify. In my view, defining weights opens the door to very subjective discussions that I would prefer to avoid. From my perspective, all aspects of the carbon, water, and energy fluxes should be considered equally important. I will add this argument to the text.
  REVIEWER: L270–272: The robustness analysis was conducted with fewer grid cells, a shorter time period, and fewer generations. The representativeness of these reduced settings should be discussed in the manuscript.
  ANSWER: Agree, I will either raise this limitation in the discussion section, or replace this part of the analysis with the additional experiments using a different selection of grid cells, as outlined above.
  REVIEWER: L299: The finding that model performance stops improving after 25 generations may be due to GA parameter settings. This should be considered and discussed.
  ANSWER: Optimizing the optimization process is challenging given the large number of different possible combinations of selection, crossover, and mutation functions and corresponding hyperparameters. I briefly raise the issue in Line 425 and will expand on this in the revised version of the manuscript.
  REVIEWER: L315: The statement that “some variables did not improve” is made without analyzing the possible causes. This could be due to structural model errors rather than parameter settings, or uncertainties in the observational datasets. The discussion should include potential reasons and possible future improvements.
  ANSWER: Agree, I will include this in the revisions.
  REVIEWER: L338: Although the optimized simulation is slightly better than the default in some statistical metrics, the differences are described as “too minor to be considered meaningful.” The manuscript should discuss why optimizing 28 parameters results in only limited improvement in NBP, which may be related to observation errors, insufficient parameter representativeness, or model structural deficiencies.
  ANSWER: Please note that the optimization significantly improves model performance, particularly for gross primary productivity, leaf area index, and sensible heat flux. The model was not optimized for NBP, as no reliable globally gridded observational NBP data sets are available. My hope was that improving other surface variables would lead to global NBP values more consistent with global observations (i.e. globally accumulated NBP, which is reasonably well constrained). This hope was somewhat disappointed. Interestingly though, the NBP from the optimized run differs considerably from that of the default run, but the overall improvement is too minor to be meaningful. The limited improvement arises because NBP was not included in the optimization. I address how this limitation could be overcome in future studies (L419), namely by replacing the model with a much faster statistical emulator and by optimizing this emulator for global NBP. I will expand on this point in the revised manuscript.
  REVIEWER: L385: While two GA configurations were found to perform better than the default, the manuscript does not analyze their characteristics (e.g., differences in selection/crossover/mutation strategies) or why they perform better. Such analysis would help in better understanding the influence of GA settings on optimization results.
  ANSWER: I agree. I will either discuss the differences or replace this part of the analysis with the additional experiments using a different selection of grid cells, as outlined above. The analysis shown in Figure 11 is based on a much shorter optimization period and a smaller sample of grid cells. If I conduct multiple optimizations with the same settings as in the final optimization, then the results from that analysis will be directly comparable.
  REVIEWER: In the main text, some figures and tables could be moved to the supplementary materials to improve readability, such as Figures 1, 2, 7 and Tables 1, 2.
  ANSWER: I will carefully revisit what figures and tables should be in the main text.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2517-AC1
RC2:
'Comment on egusphere-2025-2517', Anonymous Referee #2, 20 Sep 2025

The paper proposes a Genetic Algorithm-based framework for optimizing parameters in the CLASSIC land surface model, using multiple global Earth observation datasets. It finds that the optimized parameters significantly improve key variables including GPP, LAI, and sensible heat fluxes. The paper is generally well-written and is suitable for publication after addressing the following comments.
Major comments
1. The author notes that multiple datasets are used per variable "to reduce the risk of overfitting" and "help account for observational uncertainty". However, it seems like the paper does not rigorously incorporate observational uncertainties into the optimization. A more rigorous treatment, or discussion on this, of observational uncertainty would strengthen the robustness of the conclusions.
2. I am particularly concerned about the generalizability of the optimized parameters, which the paper does not fully address. Since the optimization uses Earth observations from the modern climate, it remains unclear whether these parameter values will remain valid under future climate conditions, potentially limiting the robustness of the projections. A discussion of this limitation can strengthen the manuscript.
3. The author acknowledges that the optimization is evaluated only in offline mode, with prescribed CO2 and meteorological forcing, and notes that a fully coupled setup would alter NBP feedbacks. It would strengthen the paper if this limitation can be emphasized more clearly in the conclusions, with a brief discussion of how coupled feedbacks might influence the results.
Minor comments
L300: Figure 5a
L304: Figure 5b
Figure 10: caption does not mention (g) and (h)
Maybe Figures 2 and 7 can be moved to supplementary materials.

Citation: https://doi.org/10.5194/egusphere-2025-2517-RC2
- AC2: 'Reply on RC2', Christian Seiler, 29 Sep 2025
  
  I thank the reviewer for their thoughtful and constructive feedback on my manuscript. Please find my point-by-point responses below.
  REVIEWER: The paper proposes a Genetic Algorithm-based framework for optimizing parameters in the CLASSIC land surface model, using multiple global Earth observation datasets. It finds that the optimized parameters significantly improve key variables including GPP, LAI, and sensible heat fluxes. The paper is generally well-written and is suitable for publication after addressing the following comments.
  ANSWER: Thank you for your positive evaluation of the manuscript.
  Major comments
  REVIEWER: 1. The author notes that multiple datasets are used per variable "to reduce the risk of overfitting" and "help account for observational uncertainty". However, it seems like the paper does not rigorously incorporate observational uncertainties into the optimization. A more rigorous treatment, or discussion on this, of observational uncertainty would strengthen the robustness of the conclusions.
  ANSWER: There are several sources of uncertainty relevant to this study, including uncertainties in the model forcing data, model configuration, parameter ranges, grid-cell selection, optimization period, optimization algorithm, and hyperparameters. Observational uncertainty is therefore only one of many contributing factors. A particular challenge is that the uncertainty of observation-based products is often poorly documented, as highlighted in many parameter optimization studies. Moreover, there is no community-wide consensus on how best to represent observational error. To address your comment, I propose to discuss the different methods that have been used and the strengths and weaknesses of the approach adopted in my manuscript.
  REVIEWER: 2. I am particularly concerned about the generalizability of the optimized parameters, which the paper does not fully address. Since the optimization uses Earth observations from the modern climate, it remains unclear whether these parameter values will remain valid under future climate conditions, potentially limiting the robustness of the projections. A discussion of this limitation can strengthen the manuscript.
  ANSWER: I think it is important to acknowledge that whenever a new parameterization is introduced in a model, developers typically select parameter values within an uncertainty range so that the model output matches observations from the modern climate. This kind of ad hoc tuning is common practice, and your criticism applies equally to it. Replacing ad hoc tuning with a more systematic approach is not different in principle - it is simply far more effective. I suggest emphasizing this point more strongly in the discussion section.
  REVIEWER: 3. The author acknowledges that the optimization is evaluated only in offline mode, with prescribed CO2 and meteorological forcing, and notes that a fully coupled setup would alter NBP feedbacks. It would strengthen the paper if this limitation can be emphasized more clearly in the conclusions, with a brief discussion of how coupled feedbacks might influence the results.
  ANSWER: I agree and will elaborate on this in the Discussion section.
  Minor comments
  REVIEWER: L300: Figure 5a
  ANSWER: Yes, thank you. I will change 4a to 5a.
  REVIEWER: L304: Figure 5b
  ANSWER: Yes, thank you. I will change 4b to 5b.
  REVIEWER: Figure 10: caption does not mention (g) and (h)
  ANSWER: Yes, I will add the description of (g) and (h) in the caption.
  REVIEWER: Maybe Figures 2 and 7 can be moved to supplementary materials.
  ANSWER: I will carefully revisit the selection of the figures that will go into the main text.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2517-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Reconsider after major revisions (13 Oct 2025) by Anping Chen

AR by Christian Seiler on behalf of the Authors (24 Jan 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (13 Feb 2026) by Anping Chen

RR by Anonymous Referee #1 (26 Feb 2026)

RR by Anonymous Referee #2 (04 Apr 2026)

ED: Publish subject to minor revisions (review by editor) (10 Apr 2026) by Anping Chen

AR by Christian Seiler on behalf of the Authors (13 Apr 2026) Author's response Author's tracked changes Manuscript

ED: Publish as is (15 May 2026) by Anping Chen

AR by Christian Seiler on behalf of the Authors (15 May 2026)

Journal article(s) based on this preprint

01 Jun 2026

Improving terrestrial carbon flux simulations with machine learning and global Earth observations

Christian Seiler

Earth Syst. Dynam., 17, 651–671, https://doi.org/10.5194/esd-17-651-2026,https://doi.org/10.5194/esd-17-651-2026, 2026

Short summary

Christian Seiler

Viewed

Total article views: 6,325 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
5,084	1,048	193	6,325	146	244

HTML: 5,084
PDF: 1,048
XML: 193
Total: 6,325
BibTeX: 146
EndNote: 244

Views and downloads (calculated since 16 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	410	85	25	520
Jul 2025	265	24	25	314
Aug 2025	609	70	20	699
Sep 2025	2,487	48	24	2,559
Oct 2025	230	85	15	330
Nov 2025	196	60	10	266
Dec 2025	174	30	0	204
Jan 2026	190	263	40	493
Feb 2026	153	167	18	338
Mar 2026	231	136	7	374
Apr 2026	76	32	4	112
May 2026	37	39	3	79
Jun 2026	20	4	1	25
Jul 2026	6	5	1	12

Cumulative views and downloads (calculated since 16 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	410	85	25	520
Jul 2025	265	24	25	314
Aug 2025	609	70	20	699
Sep 2025	2,487	48	24	2,559
Oct 2025	230	85	15	330
Nov 2025	196	60	10	266
Dec 2025	174	30	0	204
Jan 2026	190	263	40	493
Feb 2026	153	167	18	338
Mar 2026	231	136	7	374
Apr 2026	76	32	4	112
May 2026	37	39	3	79
Jun 2026	20	4	1	25
Jul 2026	6	5	1	12

Viewed (geographical distribution)

Total article views: 6,285 (including HTML, PDF, and XML) Thereof 6,285 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 20 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (5413 KB)
Metadata XML

Short summary

This study demonstrates how machine learning and global Earth observations can enhance simulations of the land carbon cycle. Optimizing key model parameters improves the accuracy of historical carbon fluxes and has a substantial impact on future projections. Results suggest that future carbon uptake may be weaker than previously estimated, underscoring the importance of improved parameter optimization in climate models

Improving Terrestrial Carbon Flux Simulations With Machine Learning and Global Earth Observations

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)


Total:	0
HTML:	0
PDF:	0
XML:	0