the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
H2CM (v1.0): hybrid modeling of global water–carbon cycles constrained by atmospheric and land observations
Abstract. We present the Hybrid Hydrological Carbon Cycle Model (H2CM)—a global model that couples the terrestrial water and carbon cycles by integrating a process-informed deep learning approach with observational constraints for the water and carbon cycles. H2CM extends the hybrid hydrological model with vegetation (H2MV) to represent key terrestrial carbon fluxes, including gross primary productivity (GPP), autotrophic and heterotrophic respiration at daily resolution and 1-degree spatial scale. H2CM uses neural networks to learn and predict ecosystem properties governing water and carbon fluxes, such as carbon and water use efficiencies and basal respiration rate. H2CM uniquely combines multiple observational constraints synergistically: on top of hydrological and vegetation data constraints on terrestrial water storage variations, snow water equivalent, evapotranspiration, runoff and fraction of photosynthetically active radiation, the carbon cycle is informed by an observation-based GPP product, and net ecosystem exchange (NEE) from satellite and in-situ based atmospheric CO2 inversion datasets. H2CM reproduces the seasonal and interannual dynamics of carbon fluxes well. H2CM outperforms both purely data-driven models as well as state-of-the-art process-based model ensembles in capturing NEE seasonality, especially in challenging regions such as the South American tropics and Southern Africa. Moreover, H2CM reveals emergent spatial patterns in precipitation use efficiency, light use efficiency, and water-carbon coupling, consistent with empirical ecological understanding. Notably, we show that H2CM learns to represent the rain pulse effect on respiration in dry regions, which is often not well reproduced by global models. H2CM represents a key step toward a new generation of hybrid land surface models, with planned extensions to include the energy cycle.
- Preprint
(7608 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on egusphere-2025-3123 - No compliance with the policy of the journal', Juan Antonio Añel, 28 Jul 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlFirst, you have not shared the input data used in your work, both for simulations and comparisons. It is necessary to you share such data to ensure the replicability of your work.
Also, you have not shared the full output of your simulations, but aggregated monthly data. You must share the full daily data resulting from your simulations.
Therefore, please, publish the mentioned data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. Also, please, remember including a modified 'Code and Data Availability' section in any potentially reviewed manuscript, containing the information of the new repositories.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-3123-CEC1 -
AC1: 'Reply on CEC1', Zavud Baghirov, 30 Jul 2025
Dear Juan A. Añel,
Thank you very much for pointing this out.
Please find below the links to the relevant datasets:
-
H2CM – Model inputs and targets (e.g., constraints):
https://doi.org/10.5281/zenodo.16575309 -
H2CM – Daily simulations (carbon and water cycle parameters):
https://doi.org/10.5281/zenodo.16572166
We will ensure that this data is properly referenced in the Code and Data Availability section of the potentially revised version.
Best regards,
Zavud BaghirovCitation: https://doi.org/10.5194/egusphere-2025-3123-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 31 Jul 2025
Dear authors,
Many thanks for addressing this issue so quickly. We can consider now the current version of your manuscript in compliance with the policy of the journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-3123-CEC2
-
-
AC1: 'Reply on CEC1', Zavud Baghirov, 30 Jul 2025
-
RC1: 'Comment on egusphere-2025-3123', Anonymous Referee #1, 08 Aug 2025
Review of H2CM (v1.0): hybrid modeling of global water-carbon cycles constrained by atmospheric and land observations
The authors present a new global hybrid model (H2CM) that couples terrestrial water and carbon cycles by blending physically based equations with neural network components, constrained by multiple observational data streams. The study is timely and potentially significant, given growing interest in machine learning augmented Earth system models. The authors clearly describe the model design, data constraints, and evaluation. The integration of a hybrid hydrological model (H2MV) with a conceptual carbon cycle model is new. The results demonstrate strong performance, notably in capturing seasonal carbon flux patterns that some process models miss. I find the work scientifically interesting and largely well executed. However, several clarifications and improvements are recommended:
- Scientific significance and novelty
The authors propose the first global hybrid model explicitly coupling water and carbon cycles with ML-guided parameters. This addresses a recognized gap, the integration of observational constraints on both hydrology and carbon is novel. The model’s ability to reveal patterns (e.g., precipitation-use efficiency, water-use efficiency) demonstrate added value beyond traditional models. The work thus represents a significant advance toward next-generation hybrid land-surface models. I suggest that the authors should highlight more explicitly how H2CM differs from and advances prior approaches. Similarly, highlight that hybrid modeling is still “young and evolving” (l.47) and that most previous work was at the proof-of-concept stage, underscoring H2CM’s novelty. If there are any other related models (even sub-global studies), a brief comparison would strengthen the novelty claim.
- Methodology and model design
The model architecture is generally well described. H2CM extends H2MV hydrology by adding a carbon cycle (Eqs. 1-4). Transpiration is computed from FAPAR, potential ET, and a parameter alpha_T (Eq. 1). GPP is linked to transpiration via a NN learned WUE and a CO2-fertilization term beta (Eq. 2). NPP uses a NN learned CUE (Eq. 3), and heterotrophic respiration (Rh) follows a Q10 function (Eq. 4) with a NN learned basal respiration rate Rb. The modeling choices are physically plausible, and the coupling (via WUE linking T and GPP) is reasonable. Table 2 clarifies how each neural network is guided by selecting meaningful inputs (e.g., WUE depends on soil moisture, VPD, radiation). This guided-NN strategy improves interpretability.
- Is there one NN for each output variable? (ll.60) Why was it better to use several models? Do you performed experiments using one model for several outputs with various inputs? Please be more detailed here and explain why your approach was best.
- You name your target variables for the ML tasks model constraints – but indeed there are no constraints in the model. Your constraints are predicted target variables by a ML algorithm and controlled by the performance of the ML model. What about out-of-sample inputs? They are not constrained and depend on the generality of your model.
- The Greek variables (e.g., alpha_T) were trained by NNs – but it is not clear how you trained these parameters. Which target variable was used? These parameters seem to be hidden variables in the NNs, no target variables. Please be more precise about your ML architecture and a detailed ML model description. How is alpha_T integrated in your NN?
- Please also clarify the CO2 dependency using beta in Eq. (2) so readers can understand how fertilization enters the model.
- Also, how was the WUE learned in the model? On which spatial and temporal resolution are these parameters learned? I feel like having not enough information to fully understand your underlying ML architecture.
- 145ff.: Are you using time series or only single time steps as input for your LSTM? I assume you were using time steps as the latter would not make sense. But it is not clearly described and misunderstanding in your description.
- I understand that you used a simplified overview of your model architecture. But there is still missing more detailed information about the network architecture of you NN components. It would be helpful to have another figure especially for these components as well – as your model relies on the hybrid approach. How many layers, number neurons, training epochs, learning rate, any dropout or weight decay were used? How are the different NNs connected?
- You also mention a FCNN for data compression: What kind of architecture was used here? Was it an unsupervised approach?
- I do not see any hyperparameter tuning in the manuscript. How were model hyperparameters chosen and/or validated?
- In Tab. 3 WUE and CUE are defined as ratios – but in Tab. 2 these variables are defined as functions depending on multiple variables, trained by a NN. Please be more precise here on how the definitions are meant for you approach.
- The results are well presented. I am missing a short paragraph on the evaluation of the several trained NNs, for example on the performance of the WUE, CUE, etc. prediction alone. To increase confidence in the performance of H2CM, a brief description of the performance of the sub-variables would be helpful.
- The NNs are trained by MSE Loss (Eq. 5) averaged equally over all data constraints. This implies that all constraints (TWS, SWE, ET, runoff, FAPAR, GPP, NEE, etc.) are treated with the same priority, regardless of their units or uncertainties. The authors should comment on this. might some constraints dominate the loss? Have the authors normalized each variable or adjusted for data uncertainty? Some acknowledgment of observational errors (and how they might affect the loss weighting) is appropriate.
- The 10-fold CV is spatial only. Thus, it is not clear how well the model would predict an unseen year (e.g., a future year). I encourage the authors to comment on this limitation. If possible, as a future step, holding out later years for test could provide insight into model stability under changing climate.
- To you considered to use e.g., Physical Informed Neural Networks (PINNs) instead of simple FCNN to better control and constrain the physical processes behind?
Overall, the methodology is sound and described in good detail. Small clarifications and additional details (especially on the neural-network implementation) would improve reader understanding and reproducibility.
The authors treat global parameters e.g., beta as learnable. Section 3.1.5 shows the learned Q10 is about 1.24, which is lower than typical literature values (1.4–2). Similarly, the learned beeta values greatly exceed observational estimates. The authors rightly note this discrepancy and attribute it to equifinality and insufficient constraints. Please briefly discuss the implications: e.g., a high beta means the model might overestimate CO2 sensitivity if used for future scenarios. Emphasize that these global parameters are effectively unconstrained by data and could be fixed based on independent knowledge.
- Model evaluation and benchmarking
- Correlation and RMSE are mentioned, but it would help to provide bias or error values in the text or supplementary tables. E.g., “small RMSE for NEE IAV” (l.236), but exact numbers or global bias would be useful. A table summarizing global or zonal RMSE and bias for GPP, NEE, etc., in comparison to benchmarks would complement the discussion.
- Reproducibility and transparency
It would be helpful to have additional documentation (README, installation instructions) and example scripts/notebooks to run the model. Now, all daily outputs are shared.
- Interpretation and discussion of results
The authors could strengthen the interpretation by commenting on potential future applications. E.g., since the model currently lacks an energy cycle (mentioned as future work), are there plans to incorporate dynamic vegetation or disturbances (aside from fire emissions)?
Minor stuff
- 104: Please write Transpiration T to introduce the variable.
- Figure 4: Too small and the solid black background confuses. I suggest to make clean figures on white background. The title is also too small and does not fit to the explanations given in the caption. Please double check that your presented data fit to the presented titles in the figure.
- In Tab. 1 the meteorological forcing data are described. Please briefly explain why you decided for this mixture of data sources.
- 100: You use the Greek letter for globally constant parameters. Does it include spatially and temporally constant?
- As the various datasets span different periods, the manuscript should explicitly state the time period used for training/evaluation. Ensure it is clear how these are aligned.
- The authors may note that dynamic vegetation changes are not included due to static land use input, though FAPAR input does implicitly capture some phenological variability.
- Median and range of Q10 across folds is mentioned. It may be useful to similarly report the spread of prediction metrics across the 10 CV models. This would indicate robustness.
- the conclusion asserts that H2CM “accurately reproduces the monthly patterns” and “global patterns” of GPP and NEE. While this is supported by the results, it may sound slightly overconfident given some know biases. Perhaps soften to “reproduces major features of the seasonal and spatial patterns…”
- Overall, the writing is professional and detailed, with only minor edits needed for polish.
Recommendation
I recommend major revisions before acceptance based on the recommendations above. The reported revisions will strengthen the papers clarity and reproducibility but do not undermine the core findings.
Citation: https://doi.org/10.5194/egusphere-2025-3123-RC1 -
RC2: 'Comment on egusphere-2025-3123', Anonymous Referee #2, 11 Aug 2025
The manuscript "H2CM (v1.0): hybrid modelling of global water-carbon cycles..." by Baghirov et al. addresses a relevant and timely topic: the hybrid modelling of the land surface and terrestrial biosphere. It reports on the architecture, training and evaluation of a hybrid prototype. In principle, I consider the paper suitable for the journal.
I also have a substantial number of general and specific questions however that the current version leaves open. In my opinion, the manuscript would be much clearer and more useful if they are addressed in the general framing and writing.
General points
I find it hard to understand to what extent this model can actually be considered "hybrid". I hardly see any process-based components in the model description. There are equations 1-4, but they are highly simplistic and high-level multiplicative relationships, far simpler than the complexity of the machine learning components, or typical components of process-based land surface models.
Moreover, the model supposedly captures the "water cycle" and "carbon cycle". Besides the fact that cycles would include atmosphere and ocean (otherwise the cycle is not closed), the model does not seem to simulate any carbon pools - only fluxes. If this model is supposed to be a step toward hybrid land surface modelling (that’s how I understand the framing and motivation), what should be the approach to model differential equations where state variables have memory? How would one implement a similar model into an Earth system model, and what conclusions do the authors draw from their results to this end? What is it in the results that allows conclusions about the best approaches to such hybrid modelling?
It is also unclear to me how soil moisture is modelled. There is reference to another recent study on what is called H2MV (Baghirov et al., 2025). I had a look there, but it seems to follow a similar approach in the sense that the model’s mechanistic complexity and structure is rather simple, while model results seem to be mainly determined by the machine learning components.
Achieving a good match with observations with such a model is of course beneficial, but I wonder how well the model is able to extrapolate to different climates. For example, will it generate realistic trends when forced with data from the historical period over several decades, including the global warming trend? If not, why do we need a hybrid approach at all? To what extent do the process-based parts in the model contribute to the good performance? What makes H2CM better than Fluxcom-X-base in some cases – is it really the process-based part or is it a better machine learning approach or data? And whatever the answer is: Can the authors show this somehow? They say that a hybrid model is not a "black box" like ML models, so this may be possible? If the performance overall is largely determined by the data-driven parts (including the way different neural networks are combined), I wonder whether the framing of "hybrid modelling" is really helpful, in contrast to pure data-driven modelling with a specific architecture.
Regarding the general architecture of the model, Fig. 2 is helpful, but it is difficult for me to understand how the model is actually trained. The neural networks seem to generate inputs to what the authors call the "process-based water-carbon cycle model", which then generates observable variables. When the loss function is minimised during training, in what way is the process-based component used? Does it not need to backpropagate information somehow in order to feed back to the neural networks and let them learn? Also, how do the authors use information on observational errors, specifically where different datasets on the same quantity (the two atmospheric co2 inversions) are used at the same time?
All observational datasets seem to always be used at the same time to train the model? Some parameters seem to be overconstrained. Which training data is actually important? How are physical constraints regarded, e.g. the conservation of mass? And why do the authors only train on a subset of grid cells but not time points?
Lastly, Section 3 in general shows seveal metrics, variables and regions, and evaluates H2CM. The choices of what to show here felt somewhat arbitrary to me, for example Sect. 3.3 and also Fig. 5. Why pick these examples? What is the key message that these results support? It would help if the authors presented clear arguments and criteria, and connected the results in an argumentative way.
More detailed points
- The authors say that H2CM is a "global" model. What does this mean? As far as I see, it is a local (grid cell specific) model without any spatial interactions, hence the domain and grid are arbitrary.
- Use of vocabulary: Note that the term grid refers to the spatial structuring of all grid cells. A grid cell refers to one spatial point. The authors often use "grid" even where they actually mean grid cell.
- There are some typos; I suggest the authors read carefully before the next submission. Example: line 65-66: "the the", "objectives" (omit s), "withhold" (withheld), line 152 "compress"(es), line 263: "in in", Fig. B8 caption: "Runoff" should be lower case.
- Table 1: shortwave and longwave radiation seem to not be distinguished. But in practice, this will matter much for GPP and other fluxes. What is the underlying assumption here? Also, what is "short-term" versus "long-term" in the last two lines of the table? It could make sense to add a column showing the time period available for each dataset.
- line 105 (Eq. 1): How is ETpot computed?
- line 114-117, incl. Eq. 2: beta is supposed to capture the CO2 fertilisation effect, but it is just a constant, independent of CO2. The fertilisation effect is captured already by the linear dependence of GPP on CO2. What does this linear dependence imply when using the model for a transient situation with strongly increasing CO2? When considering all factors of Eq. 2, does the model generate a similar relationship as e.g. typical DGVMs?
- line 140: make clearer what you mean with labels "dynamic (recurrent)" and "static (fully connected)". Even though it may not be possible to draw the true architecture in Fig. 2, it would help to show different (idealised) icons for the NNs where these NNs have different architecture. If the figure becomes too busy: I don’t think one actually needs to show global maps for all variables (which are too small to see results anyway). This figure is about the structure not the actual data values.
- line 184: I did not understand what the authors mean with "blocks". Are blocks the samples of 5x5 connected grid cells that are selected for training?
- line 187-189: I is not really clear to me why validation on left-out time periods should not be possible.
- line 191 and elsewhere. The authors cite Baghirov et al., 2025, but four references like that are listed in the reference list.
- line 196-197: Parameters theta and beta are adjusted – but how (see above)? How does training work involving the "process-based" model (whatever that is, also see above)?
- line 199-201: What does it mean that the loss function is applied for each data constraint?! Isn’t there one loss function where all different variables contribute? Or several loss terms? Then how to decide how important each loss is? Additionally, I don’t understand why the Carboscope dataset is treated differently from all others.
- line 204: perhaps briefly mention what a z-transformation is.
- line 207: What is a "CV fold"?
- line 208-209: If all input is z-transformed, that means that all means are zero and standard deviation is 1? How then can the model be calibrated to respond to the correct mean values? For instance, how would the model respond to input temperature data that is 2°C higher than observed? This question also relates to the generalisability question above, and the question how the model responds to climate trends.
- line 223 (Eq. 7): This seems to be monthly anomalies. I would then not call that "Interannual variability"! And: If IAV is actually monthly variability, what is then the "monthly" values shown in Fig. 3? What is the difference? Is "monthly" the absolute data including seasonality, and "IAV" are the monthly anomalies?
- 3: (i) Why does the monthly data have much larger error than the monthly anomalies (IAV), whereas the other metrics look very good? (ii) Please make vertical axis ranges identical where possible. (iii) There is a lot of empty space in the figure, e.g. between bars. (iv) I don’t understand the difference between the columns. The training data is always the same, and the authors evaluate different variables? Why then two columns for NEE? Does the training data differ? (v) What determines the range covered by the boxes? Maximum and minimum error from what distribution?
- 4: (i) The grey colour makes it too hard to see the text. (ii) titles per panel or column would help. (iii) absolute GPP values are hard to compare between columns, perhaps add difference plots. (iv) What is meant by "members" in each case? Members from the 10 subsamples of grid cells when training H2CM? And in case of TRENDY are members the individual models? Does the map then show the median from all models at each grid cell, i.e. each grid cell comes from a different vegetation model?
- 5: (i) too grey (see above). (ii) "emerging global patterns" in what data? The trained model I guess? (iii) What are "folds"? Is this figure meant to show how realistic H2CM output is? Then we would need to see observations as comparison. Or is this result meant to offer new insights into land-atmosphere physics? Then this should be a clearer part of the framing in the abstract, introduction and conclusions.
- Sect 3.2: What is it that makes H2CM better than Fluxcom? Can the authors demonstrate this? What are the implications for hybrid land modelling in general?
- line 407: "the information is available" – which information?
- line 408: "the model’s process formulations permit it" – which process formulations? Is there evidence that they restrict the results in some way?
- line 411-414: How is the spread of results evidence for equifinality? Doesn’t equifinality mean the opposite, i.e. that different parameters (different models) lead to the same result?
- A1: (i) What is k1, k2...? Why "k"? Are these the samples used for training different versions of the model? Every block here is 5x5 grid cells? (ii) The testing set seems to be 1/11th of the data, i.e. not 10%. (iii) And what about the evaluation set mentioned in the text which should be another ~10%? (iv) What is a "fold"?
- B1-B8. (i) MSC is the time mean over the entire period? (ii) Do time series show a spatial average? Over what region? (iii) Why not the same period in all figures? Due to limited training data? (iv) Which of these time series are actually from the identical model and should be physically consistent? Maybe these can be put into one figure with several panels. (v) What is TWS in Fig. B5? Total water storage? Why does it differ so much from GRACE? Because of the low resolution of GRACE? (vi) In Fig. B6, SWE is snow water equivalent?
- B9: again, remove white space to condense figure. "Water cycle constraints" – sounds like different constraints are used here compared to the other applications? Which data was used here? This should become clearer.
- Appendix C: Why is there a "prior" and a "posterior" parameter which sounds like Bayesian statistics language? The method described in the appendix rather seems to nudge a parameter toward a specific target value, instead of calibrating it after starting from an initial value.
- Some more info on the parameter calibration method would help.
- line 466-468: The fact that the posterior equals the prior could also imply that the nudging (loss term) is just too strong? Why is it evidence for an underdetermined problem? Would it make sense to put a factor 0<f<1 in the definition of the loss term? Then the final parameters could be different?
- D1: What are transcom regions?
Citation: https://doi.org/10.5194/egusphere-2025-3123-RC2
Data sets
H2CM - model simulations Zavud Baghirov https://doi.org/10.5281/zenodo.15785260
Model code and software
H2CM - model code Zavud Baghirov https://doi.org/10.5281/zenodo.15784689
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
920 | 91 | 15 | 1,026 | 13 | 14 |
- HTML: 920
- PDF: 91
- XML: 15
- Total: 1,026
- BibTeX: 13
- EndNote: 14
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1