the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Hybrid Lake Model (HyLake) v1.0: unifying deep learning and physical principles for simulating lake-atmosphere interactions
Abstract. Lake surface temperature (LST) serves as a crucial indicator of climate change in Earth systems. However, the challenge of improving LST and heat fluxes predictions remains due to the simplified physical principles inherent in traditional process-based models and the "black-box" structure of purely data-driven models. Accurate lake-atmosphere interaction modeling, which is essential for predicting LST and associated changes in latent heat (LE) and sensible heat (HE) fluxes, has yet to fully benefit from the integration of process-based and deep learning-based models. This study proposed Hybrid Lake Model v1.0 (HyLake v1.0), which integrates a Bayesian Optimized Bidirectional Long Short-Term Memory-based (BO-BLSTM-based) surrogate trained from Meiliangwan (MLW) site in Lake Taihu to approximate LST changes with surface energy balance equations. The performance of HyLake v1.0 was intercompared with FLake and hybrid lake models with different surrogates. Results demonstrated that HyLake v1.0 outperformed the others, with a R and RMSE of 0.99 and 1.08 °C in LST, a R and RMSE of 0.94 and 24.65 W/m2 in LE and a R and RMSE of 0.93 and 7.15 W/m2 in HE. To assess model generalization and transferability in ungauged lake sites, HyLake v1.0 exhibited superior performance, with a MAE of 0.85 °C, 21.56 W/m2 and 6.63 W/m2 in LST, LE and HE respectively, across all lake sites compared to FLake. Under ERA5 reanalysis datasets, HyLake v1.0 performed better for 14 of 15 variables (including LST, LE, and HE across 5 lake sites), with a MAE of 0.90 °C, 35.02 W/m2 and 7.97 W/m2 in LST, LE and HE respectively, indicating strong generalization and transferability. The results supported HyLake v1.0 exhibited an excellent capacity in estimating lake-atmosphere interactions for untrained lake sites, indicating a reasonable performance for extending the application in other ungauged lakes. Furthermore, the proposed model shows promising potential for predicting lake-atmosphere interactions, laying the solid basis for future improvements.
- Preprint
(3994 KB) - Metadata XML
-
Supplement
(2444 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-1983', Anonymous Referee #1, 31 May 2025
The manuscript presents HyLake v1.0, a hard-coupled hybrid lake model in which an LSTM surrogate replaces the implicit-Euler surface-temperature solver embedded within an in-house one-dimensional physical backbone. The surrogate is trained at the MLW site on Lake Taihu and then applied to five other sites that differ in both biological characteristics and meteorological forcing. Although the hybrid framework outperforms several process-based and deep-learning-based benchmarks, its validation strategy and treatment of uncertainty require further refinement. Overall, the paper is clearly written and could be suitable for publication after moderate revision.
Major comments
- To address the model’s generality, the authors should apply HyLake to at least one morphologically distinct lake or extend the simulation period to include additional years.
- The study employs Bayesian optimization to optimize network depth, width, optimizer, and learning rate, but ignore the critical information. Please provide the search ranges of hypermeters, objective function, stopping criterion, and computational cost.
- HyLake performed well at MLW, PTS, and XLS, whereas TaihuScene outperforms it at BFG and DPK. Discuss possible causes and advise when multi-site versus single-site training is preferable.
- The discussion regarding computational efficiency of HyLake is inadequate. Provide a detailed table comparing training time and wall-clock simulation speed for HyLake, FLake, and any other relevant models.
Minor comments
- Line 164: Define the acronym LWT on first use.
- Figure 3: Text are crowded in d-f. Consider summarizing the model accuracy in a table.
- Figure 4-6: Add the relevant validation statistics directly to the LST, LE, and HE plots for clarity.
- Figure 9: a-c and d-i represent distinct data types and should not share a single figure label.
- Line 847: correct the citation format for code repositories with the rest of the reference list.
Citation: https://doi.org/10.5194/egusphere-2025-1983-RC1 - AC1: 'Reply on RC1', Yuan He, 05 Aug 2025
-
RC2: 'Comment on egusphere-2025-1983', Anonymous Referee #2, 20 Jun 2025
General comment:
The manuscript entitled “Hybrid Lake Model (HyLake) v1.0: unifying deep learning and physical principles for simulating lake-atmosphere interactions” written by Yuan He and Xiaofan Yang (egusphere-2025-1983) presented the HyLake v1.0 hybrid model, which performed better than other models. The manuscript is generally well-written, which will be within the scope of GMD. Please clarify the following points before the possible publication.
Major comments:
- Line 117-119 (and Table 1): This might be the trial and error in the authors and was not presented explicitly within the manuscript, but why was only the MLW site used for training and other sites used for validation? I missed the information, but why was the cross-validation not attempted in the process? I am wondering about the robustness of the developed model based on the training data from one site.
Specific comments:
- Line 40-42: The reference for each process-based model will be better.
- Line 116-117 and 120-125: How to fill the gap by the ERA5 reanalysis dataset was ambiguous. For example, what was the deficit rate from 2012 to 2015? Please rewrite this explanation.
- Line 120-125: In addition to the above comment, when the ERA5 reanalysis gaps the data at the MLW site, is this the self-validation? Please clarify.
- Line 345: The legend “HyLake-baseline” will be confusing. I would like to recommend expressing “Baseline”.
Technical comments:
- Line 29: “surface water temperature” will not match the abbreviation of “LST”. Is this “lake surface temperature”? Please confirm.
- Line 110: No need to repeat these abbreviations.
Citation: https://doi.org/10.5194/egusphere-2025-1983-RC2 - AC2: 'Reply on RC2', Yuan He, 05 Aug 2025
-
RC3: 'Comment on egusphere-2025-1983', Anonymous Referee #3, 21 Jun 2025
General Comments
This manuscript presents HyLake v1.0, a hybrid lake–atmosphere model that embeds a Bayesian-optimized bidirectional LSTM surrogate within a process‐based 1-D vertical transport framework to simulate lake surface temperature and surface fluxes. The work addresses a key challenge in environmental modeling: integrating data‐driven surrogates with physical principles. The extensive validation on Lake Taihu (2012–2015) against FLake demonstrates clear performance gains, and the hybrid approach represents a meaningful methodological advance for lake modeling. While the methodology is sound and the Lake Taihu validation is comprehensive, the authors should more clearly discuss the requirements and limitations for applying this approach to other lake systems. The current multi-site validation within Lake Taihu provides good evidence of transferability, but broader applicability claims should be more cautiously framed.Specific Comments
- The multi-site validation within Lake Taihu is convincing but add discussion of what adaptations would be needed for different lake types (e.g., deeper lakes, different climate zones, varying trophic states). Consider outlining a framework for applying the methodology to new lake systems.
- Better justify the choice of BO-BLSTM over simpler alternatives. provide clearer explanation of why Bayesian optimization and bidirectional LSTM architecture were chosen over deterministic alternatives.
- Discuss how the surrogate maintains physical consistency and whether energy balance is preserved through the hybrid coupling. Consider briefly addressing this in the discussion section.
- While full uncertainty quantification may be beyond the current scope, briefly discuss the uncertainty implications of the Bayesian surrogate and how this could be leveraged in future applications.
Minor Comments
- Terminology: Define LE (latent heat) and HE (sensible heat) at first mention.
- References: Standardize citation formats (e.g., “Hersbach et al. (2020)” vs. “Hersbach et al., 2020”).
- Section Organization: Consider moving deep implementation details (e.g., GUI remarks) into a Supplement or Code & Data Availability section.
- Caption Detail: Enhance figure captions to specify whether plotted values are observed or simulated and note dataset origins (real vs. semi-synthetic).
Citation: https://doi.org/10.5194/egusphere-2025-1983-RC3 -
AC3: 'Reply on RC3', Yuan He, 05 Aug 2025
We sincerely thank Reviewer #3 for the constructive comments. In revision, we particularly discussed the requirements and limitations for HyLake v1.0 and presented an example using another morphologically distinct lake to show its transferability. The response is attached in the Supplement.
-
RC4: 'Comment on egusphere-2025-1983', Anonymous Referee #4, 26 Jun 2025
He and Yang present a new model, the Hybrid Lake Model v1.0. With this model He et al. want to approximate LST changes which are a crucial indicator of climate change in the Earth system. Their model combines process-based with deep learning methods. Their results show that HyLake outperforms other models. The study is interesting and may be published, but the manuscript needs major revisions before it can be considered for publication. It is essential that the content of the study and presented results become more clear and the study understandable for a broader readership. Without that it is quite tough to assess the quality of the here presented results.
General comments:
- There are many weird sentences in the text which do not make sense or are misleading. I will provide several examples in the specific comments.
- The abstract should be significantly improved and clearly state what has been done in this study and what are the major results.
- The entire manuscript needs are clear writing and thus needs to be rewritten. There are many repetitions on one hand, but on the other hand a mixed terminology is used as e.g. evaluation and validation; model, model results and model experiments; surrogates so that it does not become clear to the reader what has been used and what exactly has been done and which models/data sets are compared.
Specific comments:
P1, L13: What exactly do you mean with “has yet to fully benefit from the integration of process-based and deep learning based models.”? Do you mean these processes need to be still integrated in these models? Please rephrase the sentence to be more clear.
P1, L16: What is “FLake”? Is this a ML model or a process-based model?
P1, L17-19: Compared to what does HyLake outperform other models? What has been used as reference?
P1, L21: What do you mean with “Under ERA5 reanalysis datasets”? This does not make any sense and needs to be rephrased.
P1, L21-23: What is meant with “generalization and transferability”? Concerning what is HyLake indicating a strong generalization and transferability?
P1, L26-27: The last sentence is in my opinion a repetition of what has been said before and is thus obsolete.
P3, L76-77: The abbreviations HE and LE should be introduced here once again.
P3, L82-82: “where differ significantly in its biological characteristics” is not clear and the sentence should be rephrased.
P3, L88-89: What is the difference here between “validate” and “evaluate”. To which data sets has HyLake been evaluated or validated?
P3, L90: What do you mean with “under ERA5 reanalysis datasets”? This does not make any sense. Please rephrase the sentence.
P3, L90-91: Why? Is this a result from your validation/evaluation?
P3, L95: rapid increase of what? The water temperature? Please be more clear. Additional questions I have are if this increase is based on observations and if these are climate change induced increases or increases due to other reasons.
P3, L96-97: Sentence grammatically not correct, please improve.
P4, L110: The introduction of the abbreviations LE and HE should be done already in L76 (see my comment above).
P4, L117: “using ERA5 reanalysis data sets”. Which parameters are used from ERA5? HE and LE? How accurate is the data?
P4, L124-125: If ERA5 is used to fill up data gaps , it should not be used for evaluation.
P5, L134: What is meant with “variants”? Do you mean variables? What exactly are you doing here? Are you using different set-ups, thus performing sensitivity simulations?
P6, L143-148: Hasn’t the same you have written here been written in slightly different wording already in the previous paragraph? Please avoid repetitions.
P8, L197-198: This sentence does not make sense. “LST” is a parameter while the “Euler scheme” is a method.
P8, L199: What is meant with “increments in LST”? Do you mean components that affect LST?
P8, L200: What is G(0)?
P8, L204-205: The sentence is not clear and needs to be rephrased. What do you mean with different models? To my understanding you are not using different models, these are rather different model runs.
P10, L225: For non LSTM users it should be explained what the “forget gate” is.
P11, L250: What is an “Adam optimizer”?
P11, L259-261: 10% and 10% correct? I think it would be easier for the reader if the information listed here would be put into a table.
P11, L266-267: Rephrase/Improve sentence “The briefly introduction……”
P11, L275: Rather “used” than “proposed”. Compared to what is the improvement of HyLake compared?
P12, L278: For me it is not clear what the difference to HyLake v1.0 is. Please clarify and improve the text.
P12, L282: What exactly has been intercompared? For me it is still not clear for what ERA5 has been used.
P12, L283: “almost validated” does not make any sense. Either you have validated your experiments or not. Additionally to this unclear phrasing, this sentence is a repetition of what has been said at the begin of the paragraph.
P12, L284: The specification of what can be found in Table 2?
P12,Table 2: It is still not clear if these are different “models” or “model experiments” since throughout the manuscript different wording is used and what exactly has been done has not properly been explained.
P12, L286-287: “validation” or “evaluation”? Only one of the terms should be used. Since you assess the quality of models (or model experiments) “evaluate” would be the correct term.
P12, L290; It should rather read “assess if the models over or underestimate the observations”
P12, L292-294: If you calculate the difference between model and observations it should also read in the equations for RMSE and ME x_i-y_i. Please check and correct.
P13, L305-307: Improve sentence and make clear if these are different models or model experiments.
P13, L313: Here it is still not clear to what the comparisons are done. Are you comparing these to observations or to other models/model experiments?
P13, L324: What is meant with “feasible” and “reasonable and robust way”? This terminology does not make any sense here and the text should be rewritten.
P13, L328-329: Which are the surrogates? I still cannot follow. Here it sounds like that Flake and HyLake have been integrated to Baseline and HyLake 1.0? However, aren’t Baseline and HyLake 1.0 model experiments?
P14, L330-341: It still did not become clear to me to which data set the model has been compared.
P14, Figure 4: I would suggest to make two figures instead of one figure.
P15, 349: Which experiments? Model experiments? Are Flake, Baselin and HyLakev1.0 model experiments or models?
P15, L351: Are you referring here to one period or several periods? Which period exactly is considered?
P15, L356: What is meant with daily scale? On a daily basis or the daily cycle? What exactly is hwon in Figure 4 needs to be better explained.
P15, L374: What is meant with “peak and valley values”? Please rephrase.
P15, L376-377: Also this sentence needs to be revised.
P18, L403: developed? Isnt’ that a model experiment?
P18, L410ff: It would be much more concise if these ME and RMSE values would be listed in a table.
P19, 442: The transferability and generalization is too often mentioned without explaining what actually is meant with that.
P19, Figure 8 caption: Which overall datasets? What is meant with each variable? HE, LE and LST. If yes, then please clearly write this.
P19, Figure 8 caption: There seem to be some repetitions. Please check and improve the figure caption.
P19, L455: This is mentioned to often. Please avoid to many repetitions.
P19, L459: using ERA5 for what? As forcing data set? If yes, what has then been used in the other results presented?
P20, L479: Repetition -> sentence obsolete.
P20, L486: Same as for L479.
P21, L502: Same here.
P22, L528: Which are the different forcing data sets? Only ERA5 is mentioned.
P24, 578: Not clear, before it was always stated that HyLake v1.0 outperformed the other models. Now, here the opposite is stated.
P25, L624: Are these forcing data sets observations or model simulations?
P26, L640: Which 15 variables have been used? These have nowhere been mentioned.
Technical corrections:
P1, L13: proposed -> proposes
P4, L104: with -> has
P5, Figure 1 caption: valid -> validate
P5, L133: couple a LSTM-based -> coupled to a LSTM-based
P5, L133: which is shown -> as schematically shown
P8, L184 and L208: Equation -> Eq.
P10, L233: same here as for L184 and L208.
P11, L262: designment -> design
P12, L279: using larger train datasets against -> using a larger training dataset compared to a
P12, L287: a -> the
P13, L305: Delete “After that” and start sentence with “To evaluate”.
P13, L308: train -> training
P13, L313: train set -> training data set
P13, L313: validation set -> validation data set
P13, L323: relatively -> somewhat (?). Check wording and improve.
P14, Figure 4 caption: train -> training
P18, L420: it is worthy -> it is worthy to note (?)
P18, L404: train -> training
P18, L429: appeared -> apparent
P22, L527: proposed to intercompare -> proposed for intercomparison
Citation: https://doi.org/10.5194/egusphere-2025-1983-RC4 - AC4: 'Reply on RC4', Yuan He, 05 Aug 2025
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
542 | 62 | 30 | 634 | 24 | 8 | 26 |
- HTML: 542
- PDF: 62
- XML: 30
- Total: 634
- Supplement: 24
- BibTeX: 8
- EndNote: 26
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1