Enhancing Parameter Calibration in Land Surface Models Using a Multi-Task Surrogate Model within a Differentiable Parameter Learning Framework

Xie, Wenpeng; Li, Hongmei; Yoshimura, Kei

doi:10.5194/egusphere-2025-3301

Preprints

https://doi.org/10.5194/egusphere-2025-3301

Preprints

21 Aug 2025

| 21 Aug 2025

Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

Enhancing Parameter Calibration in Land Surface Models Using a Multi-Task Surrogate Model within a Differentiable Parameter Learning Framework

Wenpeng Xie, Hongmei Li, and Kei Yoshimura

Abstract. Land surface models (LSMs) are essential for simulating terrestrial processes and their interactions with the atmosphere. However, parameter calibration in LSMs remains a major challenge owing to complex process coupling and parameter uncertainty. For example, key parameters, such as plant function type (PFT), are often estimated using field measurements or empirical relationships, which are characterized by limited accuracy, resulting in systematic biases and inconsistencies. In this study, we introduce multiple-task differentiable parameter learning (MdPL), a deep learning framework that combines a multitask surrogate model with a differentiable parameter generator for more accurate and efficient LSM parameter calibration. The multitask surrogate learns both shared and task-specific features to predict multiple fluxes, and the differentiable generator infers site-specific parameters from meteorological forcings and land surface attributes. Calibrated across 20 sites spanning four PFTs, the MdPL-calibrated Integrated Land Simulator (ILS) achieved a 15 % decrease in RMSE for both sensible and latent heat flux simulations. Further, benchmarking using the PLUMBER2 dataset showed that the MdPL-calibrated ILS outperformed standard LSMs (CLM5, JULES, Noah, and GFDL), and its accuracy matched or exceeded those of LSTM-based approaches. The assessment of its transferability via leave-one-out cross-validation for evergreen forest, woodland, and cultivation sites showed reasonable transfer performance for evergreen forests and woodlands, with parameter sets yielding close-to-optimal flux simulations, even without site specification. However, for cultivation sites, PFT parameters exhibited strong site specificity, with parameter sets from the same PFT not reliably transferred. Despite its reduced effectiveness of the framework for cultivation sites under fixed PFT settings, it offers a scalable and physically grounded approach for enhancing parameter calibration in complex LSMs.

Received: 13 Jul 2025 – Discussion started: 21 Aug 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Wenpeng Xie, Hongmei Li, and Kei Yoshimura

Status: open (until 26 Nov 2025)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2025-3301', Anonymous Referee #1, 21 Sep 2025 reply

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3301/egusphere-2025-3301-RC1-supplement.pdf
Reply

Citation: https://doi.org/10.5194/egusphere-2025-3301-RC1
CC1: 'Comment on egusphere-2025-3301', Shanning Bao, 07 Nov 2025 reply

Publisher’s note: this comment is a copy of RC2 and its content was therefore removed on 11 November 2025.

Reply

Citation: https://doi.org/10.5194/egusphere-2025-3301-CC1
RC2:
'Comment on egusphere-2025-3301', Anonymous Referee #2, 10 Nov 2025 reply
The manuscript entitled “Enhancing Parameter Calibration in Land Surface Models Using a Multi-Task Surrogate Model within a Differentiable Parameter Learning Framework” aimed to calibrate parameters of the ILS model by leveraging the differentiability of neural networks. The study demonstrates that the proposed multi-task differentiable parameter learning (MdPL) framework achieves better sensible and latent heat simulation performance than the default parameter set and outperforms the single-task version.
While the topic is relevant and the approach technically interesting, I have serious concerns about the validity and scientific contribution of the study in its current form. Specifically, there are two major issues that critically affect the plausibility and impact of the work:
Lack of direct physical connection between calibration and model parameters:

The ILS model parameters were not directly optimized through differentiable learning. Instead, the learning process was conducted through a surrogate model. This inevitably introduces fitting errors and weakens the physical interpretability of the results. The surrogate model may not faithfully represent the true relationships between model parameters and physical processes within the ILS framework.

Insufficient evaluation and benchmarking:

The study assesses calibrated model outputs only against simulations using the default parameter set. Without comparison against other calibration methods (e.g., PFT-specific parameter optimization and etc), it is difficult to judge the value or robustness of the proposed approach. Given the many existing parameter calibration techniques, it is essential to demonstrate that MdPL performs competitively or superiorly to conventional parameter calibration methods.

These two issues fundamentally limit the scientific credibility and general applicability of the work. Nonetheless, the authors’ efforts are appreciated, and I encourage them to consider these points in future work to strengthen the study’s methodological and physical rigor.
Minor Comments
Introduction: The literature review on differentiable parameter learning is incomplete. Please include more relevant studies (e.g., Bao et al., JAMES, 2023) to better contextualize the contribution.

Line 110: The manuscript claims to mitigate the impact of sparse and noisy observational data, but this is not clearly demonstrated. Please elaborate or revise this claim.

Lines 112–114: The notation for section references (“Section 2,” “Sect. 3,” “Sect. 4”) should be unified.

Figure 1: This figure should appear after the corresponding description. As currently presented, it is difficult to interpret without prior explanation of the framework.

Line 158: Please define the symbol ‘L’

Section 2.3: Consider summarizing the evaluation metrics in a concise table instead of repeating similar textual descriptions.

Table 1: Add horizontal lines to clearly separate plant functional types.

Line 225: The term “Mediterranean climate” is more appropriate than “subtropical climate” for ‘Csa’ and ‘Csb’.

Line 230: Provide details on parameter sampling: Was it random? How many samples were drawn per range? Did the authors account for potential nonlinear parameter–output relationships (e.g., exponential)?

Table 2: Explain how the key parameters influence model outputs, and consider providing response curves to illustrate these relationships.

Line 241: Justify the use of only one pre-training dataset and provide its distribution in a figure.

Line 257: Correct grammatical errors.

Line 266: Explain the rationale for using different hidden layer sizes in single-task and multi-task surrogate models, or directly use the same size.

Section 2.4.3: The three experiments appear sequential rather than parallel. Consider dividing them into three sub-sections—e.g., (2.4.3) Comparison between Multi-Task and Single-Task Models, (2.4.4) Benchmarking, and (2.4.5) Transferability Evaluation.

Line 300: Clarify why only three plant functional types were evaluated.

Results and Discussion: This section should be streamlined to highlight key findings rather than listing results exhaustively. Focus on the major insights and implications.

Line 371: ‘LSM’?

Lines 376–379: The text mentions identical RMSEs between LSTM and ILS_MdPL and an increasing difference at larger timescales, but this is not supported by Figure 5. Please check for consistency.

Table 4: The PFT-calibrated parameters perform worse than the mean of site-specific parameters, which is counterintuitive (LSTM should be more flexible than mean and should have better performance). Please investigate potential causes (e.g., missing features or model structure issues).

Figure 6: The figure caption does not match the content—it does not directly compare KGE but instead presents temporal LE and H. Please revise accordingly.

Reply
Citation: https://doi.org/10.5194/egusphere-2025-3301-RC2

Wenpeng Xie, Hongmei Li, and Kei Yoshimura

Data sets

Dataset and results for dPL and MdPL Experiments Wenpeng Xie https://doi.org/10.5281/zenodo.15753067

Model code and software

model code, figure and table reproduction Wenpeng Xie https://doi.org/10.5281/zenodo.15748737

Interactive computing environment

ILS environment and Pytorch environment Wenpeng Xie https://doi.org/10.5281/zenodo.15748737

Wenpeng Xie, Hongmei Li, and Kei Yoshimura

Viewed

Total article views: 1,917 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,873	32	12	1,917	27	30

HTML: 1,873
PDF: 32
XML: 12
Total: 1,917
BibTeX: 27
EndNote: 30

Views and downloads (calculated since 21 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	558	15	2	575
Sep 2025	1,238	6	1	1,245
Oct 2025	47	7	5	59
Nov 2025	30	4	4	38

Cumulative views and downloads (calculated since 21 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	558	15	2	575
Sep 2025	1,238	6	1	1,245
Oct 2025	47	7	5	59
Nov 2025	30	4	4	38

Viewed (geographical distribution)

Total article views: 1,909 (including HTML, PDF, and XML) Thereof 1,909 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 22 Nov 2025

Short summary

Forecasts of land–air heat and moisture exchange often depend on experts manually adjusting settings, causing inconsistencies. We developed a deep learning method that automatically calibrates these simulations using observations from twenty sites, reducing average errors by fifteen percent. It kept high accuracy at new forest locations while showing more variability at farms. This tool enables faster, more reliable forecasts without manual tuning.


Total:	0
HTML:	0
PDF:	0
XML:	0