A physics-informed machine learning (PIML) framework for projecting 21st-century permafrost extent in Northeast China

Huang, Shuai; Kong, Xiangbing; Yang, Xue; Jin, Xiaoying; Li, Shanzhen; Yang, Lin; Zhang, Yaodan; Gao, Kai; Wang, Hongwei; Li, Xiaoying; He, Ruixia; Lü, Lanzhi; Cheng, Guodong; Jin, Huijun

doi:10.5194/egusphere-2025-4544

Preprints

https://doi.org/10.5194/egusphere-2025-4544

Preprints

17 Nov 2025

| 17 Nov 2025

A physics-informed machine learning (PIML) framework for projecting 21st-century permafrost extent in Northeast China

Shuai Huang, Xiangbing Kong, Xue Yang, Xiaoying Jin, Shanzhen Li, Lin Yang, Yaodan Zhang, Kai Gao, Hongwei Wang, Xiaoying Li, Ruixia He, Lanzhi Lü, Guodong Cheng, and Huijun Jin

Abstract. The degradation of marginal permafrost is a sensitive indicator of climate change, with far-reaching implications on regional ecosystems, hydrology, and infrastructure. Located near the southern limit of latitudinal permafrost (SLLP) in Eastern Asia, Northeast China has experienced pronounced permafrost retreat and persistent ground warming in recent decades. This study develops a physics-informed machine learning (PIML) framework that integrates the Temperature at the Top of Permafrost (TTOP) model, observed changes in land use and land cover (LULC), and climate projections from the Coupled Model Intercomparison Project 6 (CMIP6) to improve the understanding and prediction of permafrost dynamics in the region. Results indicate that, under the SSP5-8.5 scenario, permafrost extent may decline by more than 90 % by the end of the 21st century, primarily driven by a sharp reduction in the air freezing index (AFI), especially in high-latitude and high-elevation zones. Land use and cover changes (LUCC), particularly urban expansion and deforestation, further exacerbate ground thermal disturbances. Spatially, mountainous forested areas, such as the Da Xing’anling Mountains, exhibit relatively greater resilience to warming due to dense vegetation and complex topography that help buffer surface energy fluxes. Feature attribution analysis identifies surface temperature, snow cover duration, and vegetation as dominant drivers of permafrost stability, while Uniform Manifold Approximation and Projection (UMAP) clustering reveals distinct degradation trajectories across different land cover types. This study highlights the complex interplay of climatic and anthropogenic factors in permafrost evolution and demonstrates the utility of integrating physical modelling with machine learning to support ecological conservation and infrastructure risk management in cold regions environment.

Received: 16 Sep 2025 – Discussion started: 17 Nov 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 6275 KB)

Supplement (2140 KB)

Download & links

Shuai Huang, Xiangbing Kong, Xue Yang, Xiaoying Jin, Shanzhen Li, Lin Yang, Yaodan Zhang, Kai Gao, Hongwei Wang, Xiaoying Li, Ruixia He, Lanzhi Lü, Guodong Cheng, and Huijun Jin

Status: final response (author comments only)

CC1:
'Comment on egusphere-2025-4544', Xianglong Li, 20 Nov 2025

I believe the author has done an excellent job, but I think it would be preferable to compare such predictions with permafrost mapping or field survey results from the observed time period.

Citation: https://doi.org/10.5194/egusphere-2025-4544-CC1
- AC1:
  'Reply on CC1', Shuai Huang, 03 Dec 2025
  
  Comment:
  I believe the author has done an excellent job, but I think it would be preferable to compare such predictions with permafrost mapping or field survey results from the observed time period.
  Response:
  Dear Dr. Li:
  Thank you very much for your constructive suggestion. We fully agree that comparing our simulation results with existing permafrost maps and field-based evidence is essential for strengthening the reliability of model outputs. In response to your comment, we have added a comprehensive comparison between our simulated permafrost distribution during 2001–2020 and two recently published Northern Hemisphere permafrost maps (Ran et al., 2022; Obu et al., 2019). The newly added content is presented in the revised manuscript (Lines 343–364) and the comparison is illustrated in the newly added Figure 5. Revision as below:
  L343-364:
  In addition, we compared the permafrost distribution simulated by the MLP model in this study during 2001–2020 with the recently published Northern Hemisphere permafrost maps (as shown in Fig. 5). Across the three permafrost maps, we observed a consistent representation of the widespread permafrost distribution in the Da Xing’anling Mountains, with the SLLP located approximately in the Arxan mountains. However, notable discrepancies occur among studies for the permafrost distribution in the Xiao Xing’anling Mountains, the Hulunbuir Plateau, and the southern mountainous regions (Huanggangliang Mountains and Changbai Mountains). For the Xiao Xing’anling region, our results are more consistent with those of Ran et al. (2022), but differ significantly from Obu et al. (2019). According to Huang et al. (2025), the SLLP in the Xiao Xing’anling mountains is located approximately between Heihe and Bei’an, which agrees well with our simulation. For the Hulunbuir Plateau, our estimation lies between the results of Ran et al. (2022) and Obu et al. (2019). However, due to the limited availability of field observations in this area, further verification is required. Regarding SLLP characteristics, the simulated permafrost distribution near the southern boundary in this study appears more scattered, reflecting the presence of isolated permafrost patches near the SLLP. This pattern is consistent with the actual conditions. With respect to the permafrost in the southern mountainous regions of Northeast China, our results and those of Ran et al. (2022) and Obu et al. (2019) all indicate the presence of permafrost. However, Obu et al. suggest a more extensive permafrost area in the Huanggangliang mountains, whereas both our study and Ran et al. (2022) show a more sporadic distribution. Based on the synthesis by Jin et al. (2025) and field surveys, permafrost in the southern mountainous regions of Northeast China may indeed exist but is difficult to detect; its occurrence is likely controlled by local factors. These findings further support the results of this study.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4544-AC1
  - CC3: 'Reply on AC1', Xianglong Li, 03 Dec 2025
    
    I’m very interested in this article, and these additional analyses will greatly enhance the reliability of the predictions. This work is truly meaningful.
    
    Citation: https://doi.org/10.5194/egusphere-2025-4544-CC3
CC2:
'Comment on egusphere-2025-4544', Guojie Hu, 01 Dec 2025
This study develops a PIML framework that integrates physically based modeling with machine learning and incorporates dynamic land-use/land-cover changes to simulate and project permafrost evolution in Northeast China. The methodology demonstrates a certain degree of innovation. Since the study area is located near the SLLP in East Asia, its permafrost characteristics are regionally representative, and the findings provide valuable regional applicability and scientific insight. The manuscript is overall well written, but minor improvements can be made regarding clarity of expression as well as figure and text descriptions. The specific comments are as follows:
The abstract predominantly provides qualitative descriptions. It is recommended to include more quantitative results to enhance informativeness.

In Section 3.2, it is suggested to add comparisons with existing permafrost maps developed for the same region.

In Figure 7, please indicate the spatial extents corresponding to the Da Xing’anling Mountains, Xiao Xing’anling Mountains, the northern Song-Nen rivers Plain, and the Hulun Buir Plateau.

Lines 327–328 and 564–565 contain inaccurate wording, as the predictive accuracies of MLP and CatBoost differ depending on the metric used; thus, it is inappropriate to state that both models simultaneously exhibit the best performance.

In Lines 564–565, MAE is mentioned without prior reference, which seems to be a typographical error where MSE was mistakenly written as MAE.

The unit of MSE in the manuscript should be °C² instead of °C.
Citation: https://doi.org/10.5194/egusphere-2025-4544-CC2
- AC2: 'Reply on CC2', Shuai Huang, 05 Dec 2025
  
  We sincerely thank Dr. Hu for the careful review and support of our manuscript. Our response letter is provided in the attachment; please kindly check it.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4544-AC2
RC1:
'Comment on egusphere-2025-4544', Anonymous Referee #1, 13 Dec 2025
The combination of dynamic land-use projections (PLUS), CMIP6 forcing, the TTOP model, and a machine-learning enhancement labelled as “physics-informed” offers a promising framework. The writing is clear and the regional focus is valuable. However, several critical methodological details, uncertainty treatment, quantitative attribution need to be improved. Therefore, I recommend that the manuscript could be accepted after major revision. I listed my concerns as follows:
Major concerns:
The PIML component is introduced as the central novelty but remains inadequately described. Readers are not told which ML algorithm was used, how physical constraints from TTOP were explicitly embedded in the training process, what the training/target data were, or how performance was assessed. A schematic and the mathematical formulation of the physics-informed part are needed for transparency and reproducibility.

Projections are presented without quantitative uncertainty. The headline result (>90 % permafrost loss by 2100 under SSP5-8.5) lacks ensemble spread from the 14 CMIP6 models and confidence intervals. At minimum, ensemble mean ±1σ and results for SSP1-2.6 and SSP2-4.5 should be added.

The role of land-use/land-cover change is emphasised throughout the introduction but not quantified in the results. A simple control experiment (dynamic PLUS scenario versus fixed 2020 land cover) is required to show how much additional permafrost loss is attributable to LUCC.

Independent validation of the full modelling chain against recent observations is missing. Present-day skill metrics are essential to support confidence in century-scale projections.

Minor comments:
Please streamline some lengthy sentences and ensure consistent terminology (LULC/LUCC).

Please briefly describe the content of repeatedly cited supplementary figures and tables when first mentioned.

Please consider adding at least one lower-emission scenario to increase policy relevance.

Please clarify or update citations listed as 2024/2025 that are still in review.
Citation: https://doi.org/10.5194/egusphere-2025-4544-RC1
- AC3: 'Reply on RC1', Shuai Huang, 09 Feb 2026
  
  Response Letter to Reviewers’ Comments
  The combination of dynamic land-use projections (PLUS), CMIP6 forcing, the TTOP model, and a machine-learning enhancement labelled as “physics-informed” offers a promising framework. The writing is clear and the regional focus is valuable. However, several critical methodological details, uncertainty treatment, quantitative attribution need to be improved. Therefore, I recommend that the manuscript could be accepted after major revision. I listed my concerns as follows:
  
  Major concerns:
  Q1: The PIML component is introduced as the central novelty but remains inadequately described. Readers are not told which ML algorithm was used, how physical constraints from TTOP were explicitly embedded in the training process, what the training/target data were, or how performance was assessed. A schematic and the mathematical formulation of the physics-informed part are needed for transparency and reproducibility.
  
  R1: We thank the reviewer for this constructive comment. In response, we will substantially expand the description of the PIML component in Section 2 (Methods). Specifically, we will clarify the machine-learning algorithm employed, detail how physical constraints derived from the TTOP model are explicitly incorporated into the training process, and describe the training data, target variables, and performance evaluation metrics. A schematic overview of the PIML workflow is already provided in the Supplementary Materials (Fig. S2). Accordingly, we will revise and better integrate this schematic with the main text and improve the methodological explanation to enhance transparency and reproducibility.
  
  Q2: Projections are presented without quantitative uncertainty. The headline result (>90% permafrost loss by 2100 under SSP5-8.5) lacks ensemble spread from the 14 CMIP6 models and confidence intervals. At minimum, ensemble mean ±1σ and results for SSP1-2.6 and SSP2-4.5 should be added.
  The role of land-use/land-cover change is emphasised throughout the introduction but not quantified in the results. A simple control experiment (dynamic PLUS scenario versus fixed 2020 land cover) is required to show how much additional permafrost loss is attributable to LUCC.
  Independent validation of the full modelling chain against recent observations is missing. Present-day skill metrics are essential to support confidence in century-scale projections.
  
  R2: Thank you for your insightful and constructive comments. We address each point as follows.
  (1) Uncertainty quantification of projections.
  We agree that quantitative uncertainty information is essential. In the revised manuscript, we will add an explicit uncertainty analysis based on the CMIP6 multi-model ensemble, reporting the ensemble mean together with ±1σ for projected permafrost extent. Results for SSP1-2.6 and SSP2-4.5 will also be included alongside SSP5-8.5 to provide a more complete scenario comparison.
  (2) Quantification of effects by changes in land-use/land-cover .
  We acknowledge that the role of land-use/land-cover change (LULC) was not sufficiently quantified in the original results. To address this, we will introduce a control experiment comparing simulations driven by dynamic PLUS-projected land use with those using fixed 2020 land-cover conditions. This comparison will allow us to explicitly quantify the additional permafrost loss attributable to LUCC.
  (3) Independent validation of the modelling framework.
  We agree that validation of the full modelling chain is necessary to support confidence in long-term projections. In the revised version, we will add a model evaluation based on present-day (2020s) conditions, including quantitative skill metrics, to assess model performance against available observations.
  
  Minor comments:
  
  Q3: Please streamline some lengthy sentences and ensure consistent terminology (LULC/LUCC).
  
  R3: Thanks for your helpful suggestion. We will streamline several lengthy sentences to improve clarity and readability, and we will ensure consistent use of terminology throughout the text.
  
  Q4: Please briefly describe the content of repeatedly cited supplementary figures and tables when first mentioned.
  
  R4: Thank you for your reminder. In the revised manuscript, we will briefly describe the content of supplementary figures and tables when they are first mentioned in the main text, particularly for those that are cited repeatedly, to improve clarity and readability.
  
  Q5: Please consider adding at least one lower-emission scenario to increase policy relevance.
  
  R5: Thanks for your valuable suggestion. In the revised manuscript, we will include at least one lower-emission scenario (e.g., SSP1-2.6) in the analysis to enhance the policy relevance of the projected permafrost changes and to provide a more comprehensive comparison across emission pathways.
  
  Q6: Please clarify or update citations listed as 2024/2025 that are still in review.
  
  R6: Thanks for your comment. In the revised manuscript, we will carefully review all references listed as 2024/2025 and verify their publication status.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4544-AC3
RC2:
'Comment on egusphere-2025-4544', Orgogozo Laurent, 17 Dec 2025

The present work pioneers the taking into account of land use and land cover changes in the simulation of climate change impacts on permafrost. It develops a new workflow by combining CMIP6-based climate projections and cellular automata-based land use projections for building the input data for a simple and classic permafrost model, the TTOP model, and finally Machine Learning procedures are combined with this modelling chain for improving the prediction capabilities and help studying the uncertainties. Then this new workflow is applied to the simulation of permafrost evolution in Northeast China for different scenarios of climate change. The work is novel and timely, and the results are interesting. However some information are lacking about the modeling procedure, and some additional discussions are needed regarding the links between the data-driven and process-oriented approaches. I recommend the manuscript to be accepted for publication after the appropriate revisions would have been made. My general and specific comments for doing so may be found below.

General comments:
The authors do not provide a finalized PIML model, but rather a familly of six PIML models with sometimes significantly different performances. I think that the authors should take the responsibility to make their recommendation about the procedure that should be used in future works that would adopt the methodology developed here.
The calibration / validation of the whole procedure should be better explained, and maybe needs some improvements (see my specific comments l 156-157 below).
I think that the performance of the proposed PIML approach depends greatly on the merits of the used process-based model. This point should be discuss more thoroughly, and it could be the subject of future comparative works.
Finally, the possibilities of applying the developed modelling chain in other permafrost contexts should be better discussed in the conclusions.

Specific comments:
l 87 : Error in the reference ; Tubini et al., 2021
l 90-91 : “Moreover, uncertainties in parameterization and incomplete representation of sub-grid heterogeneity can result in substantial variability in model projections (Groenke et al., 2023; Wang et al., 2024b).” I do agree with this statement, but the references provided for grounding it are not really relevant in this paragraph I think, since they mainly deal with results obtained with empirical or equilibrium models. Recent reviews in physical modelling of permafrost could be consulted for strengthening this point (e.g.: Yang et al., 2021, Hu et al, 2023).
https://doi.org/10.3389/feart.2021.721009
https://doi.org/10.1016/j.catena.2022.106844
l 93, l 97 : “Luo et al., 2024” : I can’t find this reference ? 10.1016/B978-0-323-85242-5.00013-0
l 140-141 : “14 global climate models (GCMs) were selected from the CMIP6 ensemble”. How this selection has been made? If all the CMIP6 models are included, it should be said as such, if not, the rational behind the choice of this specific sub-set should be provided.
l 103 : Citing a preprint is problematic, a regular reference should be provided for this work. Maybe the one below would be the relevant one? To be checked.
P. Pilyugina et al., "A Physics-Informed Machine Learning Framework for Permafrost Stability Assessment," in IEEE Access, vol. 13, pp. 96423-96433, 2025, doi: 10.1109/ACCESS.2025.3573072.
l 156-157 : Section 2.2 very interesting approach for LUCC projection. Two questions regarding the calibration / validation process. First, it seems that the 2000 and 2020 slices has been used both for calibrating and for validating the LUCC projection method. It seems to me that calibration and validation should be done on two different couples of time slices, e.g.: 2000-2010 for calibration and 2010-2020 for validation. So why only considering two dates for this calibration/validation process? Second, the used approach seems to rely on the assumption that the LUCC dynamics have stationary behaviors, so that a projection workflow calibrated on a given period (here 2000-2020) may be used for future projections. How to evaluate the strength of this assumption of stationary behaviors of LUCC?
l 191 : “The empirical parameters n_f, n_t and r_kserve to account for snow insulation, surface energy exchange, and the ratio of soil thermal conductivity in frozen to thawed states, respectively. The model parameters n_f, n_t and r_k were assigned based on LULC classifications.” It seems to me that these parameters should not only depends on LULC, but also on climate (e.g.: mean annual snowfall for snow insulation), topography (e.g.: plain vs mountain for surface energy exchange) and hydrology (e.g.: soil moisture for the ratio of soil thermal conductivity in frozen to thawed states). For instance in all LULC projections there are forested areas both at the South East and North West limits of the region of interest, for which may be climate, topography and hydrology are significantly different. Do all these forested areas share the same n_f, n_t and r_kempirical parameters? If yes, how to evaluate the biases in projected MAGT related to climatic, topographic and hydrological variability?
l 231-232 : “We then constructed a training dataset using the TTOP-estimated MAGT as the target variable, allowing the supervised learning models to capture the relationships between environmental predictors and ground temperature.” I understand that the proposed PIML approach aims at taking into account the variability mentioned in my previous comment. However, using TTOP results as target variable, which means the variable that the ML should reproduce based on the other ones I presume (I am not a ML specialist, sorry if I misunderstood this), I cannot see how the shortcomings of TTOP-estimation itself as put forward in my previous comment could be dealt with.
l 284-285, Figure 1 : “The indices represent the arithmetic averages computed from site-level downscaled data at 225 meteorological stations, using the delta downscale method applied across 14 CMIP6 models.” I think that what ‘averages’ means exactly here should be clarified. I guess that the model specific curves present averages across the 225 sites, while scenario averages are averages across the previous 14 climate models-specific averages (averages of averages). What are exactly the different averages presented in Figure 1 should be explained without ambiguity.
l 288 : “To further examine the spatial dynamics of freeze–thaw responses, we have conducted a case analysis under the low-emission scenario SSP126 (Fig. 2).” Why this one ? This should be explained. In fact I would be interested by the same analysis for the other scenarios, at least also for scenario SSP585.
l 300 : Using an absolute variation visualization rather than a negative one for AFI and a positive one for ATI may improve the readability of the Figure. I would also say that the figure is too information-rich, I would recommend to show only the 2020-2100 deltas, since the other ones are not commented in the body of the text. These are only presentation suggestions.
l 323 : “To evaluate the capability of different ML algorithms in simulating TSP, we have compared model-predicted MAGTs with observed values using six ML methods”. What is exactly observed values here ? I guess each blue dot in Figure 4 corresponds to a given site, for a 1961-2020 multi-annual average, right ? I would also be interested by the performances analyzed by year, the same graphics but with each dot representing the average of the MAGT across the 225 sites for a given year between 1961-2020, in order to visualize temporal variability of performances as well.
l 335-336 : “The high-performing models are subsequently employed to simulate future MAGT patterns and permafrost extent under projected scenarios.” Please list these high-performing models.
l 351: “Projected mean annual ground temperatures (MAGTs) across Northeast China under four SSPs scenarios based multilayer perceptron model.” What are exactly these projections ? Averages of the best performing PIML models selected in the previous section? May be the ‘multilayer perceptron model’ is the answer, but I don’t know what is it, and I think that many readers of TC won’t know either.
l 360 : “the MLP-TTOP model” First occurrence of this acronym/name. It should be introduced earlier in the method section.
l 365 : What is DXAM?
l 366 : What is XXAM?
l 375-376 : “(a) total Northeast China; (b) Xiao Xing'anling Mountains; (c) Da Xing'anling Mountains; (d) northern Songhua-Nen rivers Plain, and; (e) Hulun Buir Plateau” The Figure 7 would be much more interesting if it was providing alongside a map that show the localization and extent of the considered four sub-regions of Northeast China.
Figure 7 : In 7a, why are there increases in discontinuous permafrost area between 2040 and 2080? In 7d, why are there more permafrost in 2040 in SSP370 than in SSP126 and SSP245?
l 379 – 380 : “Fig. 8 summarizes the relative importance of 15 environmental variables across six ML models.” Why not considering here only the best performing PIML-models as selected in section 3.2?
l 389-391 : “In contrast, topographic and edaphic variables, such as slope angle, slope aspect, soil organic matter contents (SOC), bulk density (BD), and sand and clay contents, generally rank lower, though they may modulate soil thermal properties and hydrological processes as secondary controls.” I think it must be said here that these topographic and edaphic variables do have a strong controls on the two most influential variables, Mean annual land surface temperature (MALST) an Land Use Land Cover (LULC). So in fact most likely a (large ?) part of their intrinsic influence is encompassed in the influence of MALST and LULC.
l 423 : “thereby underestimating the scale and impact of anthropogenic disturbances.” LUCC may also derive solely from climate change. Deciphering the scale and impact of anthropogenic disturbances would require here to separate the climate change-induced land use effects and the anthropogenic land use change effects.
l 443 : “a PIML model” In fact several PIML models have been used, and in its present form the study does not make a clear choice about which is the best one that could be preferred to the others. I think that this choice should be made, or this sentence should be reformulated.
l 473-474 : “Rather, such signals reflect the thermal inertia of geocryological system prior to abrupt transitions.” The study convincingly grounds this statement for Northeast China permafrost, but its generalization to other permafrost contexts maybe not straightforward. I would attenuate the wording here for avoiding over interpretation of the results.
l 477 : “4.3 Spatial heterogeneity and resilience of XAP” Please avoid acronyms in titles as much as possible.
l 501-503 : “Hydrogeological conditions further affect thermal stability: long-term drying and surface drainage can increase ground albedo and reduce soil thermal conductivity, thereby cooling the ground.” I would highlight that the effect of forest on soil hydrology is rather complex, since for instance the roots network enhances infiltration while at the same time dries the soil trough evapotranspiration. Moreover, in permafrost contexts, these competing effects interact in a complex manner with active layer dynamics (e.g. : Orgogozo et al., 2019, sorry for the self citation).
https://doi.org/10.1002/ppp.1995
l 566 : “Compared to traditional physically based models, the PIML framework yields superior performance” This study grounds this statement solely for TTOP model, and this should be made clear in this paragraph.

Citation: https://doi.org/10.5194/egusphere-2025-4544-RC2
- AC4:
  'Reply on RC2', Shuai Huang, 09 Feb 2026
  Response Letter to Reviewers’ Comments
  The present work pioneers the taking into account of land use and land cover changes in the simulation of climate change impacts on permafrost. It develops a new workflow by combining CMIP6-based climate projections and cellular automata-based land use projections for building the input data for a simple and classic permafrost model, the TTOP model, and finally Machine Learning procedures are combined with this modelling chain for improving the prediction capabilities and help studying the uncertainties. Then this new workflow is applied to the simulation of permafrost evolution in Northeast China for different scenarios of climate change. The work is novel and timely, and the results are interesting. However some information are lacking about the modeling procedure, and some additional discussions are needed regarding the links between the data-driven and process-oriented approaches. I recommend the manuscript to be accepted for publication after the appropriate revisions would have been made. My general and specific comments for doing so may be found below.
  General comments:
  Q1: The authors do not provide a finalized PIML model, but rather a familly of six PIML models with sometimes significantly different performances. I think that the authors should take the responsibility to make their recommendation about the procedure that should be used in future works that would adopt the methodology developed here.
  
  R1: Thanks for your insightful comment and constructive advice. In the revised manuscript, the rationale behind the six PIML configurations will be clarified, and their performances will be compared in a more structured and transparent manner. A clear recommendation regarding the procedure to be adopted within this methodological framework will also be provided.
  
  Q2: The calibration / validation of the whole procedure should be better explained, and maybe needs some improvements (see my specific comments L156-157 below).
  
  R2: Thanks for your important comment. The calibration and validation procedure will be clarified in the revised manuscript. A detailed response and explanation addressing this point are provided below in response to the specific comments at L156–157.
  
  Q3: I think that the performance of the proposed PIML approach depends greatly on the merits of the used process-based model. This point should be discuss more thoroughly, and it could be the subject of future comparative works.
  
  R3: Thanks for your professional and visionary comment. The dependence of the proposed PIML framework on the underlying process-based model will be discussed in more detail in the Discussion section of the revised manuscript, including its implications and potential relevance for future comparative studies.
  
  Q4: Finally, the possibilities of applying the developed modelling chain in other permafrost contexts should be better discussed in the conclusions.
  
  R4: Thanks for your important comment. The applicability of the proposed modelling chain to other permafrost contexts will be discussed in more detail in the Discussion section of the revised manuscript, including its potential transferability and limitations.
  
  Specific comments:
  
  Q5: L87 : Error in the reference ; Tubini et al., 2021
  
  R5: Thanks for pointing this out. The reference to Tubini et al., 2021 will be checked carefully and corrected in the revised manuscript.
  
  Q6: L90-91 : “Moreover, uncertainties in parameterization and incomplete representation of sub-grid heterogeneity can result in substantial variability in model projections (Groenke et al., 2023; Wang et al., 2024b).” I do agree with this statement, but the references provided for grounding it are not really relevant in this paragraph I think, since they mainly deal with results obtained with empirical or equilibrium models. Recent reviews in physical modelling of permafrost could be consulted for strengthening this point (e.g.: Yang et al., 2021, Hu et al, 2023).
  https://doi.org/10.3389/feart.2021.721009
  https://doi.org/10.1016/j.catena.2022.106844
  
  R6: Thanks for your important comment. The references supporting this statement will be reviewed and revised in the updated manuscript.
  
  Q7: L93, L97: “Luo et al., 2024” : I can’t find this reference ? 10.1016/B978-0-323-85242-5.00013-0
  
  R7: Thanks for pointing this out. The reference to Luo et al., 2024 will be carefully checked, and the citation information will be corrected or updated as appropriate in the revised manuscript.
  
  Q8: L140-141: “14 global climate models (GCMs) were selected from the CMIP6 ensemble”. How this selection has been made? If all the CMIP6 models are included, it should be said as such, if not, the rational behind the choice of this specific sub-set should be provided.
  
  R8: Thanks for your valuable advice. The selection of the 14 CMIP6 GCMs will be clarified in the revised manuscript.
  
  Q9: L103: Citing a preprint is problematic, a regular reference should be provided for this work. Maybe the one below would be the relevant one? To be checked.
  Pilyugina et al., "A Physics-Informed Machine Learning Framework for Permafrost Stability Assessment," in IEEE Access, vol. 13, pp. 96423-96433, 2025, doi: 10.1109/ACCESS.2025.3573072.
  
  R9: Thanks for your kind suggestion and for pointing out the appropriate peer-reviewed reference. The citation will be updated accordingly in the revised manuscript, and the correct published version (Pilyugina et al., 2025, IEEE Access) will replace the preprint.
  
  Q10: L156-157 : Section 2.2 very interesting approach for LUCC projection. Two questions regarding the calibration / validation process. First, it seems that the 2000 and 2020 slices has been used both for calibrating and for validating the LUCC projection method. It seems to me that calibration and validation should be done on two different couples of time slices, e.g.: 2000-2010 for calibration and 2010-2020 for validation. So why only considering two dates for this calibration/validation process? Second, the used approach seems to rely on the assumption that the LUCC dynamics have stationary behaviors, so that a projection workflow calibrated on a given period (here 2000-2020) may be used for future projections. How to evaluate the strength of this assumption of stationary behaviors of LUCC?
  
  R10: Thanks for your important comment. It is acknowledged that the description of the calibration and validation procedure in the original manuscript may cause confusion, particularly regarding the roles of the 2000, 2010, and 2020 LULC maps. This ambiguity arises from the way the validation step was described, rather than from the modelling procedure itself.
  In practice, the validation is conducted as a fully independent process. Specifically, land use in 2020 is simulated using the CLCD maps from 2000 and 2010 as inputs, and the simulated 2020 result is then compared with the observed CLCD 2020 map for validation. Therefore, the 2020 LULC map is not used simultaneously as both an input and a validation target, and no overlap exists between the computation and validation steps.
  Regarding the assumption of stationary LUCC dynamics, it is acknowledged that the PLUS-based projection framework, similar to other CA–Markov approaches, relies on the assumption that the dominant land-use transition mechanisms and their relationships with driving factors remain approximately stationary over time.
  In the present study, this assumption is explicitly embedded in the definition of the future land-use scenario, which is formulated as a “natural development” scenario representing a continuation of the observed 2000–2020 LUCC trends without additional policy interventions. The projection is therefore intended to provide a baseline scenario rather than a deterministic prediction of future land-use changes.
  The strength and implications of this stationarity assumption will be evaluated and discussed in the revised manuscript by quantifying the impact of LUCC through a control experiment comparing simulations driven by dynamic PLUS-based land-use projections and those using a fixed 2020 land-cover map. This comparison allows the contribution of LUCC, and hence the sensitivity of permafrost projections to the assumed LUCC dynamics, to be explicitly assessed.
  In addition, the limitations associated with the stationarity assumption will be discussed more clearly, and the LUCC-related uncertainty will be interpreted in the context of the larger climate-driven uncertainty derived from the CMIP6 multi-model ensemble.
  
  Q11: L191 : “The empirical parameters nf, nt and rk serve to account for snow insulation, surface energy exchange, and the ratio of soil thermal conductivity in frozen to thawed states, respectively. The model parameters nf, nt and rk were assigned based on LULC classifications.” It seems to me that these parameters should not only depends on LULC, but also on climate (e.g.: mean annual snowfall for snow insulation), topography (e.g.: plain vs mountain for surface energy exchange) and hydrology (e.g.: soil moisture for the ratio of soil thermal conductivity in frozen to thawed states). For instance in all LULC projections there are forested areas both at the South East and North West limits of the region of interest, for which may be climate, topography and hydrology are significantly different. Do all these forested areas share the same nf, nt and rk empirical parameters? If yes, how to evaluate the biases in projected MAGT related to climatic, topographic and hydrological variability?
  
  R11: Thanks for your kind suggestion. This concern is well taken. It is acknowledged that deriving spatially varying n_t, n_fand r_kfields that explicitly account for climate, topography, and hydrology would ideally require dense observational constraints (e.g., multi-depth ground temperature/TTOP measurements and long-term surface energy–snow–moisture observations). However, such observational data are sparse across Northeast China, which limits the feasibility of reliably calibrating fully spatially distributed parameter fields.
  In the revised manuscript, this limitation will be stated explicitly, and the role of n_t, n_fand r_kwill be clarified as effective parameters used for a first-order representation of surface controls. The climate signal is primarily introduced through the CMIP6-driven FDD/TDD forcing in the TTOP framework, while the LULC-based parameter assignment represents the dominant land-surface modulation under data limitations.
  To address the potential bias associated with within-class heterogeneity (e.g., soil/topographic/hydrological conditions), an uncertainty/sensitivity analysis will be added by perturbing n_t, n_fand r_kwithin plausible ranges (informed by the literature) and propagating this variability into MAGT projections. Where feasible, a simple sub-classification within major LULC categories using static proxies (e.g., elevation/terrain or wetness indicators) will also be considered to reduce the assumption of uniform parameters across the region. The implications of these assumptions and the resulting uncertainty will be discussed in the Discussion section.
  
  Q12: L231-232 : “We then constructed a training dataset using the TTOP-estimated MAGT as the target variable, allowing the supervised learning models to capture the relationships between environmental predictors and ground temperature.” I understand that the proposed PIML approach aims at taking into account the variability mentioned in my previous comment. However, using TTOP results as target variable, which means the variable that the ML should reproduce based on the other ones I presume (I am not a ML specialist, sorry if I misunderstood this), I cannot see how the shortcomings of TTOP-estimation itself as put forward in my previous comment could be dealt with.
  
  R12: Thank you for your valuable comment. We agree that, as originally phrased, the description could be interpreted as training the machine-learning (ML) models to reproduce the TTOP-estimated MAGT, which would indeed limit the ability of the framework to address the intrinsic shortcomings of the TTOP model itself.
  In the revised implementation, TTOP-estimated MAGT is no longer treated as the sole target variable that the ML component aims to reproduce. Instead, TTOP is used as a physically motivated baseline, while independent permafrost survey data (permafrost presence/absence) are introduced as an external supervisory constraint. Specifically, the learning objective is reformulated such that the ML model learns a correction term to the TTOP-based ground thermal state, guided by observational evidence of permafrost occurrence.
  Practically, this is achieved by first constructing spatially continuous climate forcing fields from meteorological station observations using regression-based methods that incorporate latitude, longitude, and topographic controls. These fields are then used to drive the TTOP model to obtain a baseline estimate of MAGT. The ML component does not aim to reproduce this baseline, but rather to adjust it so that the resulting thermal state is consistent with observed permafrost presence or absence at survey locations. In this way, the supervisory signal for ML training is provided by independent field observations rather than by TTOP itself.
  This reformulation allows the framework to explicitly correct systematic biases in TTOP arising from simplified parameterizations or unresolved surface heterogeneity, while retaining the physical interpretability of a process-based baseline. Consequently, the proposed approach no longer propagates TTOP uncertainties uncritically, but instead constrains the modeled ground thermal state using independent observational evidence, thereby directly addressing the concern.
  In addition, the choice of permafrost survey data (presence/absence) as the primary observational constraint, rather than borehole ground temperature measurements, is motivated by the availability and spatial representativeness of observations in Northeast China. In this region, boreholes providing reliable long-term ground temperature records or mean annual ground temperature (MAGT) estimates are extremely sparse, with only several tens of sites available, and their spatial distribution is highly uneven. Most boreholes are concentrated along transportation corridors or at a limited number of experimental sites, which makes them insufficient to support regional-scale statistical learning and spatial generalization.
  By contrast, the permafrost survey dataset compiled in this study consists of nearly 1,000 observation points with substantially broader spatial coverage. These survey data originate from multiple independent sources, including geotechnical investigations conducted for road and infrastructure construction, permafrost records reported in the published literature, and permafrost-related observations associated with meteorological stations and their surroundings. Although such survey data are typically limited to binary information on permafrost presence or absence and do not provide continuous ground temperature measurements, they offer a much more comprehensive representation of regional permafrost occurrence patterns.
  Under these data constraints, incorporating permafrost survey information as a weak supervisory signal provides an effective means of constraining permafrost occurrence at the regional scale, while avoiding overreliance on a small number of spatially clustered boreholes. Borehole observations, where available, are more suitably used as local-scale strong constraints or independent validation data. This strategic balance observational reliability with spatial coverage and reflects the practical realities of data availability in Northeast China.
  
  Q13: L284-285, Figure 1 : “The indices represent the arithmetic averages computed from site-level downscaled data at 225 meteorological stations, using the delta downscale method applied across 14 CMIP6 models.” I think that what ‘averages’ means exactly here should be clarified. I guess that the model specific curves present averages across the 225 sites, while scenario averages are averages across the previous 14 climate models-specific averages (averages of averages). What are exactly the different averages presented in Figure 1 should be explained without ambiguity.
  
  R13: Thanks for your helpful advice. Your understanding is correct: the model-specific curves represent arithmetic averages across the 225 meteorological stations, while the scenario means are calculated as averages across the 14 CMIP6 model-specific means. This hierarchical averaging procedure will be clarified explicitly in the revised manuscript to avoid any ambiguity.
  
  Q14: L288 : “To further examine the spatial dynamics of freeze–thaw responses, we have conducted a case analysis under the low-emission scenario SSP126 (Fig. 2).” Why this one ? This should be explained. In fact I would be interested by the same analysis for the other scenarios, at least also for scenario SSP585.
  
  R14: Thanks for your helpful comment. We agree that the rationale for selecting SSP126 should be clarified. In the revised manuscript, corresponding results under the high-emission scenario SSP585 will be added to provide a comparative perspective and better illustrate scenario-dependent freeze–thaw dynamics.
  
  Q15: L300 : Using an absolute variation visualization rather than a negative one for AFI and a positive one for ATI may improve the readability of the Figure. I would also say that the figure is too information-rich, I would recommend to show only the 2020-2100 deltas, since the other ones are not commented in the body of the text. These are only presentation suggestions.
  
  R15: Thanks for your helpful suggestions. We agree that using absolute variations for AFI and positive variations for ATI can improve the readability of the figure. In the revised manuscript, we will simplify the figure by focusing on the 2020–2100 deltas and adjust the visualization accordingly.
  
  Q16: L323 : “To evaluate the capability of different ML algorithms in simulating TSP, we have compared model-predicted MAGTs with observed values using six ML methods”. What is exactly observed values here ? I guess each blue dot in Figure 4 corresponds to a given site, for a 1961-2020 multi-annual average, right ? I would also be interested by the performances analyzed by year, the same graphics but with each dot representing the average of the MAGT across the 225 sites for a given year between 1961-2020, in order to visualize temporal variability of performances as well.
  
  R16: Thanks for this comment. In the original formulation, the term “observed values” referred to site-level multi-annual mean MAGT estimates for the period 1961–2020, with each point in Figure 4 representing a single site. However, as clarified in the revised framework, the modeling strategy has been fundamentally adjusted.
  Specifically, the machine-learning component is no longer trained or evaluated against site-level MAGT values. Instead, it is weakly constrained by independent permafrost survey data (presence/absence), which are used to calibrate the physically consistent relationship between environmental conditions and permafrost occurrence. Accordingly, model evaluation is reformulated to focus on classification-based validation using survey observations (e.g., ROC/AUC and related metrics), rather than direct comparisons between predicted and observed MAGT.
  Under this revised validation scheme, performance is assessed in terms of the model’s ability to correctly discriminate permafrost presence and absence at survey locations, while the corrected MAGT is treated as an intermediate state variable ensuring physical consistency rather than a quantity subject to direct temporal validation. Therefore, year-by-year evaluations of MAGT across sites are no longer central to the validation strategy and are not included in the revised analysis.
  
  Q17: 335-336 : “The high-performing models are subsequently employed to simulate future MAGT patterns and permafrost extent under projected scenarios.” Please list these high-performing models.
  
  R17: Thanks for this comment. We agree that the specific high-performing models should be explicitly identified. In the revised manuscript, we will clearly list the selected models and clarify the criteria used to define their performance.
  
  Q18: L351: “Projected mean annual ground temperatures (MAGTs) across Northeast China under four SSPs scenarios based multilayer perceptron model.” What are exactly these projections ? Averages of the best performing PIML models selected in the previous section? May be the ‘multilayer perceptron model’ is the answer, but I don’t know what is it, and I think that many readers of TC won’t know either.
  
  R18: Thanks for this comment. We agree that the description of these projections and the role of the multilayer perceptron model were not sufficiently clear. In the revised manuscript, we will clarify how the projected MAGTs are derived, explicitly identify the model used, and provide a brief explanation of the multilayer perceptron approach to ensure accessibility for a broad readership.
  
  Q19: L360: “the MLP-TTOP model” First occurrence of this acronym/name. It should be introduced earlier in the method section.
  
  R19: Thanks for pointing this out. We will carefully review the revised manuscript to ensure that all acronyms and model names, including MLP-TTOP, are clearly introduced at their first occurrence and used consistently throughout the text.
  
  Q20: L365 : What is DXAM? L366 : What is XXAM?
  
  R20: Thanks for raising this point. DXAM and XXAM refer to the Da Xing’anling Mountains and the Xiao Xing’anling Mountains, respectively. In the revised manuscript, these abbreviations will be replaced by their full names at first mention to avoid any potential confusion.
  
  Q21: L375-376 : “(a) total Northeast China; (b) Xiao Xing'anling Mountains; (c) Da Xing'anling Mountains; (d) northern Songhua-Nen rivers Plain, and; (e) Hulun Buir Plateau” The Figure 7 would be much more interesting if it was providing alongside a map that show the localization and extent of the considered four sub-regions of Northeast China.
  
  R21: Thanks for this helpful suggestion. In the revised manuscript, we will add a map illustrating the locations and spatial extents of the four sub-regions of Northeast China to improve the clarity of Figure 7.
  
  Q22: Figure 7: In 7a, why are there increases in discontinuous permafrost area between 2040 and 2080? In 7d, why are there more permafrost in 2040 in SSP370 than in SSP126 and SSP245?
  
  R22: Thanks for this comment. These features may partly reflect uncertainties associated with land-use and land-cover changes, particularly in transitional permafrost zones. As discussed earlier, in the revised manuscript we implement a weakly constrained PIML framework calibrated with permafrost survey data, which is expected to better constrain such uncertainties and improve the robustness of the projected permafrost dynamics.
  
  Q23: L379 – 380 : “Fig. 8 summarizes the relative importance of 15 environmental variables across six ML models.” Why not considering here only the best performing PIML-models as selected in section 3.2?
  
  R23: Thanks for this comment. The original intention of Figure 8 was to compare the variability in driver importance across different ML models and to highlight the overall patterns of environmental controls. However, we agree that focusing on the best-performing PIML model would provide a clearer and more consistent interpretation. Accordingly, in the revised manuscript, this analysis will be revised to present variable importance results derived only from the selected optimal model.
  
  Q24: L389-391 : “In contrast, topographic and edaphic variables, such as slope angle, slope aspect, soil organic matter contents (SOC), bulk density (BD), and sand and clay contents, generally rank lower, though they may modulate soil thermal properties and hydrological processes as secondary controls.” I think it must be said here that these topographic and edaphic variables do have a strong controls on the two most influential variables, Mean annual land surface temperature (MALST) an Land Use Land Cover (LULC). So in fact most likely a (large ?) part of their intrinsic influence is encompassed in the influence of MALST and LULC.
  
  R24: Thanks for this insightful comment. We agree that topographic and edaphic variables can exert strong indirect controls on MALST and LULC, and that a substantial part of their influence may therefore be implicitly reflected through these dominant variables. This point will be clarified explicitly in the revised manuscript.
  
  Q25: L423 : “thereby underestimating the scale and impact of anthropogenic disturbances.” LUCC may also derive solely from climate change. Deciphering the scale and impact of anthropogenic disturbances would require here to separate the climate change-induced land use effects and the anthropogenic land use change effects.
  
  R25: Thanks for this important comment. We agree that land-use and land-cover change may also be driven by climate change, and that our original statement did not sufficiently distinguish between climate-induced and anthropogenic effects. This point will be revised and clarified in the revised manuscript.
  
  Q26: L443 : “a PIML model” In fact several PIML models have been used, and in its present form the study does not make a clear choice about which is the best one that could be preferred to the others. I think that this choice should be made, or this sentence should be reformulated.
  
  R26: Thanks for this comment. We agree that the wording is ambiguous given the use of multiple PIML models. In the revised manuscript, we will clarify the model selection and reformulate this sentence accordingly.
  
  Q27: L473-474 : “Rather, such signals reflect the thermal inertia of geocryological system prior to abrupt transitions.” The study convincingly grounds this statement for Northeast China permafrost, but its generalization to other permafrost contexts maybe not straightforward. I would attenuate the wording here for avoiding over interpretation of the results.
  
  R27: Thanks for this thoughtful comment. We agree that the original wording may overgeneralize the interpretation beyond Northeast China. Accordingly, the statement will be attenuated and revised in the manuscript to better reflect the regional scope of the results.
  
  Q28: L477 : “4.3 Spatial heterogeneity and resilience of XAP” Please avoid acronyms in titles as much as possible.
  
  R28: Thanks for this suggestion. We will carefully review the revised manuscript to ensure that acronyms are avoided in figure titles, table titles, and subsection headings whenever possible, and that full names are used instead to improve clarity.
  
  Q29: L501-503 : “Hydrogeological conditions further affect thermal stability: long-term drying and surface drainage can increase ground albedo and reduce soil thermal conductivity, thereby cooling the ground.” I would highlight that the effect of forest on soil hydrology is rather complex, since for instance the roots network enhances infiltration while at the same time dries the soil trough evapotranspiration. Moreover, in permafrost contexts, these competing effects interact in a complex manner with active layer dynamics (e.g. : Orgogozo et al., 2019, sorry for the self citation).
  https://doi.org/10.1002/ppp.1995
  
  R29: Thanks for this helpful suggestion. We appreciate your highlighting of the complexity of forest effects on soil hydrology in permafrost environments. We will carefully review the reference and revise this statement accordingly.
  
  Q30: L566 : “Compared to traditional physically based models, the PIML framework yields superior performance” This study grounds this statement solely for TTOP model, and this should be made clear in this paragraph.
  
  R30: Thanks for this comment. We agree that the comparison is specific to the TTOP model. This will be clarified explicitly in the revised manuscript to avoid overgeneralization.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4544-AC4
RC3:
'Comment on egusphere-2025-4544', Anonymous Referee #3, 05 Jan 2026

General Comments

I carefully reviewed this manuscript. Unfortunately, it suffers from critical conceptual and physical flaws. The core of the PIML framework is built upon the TTOP model, which is being used outside of its physical validity range and fails to account for the transient hydrological and thermal dynamics necessary for century-scale projections.

1 The study uses TTOP-estimated values as the ground truth for the ML models. Therefore, the ML is not learning the physics of permafrost; it is simply learning to approximate the TTOP equation. Any inherent biases in the TTOP model are inherited and potentially amplified. The accuracy reported in the manuscript is merely a measure of how well the ML fits the TTOP formula, not how well it predicts real-world permafrost.

2 TTOP is an equilibrium model that does not calculate soil moisture dynamics. By the end of the 21st century, precipitation patterns are projected to vary significantly. These changes will fundamentally alter the latent heat exchange and the energy balance, which TTOP ignores.

3 The manuscript treats soil thermal conductivity as a semi-static parameter. However, thermal conductivity is highly dependent on water/ice content. As soil moisture varies over a century, the ratio of frozen to thawed conductivity will shift nonlinearly. Treating this as a constant or a narrow random sample invalidates the long-term reliability of the projection.

4 In geocryology, MAGT refers to the temperature at the depth of zero annual amplitude. The TTOP calculates the temperature at the permafrost table. These are not interchangeable.

5 The manuscript fails to identify its primary engine. Line 333 mentions CatBoost, but Figure 5 captions suggest a Multilayer Perceptron (MLP).

6 While dynamic LUCC is included, there is no quantitative attribution. It is impossible to determine if the projected 90% loss of permafrost is driven by climate warming, land-use change, or simply the mathematical sensitivity of the PIML framework.

Some specific comments:

7 Clearly explain the specific source of "observed MAGT" in Figure 4.

8 The discussion must compare these equilibrium results with existing transient studies. Permafrost degradation is a slow process; a 90% loss by 2100 seems physically improbable when considering the thermal inertia of deep ground ice.

9 Figure 6: The color mapping is poor; the spatial continuity differences of permafrost are indistinguishable.

10 Figure 7: Provide a spatial reference map for the four sub-regions; currently, the regional analysis lacks geographic context.

11 Provide a physical justification for how r_k is expected to evolve as soil moisture regimes change under SSP5-8.5.

Citation: https://doi.org/10.5194/egusphere-2025-4544-RC3
- EC1: 'Reply on RC3', Heather Reese, 08 Jan 2026
  
  As The Editor, I strongly encourage the authors to consider of course all reviewer comments, but certainly this comment. Several recent publications in 2025 address the limitations of TTOP, and the major revision necessary for this manuscript would need to take the intended function and the results of using TTOP models seriously into account.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4544-EC1
- AC5: 'Reply on RC3', Shuai Huang, 09 Feb 2026
  
  Response Letter to Reviewers’ Comments
  General Comments
  
  I carefully reviewed this manuscript. Unfortunately, it suffers from critical conceptual and physical flaws. The core of the PIML framework is built upon the TTOP model, which is being used outside of its physical validity range and fails to account for the transient hydrological and thermal dynamics necessary for century-scale projections.
  
  Q1: The study uses TTOP-estimated values as the ground truth for the ML models. Therefore, the ML is not learning the physics of permafrost; it is simply learning to approximate the TTOP equation. Any inherent biases in the TTOP model are inherited and potentially amplified. The accuracy reported in the manuscript is merely a measure of how well the ML fits the TTOP formula, not how well it predicts real-world permafrost.
  
  R1: Thanks for your critical and constructive comment. We acknowledge that, in the original formulation, using TTOP-estimated MAGT as the sole target variable could indeed be interpreted as training the machine-learning (ML) component to approximate the TTOP formulation rather than to learn physically meaningful permafrost behavior. We agree that such a setup would limit the ability of the framework to address intrinsic biases and physical simplifications in the TTOP model.
  In the revised framework, this limitation is explicitly addressed by fundamentally reformulating the role of both TTOP and ML. TTOP is no longer treated as ground truth or as the primary supervisory signal. Instead, it is used strictly as a physically motivated baseline that provides a first-order estimate of the ground thermal state under given climatic forcing. The ML component is no longer trained to reproduce TTOP outputs.
  Crucially, independent permafrost survey data (the presence or absence of permafrost), compiled from engineering investigations, published records, and station-based observations, are introduced as an external observational constraint. These data provide the primary supervision during model calibration, forcing the learning process to reconcile the TTOP-based thermal baseline with real-world permafrost occurrence. In this way, the ML model learns a correction term that compensates for systematic biases in TTOP arising from simplified parameterizations and unresolved surface and hydrological heterogeneity.
  Under this revised formulation, the learning objective is no longer to fit the TTOP equation, but to enforce physical consistency between modeled thermal states of permafrost and independent observations of permafrost existence. As a result, the predictive skill of the framework is evaluated against survey-based permafrost presence/absence data using classification-based metrics, rather than against TTOP-derived quantities. This change ensures that model performance reflects agreement with observational evidence rather than fidelity to the TTOP formulation itself.
  While we acknowledge that the framework does not resolve fully transient hydrothermal processes at depth, the revised approach represents a pragmatic strategy for regional- to continental-scale permafrost projections under data-scarce conditions. By combining a physically interpretable baseline with observation-constrained learning, the updated PIML framework avoids circular learning and provides a more realistic representation of permafrost occurrence than either purely process-based or purely data-driven approaches alone.
  
  Q2: TTOP is an equilibrium model that does not calculate soil moisture dynamics. By the end of the 21st century, precipitation patterns are projected to vary significantly. These changes will fundamentally alter the latent heat exchange and the energy balance, which TTOP ignores.
  
  R2: Thanks for your helpful comment highlighting this important physical limitation of the TTOP model. We fully agree that TTOP is an equilibrium model and does not explicitly represent transient soil moisture dynamics or associated latent heat exchanges. Under future climate change, projected alterations in precipitation regimes may indeed affect soil thermal properties and energy balance in ways that cannot be directly resolved by TTOP alone.
  This limitation is explicitly acknowledged in the revised manuscript, and it motivates the revised design of our PIML framework. Rather than relying on TTOP as a complete physical representation of future hydrothermal processes, TTOP is used only as a first-order thermal baseline. The ML component is then constrained by independent permafrost survey observations and environmental predictors (e.g., land cover, soil parameters, and topographic controls), which implicitly capture the integrated effects of soil moisture and surface–subsurface interactions on permafrost occurrence.
  While we do not claim that the revised framework explicitly simulates transient hydrological processes, the weakly constrained PIML approach provides a pragmatic means to account for their net influence at the regional scale, where detailed hydrothermal observations and fully coupled process-based models remain unavailable. We emphasize that future extensions incorporating dynamic hydrological modeling would further improve physical realism, and this limitation is now clearly discussed in the revised manuscript.
  
  Q3: The manuscript treats soil thermal conductivity as a semi-static parameter. However, thermal conductivity is highly dependent on water/ice content. As soil moisture varies over a century, the ratio of frozen to thawed conductivity will shift nonlinearly. Treating this as a constant or a narrow random sample invalidates the long-term reliability of the projection.
  
  R3: Thanks for raising this key issue. In our framework, soil thermal conductivity is represented through the parameter rₖ in the TTOP formulation, which reflects the effective ratio between frozen and thawed soil thermal conductivity. We totally agree that rₖ is strongly influenced by soil water and ice content, and that long-term changes in soil moisture may induce nonlinear variations that are not explicitly resolved by an equilibrium model such as TTOP.
  In the original implementation, rₖ was treated as a semi-static parameter to maintain consistency with the standard TTOP formulation and to avoid introducing poorly constrained degrees of freedom at the regional scale. We acknowledge that this simplification limits the ability of PIML to explicitly represent transient soil moisture dynamics and associated nonlinear changes in thermal conductivity.
  In the revised framework, this limitation is partially mitigated by the weakly constrained PIML approach. Rather than assuming rₖ alone can represent the full variability of soil thermal properties, the ML component learns correction terms conditioned on environmental predictors (e.g., land cover, soil parameters, and topographic controls) and constrained by independent permafrost survey observations. These predictors implicitly capture the integrated effects of soil moisture regimes and surface conditions on ground thermal behavior, thereby reducing the sensitivity of long-term projections to a fixed or narrowly sampled rₖ.
  We emphasize that the revised framework does not explicitly simulate the nonlinear evolution of soil thermal conductivity under changing hydrological conditions. This limitation is now clearly acknowledged in the revised manuscript, and incorporating dynamically varying soil thermal properties remains an important direction for future work.
  
  Q4: In geocryology, MAGT refers to the temperature at the depth of zero annual amplitude. The TTOP calculates the temperature at the permafrost table. These are not interchangeable.
  
  R4: Thanks for this important clarification. We agree that, in geocryology, MAGT conventionally refers to the temperature at the depth of zero annual amplitude (D_ZAA), whereas T_TOP estimates the temperature at the table of attached permafrost, and these quantities are not strictly interchangeable. In this study, the term MAGT_TP was used to denote the temperature at the permafrost table as derived from the T_TOP framework. Of course, TTOP also differ from MAGT_PT when permafrost is detached due to the formation of supra-permafrost subaerial talik (SST) as a result of persistent, climate-induced permafrost degradation. Conventionally, in geocryology, MAGT is reserved for mean the annual ground/soil temperature at the D_ZAA although MAGT can be physically meaningful at any depth. We will clarify this distinction explicitly in the revised manuscript and revise the notation accordingly to avoid ambiguity.
  
  Q5: The manuscript fails to identify its primary engine. Line 333 mentions CatBoost, but Figure 5 captions suggest a Multilayer Perceptron (MLP).
  
  R5: Thanks for this comment. We agree that the primary modeling engine was not clearly identified. In the revised manuscript, we will explicitly identify and present the selected optimal model and; we will ensure that its description is consistent throughout the text and figures.
  
  Q6: While dynamic LUCC is included, there is no quantitative attribution. It is impossible to determine if the projected 90% loss of permafrost is driven by climate warming, land-use change, or simply the mathematical sensitivity of the PIML framework.
  
  R6: Thanks for this important comment. We agree that quantitative attribution is necessary to disentangle the effects of climate warming and land-use change, or the interaction of the two. In the revised manuscript, we will introduce additional comparative experiments, with and without dynamic LUCC forcing, to explicitly quantify the contribution of LUCC to projected permafrost changes and to assess its relative influence within the PIML framework.
  
  Some specific comments:
  
  Q7: Clearly explain the specific source of "observed MAGT" in Figure 4.
  
  R7: Thanks for this keen and kind reminder. We agree that the source of the “observed MAGT” in Figure 4 was not clearly explained. In the revised manuscript, we will explicitly clarify the data source and revise the figure and related text to ensure consistency with the updated validation strategy.
  
  Q8: The discussion must compare these equilibrium results with existing transient studies. Permafrost degradation is a slow process; a 90% loss by 2100 seems physically improbable when considering the thermal inertia of deep ground ice.
  
  R8: Thanks for this important and insightful comment. We agree that the TTOP-based framework primarily reflects the response of near-surface permafrost, and that the thermal inertia of deep ground ice implies a much slower degradation of permafrost layers at depth. Consequently, the actual total loss of permafrost by 2100 is likely substantially smaller than the near-surface estimate suggested by equilibrium results. In the revised manuscript, we will systematically compare our findings with existing transient permafrost studies and revise the discussion to better contextualize these results and their physical implications.
  
  Q9: Figure 6: The color mapping is poor; the spatial continuity differences of permafrost are indistinguishable.
  
  R9: Thanks for this kind remark. We agree that the current color mapping/contrast does not clearly convey spatial continuity differences. In the revised manuscript, we will redesign the color scheme of Figure 6 to improve visual contrast and interpretability.
  
  Q10: Figure 7: Provide a spatial reference map for the four sub-regions; currently, the regional analysis lacks geographic context.
  
  R10: Thanks for this suggestion. In the revised manuscript, we will add a spatial reference map showing the locations and extents of the four sub-regions to provide clearer geographic context for Figure 7.
  
  Q11: Provide a physical justification for how r_k is expected to evolve as soil moisture regimes change under SSP5-8.5.
  
  R11: Thanks for this important question. We agree that the parameter r_k, which represents the effective ratio between frozen and thawed soil thermal conductivity, is physically expected to evolve with changing soil-moisture regimes under SSP5-8.5. However, given the lack of spatially explicit and long-term projections of soil moisture and ice content at the regional scale, r_k is treated as an effective parameter rather than a prognostic variable in the current framework. In the revised manuscript, we will carefully clarify this physical interpretation, explicitly discuss the expected direction of change in r_k under wetter or drier conditions, and clearly acknowledge this limitation as an important source of uncertainty for future work.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4544-AC5

Shuai Huang, Xiangbing Kong, Xue Yang, Xiaoying Jin, Shanzhen Li, Lin Yang, Yaodan Zhang, Kai Gao, Hongwei Wang, Xiaoying Li, Ruixia He, Lanzhi Lü, Guodong Cheng, and Huijun Jin

Supplement

https://doi.org/10.5194/egusphere-2025-4544-supplement

Shuai Huang, Xiangbing Kong, Xue Yang, Xiaoying Jin, Shanzhen Li, Lin Yang, Yaodan Zhang, Kai Gao, Hongwei Wang, Xiaoying Li, Ruixia He, Lanzhi Lü, Guodong Cheng, and Huijun Jin

Viewed

Total article views: 962 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
644	275	43	962	65	19	26

HTML: 644
PDF: 275
XML: 43
Total: 962
Supplement: 65
BibTeX: 19
EndNote: 26

Views and downloads (calculated since 17 Nov 2025)

Month	HTML	PDF	XML	Total
Nov 2025	177	21	15	213
Dec 2025	178	113	20	311
Jan 2026	151	87	5	243
Feb 2026	138	54	3	195

Cumulative views and downloads (calculated since 17 Nov 2025)

Month	HTML	PDF	XML	Total
Nov 2025	177	21	15	213
Dec 2025	178	113	20	311
Jan 2026	151	87	5	243
Feb 2026	138	54	3	195

Viewed (geographical distribution)

Total article views: 973 (including HTML, PDF, and XML) Thereof 973 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 01 Mar 2026

Short summary

Permafrost in Northeast China is rapidly degrading due to climate warming and land use changes, threatening ecosystems and infrastructure. We developed a physics-informed machine learning framework that integrates climate and land cover data with physical models to predict permafrost evolution. Results show that up to 97 % of near-surface permafrost may disappear by 2100 under high emissions, while forests and mountains provide partial resilience.


Total:	0
HTML:	0
PDF:	0
XML:	0