the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Assessing Climate Modeling Uncertainties in the Siberian Frozen Soil Regions by Contrasting CMIP6 and LS3MIP
Abstract. Climate models and their land components still show pervasive discrepancies in frozen soil simulations. Contrasting the historical runs of seven land-only models of the Land Surface, Snow, and Soil Moisture Model Intercomparison Project (LS3MIP) with their Coupled Model Intercomparison Project Phase 6 (CMIP6) counterparts allows quantifying the contributions of the land surface parameterization scheme and the atmospheric forcing to the discrepancies. The simulation capabilities are assessed using observational data from 152 sites in Siberia and reanalysis data. On average, the 0.2-m soil temperatures in the CMIP6 simulations are 5.4 °C colder than the observations if the simulated soil temperature drops below -5 °C. The LS3MIP simulations are even colder with a bias of -6.7 °C. In the winter months (December, January, and February), the LS3MIP ensemble diversity in 2-m temperature is less than the CMIP6 diversity (0.8 °C vs 3.2 °C). In contrast, the diversity of winter 0.2-m soil temperatures is larger in the LS3MIP ensemble (5.7 °C) than in the CMIP6 ensemble (3.6 °C). For permafrost sites, the spatial correlation of the simulations of winter soil temperature against observation is not better than 0.7, and spring/autumn spatial correlations of snow depth are less than 0.75 for the CMIP6 models. The biases of 2-m temperature have a different sign and are amplified in magnitude compared to the biases of the soil temperatures, especially below 0 °C. Four of the climate models and their land components underestimate the snow insulation effect. We conclude that land surface models struggle to well simulate soil temperatures and snow depth under low-temperature conditions. The CMIP6 models tend to compensate for errors in their land component by errors through errors in the atmospheric model component. In shallow snow depth (0 to 0.2 °C) cases, all models show between 1 to 8 °C less air-soil temperature difference than in situ data. Therefore, a better representation of surface-soil insulation is essential for improvements in frozen soil land modeling.
- Preprint
(22686 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 24 Mar 2025)
-
RC1: 'Comment on egusphere-2025-389', Anonymous Referee #1, 13 Feb 2025
reply
This manuscript presents an analysis of four key variables from the LS3MIP simulations in comparison to their counterpart CMIP6 fully coupled simulations. Detailed analysis of this very important MIP has been lacking so far, and it is very encouraging to see the simulations used. Restriction of the analysis to the eastern Arctic permafrost region is well argued with the availability of comprehensive observations. The manuscript presents sound methodological analysis, and can become an important contribution to understanding the intricacies of land model performance in coupled model setups.
The methodology used in the manuscript is appropriate, well applied and mostly well presented. However, the rest of the paper lacks structure, seems written rather carelessly (citations that don’t contain the information they claim to give, sentences that do not work) and does not present conclusions promised in the introduction in a comprehensive way. While I think in general this manuscript can become a valuable contribution to the interpretation of the LS3MIP simulations, it needs substantial improvement in many parts.
I have two general comments concerning the use of reanalysis and concerning the lack of a proper discussion that can be found below, followed by a large number of specific comments to the text.
General comments:
ERA5-Land: The presentation of the ERA5 data in the manuscript seems pointless. The purpose of the manuscript is not to evaluate ERA5 Land against station data. ERA5 Land is also not used for additional validation of the model simulations on a larger spatial scale than the station observations allow. I suggest to remove the ERA5 Land contributions in the manuscript, or actually make use of them within their limitations (which would then warrant their evaluation against the station data).
Discussion: The manuscript lacks a proper discussion of its results in a comprehensive way, which also leads to conclusions that seem to have no basis. In the introduction, you state that: (1) "We will analyze the discrepancies between the same model in CMIP6 and LS3MIP to quantify the bias and uncertainty present in frozen soil regions, attributing them to land surface models versus those resulting from atmospheric forcings. With identical and more realistic atmospheric conditions, we anticipate that the LS3MIP models will more accurately simulate soil conditions. If these models fail to produce soil variable outputs that align better with observed data than the CMIP6 simulations, it is regarded as an error in the land surface models." (2) "We will discuss the variations among different models of LS3MIP and try to establish a connection between model performance and their specific features." However, the manuscript ends after the presentation of the results, without coming back to the analysis you promise in a comprehensive way. Since the you do not have a discussion section in the manuscript, the conclusions need to contain this discussion (or you need to make a discussion section). Please come back to both points from the introduction, and establish conclusions for both points rooted in your results in an understandable way. Right now, some conclusions and discussion are scattered throughout the results, but it is hard to puzzle them together to a coherent picture.
Specific comments:
1 Introduction
The introduction loosely strings together statements on Arctic climate change and its impacts on the Siberian permafrost region, then jumps to factors that determine permafrost thermal state, and finally barely introduces CMIP6 and LS3MIP. Without knowing all of these things already, and how they are related to the uncertainties eg in the permafrost carbon feedback from climate model projections of the future, it does not tell the reader much, and does not coherently argue the importance of this study. Please outline a clear relation between the facts mentioned in the introduction and the statements about what the paper means to do from the last two paragraphs. I suggest rewriting the introduction completely. In addition, I have a number of specific comments on the introduction below.
Line 20: This is rather vague, please give specific numbers to the magnitude of Arctic Amplification, and cite their sources.
Line 20: While I don't doubt the numbers for Arctic climate change cited from the two papers, and I acknowledge that they are permafrost related publications, these aren't the papers that produced the numbers, and they are seriously outdated. Please cite more recent publications on climate change projections for the Arctic, and cite the direct sources.
Line 22: While it is certainly true that the most distinct impacts occur in the permafrost areas where temperatures are already close to zero, the statement seemingly has no relation to your manuscript, since you focus your detailed analysis on cold regions, not the warmer edge of the permafrost zone, so I don't see the relevance of that statement.
Line 27: Again, this is very vague and lacks an appropriate reference. Please clarify.
Line 28: The point about abrupt thaw is that from models. we can usually only estimate the carbon emission effects of gradual thaw, but the effects of abrupt thaw are expected to be substantially larger than those of gradual thaw. However, instead of saying that, you simply line up facts with no connection or argument. Please rephrase, and cite appropriate sources.
Line 33: There needs to be at least one general, bridging sentence on how heat transfer through the soil is simulated, and that the following paragraph speaks about modelling.
Line 35: “There are differences in the time scales of major physical processes between the soil and the atmosphere.” Vague, please clarify what you mean.
Lines 33-42: These two paragraphs are a weird mix of processes and conditions controlling heat transfer through the soil, and how these are represented in models. Please separate clearly.
Line 43: Please state what CMIP6 means.
Line 46: There are a number of papers that describe these advances that should be cited here.
Line 47: Please state what LS3MIP means. Also, introduce what LS3MIP aims to do before you dive into the protocol.
2.1 CMIP6 and LS3MIP Simulations
Line 75: Land models treat input data differently, and may require different forcing data sets per se. A table would be nice, in particular since you look at tas, which can be close to/identical to the forcing, or quite different, depending on model setup.
Line 89: Ménard et al only show snow properties in their paper. The way you cite the paper implies all information in you table can be found there, which is not the case. Please clarify.
Line 89-90: This is very vague again, and the table misses some of the processes mentioned here. Eg how is vegetation represented, are the Arctic specific vegetation types, are there shrubs? Please clarify and expand your table.
Line 96: I find this sentence misleading, it implies that models that consider the impact of surface organic matter with a focus on hydro-thermodynamics don't include a carbon cycle, which is for example wrong for CLM5. Please rephrase.
Table 2: Power Function and Quadratic Equation: What does that mean? Either explain somewhere, or use a more descriptive term. What does snow conductivity depend on in these equations?
3 Results and Discussions
3.1 Winter 2-m Temperature in Target Area
Line 152: The definition used in Lawrence and Slater is the generally accepted definition of permafrost. Quite a number of the stations denoted as circles are actually situated on permafrost. Please explain potential reasons why they are not categorized as permafrost using this definition on the station data.
Figure 1: The two triangle stations are very hard to see. In general, the figure would convey more information if the stations were colored by bias in comparison to the modelled data instead of their own mean states. Also, It would be useful to show the permafrost boundaries either from Brown et al or Obu et al in the map.
Figure 2: Bars need to be broader, median positions are not visible. Also, the labels have no positions, which makes them meaningless. For precipitation and tas, we could learn a lot from seeing where GSWP3 is, since it is the forcing data.
3.2 Model climatologies
Line 160: Looking at figure 2, I don't see that.
Line 162: This statement is only true for pr. The LSMs compute their own tas. How close that actually is to the forcing depends a lot on what forcing is used (eg temperature at a reference height, or 2m air temperature itself), and on how complex the calculation within the LSM is.
Line 169: What about snow, soil moisture, vegetation? There is a distinct difference between soil temperatures in general and TTOP, which refers to (1) mean annual temperatures and (2) the top of the permafrost table.
Line 171: What does model family mean? Is it based on similarity of the atmospheres, or based on the atmosphere and land components? In your example, both land and atmosphere components of the models you put into one family actually share code history, but since you do not even state if you refer to the LS3MIP or the CMIP6 simulations, the statement is unclear.
Line 179: What is the reason for this difference in snow? Precipitation is similar, at least for winter, and air temperatures differ, but are so far below zero that the difference seems irrelevant. What drives this? Precipitation and temperature in autumn? And why does it only occur for this one model?
Line 180: ±10 cm translates into a relative error of around 33%, which is massive! Please put into context.
3.2.1 Relative Spread and Relative Bias
Figure 3: Caption states you show all seasons, yet there is only winter and summer in the figure. Also, why is snow in winter similar between L and C even though precipitation differs considerably? Because the medians are the same, and that is what drives snow variability? Please expand.
Figure 4: There is no shading. Correct the caption. Also, as Figure 3, this is not showing all seasons.
Line 186: In general, it is really hard to understand the summer parts of Figures 3 and 4 without an equivalent to figure 2. Maybe provide a summer version of figure 2 in the supplement. Specifically, I think this is meant to read "contrary to JJA where" or something similar. The sentence does not make sense as it is.
Line 193: “The pr in Group C exhibits more extensive group diversity than in Group L.” Which is because in group L, the only difference between the different models is different interpolation of the forcing data set, which makes this statement meaningless.
Line 198: “the model’s bias is considered relatively small” I would suggest to rephrase that into something like "the model's performance is considered adequate", because if the IQR is big enough, very big relative biases could still lead to RBs around 1. In terms of model performance, because you only look at 30 years of data, I agree that this means model performance is adequate, however, the bias would not be small.
Line 200: “Almost all CMIP6 and LS3MIP models have a positive pr-bias but a smaller relative and non-systematic snd-bias in winter.” Since snow is not a pure winter phenomenon and snow build up starts in autumn, so I am not sure how much meaning this comparison has. This analysis needs to be extended to snow build up in autumn.
3.2.2 Spatial Heterogeneity
Figure 5: I find this figure extremely irritating. Figure out the orientation, and resort so that maybe there are eight rows and two columns, so that the figure can be read. Also, for tas, the spread in the CMIP ensemble is bigger than the spread in the LS3MIP ensemble. For tsl, it is the other way around. Why? Please expand in the manuscript text.
Line 218: Why would there be a compensating effect like that? The ensemble spread is not particularly strong in your figure. Please explain.
3.3 Permafrost Region
Figure 6: It is impossible to read the labels. If all variables are to be presented in one Taylor diagram, they need to be distinguishable.
3.4 Climate Dependency of Modeled Temperatures
Line 245: In the figure caption, it says 50th quantile, eg median, instead of the mean, which actually makes more sense. Please check.
Line 268: I think this needs to read “Four models …”
Line 270: The reference is misleading, Dutch et al 2022 only discuss simulations with CLM. Please correct.
Line 272: “There is an excessively low tsl shown in Fig.8, possibly due to insufficient geothermal (functions as upward energy flux from the bottom of soil columns). As the decrease in tas has a limited influence on the tsl through high snd, the main source of error is likely from the other side of energy transportation (thermal conditions in the bottom of the soil column).” If that was true, models that consider a non-zero flux condition at the lower boundary would have to perform better than those with zero flux conditions, which is not the case. The depth of the column plays an important role here, as eg discussed in Alexeev et al, 2007 https://doi.org/10.1029/2007GL029536 and more recently Hermoso de Mendoza et al (2020), https://doi.org/10.5194/gmd-13-1663-2020.
Line 275/276: What about the strong underestimation of variability in summer? What is the reason for that?
Line 285: “In contrast, …” I don’t understand that sentence. Please reformulate.
Line 292: That is a really important statement, it should be explicitly taken up in the conclusion, and the implications should be discussed!
3.5 Snow Insulation
Figure 10: It would be really useful to have horizontal grid lines (maybe in light grey) in the figures so the reader can better understand how close to the observed values models are.
Figure 10: CESM: This actually looks a lot better than what is Burke et al, 2020, for just winter. I wonder why.
Line 297: From your figure caption, I assume that you use monthly mean values from all months, not just the winter months, for your plot. However, I assume the classification is still based on the DJF 30 year average of the station?
Line 303: “under sufficiently thick snow, the tsl gradually convergences near 0 ◦C and is primarily impacted by tas in a limited manner.” I don’t understand that statement. Please reformulate.
Line 325: “UKESM1.0-LL consistently demonstrated similar snow insulation effects in both ensembles” From just looking at the figure, so does HadGEM, which is not surprising since the land models are similar. MIROC and IPSL also have very similar curves regardless of the forcing. Please quantify your distinction in model performance.
Line 329: “Despite cold conditions, an increase in snd still affects the snow insulation effect of LS3MIP CESM2.” I don’t understand the statement. Please reformulate.
Line 336: I cannot follow this statement. Both in Wang et al 2016 and Burke et al 2020, previous versions of CLM5 (CLM45 stand alone in Wang et al, CLM4 in the CMIP5 analysis of CESM1 in the supplement of Burke et al) clearly outperform CLM5 with regard to the snow insulation curve. Please explain further what your statement is based on.
3.6 Impact of Land Model Features on Performance
Line 343: “show good performance in reproducing accurate snd” Actually, in Figure 2, the observed median value for snow depth is within the interquartile range of 1!! model in the LS3MIP forced simulations that supposedly do not suffer from biased precipitation. I would not call that good performance. Please add context.
Line 345: “Although IPSL-CM6A-LR employs a simpler spectral averaged albedo scheme than other land surface models, it does not have an observable impact on its tsl simulation.” What data in your analysis is this statement based on?
Line 348: While vegetation is certainly important for accurately calculating albedo, in terms of the surface energy balance in general, the timing of snow cover is important. Please discuss the impact of a wrong timing of the onset of snow cover and melt.
Line 249: “Considering snow conductivity, the Power Function could be why CNRM-CM6.1 and CNRM-ESM2.1 have a negative bias of larger than -6 ◦C in the SON (figure not shown)” In table2 , both the models with best snow insulation performance (the versions of JULES) and the model with the worst performance (Surfex) employ a power function, so this seems unlikely as the reason for the difference in performance. Please explain your conclusion in more detail.
Line 352: Especially in autumn, this could also be an effect of incorrect timing in snow. If snow cover is late in the models, the soil will release heat to the atmosphere for a prolonged time, which could also explain an underestimation of soil temperatures. Since you have not looked at the timing of snow cover, and snow rmse is large for all models in autumn at least in comparison to the stations considered in Figure 6, I think you need to extend your statement.
Line 363: Since you cannot compare the performance of these models to versions that do not contain organic matter, I don’t see how you can draw that conclusion. Please explain further.
4 Conclusions
Please see my general comment on what the conclusion should contain. Specific comments below.
Line 373: “Except in summer months, inaccurate inter-annual variability in the simulation of soil temperature by CMIP6 models is mainly caused by deficiencies in the land surface models and less inherited from atmospheric components.” What is the reasoning behind this conclusion?
Line 378: “The largest model biases of tas and tsl are witnessed under -5 ◦C.” What does this refer to? Winter, summer, LS3MIP or CMIP6 models? And to what do the -5 ◦C refer? Climatological mean of winter temperature? MAGT? Please provide more context to explain the statement.
Line 379: “These indicate a weakness for models reproducing the tsl relationship with tas in freezing conditions” Which could point to deficiencies in soil moisture, which you have not discussed at all, even though it has a profound impact on latent heat during freeze and thaw. Please extend the discussion accodringly.
Line 381: “Land models tend to simulate lower tsl when overlying snow exists.” Do you mean lower than observed? Because as a general statement, that is wrong. Please explain more clearly.
Line 383: “Note that the scope of this study is limited to soil depths down to 0.2 m” You never state anywhere that all tsl metrics you show only refer to tsl at 20cm. Since the RosHydroMet data provides temperatures at 20, 40, 80, 160 and 320 cm depth, I assumed all metrics referred to comparisons of all depths, and that only the snow insulation analysis is restricted to tsl in 20cm depth as proposed in Wang et al., 2016. This would have to to be clearly stated in the data description, but actually, I don't see any good reason for excluding the other depths from the general analysis, especially because you argue the relevance of the soil temperature analysis with the climate change impacts on permafrost, and 20cm depth is above the active layer thickness in large parts of the northern hemisphere permafrost area. Please extend the tsl analysis using all depths from the station data.
Citation: https://doi.org/10.5194/egusphere-2025-389-RC1 -
RC2: 'Comment on egusphere-2025-389', Anonymous Referee #2, 04 Mar 2025
reply
Review of "Assessing Climate Modeling Uncertainties in the Siberian Frozen Soil Regions by Contrasting CMIP6 and LS3MIP"
Firstly, I would like to commend the first reviewer for their thorough and detailed evaluation of this manuscript. The major strengths and particularly the weaknesses of the study have been clearly identified, leaving little room for additional comments. I fully agree with the assessment that this work would greatly benefit from major revisions to enhance its methodological rigour, clarity, and overall structure. Below, I provide a few additional comments that address secondary but potentially relevant concerns.
While this study provides valuable insights into the performance of CMIP6 and LS3MIP models in frozen soil regions, it would benefit from revisions to improve its methodological rigour, clarity, and overall structure. Removing redundant models, excluding ERA5-Land, providing a clearer discussion of results, and improving the presentation of figures would significantly enhance the manuscript's quality. I encourage the authors to consider these points in their revisions.
1. Inclusion of Two Nearly Identical Models (CNRM-CM6-1 and CNRM-ESM2-1)
One methodological issue that warrants attention is the inclusion of both CNRM-CM6-1 and CNRM-ESM2-1 in the analysis. These two models share an extremely similar structure and code base, making them effectively redundant. Their simultaneous inclusion biases the statistical assessment of model diversity and artificially reinforces certain trends. The authors should consider removing one of these models to ensure a more balanced and independent analysis.
2. Questionable Use of ERA5-Land Data
ERA5-Land is a reanalysis product, not an independent observational dataset. While reanalysis can sometimes provide useful large-scale validation, its role in this study appears unjustified. The manuscript already includes direct observations, which are far more suitable for model evaluation. Furthermore, comparing models against another model-based dataset (ERA5-Land) does not provide meaningful validation or evaluation. Removing ERA5-Land from the analysis would streamline the results and improve the manuscript's focus on actual observations.
3. Overly Complex and Unreadable Figure 6
Figure 6 is too dense and difficult to interpret, as it combines multiple variables (tas, tsl, pr, snd) in a single diagram. This makes it hard for the reader to extract meaningful insights. A better approach would be to separate this into multiple figures, each focusing on a single variable. For example, sub-figures or distinct panels could be used for each variable, with clear titles and well-defined legends. Additionally, clearer labelling and improved visual representation would greatly enhance readability.
A more effective approach would be perhaps to create a separate figure for each season, with four distinct panels for tas, tsl, pr, and snd. This structure would allow for a clear comparison of model performance across different seasons and variables, enhancing readability and facilitating the identification of trends and anomalies. Using box plots or violin plots in each panel would effectively display the distribution of data, making the figures less cluttered and more insightful.
4. Lack of Discussion Section and Unstructured Conclusions
As previously noted by the first reviewer, the manuscript lacks a dedicated discussion section, and its conclusions do not sufficiently synthesise the results in relation to the stated objectives. In particular, the authors should:
- Revisit the key research questions outlined in the introduction and explicitly address them in the conclusions.
- Provide a clear synthesis of the main findings, rather than scattering them throughout the results.
- Offer a more structured discussion, especially accounting for the following comments 5.
5. Lack of In-Depth Understanding of Model Processes and Literature Review
One of the most concerning aspects of this manuscript is the apparent lack of in-depth understanding of the processes simulated by the six analysed models. Throughout the text, the authors make causal claims about model behaviour that are either too vague or lacking sufficient references, sometimes even incorrect, suggesting that they have not thoroughly studied the literature on these models. A deeper engagement with existing research would improve the accuracy of the study and prevent misleading conclusions.
Before attempting to diagnose model biases and uncertainties, the authors should conduct a more comprehensive literature review on each of the models they analyse. This would allow them to:
- Properly attribute biases to the correct physical processes,
- Avoid making incorrect causal inferences,
- Provide a more nuanced discussion of model differences.
A clear example of this issue is the discussion of CNRM-CM6-1 and CNRM-ESM2-1 in lines 350-353. The authors claim that the cold bias in these models is due to snow conductivity, when in reality, it is mainly caused by the way these models simulate snow cover fraction as a function of vegetation (see section Snowpack Processes and Appendix B in Decharme et al. 2019). Unlike observations, which assume a fully snow-covered ground, these models allow for a snow-free fraction where soil is directly exposed to atmospheric forcing, leading to an artificially cold soil temperature. This is well-documented in the literature (e.g., Wang et al. 2016). For example, Decharme et al. (2019) states: "In addition, the specific snow fraction over tall vegetation is generally very low, annihilating the soil insulation effect of the snowpack." This explanation is completely absent from the manuscript, despite being a critical factor in the model's behaviour. Wang et al. (2016) also provide a robust and clear discussion of this problem in their "Model Processes" section.
In summary, the authors would benefit from reviewing Wang et al. (2016), which provides an excellent discussion of snow/temperature processes in models (see the "Model Processes" section, page 1733). Writing a discussion of equivalent quality about model processes is essential if this paper is ever to be accepted.
Citation: https://doi.org/10.5194/egusphere-2025-389-RC2 -
RC3: 'Comment on egusphere-2025-389', Adrien Damseaux, 06 Mar 2025
reply
The manuscript addresses the persistent discrepancies in frozen soil simulations within climate models and their land components. By comparing historical runs from seven land-only models participating in the LS3MIP with their coupled counterparts in CMIP6, the study aims to disentangle the contributions of land surface parameterisations and atmospheric forcing to these discrepancies. Given the importance of accurate frozen soil simulations for climate projections and land-atmosphere interactions, this study aims to explore valuable insights into the strengths and limitations of current land surface models and their coupling with atmospheric components.
However, while the study's ambitions hold significant value for the scientific community, the execution falls short in several key areas. The manuscript lacks the clarity and rigor expected in an academic publication, with issues in writing style, structure, and methodological justification. Furthermore, some interpretations of model results are overly speculative, requiring a more cautious and evidence-based approach. Addressing these shortcomings through major revisions will be essential to ensure the study’s findings are both robust and impactful.
Major revisions
- The writing style does not meet the standards of a scientific publication. Many sentences are overly long and vague, and overall consistency is lacking. Several sections—particularly parts of the introduction—are less rigorous and require significant clarification and development. To ensure the paper is accessible and acceptable to its readership, many statements need to be refined and expanded.
- The introduction requires a complete overhaul. I recommend restructuring it into distinct segments that provide:
- Global/local context: Outline the broader and specific contexts relevant to the paper.
- Identification of issues: Clearly identify and discuss the current challenges in frozen soil simulations.
- Research question: Present a specific, well-defined question that the study will address.
- Study approach: Detail how the paper intends to answer this question through its methods and analyses.
- The manuscript would benefit from a clear separation between the results and discussion sections. This division would enable readers to first digest the novel findings in the results section and then understand their interpretation and comparison with previous studies in the discussion. Currently, comparisons with existing literature are scant; the manuscript should incorporate and reference a broader range of studies relevant to the subject matter.
- Regarding the use of ERA5-Land data, the paper’s central question is not directly related to the quality of this dataset. The statement “this proves that ERA5-Land can be a solid benchmark that supports observation as gridded data” is somewhat misleading. Providing “tas” values that closely match observations does not inherently qualify ERA5-Land as an appropriate benchmark for studies focusing on soil temperatures. Instead of justifying its use, the authors should expand the “Data and Methods” section and include a more thorough comparison with LS3MIP simulations to support their choice.
- When results observations, such as biases or qualitative values (e.g., high, low, warm, cold) are mentioned, include specific numerical values. This practice will enhance clarity and allow for a more precise interpretation of the data. This
- Most importantly, the authors’ attempt to link the biases observed in this manuscript to the physical processes represented or the parameterizations used in the models is largely misleading and lacks sufficient nuance. To make such claims, the authors could rely on sensitivity experiments, as they themselves acknowledge: “It is challenging to identify how model features influence the vertical energy transportation process without conducting sensitivity experiments.” Without such analyses, the attribution of biases to specific model processes remains speculative. The manuscript would benefit from a more cautious interpretation of results, clearly distinguishing between observed discrepancies and their potential causes. Additionally, a more thorough review of existing literature on model parameterizations and process representations would provide a stronger foundation for discussing the origins of biases. Instead of drawing causal conclusions prematurely, the authors should consider discussing alternative explanations, acknowledging uncertainties, and explicitly stating the limitations of their approach. For now, the results provided and their analysis are simply not sufficient to support the interpretations and conclusions.
Figures and tables revisions
- Verify and, if necessary, adjust the color palette to ensure it is color-blind friendly for all figures (I doubt figure 2/3 are for example).
Table 1:
- Specify which version of CLM5 is being used (e.g., CLM5.0).
Table 2:
- Replace “snow conductivity” with “snow thermal conductivity.” Note that all values are density-dependent except for MATSIRO. It would be more informative to share the default scheme used (e.g. Yen 1981)and reference the relevant publication rather than only providing the mathematical formulation.
Figure 1:
- Clearly indicate that “perma” refers to frozen soil at -20 cm, as opposed to permafrost at deeper layers elsewhere.
- Specify what is displayed by the color bar adjacent to the figure.
- Consider adopting a visualization approach similar to that in Figure 5 for representing color classes—despite the current use of a linear color scale, note that employing too many classes (a maximum of 7×2 is recommended) can be problematic. If a diverging color scale is preferred, it should be centered around 0.
Figure 2:
- Ensure consistency across figures with respect to units, labels, and color schemes (e.g., using either “Celsius” or “°C” uniformly). The ERA5 Land color scheme should be consistent (currently it varies from gray to black), and model labels should be uniform. Choose ERA5 Land instead of ERA5_Land.
- The model labels and colors are not clearly aligned. It would be preferable for the legend to explicitly represent the color assignments and to differentiate offline land with a distinct style, as demonstrated in Figure 6.
- Clarify (in the figure caption and potentially also elsewhere in the text) that the site-average corresponds to the average of all selected station locations.
Figure 3:
- Ensure that the colors match exactly those used in Figure 2.
- The colors assigned to HadGEM and UKESM are too similar, making them difficult to distinguish.
Figure 4:
- There are no shaded areas.
- Include ERA5 Land in the legend.
Figure 5:
- Align the orientation of both the figure and its caption.
- The current figure is challenging to interpret due to an overload of information. It would be beneficial to label each column or row directly within the figure, rather than forcing the reader to refer continuously to the caption.
- Include a legend that explains what the circle radii represent. Reducing the maximum radii is advisable to avoid overlapping circles, as both color and circle size are important for interpretation.
- Maintain a consistent cartographic projection for all maps; the projection used in Figure 5 currently differs from that in Figure 1.
- Some data points are not legible. Although this may have been intended to portray small biases and standard deviations, it could be misinterpreted as insufficient data. Using an edge color (e.g., black) for the points may enhance clarity.
Figure 6:
- The volume of information in this single figure could be overwhelming for readers. It is recommended to separate the diagrams by variable—focusing on the most relevant ones in the main text, while relegating the less critical diagrams to supplementary material.
- For enhanced clarity, consider dividing the legend into distinct sections: one indicating color (for different models), another for symbols (to distinguish between ESM and stand-alone LSM), and a third for numerical indicators (indicating variable significance).
- The “REF” is not adequately described; it should be clearly explained in the figure caption or within the main text.
Figures 7 and 8:
- Although the manuscript states that “the model simulation outputs are binned at 2°C intervals,” the x-axis labels and ticks reflect a 5°C interval.
Figure 9:
- The significance of the violin plots is not explained. The authors should clarify what the areas, dots, and black lines represent.
- It remains unclear what the x-axis represents. Specify which “temperature” is being shown (e.g., air temperature or soil temperature, observations versus model results, and which dimensions are averaged).
- The current style of the figure does not adequately support the subsequent discussion. For example, statements such as “Set Frozen and Set Warm tas data sample sizes are more than twice as large as in Set Intermediate” and “the difference in LS3MIP runs is negligible, suggesting that the climate model will likely have a cold and small deviation under these temperature states” are not easily decipherable. Focusing more on the standard deviation as a visual might improve interpretability.
Figure 10:
- It is surprising that this figure includes data from all seasons. Isn’t it only winter data? If the intent is to follow the approach of Wang et al. (2016), the data should be restricted to winter conditions.
- Position the model names at the top of each panel to enhance readability.
Minor revisions
General
- Some terms need to be more explicitly defined:
- Models: Clearly define which type of models is under discussion (e.g., Earth System Models, land surface models, etc.) in the introduction.
- Model Ensembles: When introducing model ensembles, particularly in reference to CMIP6 and LS3MIP, specify this term explicitly.
- Land-Only vs. Offline land surface models: Adopt and consistently use one term throughout the manuscript.
- Permafrost: At line 152, the manuscript classifies certain locations as permafrost, which may be misleading because it may imply that permafrost is present exclusively at these sites. The explanation provided in lines 223–227, clarifying that these locations exhibit permafrost soils at a depth of –20 cm, should be introduced earlier and applied consistently to avoid ambiguity
- Use negative numbers consistently when referring to depth and cold bias values. For example, if cold biases are sometimes shown as negative values, ensure that all such instances follow that convention.
- The authors are encouraged to provide the code used to produce the figures in addition to the underlying data. This will improve transparency, reproducibility, and the ability for readers to further explore and validate the results.
Introduction
- Lines 20, 44, 46, 78, 94–97: The authors should include additional references at these lines.
- Lines 21, 171: Update the references with more recent publications to reflect the current state of research.
- Line 24: The sentence is overly long and should be rewritten into two or more concise sentences to improve clarity and readability.
- Line 28: If abrupt thaw is mentioned, a clarification is needed to explain its relevance to this study. The authors should clearly delineate the connection between abrupt thaw processes and the objectives of the manuscript.
- Line 29: Clarify the phrase “such carbon emissions” by specifying what these emissions refer to, ensuring that the meaning is unambiguous.
- Lines 30–32: The implications of “changes in surface vegetation types” should be further developed. While the focus on permafrost thaw is noted, the authors need to expand the discussion to include other consequences of permafrost thaw on Earth’s ecosystems, not just those related to vegetation.
- Lines 34–35: Revise the sentence for clarity. The term “frequently varying” is ambiguous, and the concept of “thermal offset” should be explicitly defined and contextualized.
- Lines 35–37: The statement regarding differences in time scales between soil and atmospheric processes and the role of the soil surface as the interaction window needs further refinement. A clearer explanation of these dynamics, with concrete examples if possible, will help strengthen the argument.
- Lines 36–38: Should examples be mentioned here, the authors are advised to include at least a couple of specific cases. For instance, emphasizing “excess ice” conditions (as noted by Burke et al. (2020) and other studies) would help illustrate the point effectively.
- Line 41: The statement is currently too vague. The authors should detail what each mentioned characteristic does and how it impacts the study.
- Line 42: Specify which conditions are being referred to (including the time-scale, any specific event, and the relevant soil depth), providing the reader with necessary context.
- Line 44: The phrase “the most suitable” is subjective and should be replaced with more objective language.
- Line 45: The text mentions horizontal resolutions; it should be clarified why high resolution is necessary to distinguish between different frozen soil regions. A brief explanation or supporting evidence is recommended.
- Line 47: Provide further justification for the necessity of high-resolution data at this point in the paper.
- Line 56: Describe the potential consequences (or cite relevant studies) that are being discussed. It is advisable to move this sentence to an earlier position in the introduction, before the datasets are introduced, to frame the context properly.
- Line 59: This sentence is overly long.
- Lines 61–62: The delineation of which “characteristics” are being assessed is too vague. In addition, the concept of a “benchmark” is not developed. The authors should specify which benchmark is being applied.
- Lines 62–64: Although the general concept is sound, this section should be expanded. The authors need to elaborate on the differences between CMIP6 and LS3MIP that could lead to the observed biases and uncertainties in frozen soil regions. In particular, clarify how these differences might be attributed to discrepancies arising from the land surface models versus those caused by atmospheric forcings.
- Line 64: The phrase “with identical and more realistic atmospheric conditions” is ambiguous. Clarification is needed regarding what is meant by this and why, under such conditions, LS3MIP models are anticipated to simulate soil conditions more accurately.
- Lines 65–66: This sentence, which is key to the study, requires further development. The rationale behind regarding certain discrepancies as “errors in the land surface models” needs to be clearly explained, with supporting arguments that make the rationale accessible to all readers.
- Line 67: Clearly specify which features are being referred to. Providing examples where applicable will help avoid ambiguity.
Data and methods
- Line 76: Consider introducing the concept of climate “feedback” in the introduction, as it is a key component of this study. This will help set the stage for its later use in the analysis.
- Lines 78–80: The sentence in these lines is unclear. A revision is needed to improve clarity and ensure that the intended meaning is conveyed unambiguously.
- Line 81: Replace “climate models/earth system models” with “Earth System Models (ESMs)” to maintain consistency and accuracy in terminology.
- Lines 82–83: Rephrase “cannot be considered” to clarify whether the models lack a freeze option when turned off or if they do not adequately represent frozen soil processes.
- Line 84: Remove any repetitive wording to improve the flow of the section.
- Line 91: Use the term “snow thermal conductivity”.
- Lines 91–92: The current description is somewhat misleading. It should be clarified that all formulations are either (1) empirically derived and density-dependent or (2) assigned fixed values. This nuance is important for understanding the parameterizations.
- Line 100: Replace the vague phrase “assist our assessment” with a more precise description of how the method contributes to the analysis. Additionally, the term “numerical” is too vague; the authors need to clarify and explain the differences between ERA5-Land and other land surface models, including a brief definition of what reanalyses entail.
- Line 103: Provide details on how the available depth data are interpolated to the target depth of the study. This clarification is necessary for understanding the data processing methodology.
- Line 107: Specify which quality flag is being referenced and explain what it represents regarding data quality or processing.
- Line 108: Reconsider the rationale for using user-defined values of longitude and latitude to determine warmer climates. Note that areas east of 120°E do not correspond to Siberia. It may be preferable to use average air surface temperature measurements to define these regions.
- Line 128: The phrase “in the central tendency of the data” is ambiguous.
- Line 139: Clearly define the seasons used in the analysis. For example, if DJF is employed, specify whether it covers December 1st to February 28/29th or follows another seasonal definition.
Results
- Line 142: Instead of beginning every sentence with “Fig. x…”, the authors should directly present the scientific point. This repetition occurs multiple times and could be streamlined to improve readability.
- Line 143: The term “matching” is superfluous and should be removed.
- Lines 147–148: Clarify the rationale behind the observations made here. Consider moving this explanation to an earlier portion of the section so that the context is established before the results are discussed.
- Line 150: The term “outcomes” is vague.
- Line 162: The term “interpolated” is ambiguous. The authors should specify the interpolation method used and indicate how this process might affect the results.
- Lines 162–163: The sentence stating, “Fig. 2 shows slight differences between different land models because of interpolation uncertainties using different model grids with different setups,” uses the terms “differences” and “different” in a repetitive way. The authors should (a) avoid rushing to conclusions by providing supporting quantitative evidence (e.g., specific values or statistical measures), and (b) rephrase the sentence to clearly explain the potential impact of grid differences and interpolation uncertainties.
- Lines 163–164: The statement, “This illustrates how carefully a comparison of coarse-grid model output against point-like station data has to be interpreted,” requires further explanation. The authors should provide concrete evidence or reasoning to demonstrate how this conclusion was reached.
- Lines 170–171: Rephrase and clarify the material within the brackets to ensure that it is concise and informative.
- Line 171: The phrase “same family” should be explicitly defined. The authors need to clarify what criteria determine the grouping of models into the “same family.”
- Line 172: The phrase “their ability to simulate tsl” is too vague. The authors should detail why a particular performance in simulating soil temperature (tsl) is expected and what underlying processes or parameterizations support this expectation.
- Line 174: I am not a statistician, but the term “diversity” is not adequate in this context and in the rest of the manuscript for me. Could it be replaced with a more specific term such as “variability” or “spread” in model performance to better describe the differences observed?
- Line 175: Clarify what is meant by “with most sites lacking insulating snow.” The authors should specify the criteria or observations underpinning this statement and discuss how this influences the results.
- Line 177: Specify which models are being referred to at this point to avoid ambiguity.
- Lines 207–208: The sentence is redundant or not essential to the discussion.
- Line 211: The assertion that “Differences in grid cell scale among models can lead to biases in the tas state over the grid” is unclear. The authors should expand on this point—explaining how grid cell scale differences can affect biases in near-surface air temperature (tas)—and support the statement with references to the literature or numerical values.
- Lines 218–220: While the discussion of differences in tas EB and tsl EB between LS3MIP and CMIP6 simulations offers an interesting perspective regarding the compensation between land surface and atmospheric processes, this section needs further clarification. The authors should:
- Provide explicit numerical values or clear graphical support for the claim.
- Reconcile this discussion with other observations in the manuscript (e.g., the statement in line 234 regarding Group L versus Group C performance).
- Elaborate on the physical reasoning behind the offsetting errors observed in the CMIP6 ensemble.
- Line 223: The focus on the “shallow soil response” should be clarified and introduced earlier in the section or even in the introduction to offer better context to the reader.
- Line 230: The use of the term “better” is vague. A more precise descriptor or quantitative measure of comparative performance should replace it.
- Line 234: It is unclear whether the reported tsl values pertain to all seasons. The authors should clearly state which seasons are included and, if possible, quantify the differences observed.
- “Climate Dependency of Modeled Temperatures” Section: This entire section could be confusing as it references two kinds of “temperature values”:
-
- Cold/warm biases (often without specific numerical values)
- Temperature values from observations or model outputs (without clear designation) It is recommended that the authors adopt a consistent approach similar to that used by Wang et al. (2016) and further in the manuscript, where temperature regimes are clearly defined and specific numerical ranges are provided for each regime.
- Line 243: The term “state” needs to be clearly defined. Furthermore, introductory words such as “So,” (and “And” further in manscrupit) should be removed in favor of a more formal tone. This writing style is very very preoccupying.
- Line 247: The phrase “simulate similar histograms” is misleading because histograms represent sample counts.
- Line 248: The statement “However, a slight cold bias below -30 °C exists in Group L” requires clarification. If such a bias is observed, the authors should provide the exact numerical value, discuss its significance, and ensure the figure clearly demonstrates this bias.
- Line 261: The phrase “more likely distributed” is unclear. The authors should:
- Specify how the distribution of tsl was characterized.
- Indicate the specific temperature range of the cold bias.
- Lines 266–268: This passage needs to be rewritten for clarity. For instance:
- Clearly specify that the values refer to the minimum tas (if that is the case).
- Rewrite “than that of the land-only run.”
- Explicitly describe how differences in the lower extremes of tas translate into corresponding gaps in tsl values, supporting this statement with numerical evidence.
- Line 269: There is a typographical error: “underestimate” should be corrected to “underestimating.”
- Lines 269–270: The sentence is overly long and ambiguous. It should be restructured to clarify the differences between the “snow insulation effect” and the “surface insulation effect”. Also, note that Dutch et al. (2022) discuss only CLM5.0. The authors should clearly explain whether the two effects are distinct or closely linked, and provide additional literature to support any claims regarding deficiencies in the modeled snow insulation effect.
- Line 272: Missing the term “flux”.
- Lines 273–274: The claim—that a decrease in tas has a limited influence on tsl due to high snd, implying that the primary source of error stems from thermal conditions at the bottom of the soil column—requires stronger substantiation. More data, evidence, or references should be provided to support the assertion regarding the role of geothermal flux.
- Lines 277–280: The categorization of states (i.e., the thawed state, the freeze–thaw transition state, and the frozen state) needs further clarification. It is recommended to (a) reference literature that has adopted a similar categorization technique and (b) ensure consistency in nomenclature between the text and figures—for example, aligning terms like “Set Frozen, Set Intermediate, Set Warm” with the descriptive categories.
- Line 282: Remove “Results are shown in Fig. 9”.
- Line 287: Clarify whether the observation that “the tsl samples are mainly concentrated in Set Intermediate and Set Warm” is directly evident from Fig. 9. If not, provide either numerical summaries or additional explanation within the text or figure caption.
- Lines 290–292: The claim that “the mean and minimum value of tsl bias is much lower than that of tas bias” and that this negative bias appears across all sets requires further explanation. The authors should provide precise numerical values and discuss how these figures support the conclusion that the land models simulate tsl as being too cold relative to expectations.
- Lines 292–293: The key point that improved tas accuracy in Group L models (in Set Frozen and Set Intermediate) does not necessarily yield better tsl simulations, and that tsl variability below –5 °C is higher in Group L than in Group C, needs to be expanded.
- Line 303: The statement that “the tsl gradually convergences near 0 °C” is not clearly supported by the observation figure. In addition, the phrase “and is primarily impacted by tas in a limited manner” appears contradictory. The authors should re-examine the data, reconcile these inconsistencies (especially in light of their earlier remark in line 273), and clearly articulate the influence of tas on tsl with supporting evidence.
- Lines 305–307: The discussion of snow shielding effects—claiming that thicker snow (snd > 0.3 m) strongly relates ∆T and tas—needs to be clarified. The authors should elaborate on how thicker snow modifies the impact of overlying tas and provide additional data or references to substantiate this relationship.
- Lines 316–317: The claim that “all other models fail to reproduce the observation-like curve, underestimating the snow insulation effect under most conditions” seems overly generalized. For instance, CESM2 appears to capture the curve reasonably well. The discussion should differentiate among models and provide detailed evidence to support such claims.
- Lines 319–321: The explanation that low-resolution land surface models hinder accurate determination of surface organic matter distribution—thereby leading to errors in calculating the surface insulation effect—requires further detail. The authors should clarify the underlying processes, reference additional studies (e.g. 10.1175/JCLI-D-24-0267.1), and distinguish this issue from concerns related to snow insulation.
- Lines 319–324: This section appears to focus on surface insulation rather than directly addressing snow insulation. Given that other studies (e.g., 10.1038/s41467-019-11103-1) have highlighted that the warming effect of soil organic matter is less significant in winter because of insulating snow cover, it might be advisable to revise or remove this passage.
- Line 325: The statement “Similar conclusions can be made to HadGEM3” should be clarified by explicitly detailing which aspects of the analysis are similar and providing supporting evidence for this comparison.
- Lines 334–340: More numerical evidence is needed to support the claims made in this portion of the discussion. The authors should include specific numbers, statistical measures, or comparisons to strengthen their argument.
- Line 340: The phrasing “where a substantial reduction in the lack of snow insulation is seen” contains a confusing double negative.
- Lines 341–368: There is a lack of quantitative support throughout this portion. It is recommended to supplement the discussion with numerical values that back up the claims. In particular, when addressing snow thermal conductivity, compare the performance of different parameterization schemes (rather than solely presenting their mathematical formulations) and clarify the impact of low snow depth on thermal behavior.
Conclusions
- Lines 373–382: The authors need to substantially develop and clarify nearly every sentence in this section. Several statements are not consistently supported by the manuscript. For example:
- The last figure concerning CESM shows results that contradict the claim that “inaccurate inter-annual variability in the simulation of soil temperature by CMIP6 models is mainly caused by deficiencies in the land surface models and less inherited from atmospheric components.” This discrepancy is not adequately addressed in the discussion.
- The statement that “biases in the land surface model even partially compensate for the influence of air temperature biases” lacks sufficient evidence or numerical backing.
- The claim that “better precipitation simulation does not ensure snow depth results improve, especially in winter and spring” is not clearly linked to the rest of the manuscript.
- Terms such as “weakness,” “near-surface energy transport process,” and “snow amount” are imprecise.
- The recommendation for “further improvement of parameterization” is vague. The authors should identify which specific parameterizations (e.g., those related to snow dynamics, soil thermal properties, or energy exchange processes) require refinement.
Citation: https://doi.org/10.5194/egusphere-2025-389-RC3
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
105 | 38 | 10 | 153 | 5 | 8 |
- HTML: 105
- PDF: 38
- XML: 10
- Total: 153
- BibTeX: 5
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 55 | 36 |
Germany | 2 | 30 | 20 |
France | 3 | 19 | 12 |
China | 4 | 11 | 7 |
undefined | 5 | 4 | 2 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 55