Sea ice in the Barents and Kara Seas: models versus reanalyses
Abstract. Satellite measurements indicate that the Barents and Kara Seas have experienced some of the most significant declines in sea ice concentration and an increase in sea surface temperature in the Arctic. Understanding the processes driving these changes is of utmost importance to assess the implications of climate change in the Arctic and globally. Given the observational limitations in these high-latitude areas, coupled models and reanalysis products play a key role. However, it is essential to better understand the strengths and limitations of such data products. Models are often evaluated on a pan-Arctic scale, but results might differ when considering sub-regions. This study examines two regional coupled ocean-atmosphere models (HIRHAM-NAOSIM 2.2 and RASM), two coupled model intercomparison project, phase 6 (CMIP6) models (CNRM-ESM2_1, NorESM2-MM), two regional ocean reanalyses (RARE 1.15.2 and TOPAZ 4B), one global reanalysis (ORAS5) and OSI-SAF satellite observations over the period 1991–2020. We focus on spring (March–April–May) and autumn (August–September–October) sea ice area and sea ice concentration, the edge of the sea ice, and the temperature of the mixed layer. The results show a statistically significant decrease in sea ice area and sea ice concentration in the satellite measurements and in all reanalyses. However, only one of the coupled models, HIRHAM–NAOSIM, exhibits a statistically significant sea ice decline in the Barents Sea, albeit weaker than in the reanalyses, and no coupled model shows a significant trend in the Kara Sea. Additionally, the reanalyses show a more pronounced warming of the mixed layer than the coupled models, particularly in autumn and in the Kara Sea.
General comment:
There is great interest around the world to monitor the evolution of sea ice and mixed layer temperatures in the Arctic Ocean. Throughout the satellite observational record, we have seen a consistent decline in sea ice extent in the Arctic, with far-reaching implications for the global climate system. There is a range of models, reanalyses, and satellite products to evaluate these trends, however, critical evaluations of these products are often lacking. The submitted manuscript presents data from two regional models, two global climate models, and three reanalyses, and evaluates them against satellite measurements. The figures are generally clear, the metrics used to assess sea ice trends are appropriate, and the topic is within the scope of the journal.
However, the manuscript suffers from a lack of clearly defined research questions and objectives, which affects its focus and analytical depth. The abstract and introduction do not state specific aims, the justification for selecting datasets is incomplete, and the discussion summarises the results without connecting the findings to well-defined objectives. Mechanistic explanations for the observed biases are currently lacking, which makes it difficult to interpret the results and assess their significance. While this could be acceptable if the study’s primary purpose is a very broad intercomparison, the manuscript could still provide concrete guidance—such as performance metrics or recommendations—that would help the community identify which models are most appropriate for particular scientific questions and applications in Arctic sea ice and mixed layer research.
This lack of clear direction makes it difficult to interpret the results, assess the significance of the findings, or understand or understand how the results could inform the suitability of different models for specific applications in Arctic sea ice and mixed layer research. If the goal is to assess model realism, the manuscript could provide concrete guidance—such as performance metrics or recommendations—that would help the community identify which models are most appropriate for particular scientific questions.
Additionally, the manuscript reads somewhat as a first draft, with recurring scientific and formatting issues, including inconsistent acronym usage, misplaced citations, and inclusion of DOIs and URLs in the main text. While each of these issues is minor individually, their repetition throughout the manuscript negatively impacts readability and even the overall professionalism of the work.
Given these issues, I recommend rejection in its current form. However, with careful revision it has potential to make a meaningful contribution. The authors should explicitly define research questions, justify dataset selection, restructure the discussion to address those questions with mechanistic interpretation, and address the recurring scientific writing and formatting issues. Addressing these points would strengthen the manuscript and make it suitable for resubmission.
Major comments:
Abstract
The abstract provides a clear overview of the study and concisely summarises the main results, but it does not fully convey the purpose, nor the significance of the work. While the opening sentences frame the general context and the importance to understand model and reanalysis performance in the Arctic, there is no clear research question stated, only “better understand the strengths and limitations [...]”, which is entirely too generic for a research question. Furthermore, although the results are summarised, there is no evaluation or discussion of these strengths and limitations, or mechanisms behind the observed trends. For example, the abstract reports that of the models, only HIRHAM-NAOSIM shows a significant sea ice decline, but it does not explain why these differences occur, or what they imply for model performance.
To improve clarity and focus, I recommend that the authors: (i) state specific research objectives, e.g., assessing the ability of models and reanalyses to reproduce sea ice variability and identifying underlying causes of potential biases. (ii) summarise the main results in the context of these objectives, i.e. not just stating the results, but what they reveal about the models’ strengths and weaknesses. (iii) include a brief statement on the broader impact of their results. That is to say, how these findings inform understanding of e.g., sea ice variability, model development, or future projections.
Introduction
The introduction provides nice context on the Barents and Kara Seas, but several aspects could be improved for clarity and focus: (i) The detailed geographic description of the region (L. 30-42) is somewhat excessive. I would recommend condensing it to highlight only the features most relevant to modelling and sea ice variability. As it stands, it distracts from getting into the topic and motivation of the paper. (ii) The concept of “storylines” (Levine et al., 2024) is introduced but not really developed in the study. Unless this concept is explicitly connected to the results, I would recommend removing it. (iii) The study’s aim is not stated clearly (L. 64-66); phrases such as “investigate the capability [...]” and “understanding the performance [...]” are too generic, and this is felt throughout the manuscript, which feels somewhat lacking a clear direction. In particular, it is not clear whether the focus is on long-term trends, interannual variability, seasonal variability, or mean state biases. As it stands, the manuscript attempts to address it all, but without a clearly defined focus, the analysis remains broad and does not explore any one aspect in depth. I recommend stating specific research questions (e.g., why do models/reanalyses fail to capture Barents-Kara sea ice variability/trends), and clearly defining the scope of the analysis. The authors should also clarify the expected contribution of the study (e.g., identifying underlying mechanisms driving biases, and informing improvements for specific model components that control sea ice and mixed layer properties in the Barents-Kara region).
Data and Methods
The methodology lacks a clear explanation for why these particular models and reanalyses were chosen. The only justification concerns the two CMIP6 models, where it is stated: “These models were chosen following the analysis described in (Levine et al., 2024), where the future simulation of CNRM represents a warm humid Arctic Ocean and cold dry continents, while NorESM represents a cold dry Arctic Ocean and warm humid continents, as described in (Mottram et. al (submitted)).” However, it is not entirely clear why these models are appropriate for evaluating historical sea ice trends, since the stated motivation – that they represent extreme future Kara/Barents warming scenarios – relates to projections rather than the historical period analysed. The discussion reiterates the “warm/cold Arctic storyline” without linking it to past sea ice variability, which may be confusing to readers.
Even more importantly, no justification is provided at all for the selection of the regional models or reanalyses. It would be helpful to understand whether these datasets were chosen based on prior evaluation, spatial or temporal resolution, regional performance, or indeed any criteria at all. Without a clear explanation for all datasets, I think it is difficult to assess the robustness of the comparisons or whether the results might simply reflect arbitrary choices rather than meaningful scientific patterns. I recommend that the authors provide criteria for the selection of all datasets, and either justify why the future-oriented storylines are relevant for assessing historical trends or remove that discussion from the manuscript.
Discussion
The discussion is detailed and descriptive, but mainly just summarizes the results again. The authors identify relevant differences but do not sufficiently explore the causes of these differences or their implications, so the discussion ends up shallow and offers few novel insights. Another issue is a structural one, the authors do not sufficiently tie together the discussion of their sea ice analysis to their analysis of mixed layer properties. Rather it reads like two separate parts, with no connection between them.
On the discussion of reanalyses and sea ice representation (L. 415-427), the manuscript builds up to the conclusion that data assimilation improves sea ice representation, but that there are still deficiencies and that “they can not be used for everything”. While technically correct, this is not a particularly novel insight and does not substantially advance understanding. Some opportunities for deeper analysis could perhaps include: (i) exploring why TOPAZ and ORAS – both of which assimilate satellite data – exhibit opposite biases in sea ice concentration (overestimation vs underestimation). This suggests that factors beyond assimilation, such as model physics or the assimilation scheme, play an important role, but this is not explored at all, and (ii) why RARE, despite not assimilating satellite data, is able to capture the negative trend, while largely failing to reproduce the mean state. More importantly the paragraph should explicitly connect the sea ice biases to the mixed layer properties discussed later. At present, sea ice variability is discussed in isolation, without relating it to the ocean state, and therefore lacks a mechanistic explanation for the identified biases. I think that establishing this connection is necessary to move beyond a descriptive comparison.
The discussion of regional models and sea ice representation (L. 428-448), is largely descriptive and offers no additional insights from the results. It does not meet the analytical depth expected in a discussion. Statements such as “This might have to do with the differences between the basins [...]” are too general and vague, and are above all not supported by any analysis. The authors are expected to substantiate such claims or formulate concrete, process-based interpretations of their results. Additionally, the second paragraph (starting at L. 441) reads as general background and is not explicitly tied to the results of this study. As it currently stands, this paragraph is more appropriate in the introduction than the discussion. In the discussion, these mechanisms should be used to interpret the authors own findings. Furthermore, there is still no connection in these paragraphs relating sea ice biases to the ocean state or atmospheric variability or indeed any other metric, and therefore no analysis is provided for why e.g. RASM has the smallest IIEE, or why HIRHAM-NAOSIM performs better for sea ice trends in the Barents Sea.
Starting at L. 449, the discussion on the mixed layer temperature is mainly descriptive and restates results, rather than providing interpretations. High correlations between mixed layer temperature and sea ice area are noted, along with differences between datasets and regions, but these differences are not explored further. For example, the authors note that RARE shows higher correlations than the other reanalyses, but that discussion is immediately stopped with “more research would be needed [...]”. Similarly, differences in the magnitude and spatial pattern of ML temperature trends between reanalyses are reported, but not investigated further. There is also a lack of mechanistic interpretation, for example, how ML temperature anomalies might influence sea ice melt or inhibit ice formation. The discussion could perhaps be strengthened by investigating the differences between the reanalysis products and analysing why RARE shows higher correlations, exploring the spatial patterns in ML temperature anomalies, as well as explicitly linking ML temperatures to observed sea ice variability more than reporting on high correlations.
The section on CMIP6 models and Barents-Kara sea ice (L. 461-477), adds some useful context, e.g. linking to Yamagami et al. (2022) and highlighting limitations due to model domain coverage and resolution. These insights are mostly descriptive though and I think again some mechanistic interpretation is missing, essentially, how do factors such as North Atlantic warming dynamically and quantitatively influence the observed mixed layer temperature anomalies and sea ice anomalies in this study? Another point to expand the discussion could be that HIRHAM-NAOSIM did not cover the Gulf Stream region, yet it showed better performances across most metrics compared to RASM. Could this mean that explicit representation of the Gulf Stream is not important? Or that other factors such as local ocean state variability or atmospheric variability are more important? Finally, references to the “storyline analysis” from Levine et al. (2024) are underdeveloped and do not really contribute to the interpretation of results. Storylines are mentioned briefly in the introduction, methods, and discussion, but are never analysed in relation to observed historical trends, sea ice biases, or ML anomalies. This concept, as presented, adds little analytical value and I think the text would be clearer if removed.
The final paragraph reiterates known challenges regarding Barents-Kara representation and model resolution but does not provide new insight or specific recommendations. Statements such as “further model development is required” are too generic and should be linked to concrete processes or mechanisms identified in this study. While noting that resolution alone does not solve the biases is useful, the manuscript does not analyze why higher resolution fails to address the discrepancies, e.g. are there specific ocean processes that are not represented well? Sea ice-ocean coupling? Vertical mixing? Statements such as “further model development is required” are too vague, especially after 24 pages of detailed analysis. The conclusions would benefit from more specific suggestions, for example: (i) which processes require improved representation (e.g., Atlantic water inflow, mixed layer heat exchange, ice-edge dynamics)? (ii) Are there model parameterizations, coupling approaches, or observational constraints that could be targeted for improvement? (iii) Could regional modeling experiments or sensitivity studies help disentangle the drivers of bias? Making these points explicit would strengthen the conclusions and provide some actionable steps for future work.
Scientific writing
This manuscript contains a number of recurring style and formatting issues that affect readability. Acronyms are often introduced after their abbreviated forms, DOIs and direct URLs are included in the main text rather than in the reference list, and citations are frequently placed mid-sentence in ways that disrupt flow. Table captions are consistently located below the tables instead of above. While each of these issues is minor individually, their repetition throughout the entire manuscript makes the text difficult to read and follow. The cumulative effect of these errors reduces the clarity and even the professionalism of the manuscript. I recommend a careful and thorough editorial pass to correct these issues and ensure consistent adherence to standard scientific writing conventions. Since these many of these errors are occurring throughout the manuscript, I will only list a few examples:
Acronyms are often introduced before their full names (e.g., L. 103: “HYCOM (Hybrid Coordinate Ocean Model) [...]”), which is inconsistent with conventional scientific writing. Standard practice is to present the full name first, followed by the acronym in parentheses (e.g., “Hybrid Coordinate Ocean Model (HYCOM) [...]”). This occurs in Sec. 2.1, 2.2, among others.
DOIs should not appear directly in the main text. This happens at e.g., L. 102. Similarly, direct URLs (e.g., L.119: “Arctic Great Rivers Observatory, https://arcticgreatrivers.org/”) should not appear directly in the main text. All DOIs and URLs should be moved to the reference list or the data availability section. In-text citations should only include author(s) and year to maintain readability and a consistent citation style.
Citations often appear immediately after acronyms or mid-sentence, which interrupts the flow of the text (e.g., L. 91: “We use two CMIP6 (Eyring et al., 2016) Earth system models, […]”). A clearer style is to place citations at the end of the sentence or clause describing the relevant data or concept (e.g., “We use two CMIP6 Earth system models (Eyring et al., 2016), […]”). This ensures the text is more readable and not interrupted mid-sentence.
In some cases, text citations are incorrectly used instead of parenthetical citations (e.g., L. 84: “RASM Cassano et al., 2017 is a fully coupled atmosphere–land–ocean–sea ice model.”) and should be rewritten as “RASM is a fully coupled atmosphere–land–ocean–sea ice model (Cassano et al., 2017).” Conversely, parenthetical citations are sometimes used where text citations are appropriate (e.g., L. 92: “These models were chosen following the analysis described in (Levine et al., 2024) […]”), which should be rewritten as “These models were chosen following the analysis described in Levine et al., 2024 […]”.
The manuscript shows inconsistent use of parentheses in two contexts: (i) Acronyms with references: Examples such as L. 103: “HYCOM (Hybrid Coordinate Ocean Model) (Chassignet et al., 2003)” use separate parentheses for the acronym and the reference. For clarity, these should be combined in a single set of parentheses, separated by a semicolon: “Hybrid Coordinate Ocean Model version 2.2 (HYCOM; Chassignet et al., 2003).” (ii) Multiple references within text: At times, references are incorrectly formatted, e.g., L. 43: “(e.g., Li et al. (2022) and Dörr et al. (2024), among others)” Standard formatting combines such references into a single set of parentheses, separated by semicolons: “(e.g., Li et al., 2022; Dörr et al., 2024)”
Minor comments:
The title could be improved for clarity. The phrase “models versus reanalyses” suggests a competition, when in reality observations, reanalyses, and models, are all complementary tools. A more neutral and I think scientifically appropriate phrasing (e.g., “evaluation” or “comparison”) would better reflect the study’s aim. Beyond this minimal change, the title could be made more informative by indicating what is being evaluated (e.g., long-term trends, interannual variability, seasonal variability, mean state biases), and potentially why it matters (e.g., performance, processes).
Throughout the text, standardise “Barents-Kara Seas” vs “Barents and Kara Seas”. Really either is fine, just be consistent, however, considering that the authors explicitly distinguish between the two regions, I would write “Barents and Kara Seas”.
The manuscript uses season names inconsistently. At times “spring/MAM” and “autumn/ASO” are used, elsewhere just “spring”/“autumn,” elsewhere just “MAM”/”ASO”, and sometimes even “winter”/“summer.” Considering the already large number of acronyms in the manuscript, I recommend defining the seasons once in the Methods (e.g., spring = MAM, autumn = ASO) and then consistently using the terms “spring” and “autumn” throughout. This would improve readability, especially when referring to sea ice maxima and minima.
I recommend using Oxford commas throughout the manuscript where appropriate to improve clarity, for example in L. 18 and 44 to name a few.
Wherever appropriate, I recommend adding a comma after parenthetical references when they occur at the end of a clause, to clearly signal the clause boundary. L. 24 would be changed to: “These observed changes characterise the region as a warming hotspot (Lind et al., 2018), and as […]”.
E.g., L. 20, 44: I think the term “ocean basins” is not appropriate for the Barents and Kara Seas, which are two shallow shelf seas. Perhaps replace it with “shelf seas” or “marginal seas”.
41: The phrase “disproportionately large influence” is somewhat unclear. Disproportionately large compared to what? I would consider removing the word "disproportionately" or explicitly state what this is being compared to.
50: European observations in the high Arctic go all the way back to Nansen’s expedition 130 years ago, so the phrase “Nowadays, observations [...]” is inaccurate. Additionally, the sentence is somewhat circular and redundant, as observations, reanalyses, and models, are the standard tools for studying the Arctic, and indeed the rest of the World Ocean. Consider rewriting to highlight that they are complementary tools to understand recent changes in the Arctic.
55: CMIP6 has already been introduced as an acronym earlier in the text, and therefore there is no need to reintroduce it.
68, L. 75: Consider swapping the order of “coupled” and “regional” to emphasise that these are regional models that have coupled components.
153: 𝛔 typically refers to density anomaly, not density, which is denoted by ⍴. Add the word “anomaly” after density.
153: Considering that we might expect model properties to be biased, are the results sensitive to the choice of density criterion? Do the results qualitatively change with other criteria?
155: It is unclear whether “The monthly mean temperature for each grid cell is calculated for this depth” is taken at the mixed layer base only, or as an integrated value from the surface down to the depth of the mixed layer base. The authors should clarify this, as it significantly affects interpretation of the mixed layer temperature. The correct way would be to integrate from the surface down to the base of the mixed layer.
158-165: The manuscript uses a significance threshold of p < 0.1 for trends, but a threshold of p < 0.05 for differences between correlations. Consider clarifying why different thresholds are used, and whether the choice of p < 0.1 for trends affects interpretation.
191: The term “temporal pattern” is somewhat vague. I would specify whether this refers to interannual variability, seasonal timing, or long-term trends, as the paragraph also discusses mean values and decadal trends.
Fig. 1: I would add bathymetry to Figure 1. This would help illustrate key features, reducing the need for a lengthy description in the introduction.
Fig. 2, 7: Figures 2 and 7 show the full seasonal cycle, but the manuscript mainly focuses on spring and autumn, and other seasons are not really discussed in detail. Consider whether these figures could be simplified and replaced with a simple time series showing only spring and autumn values, which might make the results clearer and more directly relevant to the study’s focus. Alternatively, expanding the manuscript to consider the full seasonality.
Additionally, the ticklabels for the years are not aligned with the start of the grid cells.
Adding titles above the left and right panels showing “Barents Sea” and “Kara Sea” would improve readability and help orient the reader.
The figures could be simplified by showing only one colorbar instead of two, which would reduce clutter.
Finally, the figure could be even more simplified by removing the ticklabels showing years for the Kara Sea panels.
Fig. 3: Each subplot uses the same legend, so only one is necessary. I suggest keeping the legend in the top-left subplot and removing it from the other subplots to reduce redundancy. Additionally, the x- and y-axes should be the same for the Barents Sea and Kara Sea panels to make it easier to compare between regions.
Table 5. The correlations are calculated for summer (which you define as Apr–Sep) and winter (which you define as Oct–Mar), which include spring (MAM) and autumn (ASO) months that the rest of the manuscript focuses on. This shift in seasonal definition is somewhat confusing and not really explained in the manuscript. I recommend either aligning the correlation periods with the spring and autumn focus of the manuscript, or alternatively clarifying why a different seasonal grouping is used here.