the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CMIP7 Data Request: Ocean and Sea Ice Priorities and Opportunities
Abstract. The ocean and sea ice are central to Earth's climate system, influencing global heat and carbon cycles, weather patterns, and sea level rise. Recent decades have seen rapid advances in Earth System Models (ESMs), but limitations remain in simulating and comparing key oceanic and cryospheric processes across models. A recurring challenge in model intercomparison efforts like the Coupled Model Intercomparison Project (CMIP) is determining the output variables that best represent essential mechanisms while remaining manageable in volume and complexity. Here we present the CMIP7 ocean and sea ice data request, developed through an international, community-based process to prioritize variables for model output. We identify seven opportunities—science-based use cases spanning ocean and cryosphere drivers and responses, paleoclimate, polar amplification, extremes, wind waves, and rapid model evaluation—to guide variable selection and temporal resolution. To address these opportunities we request new high-frequency and depth-integrated variables, support improved diagnostics of ocean heat uptake, sea ice processes, and model-observation comparison, and build on lessons from CMIP6. Our approach enables targeted, efficient, and transparent data curation to support a wide range of users, from model developers to policymakers. This effort reflects a growing need for more sophisticated, integrative model outputs that address pressing climate questions, including regional extremes and tipping points, while laying the groundwork for future modeling developments.
- Preprint
(730 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-3083', Anonymous Referee #1, 02 Aug 2025
In this paper, Baylor Fox-Kemper and his co-authors present an overview of the CMIP7 variable request for the ocean and sea ice.
Before outlining some comments on this paper, I first want to express my gratitude to the authors for taking on this huge effort which I see this as a truely outstanding service to our community. Thank you!
As regards to the paper itself, I find this a bit difficult to review as it's not a classical scientific paper, but instead more of an outline of the result of a community effort to define overarching questions that could/should be addressed with CMIP7 model output. As such, I'm a bit unsure as to which level of criticism/comment is warranted: The paper is scientifically formally correct and can thus in my view be published as is (with some very minor edits as outlined below).
However, as a reader, I also felt that a few important questions remained open, but I would like to leave it to the authors to decide to which degree these can be answered within the scope of this publication. I therefore just list these here and look forward to seeing maybe at least some level of discussion in a revised version of this paper.1. I would have wished for more "lessons learned" from CMIP6, and for some discussion as to how these lessons were addressed in CMIP7. In particular, I feel that the data request for CMIP6 was already overwhelming, and now even more variables are added to that list. What did we learn more concretely about the variable usage and efficiency of CMIP6 output, and how was this considered here? Which variables or groups of variables are dropped for CMIP7, or are we simply requesting more from the modelling centers?
2. What are the pros and cons of the approach of defining "opportunities"? Do the authors feel it is warranted that 2 out of 7 of these are related to waves? Given that the two overarching opportunities of the ocean and the sea ice variables seem to mirror the approach taken in CMIP6, the introduction of opportunities seems to make the call wider rather than more concrete, which seems to go against the intention of opportunities. Some reflection on this would be helpful. Is this approach too broad/too limiting/helpful??3. What is special about "paleo" variables, why are they different from the standard variables, or is the idea here to define a subset given the length of the simulations?
4. Some discussion of on-line analysis versus the storage of huge amounts of data would be helpful, I find. To which degree do the authors think that we should focus more on high-level analyses that are calculated on the fly while the simulations are running, versus storing petabytes of output variables that researchers than use to calculate the same integrated metrics again and again? (I think this indeed is an open question)
5. Some reflection on the time line for CMIPs would have been nice to have included here from a specific sea-ice and ocean perspective. From the discussion, I understand that we might be overwhelming ourselves with the current pace of CMIP activities - would the authors have concrete suggestions for improving this situation for our communities?
6. Which concrete criteria were used to define the priorities for variable requests? How are they different from those in CMIP6?
7. Are there variables that are requested to be supplied as a group given that for example supplying just 60 % of budget variables often means that even those 60 % cannot be used if the other 40 % of variables are not supplied
8. While the term "tipping" is mentioned in the abstract, there is no dedicated opportunity related to the stability of e.g. the AMOC or other large-scale ocean features. Is there a reason for why this is not targeted explicitly?
9. The inclusion of physical vs. non-physical variables is unclear. I first thought that this paper only dealt with physical variables but then saw that for example chl200 is requested which relates to chlorophyll. Some discussion of physics vs. biogeochemistry vs. biology of this variable request would be helpful
10. I was surprised to find that lakes are included in this request. Was this also the case in CMIP6? Some discussion on lakes vs. seas vs. oceans would be helpful, also related to the modeling of the hydrological cycle over land grid cells.
Minor comments:
l.71: One could also cite IPCC AR6 WG1 cross-chapter box 10.1 here
l.75: It is unclear to me which data set the v2.2 refers to
l.76: Something seems wrong with this sentence
l.120: What about ice-sheet--ocean interactions, was ISMIP involved in these discussions? This seems like a topic that I found surprising to not be covered as a major topic/opportunity in this paper
Table 1 (and other tables): Please check column width, in this table the first column is too narrow and so all IDs are spread over multiple lines
l.297: Please spell out AFT here, the abbreviation is rarely used in this paper and there are many pages after its definition in l.90. The mentioning of the 127k simulations is surprising, I don't understand what these refer to - maybe provide. a bit more background information for the non-paleo readership of this paper?
l.492: The style of this paragraph is different to the ones before, in particular owing to the usage of "we request". Might be good to streamline the style of all opportunities one way or the other.
l.501: Again, please spell out AFT I suggest.
Appendix B: sisnmassn and sisnmasss could use same wording for their titles
Citation: https://doi.org/10.5194/egusphere-2025-3083-RC1 -
RC2: 'Comment on egusphere-2025-3083', Brandon Reichl, 14 Aug 2025
Review of “ CMIP7 Data Request: Ocean and Sea Ice Priorities and Opportunities”
Reviewed by Brandon Reichl
This paper proposes the framework for ocean and sea-ice focused data output standards for CMIP7 models. The text provides a summary of this project’s origination, structure, and recommendations. The objective of this manuscript is to document this committee’s task to create a high-level overview for the recommended marine based CMIP7 data requests and summarize the process. This objective is achieved. Since the paper lays out an important set of recommendations for the CMIP7 community, it is appropriate for publishing. My comments can be viewed as suggestions that the authors may consider for revision rather than concerns that need to be addressed to warrant publication.
General Comments
- The authors clearly devoted considerable time and effort into generating this data request, and it is a monumental feat to achieve this summary. This paper could easily have been 2-3x its current length depending on the granularity of presentation of the opportunities and the discussion of output variables associated with each. Distilling it down to something more manageable is a positive outcome, but this does sacrifice the ability of this paper to be self-contained. As such, there is much jargon and detail that assumes a reader is already intimately familiar with CMIP and the CMIP7 data requests. The reader is also assumed to know where to find the details that were omitted. That is probably fine for the intended audience and for the purpose of this work to fit within a collection of related manuscripts. It is a very different paper from the OMIP protocol and diagnostics paper (Griffies et al., 2016), which seems fine given a more high-level purpose.
- It took me a considerable amount of time to figure out how to find the data variables for this request. I found the Airtable website (https://airtable.com/appOcSa4gXyzHThmm/shrkayKObes58Zu45/tbljoSaMlK7m0DunX/viw0evRBr0vqp658c) after some exploring through the provided Zenodo link, github, and the binder software implementation, and eventually realized this was all organized on a website (https://wcrp-cmip.org/cmip7-data-request-v1-2-2/). Is the website address intentionally omitted from this text? The Airtable seemed to me the most direct way to scan through the details of the data request without installing special software and spending some time learning to navigate the data structures. Maybe it was omitted because it is not a permanent resource? Could some of the information be exported to PDF supplementary materials?
- I do not notice any obvious omissions of ocean and sea-ice model output that would prevent CMIP7 models from serving their primary purposes. There are a number of non-baseline variables included whose necessity could be debated at length, and probably already was debated within this large author group. The discussion about why these variables are needed is laid out well scientifically. However, I do find the data justification of this request lacking in detail, focusing on these science questions that can always be better addressed with more data, but not reporting much detail on the consideration of specific data demands. Of course, judging the appropriateness of certain data volume requests will vary significantly depending on one’s specific interests and the resources available to a specific modeling institution. If possible, providing some quantitative estimates related to the added data cost of various opportunities, variable groups, etc. would be helpful. This would especially help convey that this real (and significant) implication of the cost to institutions of serving the data request was weighed against the science it will support.
- Prioritization of outputs (including variables or frequency) could also have been discussed more, if a goal is to ensure more institutions provide certain specific model output. There are some “high”, “medium”, and “low” priority cases, but what gives a variable its priority is not discussed in detail. Is it expected that modeling centers under numerous pressures from deadlines will be able to save and publish the “low” priority output? My takeaway from Section 4.1 is that a proper prioritization process was very difficult and likely demanded more resources than were available.
- A discussion on spatial coarsening of model output could be useful somewhere. Some variables (e.g., heat content, certain scalars, MOC) can likely be coarsened with the savings in data serving demands outweighing the loss in information (e.g., from a ¼ degree grid to a 1 degree grid). But other data should presumably not be coarsened beyond the model native resolution if possible (e.g., extremes; winds if the purpose is for driving a wave model).
- I wonder if “Waves” belong as their own topic alongside ocean and sea-ice? E.g., the title could be “ocean, sea-ice, and wave priorities and opportunities”, given the attention to waves in this data request. This emphasis on waves would then better justify why two of the more data intensive opportunities are associated with waves parameters.
Specific Comments
- Introduction
L77: Check this sentence, maybe replace “has” with “and”.
L82: Multiple time [and spatial] scales. (is frequency needed here?)
L86: The terms in italics could be defined somewhere (perhaps in a table). E.g., after reading this paper a few times I’m still not entirely clear on what is meant by “opportunities”.
L107: “The accompanying tables…” This reference seems to only include a small spreadsheet of the REF variables. Is it meant to be to the full opportunities/groups/variables discussed in this text?
- Approach and Methodology
This essentially reads like a technical report, there is not much to comment on regarding the methodology.
Table 1: I wonder if there is a way to add more granularity to the breakdown of the variables. E.g., ID 47 has 240 variables, but if they are 2d monthly variables it would be a completely different request than if they are 240 3d daily mean variables. Grouping by time period and/or 2d vs 3d could help clarify the data demand. If data volumes were part of the decision making, it could also help to explicitly include some quantitative data estimates (e.g., many data volume estimates are given by Juckes et al., 2024). I eventually figured out that this information is contained in the AirTable (if that is reliable?), but without significant experience using AirTable that wasn’t obvious to me and so could be summarized here.
Table 1: “Experiment Groups” might be useful to define.
3.1, Ocean Changes, Drivers and Impacts
ENSO is briefly mentioned, but the science questions referenced here largely neglect tropical topics. Maybe that reflects the state of the scientific interests now, but perhaps some ENSO implications could be mentioned?
No glaring variable omissions. The total estimate of data volume on AirTable is about 23 TB, which seems manageable. The ocean_mesoscale addition is a fairly substantial fraction of this, I think that their cost could be acknowledged.
L183: What is the practical benefit of this clustering? What is the benefit of a variable group?
Section 3.2 (Sea Ice Changes, Drivers, and Impacts)
No comments, the data estimate from AirTable was 16.1 TB
Section 3.3 (Paleoclimate)
The data estimate from AirTable was 22.4 TB
L297: Some elaboration could help here, the assumption now is that a reader is familiar with paleoclimate experiments and knows what the abrupt-127k simulation means (I had to look it up)
Section 3.4 (Polar Amplification):
The data estimate from AirTable was 35.8 TB
Section 3.5 (Extremes):
The data estimate from AirTable was 54 TB, this opportunity will likely be an important one from CMIP7 and will be well utilized by groups performing model analysis.
L386: I’m surprised to see BGC data also included in this request, it seems somewhat out of the stated scope.
Section 3.6 (Wind waves)
The total size of this data request on AirTable is 64 TB, or almost 3x that of the Ocean Changes opportunity. That significant potential overhead is probably worth discussing, or at least explaining how much of the data is an additional cost vs atmospheric variables that would already have been stored as part of other opportunities.
L405: Those that do are often at rather coarse grid spacing (this is sort of mentioned in other places).
L407: “high-resolution data” -> This is subjective, it could be elaborated what time frequency and spatial resolution are preferred.
L409: I’m unclear what is meant by “independent of ESM outputs”. My understanding was that the ESM outputs are to be used to drive the wave model?
L411: Should it be obvious why this offers computationally efficiency and spatial detail? I’m not sure that it is.
Section 3.7 (wave coupling)
This is the most expensive opportunity in this topic area according to AirTable at 97 TB, or roughly 4x the size estimate of Ocean Changes. If the AirTable estimates are reliable and these are not just associated with otherwise collected variables I think some discussion is warranted. This opportunity being much more data intensive than the extreme impacts opportunity does make it feel like a substantial new request.
Appendix B: I’m unsure what the need is for choosing this subset of variables to define more formally (aside from being new to CMIP7?). It is a rather long/unwieldy list, but is incomplete and thus requires intimate knowledge of CMIP6 variables. A lot of non ocean and cryosphere variables are also included, is that intentional? I generally like the inclusion of some key variable information that can be accessed without surveying the AirTable (could be this appendix, or supplementary materials).
Typos/Word choice:
L163: As already noted [in the introduction and references within]
208: resolution -> grid-spacing
208: higher -> finer
Citation: https://doi.org/10.5194/egusphere-2025-3083-RC2 -
RC3: 'Comment on egusphere-2025-3083', David Bailey, 14 Aug 2025
This manuscript summarizes the CMIP7 data request focussing on the ocean and sea ice fields. This manuscript is very well written and thoroughly details the opportunities and needs for new variables to be added for CMIP7. I do like the way this has broken things down into different opportunities for ocean, sea ice, paleoclimate, polar amplification, extremes, and so on. However, I do have some significant concerns here.
1. This is not a traditional scientific paper. Which might be fine for this particular audience. I understand there is a need to have this documented somewhere. However, this really just reads like a workshop report. It is just summarizing outcomes of several discussions. In fact, Appendix A - Opportunity Processing is a table of the outcomes from all of the meetings. I really feel like this table adds nothing to the manuscript even though it is simply an appendix.
2. I get the feeling overall in this summary that the CMIP6 data request was perfect and we just need to add more output. I would argue that this paper could have provided a critical retrospective of the CMIP6 data request. How many people actually used the variables? What are the lessons learned here. For example, we had extensive discussion about the difficulty of sea ice albedo (sialb) and how to standardize this calculation. Really, the only accurate way to do this is to compute the ratio of outgoing shortwave over the incoming shortwave. This is a true broadband albedo and this should be done daily. I'm sure there are numerous other variables both from the ocean and sea ice perspective that could be tweaked or even removed from the data request. A little bit of this is discussed in the key reflections section, but I don't feel this is enough.
3. Also I am not as clear how these opportunities map to the set of experiments that are being done for CMIP7. There is mention of the Fast Track experiments, but are we supposed to include all of the for DECK, Scenario, PAMIP, ... We are being challenged at big modeling centers to reduce our carbon footprint and only do essential experiments. This sort of a data request is going to be hard to manage. We have to be very smart and efficient with our data output from the model. Creating petabytes of data is just not feasible.
4. We are being asked to rank the scientific significance, quality, and reproducibility. These are really not applicable to this manuscript. I can see aspects here that are significant for potential science down the road.
Citation: https://doi.org/10.5194/egusphere-2025-3083-RC3
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
727 | 66 | 13 | 806 | 17 | 27 |
- HTML: 727
- PDF: 66
- XML: 13
- Total: 806
- BibTeX: 17
- EndNote: 27
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1