the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Baseline Climate Variables for Earth System Modelling
Abstract. The Baseline Climate Variables for Earth System Modelling (ESM-BCVs) are defined as a list of 132 variables which have high utility for the evaluation and exploitation of climate simulations. The list reflects the most heavily used elements of the Coupled Model Intercomparison Project phase 6 (CMIP6) archive. Successive phases of CMIP have supported strong results in science and substantial influence in international climate policy formulation. This paper responds both to interest in exploiting CMIP data standards in a broader range of climate modelling activities and a need to achieve greater clarity about the significance and intention of variables in the CMIP Data Request. As Earth System Modelling (ESM) archives grow in scale and complexity there are emerging problems associated with weak standardisation at the variable collection level. That is, there are good standards covering how specific variables should be archived, but this paper fills a gap in the standardisation of which variables should be archived. The ESM-BCV list is intended as a resource for ESM Model INtercomparison Projects (MIPs) developing requests to enable greater consistency among MIPs, and as a reference for modelling centres to enhance consistency within MIPs. Provisional planning for the CMIP7 Data Request exploits the ESM-BCVs as a core element. The baseline variables list includes 98 variables which have modest or minor data volume footprints and could be generated systematically when simulations are produced and archived for exploitation by the WCRP community. A further 34 variables are classed as high volume and are only suitable for production when the resource implications are justified.
- Preprint
(1345 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2024-2363', Anne Marie Treguier, 30 Aug 2024
Dear authors,
Congratulations for this manuscript! I would like to share a suggestion. Some of the variables proposed in the list are not simple physical parameters like temperature. An example is Omon.mlotst, the ocean mixed layer depth. Its computation requires making nontrivial choices. It would be useful to add for each variable a reference to the paper that documents the method, for example Griffies et al., 2016 https://doi.org/10.5194/gmd-9-3231-2016 for ocean variables. If a change in method is decided relative to the existing reference, this change should also be documented and referenced (this may be the case for Omon.mlotst).
Best regards,
Anne Marie Treguier
Citation: https://doi.org/10.5194/egusphere-2024-2363-CC1 -
CC2: 'Comment on egusphere-2024-2363', Isla Simpson, 05 Sep 2024
I have just a minor comment as I was using this paper as I prepared some opportunities for the CMIP7 data request and I couldn't find the information on what the pressure levels actually are for the various options i.e., the 19, 8 and 3 pressure level options for the atmosphere. I think it would be helpful to have what those pressure levels are actually listed so that this could be a stand alone resource for people to find out about the options available to them from these baseline variables. Sorry if I've missed it somewhere.
Citation: https://doi.org/10.5194/egusphere-2024-2363-CC2 -
CC3: 'Comment on egusphere-2024-2363', Alistair Adcroft, 06 Sep 2024
I'm surprised to see Oday.sos (surface salinity) but not Oday.zos (table A6). I'm unclear what the purpose of sos is at such high frequency? I believe daily zos (and zostoga) would be more widely used (e.g for local sea-level analysis, mesoscale activity, ...) and should be a baseline variable.
Citation: https://doi.org/10.5194/egusphere-2024-2363-CC3 -
CC4: 'Comment on egusphere-2024-2363', Baylor Fox-Kemper, 06 Sep 2024
This is a critically important topic, and it will inform all of the CMIP7 results. I have two suggestions (at this moment) for alterations.
1) Omon.zos should be converted to Oday.zos. The daily sea level is important for extreme event diagnosis (as tos and sos are). This variable is a critical one for both impacts and input for downscaling, and is particularly revealing in showing the *failures* of coarse resolution models to reproduce SSH variance as high resolution models do (see Fig. 9.12 of AR6 WGI, panels g-i, which had to be created using resources outside of CMIP6 ones because 0day.zos was not included).
2) There is an issue with only collecting bigthetao, in that most ocean models do not use the TEOS-10 equation of state. McDougall et al. made a recommendation to address this point (https://doi.org/10.5194/gmd-14-6445-202), but the data details for thetao and bigthetao presently do not allow this option (use whichever is the model "native" variable to calculate OHCA.). Thus, at a bare minimum *either* bigthetao or thetao, whichever is the model native, should be in Omon here. Furthermore, there is an ongoing assessment within the OMIP team noting that bigthetao cannot be compared to observations easily, which are presently mostly categorized in observational climatologies via thetao. Thus, even if a model is using bigthetao, a comparison to observations (e.g., AR6 many figures comparing temperature and OHCA to observed temperatures and OHCA in Chps 2, 7, 9, 10, 11...).
Citation: https://doi.org/10.5194/egusphere-2024-2363-CC4 -
CC5: 'Comment on egusphere-2024-2363', Nathan Gillett, 19 Sep 2024
Congratulations on this manuscript! I have one suggestion. We expect that emissions-driven simulations will play a bigger role in CMIP7, and expect that more models in CMIP7 will include coupled carbon cycles than in CMIP6. If groups submit emissions-driven simulations, it will be essential to know the simulated CO2 concentration in order to interpret the results; and if groups submit concentration-driven simulations it would be very helpful to be able to diagnose compatible CO2 emissions, for example to calculate remaining carbon budgets. Also, a calculation of compatible emissions in the 1pctCO2 simulations would be needed to diagnose Transient Climate Response to Emissions (TCRE). These calculations would require monthly mean atmosphere-ocean CO2 flux, atmosphere-land CO2 flux, and atmospheric CO2 concentration or mass. These variables are included in the “Constructing a Global Carbon Budget” opportunity, but that opportunity includes a large number of other variables, and it is possible that some modelling centres would decide not to output these variables. I suggest adding this minimal set of carbon cycle variables to the baseline – with the understanding that of course these can only be provided for models with a carbon cycle.
Citation: https://doi.org/10.5194/egusphere-2024-2363-CC5 -
CC6: 'Comment on egusphere-2024-2363', Christopher Danek, 19 Sep 2024
Hi
Thanks a lot for your efforts! Please see the following comments.
1) In 2.2 it's not clear to me how r1 and r2 are defined, i.e. how "downloads" are measured. It is certainly possible to count the number of download-clicks in a browser or the number of wget-scripts generated via a browser. But what about direct data usage via ssh access to an ESGF node, which I assume a lot of scientific users have? That cannot be counted I guess? Also, can the (successful) execution of a wget command be counted? If yes, that means I could tweak the download statistics by running a trillion wget-cronjobs of an unpopular variable? I would like to see a sentence more about this technical aspect (I could not find any details on this in the two given references Fiore et al. 2021 and the ESGF dashboard).
2) In my view it would make sense to add seawater density to the baseline variables. Its an important variable but does not get much attention in the literature, at least this is my impression. At the same time its rather cumbersome to post-process seawater density. 1) Downloading the two high volume 4D variables thetao and so is time consuming. 2) Utilizing a seawater equation software (e.g. gsw from TEOS10) on this large amount of data is time consuming as well. 3) Some ocean model output is not provided on its native grid (`gn`) but horizontally and/or vertically interpolated (`gr`). Hence, if I post-process seawater density from such interpolated thetao and so, the obtained result is a less accurate (?) representation of the actual density during ocean model runtime. I am aware that seawater density would yield a high volume 4D variable but I wonder if its worth to include it due to the above points.
3) I would find it useful to add global averages/sums of important variables to the baseline variables (e.g. tosga, sosga, siarean, siareas, siextentn, siextents, sivoln, sivols) as they are 1) easy to compute for the modeling centers but not for the user (downloading a lot of data is necessary) and 2) only need a tiny amount of resources.
4) I would find it user-friendly if the utilized potential density threshold and reference level were added to the title and/or CF standard name of the mixed layer depth (mlotst), e.g. "... Defined by Sigma T of 0.03 kg m-3 wrt to model level closest to 10 m depth" or such.
5) In the appendix tables, why is "Radiation" a realm and what means "Weighted Time-Mean" (e.g. SImon.siconc)?
Thanks a lot and cheers,
ChrisCitation: https://doi.org/10.5194/egusphere-2024-2363-CC6 -
RC1: 'Comment on egusphere-2024-2363', Claire Macintosh, 23 Sep 2024
General comments
This paper represents a substantial and important step forward for the CMIP community looking towards CMIP7. The presented BCV list will form the core of the CMIP7 data request, with the underlying groundwork and philosophy having wide ranging implications across the ESM community.
There is some tension in the paper between the concept of a BCV list as it applies to the WCRP modelling multiverse generally, and the specific implementation of this list as the core of the CMIP7 Data Request and its associated tight timescale. I have tried to make clear in this review which aspect is being addressed by each comment.
In addition to the carefully considered results presented here, the author team should be acknowledged for their approach to the transparency of process in the development of the BCVs, which is an excellent example of good practice in the field.
Specific comments
Please note that I have been asked to give this review in part to provide perspective from the observational community. Some comments reflect that request.
Table 1. Stakeholders of the CMIP DR. Row 1. Examples of “communities studying the global climate” is currently restricted to MIP communities. Other direct users of CMIP data also exist outside of the MIP framework, not least a large number of scientific researchers using CMIP to elucidate specific processes or aspects of the climate system outside a specific MIP.
Section 3 Line 288, Line 303 – see Section 5 comment.
Section 3.4 Role from the data user’s perspective
The BCV list as a whole is aimed primarily at modelling centres. However, the manuscript would benefit from more careful consideration of the wider CMIP user and associated observational communities.
The example of the need of some users for high temporal resolution data presented here is important, but by no means the only consideration from the perspective of the wider CMIP user community.
For context: a search of Scopus lists 4189 papers containing “CMIP6” in the title or abstract. Of these, 1371 (33%) also contain a least one observational keyword (observations OR satellite OR in-situ OR reanalysis)[1], increasing to 1720 (41%) if the word “evaluation” is also included. This inexhaustive list of keywords represents therefore a lower bound on the fraction of the CMIP6 community that is using at least one auxiliary dataset alongside CMIP6 data.
The implications for the BCV list are clear. Given that more than a third of the CMIP community is using some kind of observational data, a key role of the BCV list must be not only that it is common across CMIP modelling centres, but also that it provides enough information to downstream users for observational comparisons and evaluation to be possible. This includes e.g. information on pressure levels, variable names that are consistent across the ECV-BCV boundary, variable choices that are suitable for observational evaluation, considerations of relevant observing resolutions, and clear information on methodological choices to generate BCVs. In short, it must ensure that it is externally facing such that it is sufficient for these analyses.
Some discussion of implications and additional requirements for the BCV list for external communities would be beneficial –
- In the general case: What are the implications for exploitation of the BCVs with and without coordination/interoperability with equivalent observational parameter lists (ECVs, GCIs etc.).
- Do these differ for direct vs indirect users (the latter being more likely to be using derived metrics, where the original form and nuance of both the BCV DR and observational data may be obscured).
- In this phase of CMIP: What input is needed from observational or other auxiliary data communities to maximise the interoperability aims of the BCV DR (for example, development of variables that are more directly comparable with model output – e.g. trivially skin vs layer temperature - or techniques and documentation where comparisons are nuanced e.g. vertical integration to a small number of layers vs observing resolution, pitfalls for regional analysis).
- What actions can the BCV DR take aimed at maximising the uptake of BCV DR across this interface and therefore achieving the overarching aims of the exercise, both for this phase and beyond.
- What gaps exist at the interface that should be filled?
- How might future iterations of the list more systematically address the widespread use of auxiliary datasets in analysis of CMIP or ESM MIP data? What is needed in the longer term?
Section 5 Conclusions: The BCV list has wide implications for ESM MIPs generally, but will also in the near future form the core of the CMIP7 data request. Given that there will be immediate and substantial CMIP community interest in the practical implementation of the list, and that numerous downstream communities will begin to make decisions on their respective implementations in preparation for CMIP7, Section 5 would benefit from some discussion on immediate and future next steps, and an aggregation and expansion of relevant issues identified elsewhere in the paper.
Please note that it is not for the authors to necessarily answer in detail to all aspects of the implementation phase, but rather to highlight in this paper issues that must be addressed by next steps, any potential pitfalls, and further community engagement that is needed in order to maximally exploit the careful and detailed work presented here. For example-
- Governance: how will the list be managed and updated? What issues must be addressed?
- For this phase of CMIP: How will any updates or amendments be transparently curated, deployed and communicated to the community.
- For this phase of CMIP: How will this list interact with the wider CMIP7 data request. For example the passing on of specific variable requests to the wider DR communities, where they are assessed as not part of the BCVDR. What action is needed from within and without the BCV community.
- For future phases: Line 303. How might new or emerging variables be fairly and transparently assessed for inclusion (e.g. new land surface or biosphere variables, that may be disproportionately important in the climate services and impacts communities, but do not appear prominently in the CMIP6 data request, or variables that have an easily assessable observational counterpart but may not be essential for model intercomparisons). How might user groups such as those illustrated by the high-resolution example in Sec 3.4 be identified systematically, rather than ad-hoc[2]?
- Future phases: By definition, the existence of this list will create a feedback effect on the most downloaded variables, a core component of its initial derivation. What are the implications for the methodology to update the list going forward? What other issues must be addressed for evolution of the list in the longer term.
- Curation of the list for this phase of CMIP
- What auxiliary information that is not described in this paper is needed for the full implementation of the BCV list. Where will it be available?
- e.g. Table A2, A3 details on pressure levels if needed, any other methodological details required for derivation of BCVs. (I would also strongly suggest some version control and numbering).
- Line 288 Section 3.1 How will new naming conventions be developed and disseminated to the community, or what is needed to address this. Does this need to happen before the AR7 Fast Track runs begin.
- Implications from external/adjacent communities on maximum exploitation of the BCVDR in this phase of CMIP
- Modelling centres and working groups: Are there issues arising from e.g. methodological choices of modelling centres, that are not the responsibility of the BCV list, but that may directly impact its utility (for example, do definitions of mixed layer depth affect how these variables can be intercompared). Are there additional engagement and documentation needs directly relating to BCVs, are there implications for the BCVs from a lack of this engagement, and how can the respective communities collaborate including across the wider CMIP7DR
- Observational community – addressed in earlier comment
- Other neighbouring auxiliary data communities – e.g. downstream modelling exercises, communities using CMIP as boundary conditions, etc. As for observational community, what is needed in terms of engagement on both sides to maximally exploit the BCVDR.
- Immediate next steps of the BCV community.
Technical/minor comments
Ln125: “from”-> “to”?
Footnote 3 on ECVs. The GCOS ECVs span all observation types including in-situ observations, they are not restricted to Earth Observation.
Section 3 title: Second “and” should be “of”?
Table A3 Omon.masscello is missing descriptors in its row
With thanks to the author team,
Claire Macintosh, ESA.
[1] Equivalent numbers from Dimensions.ai (free to access): 5944 articles mention CMIP6 in the title + abstract, of which 2099 (35%) include an observational keyword. This increases to 2538 (43%) if the word ‘evaluation ‘is included, which typically implies some kind of auxiliary data source. Searches conducted 16-Sept-24.
[2] For illustration: Dimensions.ai search “CORDEX” returns 2364 title + abstract results, “CMIP5” returns 6439, but the overlap (CMIP5 AND CORDEX) is only 262, as the majority of the CORDEX community are indirect users of CMIP data. This community will not show up in the methodology as described but is very large and currently not accounted for except via user engagement surveys. The principle of assessment of indirect users is more widely applicable to the BCV concept.
Citation: https://doi.org/10.5194/egusphere-2024-2363-RC1 -
CC7: 'Comment on egusphere-2024-2363', Gaëlle Rigoudy, 24 Oct 2024
Congratulations for this reference paper and the impressive work behind!
Here are suggestions from people from the CNRM-Cerfacs modeling group for some adjustments to BCVs list:
- add sfcWind at 3hr along with uas, uas for var association coherency
- add hurs at 3hr along with huss, tas for var association coherency
- remove hurs at 6hr frequency (since now added at 3hr - see previous point)
- useful to have ta daily on P19 along with ua, va, zg, hus for var association coherency
- remove hus, ua, va, ta at daily frequency, on P8 as it is redundant to have them both on P19 and P8 (P8 included on P19)
- add monthly msftyz (MOC) since it is a basic variable not easy to compute offline
- remove pr at 3hr frequency since already requested at 1hr frequency
- add od550aer at monthly frequency to have minimum information about aerosols (integrated content for all species, important to estimate aerosol radiative forcing) at a low cost (2D monthly variable)
- add hus and zg at 6hrPt (along with ta, ua, va), useful to feed the RCM statistical emulators ; provide them on P7h instead of P3 (to have 950 hPa and 700 hPa)
And a general comment: Would be useful to have a table with the list of pressure levels for each pressure level set.
Citation: https://doi.org/10.5194/egusphere-2024-2363-CC7 -
CC8: 'Comment on egusphere-2024-2363', Gavin A. Schmidt, 25 Oct 2024
I am very conscious of the work that goes into defining these variables and the struggle to keep everyone as happy as possible. Nonetheless, I think there are some important 'meta' considerations that should be informing these choices a little more strongly. These principles come from the notions that a) we are trying to inter-compare models, and b) (where possible) we should be able to compare to observations on a like-for-like basis. At minimum, the authors need to address how these considerations inform the choices, and if they want to continue with these variables (due to inertia, or other reasons) these should be stated. These principles lead to a number of consequences:
First, diagnostics that are specific to a single model should be discarded. They are (by definition) not comparable to other models or observations. Things I would include here are cloud variables (or really anything) defined on atmospheric model levels - since each model has different levels, these are incommensurate (without doing a lot of work, which might be impossible to do correctly post-hoc). This goes as well for ocean variables on model levels - these should be defined on fixed depths (or more technically) fixed pressure levels.
Secondly, variables that are differently defined in different models and observations are just a recipe for confusion. I would include in this, cloud fraction, or cloud cover variables. In the observations, there are observational constraints that define a minimum optical depth that 'counts' for a cloud (that could be variable in space in time) that is not used in the models (or it might be, and might differ across different models too).
Finally, there should be a greater emphasis on derived variables (i.e. variables for which observations exist, but that aren't prognostic variables in the modells).
More specific points:
Cloud ice and cloud water: These are model conceptions that do not exist in the real world nor in the observations. Any observation of either of these quantities cannot distinguish in-cloud variables from falling precipitation. There is a real danger that naive comparisons of these variables with remotely sensed quantities can lead groups to 'overfit' to biased data which could have important consequences for cloud feedbacks and climate sensitivity. These variables need to be forward modeled using remote-sensing lenses (see next point).
Cloud-related forward models: Consistent comparisons of cloud properties (ice/water content, fraction, etc.) should be performed using observation-based forward models such as the COSP package. Most groups have implemented this for CFMIP and this should now be standard for the CMIP variables. They have the benefit of standardizing the diagnostics across models for whatever experiment, and in the historical simulations they provide direct comparisons to the satellite record. This should be a no-brainer.
AMSU/MSU/SSU atmospheric temperatures. These exist as climate data records since 1979, and yet comparisons with models is much harder than it needs to be. These diagnostics can be coded as relatively simple global weighting (with possibly some variation over land and ocean and high topography - but these are minor issues for the trends).
Ocean heat content. Observations have been sufficient to provide time series over the top 700m and 2000m since the 1960s. These 2D fields should be added to the data request for easier comparison to the observations.
Derived indices: whether this is done by the model groups, or automatically when the data is ingested, we need to have easy access to key indices (Nino3.4, NAO index, NAM/SAM, IOD., GMST etc.). These are a tiny amount of data compared to the rest of the request, and it's frankly ridiculous that these need to be calculated independently by any researcher.
Citation: https://doi.org/10.5194/egusphere-2024-2363-CC8 -
RC2: 'Comment on egusphere-2024-2363', Young Ho Kim, 28 Oct 2024
This paper proposes a list of Baseline Climate Variables for Earth System Modelling (ESM-BCV), aimed at enhancing consistency across various modeling projects. With 132 variables derived from the most frequently used elements in the CMIP6 data request, this list promotes the evaluation and utilization of climate simulations, supporting data consistency in future modeling projects, including CMIP7. This paper offers a valuable resource for the climate modeling community and strengthens data consistency. However, it could benefit from additional detail on the selection criteria, weighting, and the importance of high-volume variables. Such additions would enhance the list’s practicality and scope of application. With these revisions, this paper could serve as an essential tool for Earth system modeling research and policy-making. My detailed comments are as follows:
Comments in Detail:
- While the paper explains the process for selecting the 132 variables, providing more detail on why other significant climate variables were excluded and outlining criteria for future updates would be beneficial. This additional clarity would assist researchers in expanding or adapting the list.
- For example, including 10m surface eastward and northward winds, 2m air temperature, and 2m specific humidity in the 3-hourly data provides valuable meteorological parameters essential for analyzing near-surface dynamics. However, the absence of downwelling shortwave radiation and cloud fraction in this dataset limits the ability to comprehensively assess ocean-atmosphere interactions. Both downwelling shortwave radiation and cloud fraction are critical for understanding surface energy fluxes and cloud-mediated radiation effects, which directly impact sea surface temperatures and mixed-layer dynamics. Including these parameters would significantly enhance the utility of the 3-hourly data for accurately evaluating heat exchange processes and cloud-related feedbacks in ocean-atmosphere interactions, providing a more complete picture of the surface energy budget.
- Additionally, including ocean mixed layer thickness in the dataset would greatly enhance the ability to analyze ocean-atmosphere interactions. Mixed layer thickness is a key parameter that influences and responds to surface heat fluxes, wind forcing, and freshwater input, all of which are essential for understanding energy and momentum exchange between the ocean and atmosphere. This variable is also crucial for interpreting subsurface thermal dynamics and stratification changes that affect upper-ocean mixing and biogeochemical processes. Adding mixed layer thickness to the dataset would provide a more comprehensive framework for evaluating how surface conditions drive ocean responses, thereby supporting a holistic approach to studying coupled ocean-atmosphere processes.
- The lack of weighting or prioritization criteria for each selection indicator is noted. Providing specifics on how each criterion influenced the final list would support researchers in developing similar data requests.
- While this list has the potential to enhance interoperability across models, discussing plans to expand it with additional variables necessary for regional modeling or high-resolution climate predictions would be helpful.
- Some variables are marked as "high volume" and can be selectively produced based on available resources. Providing more insight into the critical importance of these high-volume variables would guide users in determining when to prioritize these variables.
Citation: https://doi.org/10.5194/egusphere-2024-2363-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
724 | 406 | 215 | 1,345 | 7 | 6 |
- HTML: 724
- PDF: 406
- XML: 215
- Total: 1,345
- BibTeX: 7
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1