CMIP6 data usage: Lessons learned from more than 200 million downloads
Abstract. Earth system simulations from the Coupled Model Intercomparison Project (CMIP) are considered the gold standard in terms of representation of the Earth’s climate system, its past and present states, and future evolution. As CMIP moves into its seventh phase, the increasing complexity of Earth system models (ESMs) means that there is a greater need for infrastructure resources to store, distribute and utilize CMIP simulations. Statistics on the usage of data during CMIP6 has the potential of offering guidance to prepare for CMIP7. Here, we analyse the usage of CMIP6 data and propose recommendations for optimizing the production and accessibility of future CMIP data. Our analysis focuses on CMIP6 data usage statistics from the Earth System Grid Federation (ESGF), the main database of CMIP and other ESMs simulation data. We perform an analysis of CMIP6 ESGF data usage statistics, with a focus on the usage of variables, experiments, individual thematic Model Intercomparison Projects (MIPs), sources and institutions, and related geographical usage trends. We further include statistics on usage from other sources hosting CMIP6 data, including some curated by community portals (Pangeo) through commercial clouds (Google Cloud and Amazon Web Services) and by climate services (Copernicus Climate Change Service). We conclude with recommendations for centres involved in the production and distribution of data to optimise resources based on usage statistics, and to implement improved approaches to track usage.