New insights on the suitability of NetCDF/HDF5 as storage format for climate cloud repositories
Abstract. Climate data analysis increasingly relies on cloud infrastructures to offer new and efficient methods of accessing the necessary climate data. Cloud repositories allow access to such data in a remote data access basis, which allows users to retrieve and manipulate their data without requiring file downloads, reducing storage costs resulting in more efficient systems. Together, climate cloud repositories and remote data access are evolving fast, due to the necessity of collaboration that brings together diverse communities to address challenges in climate science. In recent years, a prevailing discourse has emerged suggesting that traditional climate data storage formats are inherently unsuitable for remote data access. In this work, we present new insights that challenge this discourse and demonstrate that established storage formats such as NetCDF/HDF5 can continue to operate efficiently in cloud environments when accessed remotely. These findings contrast with the widespread perception that such formats are inherently unsuitable for cloud based workflows. In the context of the onset of CMIP7, these insights have the potential to substantially enhance climate data access and analysis for the broader research community without incurring major maintenance burdens.