Preprints
https://doi.org/10.5194/egusphere-2026-60
https://doi.org/10.5194/egusphere-2026-60
09 Feb 2026
 | 09 Feb 2026
Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

ClimateBenchPress (v1.0): A Benchmark for Lossy Compression of Climate Data

Tim Reichelt, Juniper Tyree, Milan Klöwer, Peter Dueben, Bryan N. Lawrence, Allison H. Baker, Sara Faghih-Naini, Torsten Hoefler, and Philip Stier

Abstract. The rapidly growing volume of weather and climate data, both from models and observations, is increasing the pressure on data centers, restricting scientific analysis, and data distribution. For example, kilometre-scale climate models can generate petabytes of data per simulated month, making it generally infeasible to store all output. To address this challenge, numerous novel compression techniques have been proposed to ease data storage requirements. However, there exist no well-defined benchmarks for rigorously evaluating and comparing the performance of these compressors, including their impact on the data's properties. The lack of benchmarks makes it difficult to design and standardize compressors for weather and climate data, and for scientists to trust that compression errors have no significant impact on their analysis. Here, we address this gap by presenting ClimateBenchPress, a benchmark suite for lossy compression of climate data, which defines both data sets and evaluation techniques. The benchmark covers climate variables following various statistical distributions at medium to very high resolution in time and space, from both numerical models and satellite observations. To ensure a fair comparison between different compressors, each variable comes with a set of maximum error bound checks that the lossy compressors need to pass. By evaluating an initial set of baseline compressors on the benchmark, we gather practical insights for effective application of lossy compression. Our benchmark is open source and extensible: users can easily add new compressors, data sources, and evaluation metrics depending on their own specific use cases.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Tim Reichelt, Juniper Tyree, Milan Klöwer, Peter Dueben, Bryan N. Lawrence, Allison H. Baker, Sara Faghih-Naini, Torsten Hoefler, and Philip Stier

Status: open (until 06 Apr 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Tim Reichelt, Juniper Tyree, Milan Klöwer, Peter Dueben, Bryan N. Lawrence, Allison H. Baker, Sara Faghih-Naini, Torsten Hoefler, and Philip Stier

Model code and software

ClimateBenchPress data-loader Tim Reichelt and Juniper Tyree https://doi.org/10.5281/zenodo.18015682

ClimateBenchPress Compressors Tim Reichelt and Juniper Tyree https://doi.org/10.5281/zenodo.18152639

Tim Reichelt, Juniper Tyree, Milan Klöwer, Peter Dueben, Bryan N. Lawrence, Allison H. Baker, Sara Faghih-Naini, Torsten Hoefler, and Philip Stier
Metrics will be available soon.
Latest update: 09 Feb 2026
Download
Short summary
The growing size of datasets used in climate science makes it difficult to store, analyze, and distribute dataset. Lossy compression algorithms can significantly reduce the disk space required to store datasets, but it can be difficult to understand and compare the behavior of different compression algorithms. ClimateBenchPress provides a benchmark to standardize comparisons between lossy compression algorithms and guide development of novel algorithms specifically targeted towards climate data.
Share