ClimateBenchPress (v1.0): A Benchmark for Lossy Compression of Climate Data

Reichelt, Tim; Tyree, Juniper; Klöwer, Milan; Dueben, Peter; Lawrence, Bryan N.; Baker, Allison H.; Faghih-Naini, Sara; Hoefler, Torsten; Stier, Philip

doi:10.5194/egusphere-2026-60

Preprints

https://doi.org/10.5194/egusphere-2026-60

Preprints

09 Feb 2026

| 09 Feb 2026

Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

ClimateBenchPress (v1.0): A Benchmark for Lossy Compression of Climate Data

Tim Reichelt, Juniper Tyree, Milan Klöwer, Peter Dueben, Bryan N. Lawrence, Allison H. Baker, Sara Faghih-Naini, Torsten Hoefler, and Philip Stier

Abstract. The rapidly growing volume of weather and climate data, both from models and observations, is increasing the pressure on data centers, restricting scientific analysis, and data distribution. For example, kilometre-scale climate models can generate petabytes of data per simulated month, making it generally infeasible to store all output. To address this challenge, numerous novel compression techniques have been proposed to ease data storage requirements. However, there exist no well-defined benchmarks for rigorously evaluating and comparing the performance of these compressors, including their impact on the data's properties. The lack of benchmarks makes it difficult to design and standardize compressors for weather and climate data, and for scientists to trust that compression errors have no significant impact on their analysis. Here, we address this gap by presenting ClimateBenchPress, a benchmark suite for lossy compression of climate data, which defines both data sets and evaluation techniques. The benchmark covers climate variables following various statistical distributions at medium to very high resolution in time and space, from both numerical models and satellite observations. To ensure a fair comparison between different compressors, each variable comes with a set of maximum error bound checks that the lossy compressors need to pass. By evaluating an initial set of baseline compressors on the benchmark, we gather practical insights for effective application of lossy compression. Our benchmark is open source and extensible: users can easily add new compressors, data sources, and evaluation metrics depending on their own specific use cases.

Received: 05 Jan 2026 – Discussion started: 09 Feb 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Tim Reichelt, Juniper Tyree, Milan Klöwer, Peter Dueben, Bryan N. Lawrence, Allison H. Baker, Sara Faghih-Naini, Torsten Hoefler, and Philip Stier

Status: open (until 06 Apr 2026)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2026-60', Anonymous Referee #1, 03 Mar 2026 reply

Review of “ClimateBenchPress (v1.0): A Benchmark for Lossy Compression of Climate Data” by Tim Reichelt et al.
General comments
This new paper presents ClimateBenchPress, an open-source benchmark designed to standardize the evaluation of lossy compression algorithms for climate and weather data, addressing the current lack of consistent comparison frameworks in the field. As climate datasets grow to petabyte scale, lossy compression becomes essential, yet existing studies differ widely in datasets, error tolerances, and evaluation metrics. ClimateBenchPress includes diverse model and observational datasets, defines error bounds derived from uncertainty estimates, and provides distortion and compression metrics to enable fair comparison across compressors.
Testing several state-of-the-art methods (e.g., SZ3, ZFP, SPERR, JPEG2000, and rounding-based approaches), the authors show that while some achieve very high compression ratios, they may violate error bounds or mishandle edge cases such as NaNs. Simpler methods, such as bit rounding combined with optimized lossless compression, offer competitive and often more robust performance. Overall, the results demonstrate that no single compressor dominates across all variables and metrics, highlighting key trade-offs and the need for a standardized benchmark.
In my view, this is a scientifically sound study with robust and well-substantiated results. The benchmark is thoughtfully designed, the methodology transparent, and the evaluation framework rigorous and reproducible. The manuscript is well written, clearly structured, and accessible to both compression specialists and climate scientists. I recommend publication in GMD, subject to the minor comments below.
Specific comments
Lines 26–28: The example given can be justified conceptually but would benefit from clarification. Global mean temperature is statistically robust to small, spatially uncorrelated compression errors that may cancel upon averaging. In contrast, local wind power estimates depend on nonlinear relationships and fine-scale variability, making them potentially more sensitive to small local errors. Clarifying this reasoning would avoid confusion.
Lines 43–48: The term “variable characteristics” is vague. Please clarify which properties are meant (e.g., statistical distribution, intermittency/sparsity, smoothness vs. sharp gradients, spatial/temporal correlation scales, dynamic range, NaNs, extremes, etc.). A few examples would improve clarity.
Line 70: “Actionable insights” sounds generic. Consider specifying the practical guidance provided (e.g., recommendations for particular variable types or error tolerances).
Table 1: Cloud-related variables (e.g., liquid water content) are not included, although they represent a challenging case due to their 3-D structure, sharp gradients, and large near-zero regions. While sparsity and NaNs are partly represented by precipitation and SST, a brief comment on the exclusion of 3-D cloud condensates would be useful, possibly as a future extension.
Table 1: Since V is mostly 1 here, variables are effectively treated independently. While reasonable, it would help to state explicitly that multivariate compression is beyond the current scope. In practice, many variables are physically correlated (e.g., atmospheric chemistry tracers), and advanced methods may exploit such structure. A brief acknowledgment would strengthen the discussion.
Lines 79–88: The regridding discussion focuses on model output, but similar issues apply to satellite swath data provided in along-track/across-track coordinates. Regridding can alter statistical properties relevant for compression. A brief acknowledgment would clarify that restricting to regular grids simplifies comparability but does not reflect all real-world cases.
Line 101: “Linear packing” is mentioned in the context of the ERA5 data but not explained. A short definition or reference would be helpful.
Lines 128–129: The manuscript states that the absence of standard error bounds is “partly” due to application dependence. “Largely” may be more appropriate, as tolerable error levels are typically driven by downstream applications.
Lines 175–178 and Table 2: For several variables, the gap between the low (100th percentile) and mid (99th percentile) bounds exceeds that between mid (99th) and high (95th) significantly. This suggests sensitivity of the low bound to extreme outliers or heavy-tailed spread distributions. Please comment on this sensitivity and whether slightly lower percentiles (e.g., 99.9%) were considered.
Lines 226–234: Instruction count is a useful reproducible metric, but wall-clock runtime remains highly relevant in practice. Parallelization (multi-threading, GPU support) and peak memory footprint can significantly affect scalability for large datasets. A brief discussion of these practical aspects would be valuable.
Figure 2: The scorecards are helpful. For example, SZ3 often achieves higher compression ratios but also larger error metrics and occasional bound violations. Although discussed later, more explicit guidance in the figure interpretation would help readers assess comparability across methods.
Figure 7: This figure shows that compressors can produce markedly different error distributions, even under identical nominal absolute bounds. While discussed, this reinforces that methods are not strictly comparable under a single bound alone. Future work could consider complementing the current protocol with additional tail metrics (e.g., p99/p99.9) or joint criteria on maximum and distributional error properties.
Lines 468–471: Benchmarking full high-resolution model outputs (terabyte scale) would be highly valuable. Such tests would better reflect modern data volumes and assess compressors under realistic storage, I/O, and scalability constraints, complementing the current laptop-scale setup for HPC environments.
Technical corrections
Line 99: Remove extra “.”
Line 136: Index v runs from 0 to V, implying V+1 elements; is this intended?
Line 318: Rephrase as “This ensures …”
Figures 3 and 6: Adding distinct marker symbols alongside colors and linestyles would improve clarity.

Reply

Citation: https://doi.org/10.5194/egusphere-2026-60-RC1

Tim Reichelt, Juniper Tyree, Milan Klöwer, Peter Dueben, Bryan N. Lawrence, Allison H. Baker, Sara Faghih-Naini, Torsten Hoefler, and Philip Stier

Model code and software

ClimateBenchPress data-loader Tim Reichelt and Juniper Tyree https://doi.org/10.5281/zenodo.18015682

ClimateBenchPress Compressors Tim Reichelt and Juniper Tyree https://doi.org/10.5281/zenodo.18152639

Tim Reichelt, Juniper Tyree, Milan Klöwer, Peter Dueben, Bryan N. Lawrence, Allison H. Baker, Sara Faghih-Naini, Torsten Hoefler, and Philip Stier

Viewed

Total article views: 409 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
254	145	10	409	19	33

HTML: 254
PDF: 145
XML: 10
Total: 409
BibTeX: 19
EndNote: 33

Views and downloads (calculated since 09 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	245	128	10	383
Mar 2026	9	17	0	26

Cumulative views and downloads (calculated since 09 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	245	128	10	383
Mar 2026	9	17	0	26

Viewed (geographical distribution)

Total article views: 405 (including HTML, PDF, and XML) Thereof 405 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 03 Mar 2026

Short summary

The growing size of datasets used in climate science makes it difficult to store, analyze, and distribute dataset. Lossy compression algorithms can significantly reduce the disk space required to store datasets, but it can be difficult to understand and compare the behavior of different compression algorithms. ClimateBenchPress provides a benchmark to standardize comparisons between lossy compression algorithms and guide development of novel algorithms specifically targeted towards climate data.


Total:	0
HTML:	0
PDF:	0
XML:	0