the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Compression of ERA5 meteorological reanalysis data and their application to simulations with the Lagrangian model for Massive Parallel Trajectory Calculations (MPTRAC v2.7)
Abstract. Computer performance has increased immensely in recent years, but the ability to store data has only increased slightly. The storage requirements for the current version of the ERA5 meteorological reanalysis data provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) have increased by a factor of ∼80 compared to its predecessor ERA-Interim. This presents scientists with major challenges, especially if data covering several decades is to be stored on local computer systems. Accordingly, many compression methods have been developed in recent years with which data can be stored either lossless or lossy. Here we test three of these methods: two lossy compression methods, ZFP and Layer Packing (PCK), and the lossless compressor ZStandard (ZSTD). We investigate how the use of these compressed data affects the results of Lagrangian air parcel trajectory calculations with the Lagrangian model for Massive-Parallel Trajectory Calculations (MPTRAC). We analyzed 10-day forward trajectories that were globally distributed over the free troposphere and stratosphere. The largest transport deviations (up to 1600 km) were derived when using ZFP with the largest compression (CR=25). Using a less strong compression we could reduce the transport deviation (up to 100 km) and still obtain a significant compression (CR=7). Since ZSTD is a lossless compressor, we derive no transport deviations when using these compressed files for trajectory calculations, but do not reduce the use of disk space significantly using this compressor (reduction of ∼30 %, CR=1.5). The best compromise concerning compression efficiency and transport deviations is derived with the layer packing method PCK. The data is compressed by about 50 % (CR=2) but horizontal transport deviations do not exceed 40 km. Thus, our study shows that the PCK compression method would be valuable for application in atmospheric sciences and that with compression of the ERA5 meteorological reanalyses data one can overcome the challenges of high demand of disk space from this data set.
Competing interests: Lars Hoffmann is an Editor of Geoscientific Model Development.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(3478 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-3147', Anonymous Referee #1, 08 Aug 2025
-
AC1: 'Reply on RC1', Farahnaz Khosrawi, 17 Sep 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3147/egusphere-2025-3147-AC1-supplement.pdf
-
EC1: 'Reply on AC1', Peter Caldwell, 17 Sep 2025
To aid the authors with producing a manuscript revision which will quickly be accepted, I'd like to point out that I found responses to reviewer 1's major concerns 2 and 3 to be insufficient. The goal of GMD is to provide readers with an understanding of *cutting-edge* innovations in geophysics. Evaluating against old/suboptimal implementations of rival schemes is at best not state-of-the-art and at worst misleading. I'm sympathetic with the fact that adding new compression methods would be a lot of work, but I will need to see much better justification for keeping things as-is and/or big changes to the manuscript to feel comfortable publishing this work without including more modern compression methods in the evaluation.
Citation: https://doi.org/10.5194/egusphere-2025-3147-EC1
-
EC1: 'Reply on AC1', Peter Caldwell, 17 Sep 2025
-
AC1: 'Reply on RC1', Farahnaz Khosrawi, 17 Sep 2025
-
RC2: 'Comment on egusphere-2025-3147', Anonymous Referee #2, 17 Aug 2025
This study evaluated various compression models at different compression rates applied to ERA5 reanalysis data. The compressed ERA5 output was then used to drive MPTRAC for trajectory calculations. The paper investigates how both the choice of compression scheme and compression rate affect the accuracy of trajectories computed by MPTRAC.
The manuscript is well-structured overall, but the writing could benefit from some revision for clarity and flow, as it’s currently a bit difficult to read. Below are my comments:
Major Commnets
Fig 5-Fig6
- Do you have an explanation for why the deviation is larger near the surface
- I am curious to know why the deviation is negligible in the days < 4 regardless of the compression method and compression level? Why does the deviation grow in time? The similar think is also observed in Fig 7 as well.
Minor Comments
Line 149-151: Is there a reason for choosing the tolerance option for geopotential height and Temperature while choosing the precision mode for all other variables?
- Line 224-226: It was mentioned that the correlation coefficient reached 0.99999 when the precision mode was used. What’s the reason for choosing the tolerance option for T?
Line 233-235: According to Table 1, file size after the compression by PCK is larger than the file size after the compression by ZFP. Why do you think reading input file takes the shortest time for PCK and not ZFP compressed files?
Fig4: Lines are labeled as physcis, input, output, and total. But the caption or the main text is missing the explanations for those labels.
I would suggest labelling each panel in each figure.
Is Fig5-6 are ensemble mean? Or from a single trajectory?
Line 285-287: Do you mean that the maximum deviation frequencies are similar between the trajectories started at tropospheric and stratospheric altitudes?
Citation: https://doi.org/10.5194/egusphere-2025-3147-RC2 -
AC2: 'Reply on RC2', Farahnaz Khosrawi, 17 Sep 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3147/egusphere-2025-3147-AC2-supplement.pdf
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
922 | 40 | 11 | 973 | 19 | 17 |
- HTML: 922
- PDF: 40
- XML: 11
- Total: 973
- BibTeX: 19
- EndNote: 17
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
The manuscript investigates the lossy compression of ERA5 reanalysis data and its impact on trajectory calculations using the Lagrangian model for Massive-Parallel Trajectory Calculations (MPTRAC). The topic is timely and relevant, with clear implications for data storage optimisation, I/O performance, and reproducibility in geoscience workflows. ERA5 is one of the most widely used reanalysis datasets in the atmospheric sciences, with applications ranging from climate research to operational forecasting, and it is increasingly important as a training and validation source for AI models. The authors assess two lossy compression methods (ZFP and Layer Packing, PCK) and one lossless compressor (ZSTD), examining their influence on 10-day forward trajectories distributed globally in the free troposphere and stratosphere.
The study addresses an important problem and contributes to the relatively underexplored area of quantifying the impact of lossy compression on scientific analyses. Work along these lines can help the community better understand these impacts and support the adoption of compression methods that offer substantial data reduction without compromising scientific integrity.
However, the current version omits substantial parts of the relevant literature and does not sufficiently engage with the state of the art in scientific lossy compression. Several methodological choices also weaken the strength of the conclusions. The following points require major attention:
1. Omission of relevant literature – Key works are missing, including Tinto et al. (2024, GMD, [https://doi.org/10.5194/gmd-17-8909-2024](https://doi.org/10.5194/gmd-17-8909-2024)), which also deals with the impact of lossy compression of geoscientific data, and other publications that define the current state of the art.
2. Ignoring state-of-the-art compressors – The evaluation is limited to ZFP, PCK, and ZSTD, omitting widely recognised high-performance compressors such as SZ and MGARD. Without including these methods, the results cannot be considered representative of current capabilities of lossy compression for ERA5 data compression.
3. Suboptimal use of ZFP – ZFP is applied in precision mode, which the literature reports as less efficient and with poorer rate–distortion performance than accuracy mode. This disadvantages ZFP in the comparisons and may bias the conclusions.
4. Unsupported claim that PCK is the “best choice” – The conclusion that PCK is the most suitable compressor lacks supporting evidence, as state-of-the-art methods are not included and ZFP is used in a suboptimal configuration. This risks misleading readers about PCK’s competitiveness.
Recommendation: I recommend major revisions. The authors should (1) expand the literature review to include key recent works and provide proper context on the state of the art, (2) revisit the ZFP configuration to use competitive modes reported in the literature, and (3) either include tests with state-of-the-art compressors such as SZ or explicitly limit their claims, providing a clear justification for the exclusion of these methods.