the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
psit 1.0: A System to Compress Lagrangian Flows
Abstract. Meteorological simulations produce large amounts of data, which can represent a challenge when trying to store, share, and analyze it. As weather and climate models increasingly simulate the atmosphere at higher spatio-temporal resolution, it becomes imperative to compress the data effectively. While compression algorithms exist for weather data stored in a gridded Eulerian frame, there are, to date, no specialized alternatives for data stored in the Lagrangian frame. In this study, we present psit, a system to compress weather data stored in the Lagrangian frame. The system works by mapping the trajectories to a grid structure, performing additional encodings on these, and passing them to either the JPEG 2000 image compression algorithm or SZ3. The specialty of the algorithm is the mapping phase and the following encodings, which generate the grids in a way that allows the aforementioned compression algorithms to perform well. To gauge the performance of psit, we test a variety of metrics. We demonstrate that in the majority of cases, equivalent or superior compression performance is attained through the utilization of psit as opposed to naive compression with ZFP. We also compare compression with measurement inaccuracies. Here, we show that the density of 168 hour long trajectories compressed with a ratio in the range of 30 to 40 behaves similarly to trajectories calculated from uncompressed wind fields with additional random perturbations with magnitude of 0.1 ms-1 in the horizontal and around 6⋅10-3 Pa s-1 in the vertical component. Additionally, we conduct two case studies in which we discuss the impact of compression on the study of warm conveyor belts associated with extratropical cyclones and the impact of compression on the radioactive plume prediction of the Fukushima incident in 2011.
- Preprint
(12526 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-793', Robert Underwood, 19 Nov 2025
-
RC2: 'Comment on egusphere-2025-793', Anonymous Referee #2, 18 Dec 2025
"psit 1.0: A System to Compress Lagrangian Flows"Alexander Pietak, Langwen Huang, Luigi Fusco, Michael Sprenger, Sebastian Schemm, and Torsten Hoefler
SUMMARY:This manuscript concerns compression algorithms for weather and climate data stored in Lagrangian frames. The process involves mapping the trajectories to a grid structure where they can then be compressed by a standard lossy compressor. Results from various experiments are provided and indicate that the proposed method is generally useful.
GENERAL:
This paper is a solid start with some interesting results for compressing data in the Lagrangian frame of reference, which I haven't seen before. I believe the approach for transforming and mapping the trajectories was innovative. However, I think improvements are needed in a few areas, particularly regarding the experiments and their explanations. Here I'll list some general comments, with more specific details below.
- I'd like to see more justification/explanation for different experimental choices. It's not clear to me why certain compressor choices were made for different experiments. For example in 3.2., the text says that Listing 1 led to "good results". Based on what? Was it optimized somehow? Then bitrounding is used for one variable later on. Also why is the comparison between psit and zfp? Can sz3 only used within psit? Can zfp not be used within psit as well on the grids? I would think zfp could be more effective than jpeg ...
- It is not always clear why a specific amount of compression was chosen.
- Many of the figure and table captions could be improved with more text / explanations.
SPECIFIC COMMENTS AND QUESTIONS:
- Line 286: Why is sz3 used for pressure? Did other approaches not work for this variable? Were other approaches superior to sz3 on the other variables?
- Line 337: Not all lossy compressors result in Gaussian error distributions. Please see Peter Lindstrom's paper from 2017 (https://www.osti.gov/servlets/purl/1526183)
- Line 336: Why did you choose a factor of 30?
- Line 351: Why did you choose to use bitrounding on the wind field? Bitrounding was not mentioned previously, so it feels like a surprise.
- Line 374-375: For bitrounding, the lossless compressor that it is paired with can make a big difference (e.g., the newish Pcodec, or Pco, compressor can be quite an improvement). Which lossless method was used here?
- What versions of zfp and sz3 did you use? For zfp, it looks like you used the absolute tolerance mode (hence the "tolerance" in the tables in the appendix), but this is not specified.
- Line 441, re: "..the input data trajectories need to be uniformly distributed." : How common is this in practice? I don't have a sense on whether this requirement is restrictive or not.
- Section 4.2 (3rd paragraph): There is a lot of existing work that argues that simple metrics are not sufficient for evaluating the effects of lossy compression on weather and climate data. At least some other work should be cited. Here are a few earlier works that come to mind:
Baker 2014: doi:10.1145/2600212.2600217
Baker 2016: doi:10.5194/gmd-9-4381-2016
Poppick 2020: doi:10.1016/j.cageo.2020.104599-Tables B1-B6: I don't particularly care for the choice to scale some of the values by 100 (indicated by a "*'). It makes it harder to glance down the column. Maybe use more digits or round or ?
- Table B2-B3: "appears to be related to how ZFP works". Can you provide a more meaningful explanation?
- FIgure 12, 13: Have you considered normalizing these error metrics for plotting so that the y-axis extents could potentially be the same for each error metric across a variable?
- I'd like to know what the min/max values are for the variables that are being compressed so I have an idea of what an error tolerance of 20 means, for example.
- Appendix data. There is a lot of data in the appendices (especially B) , which isn't necessarily a problem, but it should be there for a reason (i.e., referred to with some discussion in the paper or appendix itself). And the volume makes it harder to make meaningful comparisons. For example, for tables B1 and B2, what is interesting to me is to compare the error values for psit versus zfp at factors/tolerances which yield a similar compression ratio. So, listing psit with factor 25 next to zfp with tolerance 2.0 is informative because both yield a compression rate of ~10. That helps me consider which is better quality-wise for the same data reduction.
Also I'm skeptical that the amount if compression in the lower rows of these tables (e.g., B1 and B2) is something that would ever be used in practice for climate and weather data, but feel free to argue otherwise.
MINOR ISSUES:
- Consider using the same block font for psit that you have used for zfp, sz3, jpeg2000, etc.
- Line 250: Awkward phrasing- Line 251: "ZFP" is in a block font in other occurrences in the paper
- Most (if not all) opening quotes appear to be backwards (e.g., line 118)
- line 156: "denotes if" => "denotes whether"
- Figure 5: consider making this caption more descriptive
- Figure 16-18: color bars are not labeled
Citation: https://doi.org/10.5194/egusphere-2025-793-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,049 | 96 | 24 | 1,169 | 40 | 59 |
- HTML: 1,049
- PDF: 96
- XML: 24
- Total: 1,169
- BibTeX: 40
- EndNote: 59
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Summary:
The paper presents a family of compression routines called PSIT for unstructured grid data and evaluates them on various weather datasets against ZFP, and Bitrounding. Depending on the context, it is competitive with ZFP in some circumstances. The paper contains serious methodological flaws compared to the state of art for compression research generally and in this domain specifically and needs a major revision before it should be considered for acceptance.
Novelty:
+ The proposed approach of converting the input data to gridded images appears to be somewhat novel but is not completely unprecedented. Similar approaches have been used with TEZip for compressing arbitrary floating-point data instead of image data.
+ The exact application of this technique to unstructured grid data appears to be novel.
Self-Contained:
+ On line 336, the authors choose a "compression factor of 30". Why was this value chosen? Additionally, the authors should consider multiple values of CRs for this analysis.
+ The authors present code for the proposed method, allowing reproducibility of their method
Correct
+ This paper misses key references to work on the compression of unstructured grid data which would be more appropriate comparisons for thsi work
+ https://dl.acm.org/doi/pdf/10.1145/3733104 has a survey that identifies 3 compressors that support unstructured grids (MGARD, zMESH, and AMR-COMP)
+ https://doi.org/10.1109/PacificVis48177.2020.6431 discusses an approach for 2d and 3d vector field compression including on unstructured meshes
+ https://doi.org/10.1109/TVCG.2022.3214821 discusses an approach for preserving topology in unstructured mesh data.
+ https://doi.org/10.1111/cgf.15097 identifies a prediction-traversal method for compressing unstructured meshes
+ https://doi.org/10.1109/IPDPS64566.2025.00040 discusses a preprocessing-based approach that uses a prioritized depth-first search and adaptations to the predictors to perform unstructured grid compression
+ The paper should reference the notion of rate distortion in the discussion of Figure 12, and include the PSNR to be more consistent with the standard presentation of rate distortion results, as done in the ZFP and SZ line of papers
+ Some statements in the paper are not insightful, such as "In general, the L-infinity error is larger than the L1 and RMSE errors." By definition, it is always true that $L_\infty \geq L_1$ because the max upper bounds the average.
+ What justifies the author's "[exception that] the error distribution ... [follows] a Gaussian". This actually depends on the design of the compressor and the error bound chosen. See "Error Distributions of Lossy Floating-Point Compressors" by Lindstrom 2017.
+ The limitations of bit rounding-based compressors are well documented in prior work for both SZ and ZFP-based compressors, including papers cited here. It is unclear why this was used as a comparison in section 3.3 of the paper.
+ It is not clear why the authors compared ZFP and JPEG and not SZ3 when they use ZFP as part of their pipeline in some configurations. It is well known that SZ3 tends to get much higher compression ratios at each quality threshold.
+ There is extensive work on quantifying appropriate error thresholds by Millian Klöwer and Alison Baker using metrics such as SSIM and dSSIM. Why did the authors not use the metrics in their analysis?
+ The authors should specify the versions of ZFP and SZ3 used in their work, as newer versions have updated the default algorithm to higher-performing versions. For ZFP, there are versions with much higher parallel performance or support for additional modes with parallel compression.
+ The authors should specify the error bounding type (runtime) and rounding mode (compile time) used with ZFP for their results.
Writing
+ The writing is verbose
+ E.g., "While this approach is conceptually sound, it quickly, [sic] becomes computationally infeasible, so it found little use in the finished pipeline. However, we still include it here for its theoretical insights." could have been written "solving an LP is computational infeasible for large images, but included for theoretical analysis."
+ Additionally, there are many run-on sentences that span 3 or more lines of text.
+ Lastly, 21 figures (not including subfigures) seems excessive
Questions:
On line 330, the authors state, "For longer time ranges, the performance of psit starts to degrade.", Do the authors have an explanation that explains this discrepancy?