psit 1.0: A System to Compress Lagrangian Flows
Abstract. Meteorological simulations produce large amounts of data, which can represent a challenge when trying to store, share, and analyze it. As weather and climate models increasingly simulate the atmosphere at higher spatio-temporal resolution, it becomes imperative to compress the data effectively. While compression algorithms exist for weather data stored in a gridded Eulerian frame, there are, to date, no specialized alternatives for data stored in the Lagrangian frame. In this study, we present psit, a system to compress weather data stored in the Lagrangian frame. The system works by mapping the trajectories to a grid structure, performing additional encodings on these, and passing them to either the JPEG 2000 image compression algorithm or SZ3. The specialty of the algorithm is the mapping phase and the following encodings, which generate the grids in a way that allows the aforementioned compression algorithms to perform well. To gauge the performance of psit, we test a variety of metrics. We demonstrate that in the majority of cases, equivalent or superior compression performance is attained through the utilization of psit as opposed to naive compression with ZFP. We also compare compression with measurement inaccuracies. Here, we show that the density of 168 hour long trajectories compressed with a ratio in the range of 30 to 40 behaves similarly to trajectories calculated from uncompressed wind fields with additional random perturbations with magnitude of 0.1 ms-1 in the horizontal and around 6⋅10-3 Pa s-1 in the vertical component. Additionally, we conduct two case studies in which we discuss the impact of compression on the study of warm conveyor belts associated with extratropical cyclones and the impact of compression on the radioactive plume prediction of the Fukushima incident in 2011.
Summary:
The paper presents a family of compression routines called PSIT for unstructured grid data and evaluates them on various weather datasets against ZFP, and Bitrounding. Depending on the context, it is competitive with ZFP in some circumstances. The paper contains serious methodological flaws compared to the state of art for compression research generally and in this domain specifically and needs a major revision before it should be considered for acceptance.
Novelty:
+ The proposed approach of converting the input data to gridded images appears to be somewhat novel but is not completely unprecedented. Similar approaches have been used with TEZip for compressing arbitrary floating-point data instead of image data.
+ The exact application of this technique to unstructured grid data appears to be novel.
Self-Contained:
+ On line 336, the authors choose a "compression factor of 30". Why was this value chosen? Additionally, the authors should consider multiple values of CRs for this analysis.
+ The authors present code for the proposed method, allowing reproducibility of their method
Correct
+ This paper misses key references to work on the compression of unstructured grid data which would be more appropriate comparisons for thsi work
+ https://dl.acm.org/doi/pdf/10.1145/3733104 has a survey that identifies 3 compressors that support unstructured grids (MGARD, zMESH, and AMR-COMP)
+ https://doi.org/10.1109/PacificVis48177.2020.6431 discusses an approach for 2d and 3d vector field compression including on unstructured meshes
+ https://doi.org/10.1109/TVCG.2022.3214821 discusses an approach for preserving topology in unstructured mesh data.
+ https://doi.org/10.1111/cgf.15097 identifies a prediction-traversal method for compressing unstructured meshes
+ https://doi.org/10.1109/IPDPS64566.2025.00040 discusses a preprocessing-based approach that uses a prioritized depth-first search and adaptations to the predictors to perform unstructured grid compression
+ The paper should reference the notion of rate distortion in the discussion of Figure 12, and include the PSNR to be more consistent with the standard presentation of rate distortion results, as done in the ZFP and SZ line of papers
+ Some statements in the paper are not insightful, such as "In general, the L-infinity error is larger than the L1 and RMSE errors." By definition, it is always true that $L_\infty \geq L_1$ because the max upper bounds the average.
+ What justifies the author's "[exception that] the error distribution ... [follows] a Gaussian". This actually depends on the design of the compressor and the error bound chosen. See "Error Distributions of Lossy Floating-Point Compressors" by Lindstrom 2017.
+ The limitations of bit rounding-based compressors are well documented in prior work for both SZ and ZFP-based compressors, including papers cited here. It is unclear why this was used as a comparison in section 3.3 of the paper.
+ It is not clear why the authors compared ZFP and JPEG and not SZ3 when they use ZFP as part of their pipeline in some configurations. It is well known that SZ3 tends to get much higher compression ratios at each quality threshold.
+ There is extensive work on quantifying appropriate error thresholds by Millian Klöwer and Alison Baker using metrics such as SSIM and dSSIM. Why did the authors not use the metrics in their analysis?
+ The authors should specify the versions of ZFP and SZ3 used in their work, as newer versions have updated the default algorithm to higher-performing versions. For ZFP, there are versions with much higher parallel performance or support for additional modes with parallel compression.
+ The authors should specify the error bounding type (runtime) and rounding mode (compile time) used with ZFP for their results.
Writing
+ The writing is verbose
+ E.g., "While this approach is conceptually sound, it quickly, [sic] becomes computationally infeasible, so it found little use in the finished pipeline. However, we still include it here for its theoretical insights." could have been written "solving an LP is computational infeasible for large images, but included for theoretical analysis."
+ Additionally, there are many run-on sentences that span 3 or more lines of text.
+ Lastly, 21 figures (not including subfigures) seems excessive
Questions:
On line 330, the authors state, "For longer time ranges, the performance of psit starts to degrade.", Do the authors have an explanation that explains this discrepancy?