A Comparison of Lossless Compression Algorithms for Altimeter Data

Thevenin, Mathieu; Pigoury, Stephane; Thomine, Olivier; Gouillon, Flavien

doi:https://doi.org/10.5194/egusphere-2022-1094

Preprints

https://doi.org/10.5194/egusphere-2022-1094

Preprints

20 Dec 2022

| 20 Dec 2022

A Comparison of Lossless Compression Algorithms for Altimeter Data

Mathieu Thevenin, Stephane Pigoury, Olivier Thomine, and Flavien Gouillon

Abstract. Satellite data transmission is usually limited between hundreds of kilobits-per-second (kb/s) and several megabits-per-second (Mb/s) while the space-to-ground data volume is becoming larger as the resolution of the instruments increases while the bandwidth remains limited, typically. The Surface Water and Ocean Topography (SWOT) altimetry mission is a partnership between the National Aeronautics and Space Administration (NASA) and the Centre National des Études Spatiales (CNES) which uses the innovative KaRin instrument, a K_a band (35.75 GHz) synthetic aperture radar combined with an interforemeter. Its launch is expected for 2022 for oceanographic and hydrological levels measurement and it will generate 7 TeraBytes-per-day, for a lifetime total of 20 PetaBytes. That is why data compression needs to be implemented at both ends of satellite communications. This study compares the compression results obtained with 672 algorithms, mostly based on the Huff- man coding approach which constitute the state-of-the-art for scientific data manipulation, including Computational Fluid Dynamics (CFD). We also have incorporated data preprocessing such as shuffle and bitshuffle, and a novel algorithm named SL6.

Received: 13 Oct 2022 – Discussion started: 20 Dec 2022

Competing interests: Stephane Pigoury is the CEA of Subnet which holds the license of the SL6 algorithm, Mathieu Thevenin holds a 5 % of the shares of Subnet.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Mathieu Thevenin, Stephane Pigoury, Olivier Thomine, and Flavien Gouillon

Status: closed

CEC1:
'Comment on egusphere-2022-1094', Juan Antonio Añel, 13 Jan 2023

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy" on many levels. Indeed, it should have never been published in Discussions before solving the issues listed below.
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html

First, the few code that you have shared is archived on GitHub. However, GitHub is not a suitable repository. GitHub itself instructs authors to use other alternatives for long-term archival and publishing, such as Zenodo. Therefore, please, publish your code in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as it should be available for the Discussions stage. Also, please, include the relevant primary input/output data. In this way, you must include in a potentially reviewed version of your manuscript the modified 'Code and Data Availability' section, the DOI of the code (and another DOI for the dataset if necessary). Also, the GitHub repository does not contain a license. If you do not include a license, despite what you state, the code is not "open-source/libre"; it continues to be your property. Therefore, when uploading the model's code to Zenodo, you could want to choose a free software/open-source (FLOSS) license. We recommend the GPLv3. You only need to include the file 'https://www.gnu.org/licenses/gpl-3.0.txt' as LICENSE.txt with your code. Also, you can choose other options that Zenodo provides: GPLv2, Apache License, MIT License, etc.

Also, we can not accept that it is necessary to contact the authors or request permission to get access to code or data. Both kinds of assets must be published in a permanent repository without the ability of the authors to remove them, and this must be done before submitting the manuscript.

In this way, you must reply to this comment with the link to the repository used in your manuscript, with its DOI. The reply and the repository must be available well in advance (as they should be already available) the Discussions stage is closed, to be sure that anyone has access to it for review purposes.

Please, be aware that failing to comply promptly with this request will result in desk rejection of your manuscript for publication.

Juan A. Añel
Geosci. Model Dev. Exec. Editor

Citation: https://doi.org/10.5194/egusphere-2022-1094-CEC1
- AC1: 'Reply on CEC1', Mathieu Thevenin, 26 Jan 2023
  
  Dear Juan,
  Thank you for your comment.
  We have carefully read the conditions about the codes used in the writing of the articles. Of course, we can provide most of the codes on a viable repository.
  However the SL6 code code is under license which does not allow open source.
  Unfortunately, we cannot provide all the codes allowing full reproduction of our study. Indeed, some codes and tools are not open source.
  We understand the importance of validating our work as well as possible. However, as you know, all the experiments are not necessarily reproducible, for questions of equipment, skills or even time; and I imagine that you do not limit your review to the basic thing.
  
  Now, our question is, is it possible to derogate from this rule for legitimate reasons ?
  
  Thank you for your reply and you interest in our work.
  
  Mathieu THEVENIN
  
  Citation: https://doi.org/10.5194/egusphere-2022-1094-AC1
RC1:
'Comment on egusphere-2022-1094', Anonymous Referee #1, 10 Mar 2023

Please see my comments in the file attached.

Citation: https://doi.org/10.5194/egusphere-2022-1094-RC1
- CC1: 'Reply on RC1', Stephane Pigoury, 02 Apr 2023
  
  Thank you for taking the time to review our study and for your comments.
  
  To respond to you on data-related matters. I emphasize that the data was not selected to favor SL6. Indeed, as specified, our study aims to resume the study "Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files", published here. (https://gmd.copernicus.org/articles/12/4099/2019/). These are the same data. Note also that the data in this publication is also only available on request and that there was no problem of reproducibility for this study.
  
  Citation: https://doi.org/10.5194/egusphere-2022-1094-CC1
RC2:
'Comment on egusphere-2022-1094', Anonymous Referee #2, 15 Mar 2023

This article compares the patented SL6 algorithm of the author(s) with existing Huffman-based lossless compression algorithms, and stresses the excellence of homogeneous results and constant compression time of the proposed SL6, which seems demanding by the satellite missions. The structure looks pretty well. The methods and results sound promising. However, I still have some concerns regarding the contents.

1. Since the SL6 is patented and shows desirable properties, and might be suitable for space missions and satellite communications, the reason why it excels should be highlighted, which might be helpful for scientific research communities.

2. Compression algorithms are actively studied, and there are public competitions such as the CVPR. Though the onboard SL6 algorithm is purposely devised, but I still wonder how it would be compared with the CNN/GAN-based compressions.

3. The Abstract is loosely organized and should be improved.

4. The citations should be correctly and consistently formated.

Citation: https://doi.org/10.5194/egusphere-2022-1094-RC2
- CC2: 'Reply on RC2', Stephane Pigoury, 02 Apr 2023
  
  Thank you for your interest in our study.
  
  The reason why we have not detailed the function of the SL6 algorithm is that this is not a study dedicated to the operation of SL6 technology, but an analysis of the state of the art about loseless data compression. It seemed to us more coherent and more interesting to complete the previous study by introducing a new metric allowing better analysis.
  
  Citation: https://doi.org/10.5194/egusphere-2022-1094-CC2
- AC3: 'Reply on RC2', Mathieu Thevenin, 11 Apr 2023
  
  The reviewer address an interesting point.
  
  Since the study was to compare the the previous study cited in the introduction, we only focused on the compression algorithms that were previously considered in the swot mission. However, adding more compression algorithms would be very interesting. It would be the object of another article or a conference paper.
  Thanks
  
  Citation: https://doi.org/10.5194/egusphere-2022-1094-AC3
RC3:
'Comment on egusphere-2022-1094', H. Xu, 05 Apr 2023
Thank the authors for presenting the comparison of compression algorithms to solve the limited bandwidth of the data transmission from satellite to the ground.

There are several good points presented to us.

1. Create the H-score metric to measure the compression ratio and compression throughput by one value.

2. Find SL6 compressor to test

However, there are several concerns that need to be addressed as well.

1.The time spent on each variable is too small, the standard deviation is too large, so the measurement of the compression/decompression time may not be reliable.

2. Since the compression time is so small, can the authors describe the time measurement tool they used?

3. SL6 compressor is shown having the best performance among all tested compressors, the authors didn't explain why the compressor is the best. It is not chunk based, it doesn't use any Huffman or entropy encoder, then what makes it compress so fast? From my experience, the FPZIP compressor has a similar compression scheme as SL6, but FPZIP cannot show the same compression performance as I recalled. Is SL6 a lossy compressor or lossless compressor?

4.There are several typos on the paper written. In table 5, the values of the last two columns have no space in between and are hard to understand. Figure 9 mentioned the marked red is most interesting, but there are several others showing SL6 poor performance as well.

5. In table 3, most time data is around 0.24 seconds to 1.7 seconds. Why do the authors display time data in such large numbers with the unit ns?

6. Can we know what the average compression rate obtained from SL6 for the whole SWOT dataset instead of some fields?
Citation: https://doi.org/10.5194/egusphere-2022-1094-RC3
- AC2: 'Reply on RC3', Mathieu Thevenin, 11 Apr 2023
  
  Dear reviewer,
  We are grateful for your comments, we will address them.
  
  The time measurement used is the based on the linux kernel from time.h and the standard library.
  You are right, the ns metric could actually be replaced by the microsecond, which would be easier to read, and would not really impact the accuracy. We chose to keep the same unit (ns, or us maybe in a revision) for consistency reasons.
  
  Thanks
  Mathieu
  
  Citation: https://doi.org/10.5194/egusphere-2022-1094-AC2

Status: closed

CEC1:
'Comment on egusphere-2022-1094', Juan Antonio Añel, 13 Jan 2023

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy" on many levels. Indeed, it should have never been published in Discussions before solving the issues listed below.
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html

First, the few code that you have shared is archived on GitHub. However, GitHub is not a suitable repository. GitHub itself instructs authors to use other alternatives for long-term archival and publishing, such as Zenodo. Therefore, please, publish your code in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as it should be available for the Discussions stage. Also, please, include the relevant primary input/output data. In this way, you must include in a potentially reviewed version of your manuscript the modified 'Code and Data Availability' section, the DOI of the code (and another DOI for the dataset if necessary). Also, the GitHub repository does not contain a license. If you do not include a license, despite what you state, the code is not "open-source/libre"; it continues to be your property. Therefore, when uploading the model's code to Zenodo, you could want to choose a free software/open-source (FLOSS) license. We recommend the GPLv3. You only need to include the file 'https://www.gnu.org/licenses/gpl-3.0.txt' as LICENSE.txt with your code. Also, you can choose other options that Zenodo provides: GPLv2, Apache License, MIT License, etc.

Also, we can not accept that it is necessary to contact the authors or request permission to get access to code or data. Both kinds of assets must be published in a permanent repository without the ability of the authors to remove them, and this must be done before submitting the manuscript.

In this way, you must reply to this comment with the link to the repository used in your manuscript, with its DOI. The reply and the repository must be available well in advance (as they should be already available) the Discussions stage is closed, to be sure that anyone has access to it for review purposes.

Please, be aware that failing to comply promptly with this request will result in desk rejection of your manuscript for publication.

Juan A. Añel
Geosci. Model Dev. Exec. Editor

Citation: https://doi.org/10.5194/egusphere-2022-1094-CEC1
- AC1: 'Reply on CEC1', Mathieu Thevenin, 26 Jan 2023
  
  Dear Juan,
  Thank you for your comment.
  We have carefully read the conditions about the codes used in the writing of the articles. Of course, we can provide most of the codes on a viable repository.
  However the SL6 code code is under license which does not allow open source.
  Unfortunately, we cannot provide all the codes allowing full reproduction of our study. Indeed, some codes and tools are not open source.
  We understand the importance of validating our work as well as possible. However, as you know, all the experiments are not necessarily reproducible, for questions of equipment, skills or even time; and I imagine that you do not limit your review to the basic thing.
  
  Now, our question is, is it possible to derogate from this rule for legitimate reasons ?
  
  Thank you for your reply and you interest in our work.
  
  Mathieu THEVENIN
  
  Citation: https://doi.org/10.5194/egusphere-2022-1094-AC1
RC1:
'Comment on egusphere-2022-1094', Anonymous Referee #1, 10 Mar 2023

Please see my comments in the file attached.

Citation: https://doi.org/10.5194/egusphere-2022-1094-RC1
- CC1: 'Reply on RC1', Stephane Pigoury, 02 Apr 2023
  
  Thank you for taking the time to review our study and for your comments.
  
  To respond to you on data-related matters. I emphasize that the data was not selected to favor SL6. Indeed, as specified, our study aims to resume the study "Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files", published here. (https://gmd.copernicus.org/articles/12/4099/2019/). These are the same data. Note also that the data in this publication is also only available on request and that there was no problem of reproducibility for this study.
  
  Citation: https://doi.org/10.5194/egusphere-2022-1094-CC1
RC2:
'Comment on egusphere-2022-1094', Anonymous Referee #2, 15 Mar 2023

This article compares the patented SL6 algorithm of the author(s) with existing Huffman-based lossless compression algorithms, and stresses the excellence of homogeneous results and constant compression time of the proposed SL6, which seems demanding by the satellite missions. The structure looks pretty well. The methods and results sound promising. However, I still have some concerns regarding the contents.

1. Since the SL6 is patented and shows desirable properties, and might be suitable for space missions and satellite communications, the reason why it excels should be highlighted, which might be helpful for scientific research communities.

2. Compression algorithms are actively studied, and there are public competitions such as the CVPR. Though the onboard SL6 algorithm is purposely devised, but I still wonder how it would be compared with the CNN/GAN-based compressions.

3. The Abstract is loosely organized and should be improved.

4. The citations should be correctly and consistently formated.

Citation: https://doi.org/10.5194/egusphere-2022-1094-RC2
- CC2: 'Reply on RC2', Stephane Pigoury, 02 Apr 2023
  
  Thank you for your interest in our study.
  
  The reason why we have not detailed the function of the SL6 algorithm is that this is not a study dedicated to the operation of SL6 technology, but an analysis of the state of the art about loseless data compression. It seemed to us more coherent and more interesting to complete the previous study by introducing a new metric allowing better analysis.
  
  Citation: https://doi.org/10.5194/egusphere-2022-1094-CC2
- AC3: 'Reply on RC2', Mathieu Thevenin, 11 Apr 2023
  
  The reviewer address an interesting point.
  
  Since the study was to compare the the previous study cited in the introduction, we only focused on the compression algorithms that were previously considered in the swot mission. However, adding more compression algorithms would be very interesting. It would be the object of another article or a conference paper.
  Thanks
  
  Citation: https://doi.org/10.5194/egusphere-2022-1094-AC3
RC3:
'Comment on egusphere-2022-1094', H. Xu, 05 Apr 2023
Thank the authors for presenting the comparison of compression algorithms to solve the limited bandwidth of the data transmission from satellite to the ground.

There are several good points presented to us.

1. Create the H-score metric to measure the compression ratio and compression throughput by one value.

2. Find SL6 compressor to test

However, there are several concerns that need to be addressed as well.

1.The time spent on each variable is too small, the standard deviation is too large, so the measurement of the compression/decompression time may not be reliable.

2. Since the compression time is so small, can the authors describe the time measurement tool they used?

3. SL6 compressor is shown having the best performance among all tested compressors, the authors didn't explain why the compressor is the best. It is not chunk based, it doesn't use any Huffman or entropy encoder, then what makes it compress so fast? From my experience, the FPZIP compressor has a similar compression scheme as SL6, but FPZIP cannot show the same compression performance as I recalled. Is SL6 a lossy compressor or lossless compressor?

4.There are several typos on the paper written. In table 5, the values of the last two columns have no space in between and are hard to understand. Figure 9 mentioned the marked red is most interesting, but there are several others showing SL6 poor performance as well.

5. In table 3, most time data is around 0.24 seconds to 1.7 seconds. Why do the authors display time data in such large numbers with the unit ns?

6. Can we know what the average compression rate obtained from SL6 for the whole SWOT dataset instead of some fields?
Citation: https://doi.org/10.5194/egusphere-2022-1094-RC3
- AC2: 'Reply on RC3', Mathieu Thevenin, 11 Apr 2023
  
  Dear reviewer,
  We are grateful for your comments, we will address them.
  
  The time measurement used is the based on the linux kernel from time.h and the standard library.
  You are right, the ns metric could actually be replaced by the microsecond, which would be easier to read, and would not really impact the accuracy. We chose to keep the same unit (ns, or us maybe in a revision) for consistency reasons.
  
  Thanks
  Mathieu
  
  Citation: https://doi.org/10.5194/egusphere-2022-1094-AC2

Mathieu Thevenin, Stephane Pigoury, Olivier Thomine, and Flavien Gouillon

Viewed

Total article views: 1,303 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
590	666	47	1,303	29	43

HTML: 590
PDF: 666
XML: 47
Total: 1,303
BibTeX: 29
EndNote: 43

Views and downloads (calculated since 20 Dec 2022)

Month	HTML	PDF	XML	Total
Dec 2022	64	23	4	91
Jan 2023	50	28	4	82
Feb 2023	37	28	0	65
Mar 2023	51	38	7	96
Apr 2023	91	44	10	145
May 2023	15	13	0	28
Jun 2023	16	32	2	50
Jul 2023	32	28	1	61
Aug 2023	12	18	0	30
Sep 2023	17	29	0	46
Oct 2023	12	32	0	44
Nov 2023	6	14	1	21
Dec 2023	8	23	1	32
Jan 2024	4	12	0	16
Feb 2024	3	11	1	15
Mar 2024	8	22	0	30
Apr 2024	5	19	3	27
May 2024	6	12	2	20
Jun 2024	23	8	2	33
Jul 2024	8	11	4	23
Aug 2024	12	7	1	20
Sep 2024	9	9	0	18
Oct 2024	2	13	0	15
Nov 2024	7	13	0	20
Dec 2024	2	7	0	9
Jan 2025	8	14	0	22
Feb 2025	9	10	0	19
Mar 2025	13	12	0	25
Apr 2025	4	12	0	16
May 2025	7	17	2	26
Jun 2025	12	36	0	48
Jul 2025	11	18	0	29
Aug 2025	8	26	1	35
Sep 2025	17	22	1	40
Oct 2025	1	5	0	6

Cumulative views and downloads (calculated since 20 Dec 2022)

Month	HTML	PDF	XML	Total
Dec 2022	64	23	4	91
Jan 2023	50	28	4	82
Feb 2023	37	28	0	65
Mar 2023	51	38	7	96
Apr 2023	91	44	10	145
May 2023	15	13	0	28
Jun 2023	16	32	2	50
Jul 2023	32	28	1	61
Aug 2023	12	18	0	30
Sep 2023	17	29	0	46
Oct 2023	12	32	0	44
Nov 2023	6	14	1	21
Dec 2023	8	23	1	32
Jan 2024	4	12	0	16
Feb 2024	3	11	1	15
Mar 2024	8	22	0	30
Apr 2024	5	19	3	27
May 2024	6	12	2	20
Jun 2024	23	8	2	33
Jul 2024	8	11	4	23
Aug 2024	12	7	1	20
Sep 2024	9	9	0	18
Oct 2024	2	13	0	15
Nov 2024	7	13	0	20
Dec 2024	2	7	0	9
Jan 2025	8	14	0	22
Feb 2025	9	10	0	19
Mar 2025	13	12	0	25
Apr 2025	4	12	0	16
May 2025	7	17	2	26
Jun 2025	12	36	0	48
Jul 2025	11	18	0	29
Aug 2025	8	26	1	35
Sep 2025	17	22	1	40
Oct 2025	1	5	0	6

Viewed (geographical distribution)

Total article views: 1,291 (including HTML, PDF, and XML) Thereof 1,291 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 09 Oct 2025

Short summary

As an extension of the work presented in "Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files" (Delaunay) https://gmd.copernicus.org/articles/12/4099/2019/, this paper present a detailed bench of lossless, mostly LZ-based, compression algorithms that could be used for space-to-earth communication or data storage. The work is conducted on the SWOT altimetry data.


Total:	0
HTML:	0
PDF:	0
XML:	0