the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Compressing high-resolution data through latent representation encoding for downscaling large-scale AI weather forecast model
Abstract. The rapid advancement of artificial intelligence (AI) in weather research has been driven by the ability to learn from large, high-dimensional datasets. However, this progress also poses significant challenges, particularly regarding the substantial costs associated with processing extensive data and the limitations of computational resources. Inspired by the Neural Image Compression (NIC) task in computer vision, this study seeks to compress weather data to address these challenges and enhance the efficiency of downstream applications. Specifically, we propose a variational autoencoder (VAE) framework tailored for compressing high-resolution datasets, specifically the High Resolution China Meteorological Administration Land Data Assimilation System (HRCLDAS) with a spatial resolution of 1 km. Our framework successfully reduced the storage size of 3 years of HRCLDAS data from 8.61 TB to just 204 GB, while preserving essential information. In addition, we demonstrated the utility of the compressed data through a downscaling task, where the model trained on the compressed dataset achieved accuracy comparable to that of the model trained on the original data. These results highlight the effectiveness and potential of the compressed data for future weather research.
- Preprint
(22140 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 27 Dec 2024)
-
RC1: 'Comment on egusphere-2024-3183', Anonymous Referee #1, 15 Nov 2024
reply
This paper introduces a VAE-based data compression method for high resolution weather data and shows its potential application of training a downscaling model using the latent representation. The method in this paper is hardly novel since both VAE and UNet are commonly used in related fields. The claimed 43x compression ratio purely comes from the downsampling CNN in VAE. Usually a neural image compression method would use vector quantization and/or entropy encoding in combination with a VAE. Interestingly, none of the neural image compression methods is used as a baseline for compression. In fact, there is no baseline in the compression part. The authors are advised to use at least one established compression method as a baseline (some can be found in this repo https://interdigitalinc.github.io/CompressAI/).Â
Â
Minor points
- The HRCLDAS data is not openly available, thus not possible to reproduce the results.
- It would be nice to have a power spectrum plot for compression part (like Fig. 6).
- The evaluation only considers t2m, u10 and v10. Containing other variables especially in Table 3 would be better.
- ERA5 should be much larger than 226TB (as claimed). The pressure level data is at least 2PB and the model level data is at least 5PB.
Citation: https://doi.org/10.5194/egusphere-2024-3183-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
149 | 49 | 14 | 212 | 3 | 2 |
- HTML: 149
- PDF: 49
- XML: 14
- Total: 212
- BibTeX: 3
- EndNote: 2
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1