Preprints
https://doi.org/10.5194/egusphere-2025-28
https://doi.org/10.5194/egusphere-2025-28
06 Feb 2025
 | 06 Feb 2025
Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

Statistical summaries for streamed data from climate simulations: One-pass algorithms (v0.6.2)

Katherine Grayson, Stephan Thober, Aleksander Lacima-Nadolnik, Ehsan Sharifi, Llorenç Lledó, and Francisco Doblas-Reyes

Abstract. Projections from global climate models (GCMs) are a fundamental information source for climate adaptation policies and socio-economic decisions. As such, these models are being progressively run at finer spatio-temporal resolutions to resolve smaller scale dynamics and consequently reduce uncertainty associated with parameterizations. Yet even with increased capacity from High Performance Computing (HPC) the consequent size of the data output (which can be on the order of Terabytes to Petabytes), means that native resolution data cannot feasibly be stored for long time periods. Lower resolution archives containing a reduced set of variables are often all that is kept, limiting data consumers from harnessing the full potential of these models. To overcome this growing challenge, the climate modelling community is investigating data streaming; a novel way of processing GCM output without having to store a limited set of variables on disk. In this paper we present a detailed analysis of the use of one-pass algorithms from the 'one-pass' package, for streamed climate data. These intelligent data reduction techniques allow for the computation of statistics on-the-fly, enabling climate workflows to temporally aggregate the data output from GCMs into meaningful statistics for the end-user without having to store the full time series. We present these algorithms for four different statistics: mean, standard deviation, percentiles and histograms. Each statistic is presented in the context of a use case, showing the statistic applied to a relevant variable. For statistics that can be represented by a single floating point value (i.e., mean, standard deviation, variance), the accuracy is at the order of the numerical precision of the machine and the memory savings scale linearly with the period of time covered by the statistic. For the statistics that require a distribution (percentiles and histograms), we present an algorithm that reduces the full time series to a set of key clusters that represent the distribution. Using this algorithm we find that the accuracy provided is well within the acceptable bounds for the climate variables examined while still providing memory savings that bypass the unfeasible storage requirements of high-resolution data.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share
Katherine Grayson, Stephan Thober, Aleksander Lacima-Nadolnik, Ehsan Sharifi, Llorenç Lledó, and Francisco Doblas-Reyes

Status: open (until 03 Apr 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-28', Anonymous Referee #1, 17 Feb 2025 reply
  • AC1: 'Comment on egusphere-2025-28: Preliminary reply to RC1', Katherine Grayson, 27 Feb 2025 reply
  • RC2: 'Comment on egusphere-2025-28', Anonymous Referee #2, 06 Mar 2025 reply
Katherine Grayson, Stephan Thober, Aleksander Lacima-Nadolnik, Ehsan Sharifi, Llorenç Lledó, and Francisco Doblas-Reyes

Data sets

nextGEMS cycle3 datasets: statistical summaries for streamed data from climate simulations nextGEMS, K. Grayson https://doi.org/10.5281/zenodo.12533197

Model code and software

DestinE-Climate-DT/one_pass: v0.6.2 K. Grayson https://doi.org/10.5281/zenodo.14591828

Interactive computing environment

kat-grayson/one_pass_algorithms_paper K. Grayson https://doi.org/10.5281/zenodo.12533064

Katherine Grayson, Stephan Thober, Aleksander Lacima-Nadolnik, Ehsan Sharifi, Llorenç Lledó, and Francisco Doblas-Reyes

Viewed

Total article views: 159 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
124 26 9 159 8 5
  • HTML: 124
  • PDF: 26
  • XML: 9
  • Total: 159
  • BibTeX: 8
  • EndNote: 5
Views and downloads (calculated since 06 Feb 2025)
Cumulative views and downloads (calculated since 06 Feb 2025)

Viewed (geographical distribution)

Total article views: 173 (including HTML, PDF, and XML) Thereof 173 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 17 Mar 2025
Download
Short summary
To provide the most accurate climate adaptation information, climate models are being run with finer grid resolution, resulting in larger data output. This paper presents intelligent data reduction algorithms that act on streamed data, a novel way of processing climate data as soon as it is produced. Using these algorithms to calculate statistics, we show that the accuracy provided is well within acceptable bounds while still providing memory savings that bypass unfeasible storage requirements.
Share