Preprints
https://doi.org/10.5194/egusphere-2025-28
https://doi.org/10.5194/egusphere-2025-28
06 Feb 2025
 | 06 Feb 2025

Statistical summaries for streamed data from climate simulations: One-pass algorithms (v0.6.2)

Katherine Grayson, Stephan Thober, Aleksander Lacima-Nadolnik, Ehsan Sharifi, Llorenç Lledó, and Francisco Doblas-Reyes

Abstract. Projections from global climate models (GCMs) are a fundamental information source for climate adaptation policies and socio-economic decisions. As such, these models are being progressively run at finer spatio-temporal resolutions to resolve smaller scale dynamics and consequently reduce uncertainty associated with parameterizations. Yet even with increased capacity from High Performance Computing (HPC) the consequent size of the data output (which can be on the order of Terabytes to Petabytes), means that native resolution data cannot feasibly be stored for long time periods. Lower resolution archives containing a reduced set of variables are often all that is kept, limiting data consumers from harnessing the full potential of these models. To overcome this growing challenge, the climate modelling community is investigating data streaming; a novel way of processing GCM output without having to store a limited set of variables on disk. In this paper we present a detailed analysis of the use of one-pass algorithms from the 'one-pass' package, for streamed climate data. These intelligent data reduction techniques allow for the computation of statistics on-the-fly, enabling climate workflows to temporally aggregate the data output from GCMs into meaningful statistics for the end-user without having to store the full time series. We present these algorithms for four different statistics: mean, standard deviation, percentiles and histograms. Each statistic is presented in the context of a use case, showing the statistic applied to a relevant variable. For statistics that can be represented by a single floating point value (i.e., mean, standard deviation, variance), the accuracy is at the order of the numerical precision of the machine and the memory savings scale linearly with the period of time covered by the statistic. For the statistics that require a distribution (percentiles and histograms), we present an algorithm that reduces the full time series to a set of key clusters that represent the distribution. Using this algorithm we find that the accuracy provided is well within the acceptable bounds for the climate variables examined while still providing memory savings that bypass the unfeasible storage requirements of high-resolution data.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share

Journal article(s) based on this preprint

10 Sep 2025
Statistical summaries for streamed data from climate simulations: one-pass algorithms
Katherine Grayson, Stephan Thober, Aleksander Lacima-Nadolnik, Ivan Alsina-Ferrer, Llorenç Lledó, Ehsan Sharifi, and Francisco Doblas-Reyes
Geosci. Model Dev., 18, 5873–5890, https://doi.org/10.5194/gmd-18-5873-2025,https://doi.org/10.5194/gmd-18-5873-2025, 2025
Short summary
Katherine Grayson, Stephan Thober, Aleksander Lacima-Nadolnik, Ehsan Sharifi, Llorenç Lledó, and Francisco Doblas-Reyes

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-28', Anonymous Referee #1, 17 Feb 2025
  • AC1: 'Comment on egusphere-2025-28: Preliminary reply to RC1', Katherine Grayson, 27 Feb 2025
    • AC2: 'Further reply on AC1', Katherine Grayson, 21 Mar 2025
  • RC2: 'Comment on egusphere-2025-28', Anonymous Referee #2, 06 Mar 2025
    • AC3: 'Reply on RC2', Katherine Grayson, 21 Mar 2025
  • AC4: 'Final response RC1', Katherine Grayson, 02 May 2025
  • AC5: 'Final response RC2', Katherine Grayson, 02 May 2025

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-28', Anonymous Referee #1, 17 Feb 2025
  • AC1: 'Comment on egusphere-2025-28: Preliminary reply to RC1', Katherine Grayson, 27 Feb 2025
    • AC2: 'Further reply on AC1', Katherine Grayson, 21 Mar 2025
  • RC2: 'Comment on egusphere-2025-28', Anonymous Referee #2, 06 Mar 2025
    • AC3: 'Reply on RC2', Katherine Grayson, 21 Mar 2025
  • AC4: 'Final response RC1', Katherine Grayson, 02 May 2025
  • AC5: 'Final response RC2', Katherine Grayson, 02 May 2025

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
AR by Katherine Grayson on behalf of the Authors (16 May 2025)  Author's response   Author's tracked changes   Manuscript 
ED: Referee Nomination & Report Request started (22 May 2025) by Po-Lun Ma
RR by Anonymous Referee #1 (22 May 2025)
RR by Lucas Harris (06 Jun 2025)
ED: Publish as is (20 Jun 2025) by Po-Lun Ma
AR by Katherine Grayson on behalf of the Authors (27 Jun 2025)  Manuscript 

Journal article(s) based on this preprint

10 Sep 2025
Statistical summaries for streamed data from climate simulations: one-pass algorithms
Katherine Grayson, Stephan Thober, Aleksander Lacima-Nadolnik, Ivan Alsina-Ferrer, Llorenç Lledó, Ehsan Sharifi, and Francisco Doblas-Reyes
Geosci. Model Dev., 18, 5873–5890, https://doi.org/10.5194/gmd-18-5873-2025,https://doi.org/10.5194/gmd-18-5873-2025, 2025
Short summary
Katherine Grayson, Stephan Thober, Aleksander Lacima-Nadolnik, Ehsan Sharifi, Llorenç Lledó, and Francisco Doblas-Reyes

Data sets

nextGEMS cycle3 datasets: statistical summaries for streamed data from climate simulations nextGEMS, K. Grayson https://doi.org/10.5281/zenodo.12533197

Model code and software

DestinE-Climate-DT/one_pass: v0.6.2 K. Grayson https://doi.org/10.5281/zenodo.14591828

Interactive computing environment

kat-grayson/one_pass_algorithms_paper K. Grayson https://doi.org/10.5281/zenodo.12533064

Katherine Grayson, Stephan Thober, Aleksander Lacima-Nadolnik, Ehsan Sharifi, Llorenç Lledó, and Francisco Doblas-Reyes

Viewed

Total article views: 593 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
426 138 29 593 23 35
  • HTML: 426
  • PDF: 138
  • XML: 29
  • Total: 593
  • BibTeX: 23
  • EndNote: 35
Views and downloads (calculated since 06 Feb 2025)
Cumulative views and downloads (calculated since 06 Feb 2025)

Viewed (geographical distribution)

Total article views: 625 (including HTML, PDF, and XML) Thereof 625 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 10 Sep 2025
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
To provide the most accurate climate adaptation information, climate models are being run with finer grid resolution, resulting in larger data output. This paper presents intelligent data reduction algorithms that act on streamed data, a novel way of processing climate data as soon as it is produced. Using these algorithms to calculate statistics, we show that the accuracy provided is well within acceptable bounds while still providing memory savings that bypass unfeasible storage requirements.
Share