Preprints
https://doi.org/10.5194/egusphere-2025-5533
https://doi.org/10.5194/egusphere-2025-5533
21 Dec 2025
 | 21 Dec 2025
Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

Optimizing Gaussian Process Emulation and Generalized Additive Model Fitting for Rapid, Reproducible Earth System Model Analysis

Kunal Ghosh and Leighton A. Regayre

Abstract. Causes of model uncertainty in complex modeling systems can be identified using large perturbed-parameter ensembled (PPEs), combined with statistical emulators to increase sample size and enable variance-based sensitivity analyses and observational constraint. In global climate models such as the UK Earth System Model (UKESM), these approaches are typically applied at the global or regional mean scales for a limited set of variables. To accelerate progress in understanding the multi-faceted causes of climate model uncertainty, requires implementing such workflows at the model grid box scale, to enable analyses across variables that reveal how uncertainties propagate and interact spatially. However, this approach requires training millions of Gaussian process (GP) emulators and fitting an equal number of generalized additive models (GAMs) – a major computational bottleneck. We present a high-performance, open-source pipeline that introduces optimisations for this workflow. For GP emulation, we implement task-level parallelism and streamlined data handling on high-performance computing systems. For GAM fitting, we integrate a parallelized pyGAM interface with R's mgcv::bam() back end, using fast fREML estimation with discrete smoothing, memory-efficient batching, and improved input–output routines. These changes reduce GP training time by 97.5 % (6177 → 154 s) and GAM fitting time by 95.2 % (10623 → 511 s), yielding a ~ 25 times faster end-to-end workflow (96 % total runtime reduction) and cutting peak memory use by a factor of 12. Outputs are numerically identical to the baseline implementation (Pearson correlation = 1.00 for both GP and GAM predictions). We demonstrate the approach using a UKESM PPE comprising 221 members scaled up to 1 million using GP emulators, and GAM fits applied to output for a single target variable, to show that the improved performance enables multi-variable, higher-resolution, and potentially multi-model analyses that were previously impractical. These improvements pave the way for PPE studies to scale in scope without compromising statistical fidelity, enabling more comprehensive exploration of model parameter uncertainty within feasible HPC budgets.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Kunal Ghosh and Leighton A. Regayre

Status: open (until 15 Feb 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Kunal Ghosh and Leighton A. Regayre

Data sets

gp-gam-optimisation-dataset: Example input and output data for Gaussian Process and GAM workflow (H₂SO₄, January 2017) Kunal Ghosh and Leighton A. Regayre https://doi.org/10.5281/zenodo.17543623

Model code and software

gp-gam-optimisation-pipeline: High-performance workflow for Gaussian Process emulation and GAM fitting Kunal Ghosh and Leighton A. Regayre https://github.com/Kunal198/gp-gam-optimisation-pipeline

Kunal Ghosh and Leighton A. Regayre

Viewed

Total article views: 22 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
17 3 2 22 2 2
  • HTML: 17
  • PDF: 3
  • XML: 2
  • Total: 22
  • BibTeX: 2
  • EndNote: 2
Views and downloads (calculated since 21 Dec 2025)
Cumulative views and downloads (calculated since 21 Dec 2025)

Viewed (geographical distribution)

Total article views: 22 (including HTML, PDF, and XML) Thereof 22 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 22 Dec 2025
Download
Short summary
Understanding which parts of climate models cause uncertainty requires many large computer experiments. We developed a new workflow that greatly improves the speed and efficiency of these studies. It can analyse millions of model variations up to 25 times faster without losing accuracy, allowing scientists to explore uncertainty in more detail and make climate predictions more reliable.
Share