Optimizing Gaussian Process Emulation and Generalized Additive Model Fitting for Rapid, Reproducible Earth System Model Analysis

Ghosh, Kunal; Regayre, Leighton A.

doi:10.5194/egusphere-2025-5533

Preprints

https://doi.org/10.5194/egusphere-2025-5533

Preprints

21 Dec 2025

| 21 Dec 2025

Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

Optimizing Gaussian Process Emulation and Generalized Additive Model Fitting for Rapid, Reproducible Earth System Model Analysis

Kunal Ghosh and Leighton A. Regayre

Abstract. Causes of model uncertainty in complex modeling systems can be identified using large perturbed-parameter ensembled (PPEs), combined with statistical emulators to increase sample size and enable variance-based sensitivity analyses and observational constraint. In global climate models such as the UK Earth System Model (UKESM), these approaches are typically applied at the global or regional mean scales for a limited set of variables. To accelerate progress in understanding the multi-faceted causes of climate model uncertainty, requires implementing such workflows at the model grid box scale, to enable analyses across variables that reveal how uncertainties propagate and interact spatially. However, this approach requires training millions of Gaussian process (GP) emulators and fitting an equal number of generalized additive models (GAMs) – a major computational bottleneck. We present a high-performance, open-source pipeline that introduces optimisations for this workflow. For GP emulation, we implement task-level parallelism and streamlined data handling on high-performance computing systems. For GAM fitting, we integrate a parallelized pyGAM interface with R's mgcv::bam() back end, using fast fREML estimation with discrete smoothing, memory-efficient batching, and improved input–output routines. These changes reduce GP training time by 97.5 % (6177 → 154 s) and GAM fitting time by 95.2 % (10623 → 511 s), yielding a ~ 25 times faster end-to-end workflow (96 % total runtime reduction) and cutting peak memory use by a factor of 12. Outputs are numerically identical to the baseline implementation (Pearson correlation = 1.00 for both GP and GAM predictions). We demonstrate the approach using a UKESM PPE comprising 221 members scaled up to 1 million using GP emulators, and GAM fits applied to output for a single target variable, to show that the improved performance enables multi-variable, higher-resolution, and potentially multi-model analyses that were previously impractical. These improvements pave the way for PPE studies to scale in scope without compromising statistical fidelity, enabling more comprehensive exploration of model parameter uncertainty within feasible HPC budgets.

Received: 07 Nov 2025 – Discussion started: 21 Dec 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Kunal Ghosh and Leighton A. Regayre

Status: open (extended)

Post a comment Subscribe to comment alert

Kunal Ghosh and Leighton A. Regayre

Data sets

gp-gam-optimisation-dataset: Example input and output data for Gaussian Process and GAM workflow (H₂SO₄, January 2017) Kunal Ghosh and Leighton A. Regayre https://doi.org/10.5281/zenodo.17543623

Model code and software

gp-gam-optimisation-pipeline: High-performance workflow for Gaussian Process emulation and GAM fitting Kunal Ghosh and Leighton A. Regayre https://github.com/Kunal198/gp-gam-optimisation-pipeline

Kunal Ghosh and Leighton A. Regayre

Viewed

Total article views: 381 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
219	143	19	381	33	37

HTML: 219
PDF: 143
XML: 19
Total: 381
BibTeX: 33
EndNote: 37

Views and downloads (calculated since 21 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	102	43	7	152
Jan 2026	72	49	7	128
Feb 2026	39	46	4	89
Mar 2026	6	5	1	12

Cumulative views and downloads (calculated since 21 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	102	43	7	152
Jan 2026	72	49	7	128
Feb 2026	39	46	4	89
Mar 2026	6	5	1	12

Viewed (geographical distribution)

Total article views: 371 (including HTML, PDF, and XML) Thereof 371 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 04 Mar 2026

Short summary

Understanding which parts of climate models cause uncertainty requires many large computer experiments. We developed a new workflow that greatly improves the speed and efficiency of these studies. It can analyse millions of model variations up to 25 times faster without losing accuracy, allowing scientists to explore uncertainty in more detail and make climate predictions more reliable.


Total:	0
HTML:	0
PDF:	0
XML:	0