Optimizing Gaussian Process Emulation and Generalized Additive Model Fitting for Rapid, Reproducible Earth System Model Analysis
Abstract. Causes of model uncertainty in complex modeling systems can be identified using large perturbed-parameter ensembled (PPEs), combined with statistical emulators to increase sample size and enable variance-based sensitivity analyses and observational constraint. In global climate models such as the UK Earth System Model (UKESM), these approaches are typically applied at the global or regional mean scales for a limited set of variables. To accelerate progress in understanding the multi-faceted causes of climate model uncertainty, requires implementing such workflows at the model grid box scale, to enable analyses across variables that reveal how uncertainties propagate and interact spatially. However, this approach requires training millions of Gaussian process (GP) emulators and fitting an equal number of generalized additive models (GAMs) – a major computational bottleneck. We present a high-performance, open-source pipeline that introduces optimisations for this workflow. For GP emulation, we implement task-level parallelism and streamlined data handling on high-performance computing systems. For GAM fitting, we integrate a parallelized pyGAM interface with R's mgcv::bam() back end, using fast fREML estimation with discrete smoothing, memory-efficient batching, and improved input–output routines. These changes reduce GP training time by 97.5 % (6177 → 154 s) and GAM fitting time by 95.2 % (10623 → 511 s), yielding a ~ 25 times faster end-to-end workflow (96 % total runtime reduction) and cutting peak memory use by a factor of 12. Outputs are numerically identical to the baseline implementation (Pearson correlation = 1.00 for both GP and GAM predictions). We demonstrate the approach using a UKESM PPE comprising 221 members scaled up to 1 million using GP emulators, and GAM fits applied to output for a single target variable, to show that the improved performance enables multi-variable, higher-resolution, and potentially multi-model analyses that were previously impractical. These improvements pave the way for PPE studies to scale in scope without compromising statistical fidelity, enabling more comprehensive exploration of model parameter uncertainty within feasible HPC budgets.
This manuscript describes an approach to computer model emulation and sensitivity analysis using parallel and distributed computing on high performance computing (HPC). Specifically, they fit scalar Gaussian process (GP) emulators to grid-cell level Earth system model output in an embarassingly-parallel fashion, and use the fitted emulators to conduct main effects-style sensitivity analysis using generalized additive models (GAMs). Each model fit occurs on a core independently of the other fits and speedup is realized by running many such fits simultaneously on many cores, with no between-node communication required.
The resulting parallel speedups seem reasonable and are significant in magnitude. It might be easier to see the parallel efficiency if plotted differently, e.g. on a log-log plot, or with a theoretical maximum speedup curve (equal to number tasks) superimposed, or speedup as a fraction of theoretical. I am not sure I would say that optimized walltime is nearly constant up to three orders of magnitude in task count; we'd expect it to be constant when the tasks are less than the concurrency cap of 200 cores, but it should (and does) increase after that. The speedup graphs are not as informative as they could be, for that reason, because the baseline was never run (and the speedup calculated) for task sizes exceeding the number of cores, where different speedup behavior is expected (of course this is because of the increasing expense of running the baseline) - perhaps at least one long baseline analysis at (say) 400 cores could be run?
Other than that, a few other points to consider:
Could the GAM have been fit directly to the data instead of to GP emulator samples? The GAM essentially is a simple emulator (neglecting parameter interactions).
It may be worth pointing out that the GAM sensitivity analysis is like a (unnormalized) first-order Sobol' sensitivity analysis with the assumption that there are no between-parameter interactions.
Could this software be extended to analyze time series at a grid cell? This would require a different form of emulator and different distribution of data onto tasks. (Obviously a spatial analysis would be harder since communication across boundaries would become inevitable, although spatially-partitioned Gaussian processes can be applied.)