Actionable reporting of CPU-GPU performance comparisons: Insights from a CLUBB case study
Abstract. Graphics Processing Units (GPUs) are becoming increasingly central to high-performance computing (HPC), but fair comparison with central processing units (CPUs) remains challenging, particularly for applications that can be subdivided into smaller workloads. Traditional metrics such as speedup ratios can overstate GPU advantages and obscure the conditions under which CPUs are competitive, as they depend strongly on workload choice. We introduce two peak-based performance metrics, the Peak Ratio Crossover (PRC) and the Peak-to-Peak Ratio (PPR) which provide clearer comparisons by accounting for the best achievable performance of each device. Using a case study into the performance of the Cloud Layers Unified by Binormals (CLUBB) standalone model, we demonstrate these metrics in practice, show how they can guide execution strategy, and examine how they shift under factors that affect workload. We further analyze how implementation choices and code structure influence these metrics, showing how they enable performance comparisons to be expressed in a concise and actionable way, while also helping identify which optimization efforts should be prioritized to meet different performance goals.
The paper advances the science by introducing two new metrics, Peak Ratio Crossover (PRC) and Peek-to-Peek Ratio (PPR), which allow for a better comparison between CPU and GPU performance of a given application. The authors demonstrate the benefits of using these metrics on a single-column parametrisation of turbulence and clouds, Cloud Layers Unified by Binormals (CLUBB), which exposes several level of parallelism that can be exploited differently by heterogenous architectures. Several use cases show the impact of batch size, precision, asynchronous execution, device type and coding optimisations on the application's throughput, expressed in columns per second. The paper is well written and the claims are well supported by experiments and profiling data which naturally drive the conclusions. I would recommend the publication of the manuscript.
There are a few comments, listed below, which could be tackled by the authors in a minor revision: