Approximating the universal thermal climate index using sparse regression with orthogonal polynomials
Abstract. The Universal Thermal Climate Index (UTCI) is a measure of thermal comfort that quantifies how humans experience environmental conditions. Due to its robustness and versatility as a bioclimatic indicator, it has been extensively employed across a wide range of studies in bioclimatology and is increasingly used as an operational measure of outdoor thermal comfort. At the same time, calculating the UTCI value from the relevant environmental parameters is nominally not straightforward, which is why using a 6th-degree polynomial approximation has become the standard way to calculate UTCI values. At the same time, although it is computationally efficient, the error of this polynomial approximation can be substantial. The goal of this study was to develop an improved version of the polynomial approximation – one that retains comparable computational efficiency but is more robust in terms of numerical stability and substantially more accurate, particularly in reducing the frequency of larger errors. This goal was successfully achieved using sparse orthogonal regression, namely sparse regression with an orthogonal polynomial basis, which not only substantially reduces the average errors (i.e., the mean error, the mean absolute error, and the root mean square error) but also drastically reduces the frequency of large errors. By leveraging Legendre polynomial bases, approximation models could be constructed that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training the new approximation models over only 20 % of the data, with the testing performed over the remaining 80 %, highlights successful generalization, with the results also being robust under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the L2 (least squares) sense.