Approximating the universal thermal climate index using sparse regression with orthogonal polynomials
Abstract. The Universal Thermal Climate Index (UTCI) is a measure of thermal comfort that quantifies how humans experience environmental conditions. Due to its robustness and versatility as a bioclimatic indicator, it has been extensively employed across a wide range of studies in bioclimatology and is increasingly used as an operational measure of outdoor thermal comfort. At the same time, calculating the UTCI value from the relevant environmental parameters is nominally not straightforward, which is why using a 6th-degree polynomial approximation has become the standard way to calculate UTCI values. At the same time, although it is computationally efficient, the error of this polynomial approximation can be substantial. The goal of this study was to develop an improved version of the polynomial approximation – one that retains comparable computational efficiency but is more robust in terms of numerical stability and substantially more accurate, particularly in reducing the frequency of larger errors. This goal was successfully achieved using sparse orthogonal regression, namely sparse regression with an orthogonal polynomial basis, which not only substantially reduces the average errors (i.e., the mean error, the mean absolute error, and the root mean square error) but also drastically reduces the frequency of large errors. By leveraging Legendre polynomial bases, approximation models could be constructed that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training the new approximation models over only 20 % of the data, with the testing performed over the remaining 80 %, highlights successful generalization, with the results also being robust under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the L2 (least squares) sense.
Summary:
The Universal Temperature Index is a measure of thermal comfort or discomfort perceived by humans, and is estimated by a environmental model from the measured values of air temperature, radiation, humidity etc. The model is complex to run, and so polynomial approximations for a quick, albeit not totally accurate, estimations have been developed. The standard polynomial approximation incurs in errors that are deemed too large. The present study presents another approximation method, based on orthogonal polynomial regression that seems to provide more accurate results.
Recommendation: The manuscript is well written and the study seems to have not technical flaws. For that standpoint I have very few comments. However, I do have a more general question on the motivation of the study, which I think the authors should address or justify more thoroughly
Main point
1) The manuscript mentions another alternative method, namely interpolation from an available look-up table that contains about 100 thousand values. This is also the approach recommended by Bröde (2021a). The manuscript argues that the storage of 100 thousand values makes the calculation cumbersome, but I clearly disagree. This storage would amount to roughly 1 MB of data, which is a very small space. Intuitively, I would argue that an interpolation of that table can produce very accurate values with a simple spline or linear algorithm. So the question arises as what would be the advantages of the algorithm presented in this manuscript relative to the look-up table interpolation.
I am not arguing that the study is not valuable, as it presents a possible way of producing more accurate estimation of the index, but the reader would ask themselves if it really worth the effort.
Bröde (2021a) argues that " This chapter provides hints and guidelines on how to handle these issues, and especially encourages the application of the hardly used look-up table approach, which will help avoiding many, if not all concerns related to UTCI calculation via the regression polynomial"
Minor points
2) The labels in Figure 1 are too small. This also the case to a lesser degree in other Figures. Figure 3 is ok, so I would recommend to homogenize the font size in all figures.
3) In table 2, the reader has to infer which is the train loss and the test loss. It seems that the train loss is the upper number, but this could be indicated more explicitly. It seems that the train loss numbers require to be wrapped by a []