Approximating the universal thermal climate index using sparse regression with orthogonal polynomials

Roman, Sabin; Skok, Gregor; Todorovski, Ljupčo; Džeroski, Sašo

doi:10.48550/arXiv.2508.11307

Preprints

https://doi.org/10.48550/arXiv.2508.11307

Preprints

06 Jan 2026

| 06 Jan 2026

Approximating the universal thermal climate index using sparse regression with orthogonal polynomials

Sabin Roman, Gregor Skok, Ljupčo Todorovski, and Sašo Džeroski

Abstract. The Universal Thermal Climate Index (UTCI) is a measure of thermal comfort that quantifies how humans experience environmental conditions. Due to its robustness and versatility as a bioclimatic indicator, it has been extensively employed across a wide range of studies in bioclimatology and is increasingly used as an operational measure of outdoor thermal comfort. At the same time, calculating the UTCI value from the relevant environmental parameters is nominally not straightforward, which is why using a 6th-degree polynomial approximation has become the standard way to calculate UTCI values. At the same time, although it is computationally efficient, the error of this polynomial approximation can be substantial. The goal of this study was to develop an improved version of the polynomial approximation – one that retains comparable computational efficiency but is more robust in terms of numerical stability and substantially more accurate, particularly in reducing the frequency of larger errors. This goal was successfully achieved using sparse orthogonal regression, namely sparse regression with an orthogonal polynomial basis, which not only substantially reduces the average errors (i.e., the mean error, the mean absolute error, and the root mean square error) but also drastically reduces the frequency of large errors. By leveraging Legendre polynomial bases, approximation models could be constructed that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training the new approximation models over only 20 % of the data, with the testing performed over the remaining 80 %, highlights successful generalization, with the results also being robust under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the L₂ (least squares) sense.

Received: 12 Nov 2025 – Discussion started: 06 Jan 2026

Sabin Roman, Gregor Skok, Ljupčo Todorovski, and Sašo Džeroski

Status: closed

RC1:
'Comment on egusphere-2025-5461', Anonymous Referee #1, 15 Feb 2026

Summary:
The Universal Temperature Index is a measure of thermal comfort or discomfort perceived by humans, and is estimated by a environmental model from the measured values of air temperature, radiation, humidity etc. The model is complex to run, and so polynomial approximations for a quick, albeit not totally accurate, estimations have been developed. The standard polynomial approximation incurs in errors that are deemed too large. The present study presents another approximation method, based on orthogonal polynomial regression that seems to provide more accurate results.

Recommendation: The manuscript is well written and the study seems to have not technical flaws. For that standpoint I have very few comments. However, I do have a more general question on the motivation of the study, which I think the authors should address or justify more thoroughly

Main point

1) The manuscript mentions another alternative method, namely interpolation from an available look-up table that contains about 100 thousand values. This is also the approach recommended by Bröde (2021a). The manuscript argues that the storage of 100 thousand values makes the calculation cumbersome, but I clearly disagree. This storage would amount to roughly 1 MB of data, which is a very small space. Intuitively, I would argue that an interpolation of that table can produce very accurate values with a simple spline or linear algorithm. So the question arises as what would be the advantages of the algorithm presented in this manuscript relative to the look-up table interpolation.
I am not arguing that the study is not valuable, as it presents a possible way of producing more accurate estimation of the index, but the reader would ask themselves if it really worth the effort.
Bröde (2021a) argues that " This chapter provides hints and guidelines on how to handle these issues, and especially encourages the application of the hardly used look-up table approach, which will help avoiding many, if not all concerns related to UTCI calculation via the regression polynomial"

Minor points
2) The labels in Figure 1 are too small. This also the case to a lesser degree in other Figures. Figure 3 is ok, so I would recommend to homogenize the font size in all figures.
3) In table 2, the reader has to infer which is the train loss and the test loss. It seems that the train loss is the upper number, but this could be indicated more explicitly. It seems that the train loss numbers require to be wrapped by a []

Citation: https://doi.org/10.5194/egusphere-2025-5461-RC1
- AC1: 'Reply on RC1', Sabin Roman, 16 Apr 2026
  
  Please find our responses to the comments from the reviewers in the attached supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-5461-AC1
RC2:
'Comment on egusphere-2025-5461', Anonymous Referee #2, 26 Mar 2026

The article presents a new approximation for the Universal Thermal Climate Index by applying sparse regression with orthogonal polynomials to the Fiala thermo-physiological model. The proposed approach improves predictive accuracy and numerical stability over the existing standards (particularly in extrapolation), while maintaining comparable computational efficiency.

The manuscript is very well written and well illustrated .
I understand that the techniques (sparse model discovery) used for the regression are standard and no novelty is presented in that front. But I would add at least the kind of equations that these methods are aiming at. This will help interpreting the results (e.g. Table 1 or Figure 3, Why for a given polynomial degree the number of parameters change?).

Is the proposed function a linear combination of Legendre polynomials? Are Ta, va, Tr−Ta and rH its input variables?

Please present the shape of the polynomial basis expansions that you are fitting.

Minor comments

- In the introduction, when it is explained that the water vapor is not included, I would add that the relative humidity is included to account for its effect.

- "Training is conducted on only 20% of the available data, while performance is assessed on the remaining 80%"

Are these sets taken randomly?

Is this needed for a better fitting?

Is the number of points a limitation of the regression method?

Citation: https://doi.org/10.5194/egusphere-2025-5461-RC2
- AC2: 'Reply on RC2', Sabin Roman, 16 Apr 2026
  
  Please find our responses to the comments from the reviewers in the attached supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-5461-AC2

Status: closed

RC1:
'Comment on egusphere-2025-5461', Anonymous Referee #1, 15 Feb 2026

Summary:
The Universal Temperature Index is a measure of thermal comfort or discomfort perceived by humans, and is estimated by a environmental model from the measured values of air temperature, radiation, humidity etc. The model is complex to run, and so polynomial approximations for a quick, albeit not totally accurate, estimations have been developed. The standard polynomial approximation incurs in errors that are deemed too large. The present study presents another approximation method, based on orthogonal polynomial regression that seems to provide more accurate results.

Recommendation: The manuscript is well written and the study seems to have not technical flaws. For that standpoint I have very few comments. However, I do have a more general question on the motivation of the study, which I think the authors should address or justify more thoroughly

Main point

1) The manuscript mentions another alternative method, namely interpolation from an available look-up table that contains about 100 thousand values. This is also the approach recommended by Bröde (2021a). The manuscript argues that the storage of 100 thousand values makes the calculation cumbersome, but I clearly disagree. This storage would amount to roughly 1 MB of data, which is a very small space. Intuitively, I would argue that an interpolation of that table can produce very accurate values with a simple spline or linear algorithm. So the question arises as what would be the advantages of the algorithm presented in this manuscript relative to the look-up table interpolation.
I am not arguing that the study is not valuable, as it presents a possible way of producing more accurate estimation of the index, but the reader would ask themselves if it really worth the effort.
Bröde (2021a) argues that " This chapter provides hints and guidelines on how to handle these issues, and especially encourages the application of the hardly used look-up table approach, which will help avoiding many, if not all concerns related to UTCI calculation via the regression polynomial"

Minor points
2) The labels in Figure 1 are too small. This also the case to a lesser degree in other Figures. Figure 3 is ok, so I would recommend to homogenize the font size in all figures.
3) In table 2, the reader has to infer which is the train loss and the test loss. It seems that the train loss is the upper number, but this could be indicated more explicitly. It seems that the train loss numbers require to be wrapped by a []

Citation: https://doi.org/10.5194/egusphere-2025-5461-RC1
- AC1: 'Reply on RC1', Sabin Roman, 16 Apr 2026
  
  Please find our responses to the comments from the reviewers in the attached supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-5461-AC1
RC2:
'Comment on egusphere-2025-5461', Anonymous Referee #2, 26 Mar 2026

The article presents a new approximation for the Universal Thermal Climate Index by applying sparse regression with orthogonal polynomials to the Fiala thermo-physiological model. The proposed approach improves predictive accuracy and numerical stability over the existing standards (particularly in extrapolation), while maintaining comparable computational efficiency.

The manuscript is very well written and well illustrated .
I understand that the techniques (sparse model discovery) used for the regression are standard and no novelty is presented in that front. But I would add at least the kind of equations that these methods are aiming at. This will help interpreting the results (e.g. Table 1 or Figure 3, Why for a given polynomial degree the number of parameters change?).

Is the proposed function a linear combination of Legendre polynomials? Are Ta, va, Tr−Ta and rH its input variables?

Please present the shape of the polynomial basis expansions that you are fitting.

Minor comments

- In the introduction, when it is explained that the water vapor is not included, I would add that the relative humidity is included to account for its effect.

- "Training is conducted on only 20% of the available data, while performance is assessed on the remaining 80%"

Are these sets taken randomly?

Is this needed for a better fitting?

Is the number of points a limitation of the regression method?

Citation: https://doi.org/10.5194/egusphere-2025-5461-RC2
- AC2: 'Reply on RC2', Sabin Roman, 16 Apr 2026
  
  Please find our responses to the comments from the reviewers in the attached supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-5461-AC2

Sabin Roman, Gregor Skok, Ljupčo Todorovski, and Sašo Džeroski

Data sets

ESM 4 Peter Bröde et al. https://static-content.springer.com/esm/art%3A10.1007%2Fs00484-011-0454-1/MediaObjects/484_2011_454_MOESM2_ESM.zip

Model code and software

Code for Approximating the universal thermal climate index (UTCI) using sparse regression with orthogonal polynomials Sabin Roman https://zenodo.org/records/17465548

Sabin Roman, Gregor Skok, Ljupčo Todorovski, and Sašo Džeroski

Viewed

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 1,017 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,002	0	15	1,017	0	0

HTML: 1,002
PDF: 0
XML: 15
Total: 1,017
BibTeX: 0
EndNote: 0

Views and downloads (calculated since 06 Jan 2026)

Month	HTML	PDF	XML
Jan 2026	448	0	448
Feb 2026	232	4	236
Mar 2026	241	6	247
Apr 2026	72	5	77
May 2026	9	0	9

Cumulative views and downloads (calculated since 06 Jan 2026)

Month	HTML	PDF	XML
Jan 2026	448	0	448
Feb 2026	232	4	236
Mar 2026	241	6	247
Apr 2026	72	5	77
May 2026	9	0	9

Viewed (geographical distribution)

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 995 (including HTML, PDF, and XML) Thereof 995 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 13 May 2026

Short summary

This study aimed to improve how the Universal Thermal Climate Index, a key measure of human thermal comfort, is calculated. Existing methods use a simplified polynomial approximation that is straightforward to apply but can introduce errors. We developed a new version using sparse regression with orthogonal polynomials, which keeps computational efficiency while improving accuracy and stability. The results enable more reliable assessments of outdoor thermal comfort and climate analyses.


Total:	0
HTML:	0
PDF:	0
XML:	0