Spatial Predictor Selection for Next-Day Minimum Temperature Forecasting: An Automated Machine Learning Framework Applied Across European Climate Regimes
Abstract. Accurate prediction of near-surface air temperature remains a central challenge in geoscientific modeling, particularly when integrating high-dimensional spatial predictors derived from reanalysis datasets. While Model Output Statistics (MOS) approaches have been widely used, the systematic selection of spatially distributed predictors remains an open methodological issue.
This study proposes a genetic algorithm (GA) framework for automated predictor selection in daily minimum temperature forecasting. The method operates on spatially structured inputs derived from ERA5 reanalysis and is evaluated using observed temperature data from multiple European locations. The GA is designed to explore high-dimensional predictor spaces while controlling model complexity and ensuring compatibility with non-linear learning algorithms.
The approach is assessed using a one-day-ahead forecasting setup and compared against a LASSO-based baseline. Results show that the GA identifies compact predictor subsets that achieve predictive performance comparable to, or slightly better than, the baseline. Across test locations, mean absolute error values remain stable and indicate robust generalization.
Analysis of selected predictors highlights the existence of stable variable categories, although individual spatial selections exhibit variability across runs, reflecting the stochastic nature of the optimization process. These results suggest that predictor relevance should be interpreted in terms of distributions rather than fixed sets.
The proposed framework provides a flexible and reproducible approach to spatial feature selection in geoscientific applications. Its compatibility with complex models and high-dimensional inputs makes it a promising tool for improving forecasting systems based on reanalysis data. A key finding of this study is that spatial predictor selection is inherently non-unique, yet exhibits stable statistical structures at the variable level, suggesting that predictor relevance should be interpreted in probabilistic rather than deterministic terms.