High-latitude auroral and cloudiness occurrence from automatic image classification
Abstract. We have investigated auroral and cloudiness occurrence over Kjell Henriksen Observatory (KHO) in Svalbard using full-colour all-sky images from 2016-2025. Our approach focused on constructing a high-quality manually labelled training set. Images were classified as ClearAurora, ClearNoAurora, CloudyAurora, or CloudyNoAurora based on their content. As there is natural overlap between these classes, we carried out several iterative validation rounds to increase the number of high-quality sample images while removing images with unclear contents. We then evaluated different Convolutional Neural Network topologies and selected the best performing network to classify all images between January 2016 and December 2025 (over 8 million images in total). In addition to the validation accuracy with the ground truth, we also estimated the classification accuracy based on a random selection of classified images. Our final classifier, called KHOnet2026, results in accuracies from 94% to 98% depending on the score and
image class.
We investigated auroral occurrence over Kjell Henriksen Observatory in Svalbard in 2016-2025 with data based on automatic classification of full-colour all-sky images (8.2 million images in total). We used a simple-to-use classification algorithm with several rounds of manual labelling of randomly selected individual images in 4 classes: ClearAurora, ClearNoAurora, CloudyAurora, CloudyNoAurora. In each iteration, images which were not obviously belonging to any of the four classes were removed to minimise the confusion. We therefore acknowledge that our classes naturally overlap, and that the overlap determines the highest achievable accuracy of our method, in this case 96%.
We found that most of our image data is cloudy (60-70%). A validation of the cloud occurrence results was performed with an independent dataset from a co-located cloud sensor. We found a good agreement between the two datasets at a monthly average level with a correlation coefficient of 0.86. Auroral occurrence over Svalbard is of the order of 25% of the imaging time, and it shows no solar cycle correlation but is rather modulated by the cloudiness. The portion of clear skies without aurora is only about 10%. The statistically clearest month at KHO is January, and the cloudiest is November. This automatic classification routine is set to run in real-time and further expand the database of classified images to aid researchers in finding images with aurora. This knowledge allows for a far more efficient use of computer time in analysis of the structural evolution of the aurora, when cloudy data can be excluded. Furthermore, the automatically classified images provide a very useful proxy for all other optical instruments hosted by KHO.