High-latitude auroral and cloudiness occurrence from automatic image classification
Abstract. We have investigated auroral and cloudiness occurrence over Kjell Henriksen Observatory (KHO) in Svalbard using full-colour all-sky images from 2016-2025. Our approach focused on constructing a high-quality manually labelled training set. Images were classified as ClearAurora, ClearNoAurora, CloudyAurora, or CloudyNoAurora based on their content. As there is natural overlap between these classes, we carried out several iterative validation rounds to increase the number of high-quality sample images while removing images with unclear contents. We then evaluated different Convolutional Neural Network topologies and selected the best performing network to classify all images between January 2016 and December 2025 (over 8 million images in total). In addition to the validation accuracy with the ground truth, we also estimated the classification accuracy based on a random selection of classified images. Our final classifier, called KHOnet2026, results in accuracies from 94% to 98% depending on the score and
image class.
We investigated auroral occurrence over Kjell Henriksen Observatory in Svalbard in 2016-2025 with data based on automatic classification of full-colour all-sky images (8.2 million images in total). We used a simple-to-use classification algorithm with several rounds of manual labelling of randomly selected individual images in 4 classes: ClearAurora, ClearNoAurora, CloudyAurora, CloudyNoAurora. In each iteration, images which were not obviously belonging to any of the four classes were removed to minimise the confusion. We therefore acknowledge that our classes naturally overlap, and that the overlap determines the highest achievable accuracy of our method, in this case 96%.
We found that most of our image data is cloudy (60-70%). A validation of the cloud occurrence results was performed with an independent dataset from a co-located cloud sensor. We found a good agreement between the two datasets at a monthly average level with a correlation coefficient of 0.86. Auroral occurrence over Svalbard is of the order of 25% of the imaging time, and it shows no solar cycle correlation but is rather modulated by the cloudiness. The portion of clear skies without aurora is only about 10%. The statistically clearest month at KHO is January, and the cloudiest is November. This automatic classification routine is set to run in real-time and further expand the database of classified images to aid researchers in finding images with aurora. This knowledge allows for a far more efficient use of computer time in analysis of the structural evolution of the aurora, when cloudy data can be excluded. Furthermore, the automatically classified images provide a very useful proxy for all other optical instruments hosted by KHO.
Review of "High-latitude auroral and cloudiness occurrence from automatic image classification" (egusphere-2026-2388) by Partamies and Syrjäsuo
The paper reports construction of dataset of sky condition at KHO in Svalbard. Although the method itself is not new, the content is important and should be published in a scientific journal like AnnGeo, because (1) the paper correctly pointed out that key for the supervised machine leaning (ML) is how to select correct training images of highly variable aurora/cloud activity, and showed how to make it (explanation of Fig 3 is excellent), and (2) Svalbard is a special location for auroral observation (this os different from other location) and the resultant database will be useful for all related study on aurora such as solar cycle dependence. To achieve (2), the authors note the importance of constructing "ground truth" dataset (by "interactive manual labelling"), which is needed these days after many supervised machine leaning (ML) methods are presented.
I have several point (major and minor) to be addressed before this paper is published. Currently the readability of particularly section 3 is not sufficient.
-----
Location of KHO: please add latitudes (and longitude)
Wording: "carefully construct" => Although the author who has long been worked on aurora image is one of the most reliable scientists (in terms of carefulness") to construct the "ground true" dataset, the word "careful" is unfortunately a subjective word. So, it is better to use "objective" numbers of iteration and numbers of year that the author has classified aurora. Similarly, best => best possible
Section 1
Solar cycle dependence: At Svalbard which is located far north of nightside aurora, no SSN dependence is reasonable. For nightside aurora, average size of the aurora oval (this depends on solar cycle) changes the probability of diffuse/discrete aurora at one fixed station. This note should be already mentioned in the introduction and discussion.
Section 3.2
"aurora expert": Did only Partamies checked manually, or any other too. Since Partamies is knows as expert (readers can guess from reference list), I recommend to explicitly write as "The author, as the aurora expert"
In example in Fig 2, only last column shows the moon case, and in Fig A1-A4, one two moon cases are found. When I looked at many aurora images for judging, the moon appears as often cloud because moons at second and third quarters are high during winter (dark month) at high latitude. How often is moon included in the training set and ML labelled result? I do not require exact number (not asking authors to classify all samples in Table 2, but just rough filling)
Line 151: "chose random images": from which years? Or is it from 2019-2020 unclear cases that was classified in the first manual labelling? Since this is not clear, I could not understand how to read Table 2. Is explanation in Section 4.1 (224-234) for Table 2?
Table 2 and 3 "title" (technical issue): Please change paragraph after the title (Unlike Figures, Tables do not have caption sentences). If you need caption, this should be moved to the footnote of table (with *1, *2, *3,,, marking)
Table 2
Also, where is "parentheses"? Is images for shaded part are all originally from 5000 images in the third row? How did you define ground truth? (classifier's result of which samples, and manually checked one last time?)
Line 179 and Table 3: "2000 random images per class": If each category has 2000 samples, why Table 3 summation is not 2000? Could you explain more.
Line 183-187: Unclear and incomplete sentences. Please fix. Errors are also found from section 4.
It would also be better to switch the order of explanation (cloudy<->clear confusion is not surprising and better to explain first).
Figure 4: Although this is a good example of "reason for wrong classification", I was bit surprised that you run your classification even for twilight (noon) case in the figure. The classification scheme drastically improves if you remove the twilight (SZA = 90-102 degree or 90-96 degree) cases. Could you explain why you include twilight cases?
Line 194 "exposure time": The aurora judging scheme depends on exposure time. Isn't the exposure time just before the abrupt appearance of aurora a good value to stop operation?
In the aurora quantification method (more explanation below), exposure time is used as one of the parameter to judge twilight time and moon (to adjust detected intensity). This case clearly show that "aurora could have existed but not detectable". From users viewpoint, such "twilight" should not marked as "no aurora"
Figure 4: The image is strongly affected by the moon after around 20 UT, and I suspect the some percentage of "ClearAurora" (particularly after 2330) could be no aurora. Could you show original images from around 2240 UT and 2340 UT?
It would be useful to mention the other types of automated aurora classification in addition to machine leaning type. Oldest example is photometer count (filtered 5577) to classify intensity, and recently, pixel-level classification to quantify the aurora activity was introduced (Yamauchi+ https://doi.org/10.5194/gi-12-71-2023). While the ML type provides good classification of morphology, manual type provides quantified values, and combining both methods will improve in evaluating the aurora activity in future. This work is a good reference for such work.
Line 210: 0.1 second per image is a good performance fitting to real-time operation of burst-mode (high time resolution) observation. Please mention such advantage.
Section 4.1
Line 235-247 "Sony and Nikon": The RGB sensitivity and its balance are quite different between Sony and Nikon, and is also a big problem for automated aurora quantification method too (above reference). Could you make similar table (appendix is ok) as Table 2 and Figure 3 just for this purpose (1000 samples)? After reading your result, I am quite confident that statistic (like Figure 6) should be separated obtained for Nikon camera and Sony camera. In this respect, I could not catch what authors want to say this paragraph: Excuse why the statistics is limited to 2016-2025 (I agree), or it is feasible to correct the judging of Nikon camera result in terms of Sony camera (I personally do not agree)?
Section 4.2
In the statistics (Fig 6), was twilight case (cf Fig 4 center part) included or excluded?
Figure 6: The "low" probability of AO at 12 MLT (which is close to local noon) and remaining double peak might be due to twilight that prevents recognising the aurora.
Figure 6: It is useful figure but please add a comment that this is not for "solid AO probability", because part of the cloudy cases (thick cloud through which the aurora is not possible to see) should be removed from total number for such probability. Even ClearAurora/(ClearAurora+ClearNoAurora) will not give the probability because ClearAurora allows 3/8 of the sky to be covered by cloud)
Figure 7 and 8: It is easier to place 11-12-1-2 from top to bottom.
Figure 8: I worry that high cloud rate under daylight/twilight means that cloud detection depends on the sky brightness, so that you should remove the twilight case from the GroundTrue. Could you comment? Please also compare ClearAurora/ClearNoAuora ratio and CloudyAurora/CloudyNoAurora
Line 311 "sound agreement": The slope < 1 (ML method overestimate the cloud) is better explicitly written rather than just saying a "discrepancy".
Line 330: For the next step, how about combining intensity of aurora (quantification of aurora using pixel size and integrated intensity of auroral pixels)? A general scheme is already presented in Yamauchi+ 2023.
Line 348 "Computer science point of view": What do you exactly mean? CPU time?
Lin 358 "allow the Moon and twilight in": As described above, they causes change of exposure, and drastically decrease the chance of recognising aurora, particularly the diffuse one. If they are included, they should also be classified as "MoonNoAurora" "MoonAurora". I am sure that ratio of MoonAurora/MoonNoAurora is smaller than ClearAurora/ClearNoAurora.
Line 375 "This suggest...": Not necessary true according to Table 2. This illusion comes from different mother group "clearNoAurora" vs "cloudyNoAurora.
Line 389 "more than 20%": The difference also comes from twilight data (see comment on Figure 6). Also the purpose of the aurora classification is different from Nanjo+ 2022. This paper aims to search aurora with blind information for the other observation teams, whereas the previous works are rather for aurora scientists (including citizen scientists) to search more classified aurora.
Line 404 "only one other": Actually the quantification of aurora activity from ASC (Yamauchi+ 2023) is already in operation, including ESA spaceweather site. Also, Nanjo's TromsoAI also includes "cloudy" probability