Preprints
https://doi.org/10.5194/egusphere-2025-1399
https://doi.org/10.5194/egusphere-2025-1399
04 Apr 2025
 | 04 Apr 2025
Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

TOAR-classifier v2: A data-driven classification tool for global air quality stations

Ramiyou Karim Mache, Sabine Schröder, Michael Langguth, Ankit Patnala, and Martin G. Schultz

Abstract. Accurate characterization of station locations is crucial for reliable air quality assessments such as the Tropospheric Ozone Assessment Report (TOAR). This study introduces a machine learning approach to classify 23,974 stations in the unique global TOAR database as urban, suburban, or rural. We tested several methods: unsupervised K-means clustering with three clusters, and an ensemble of supervised learning classifiers including random forest, CatBoost, and LightGBM. We further enhanced the supervised learning performance by integrating these classifiers into a robust voting model, leveraging their collective predictive power. To address the inherent ambiguity of suburban areas, we implement an adjusted threshold probability technique. Our models, trained on the TOAR station metadata, are evaluated on 1,000 unseen data points. K-means clustering achieves 70.03 % and 71.53 % accuracy for urban and rural areas respectively, but only 26.36 % for suburban zones. Supervised classifiers surpass this performance, reaching over 84 % accuracy for urban and rural categories, and 62–65 % for suburban areas. The adjusted threshold technique significantly enhances overall model accuracy, particularly for suburban classification. The good separation of our model is confirmed through evaluation with NOx and PM2.5 concentration measurements, which were not included in the training data. Furthermore, manual inspection of 25 individual sites with Google maps reveals that our method provides a better label for the station type than the labels that were reported by data providers and used in the model evaluation. The objective station classification proposed in this paper therefore provides a robust foundation for type-specific air quality assessments in TOAR and elsewhere.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share
Ramiyou Karim Mache, Sabine Schröder, Michael Langguth, Ankit Patnala, and Martin G. Schultz

Status: open (until 30 May 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Ramiyou Karim Mache, Sabine Schröder, Michael Langguth, Ankit Patnala, and Martin G. Schultz

Data sets

TOAR-Classifier input and output data Ramiyou Karim Mache, Sabine Schröder, Michael Langguth, Ankit Patnala, and Martin G. Schultz https://gitlab.jsc.fz-juelich.de/esde/toar-public/ml_toar_station_classification

Model code and software

TOAR-Classifier model code Ramiyou Karim Mache, Sabine Schröder, Michael Langguth, Ankit Patnala, and Martin G. Schultz https://gitlab.jsc.fz-juelich.de/esde/toar-public/ml_toar_station_classification

Ramiyou Karim Mache, Sabine Schröder, Michael Langguth, Ankit Patnala, and Martin G. Schultz

Viewed

Total article views: 99 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
79 15 5 99 3 3
  • HTML: 79
  • PDF: 15
  • XML: 5
  • Total: 99
  • BibTeX: 3
  • EndNote: 3
Views and downloads (calculated since 04 Apr 2025)
Cumulative views and downloads (calculated since 04 Apr 2025)

Viewed (geographical distribution)

Total article views: 101 (including HTML, PDF, and XML) Thereof 101 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 23 Apr 2025
Download
Short summary
The TOAR-classifier model is a data-driven tool that allows for an objective classification of air quality measuring stations as urban, rural, or suburban. Such classification is important in the analysis of air pollutant trends and regional signatures. The model is employed in the second Tropospheric Ozone Assessment Report but can also be used in other research work.
Share