23 Oct 2023
 | 23 Oct 2023

A close look at using national ground stations for the statistical modeling of NO2

Foeke Boersma and Meng Lu

Abstract. Air pollution causes a manifold of negative health and societal problems. It is therefore essential to model and predict air pollution over space. An increasing number of statistical models of air pollution have been developed using geospatial variables associated with air pollution emission and dispersion processes. However, the increasing number of air pollution models does not always equate to an increase in prediction accuracy and uncertainty reduction. An important aspect that is often disregarded is the spatial heterogeneity. In this study, we aim to evaluate and compare various spatial and non-spatial statistical and machine learning methods, with attention given to different spatial groups. Spatial groups are identified by the predictor variables. We found that prediction accuracy differs substantially in different spatial groups. Predictions in places close to roads with high populations show poor prediction accuracy, while prediction accuracy increases in low population density areas for both local and global models. Prediction accuracy is further increased in places that are far from roads for global models. This division into spatial groups also shows that global non-linear methods are capable of higher prediction accuracy than global linear methods. The spatial prediction patterns show that non-linear methods generally predict more smoothly than linear methods. Additionally, clusters of predicted air pollution differ within and between cities. Lastly, applying the same methods to the local dataset yields poor metrics, especially for the non-linear methods.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Foeke Boersma and Meng Lu

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2023-1260', Anonymous Referee #1, 09 Nov 2023
  • RC2: 'Comment on egusphere-2023-1260', Anonymous Referee #2, 14 Jan 2024
  • AC1: 'Comment on egusphere-2023-1260', Foeke Boersma, 25 Mar 2024
Foeke Boersma and Meng Lu
Foeke Boersma and Meng Lu


Total article views: 324 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
217 86 21 324 24 13 23
  • HTML: 217
  • PDF: 86
  • XML: 21
  • Total: 324
  • Supplement: 24
  • BibTeX: 13
  • EndNote: 23
Views and downloads (calculated since 23 Oct 2023)
Cumulative views and downloads (calculated since 23 Oct 2023)

Viewed (geographical distribution)

Total article views: 307 (including HTML, PDF, and XML) Thereof 307 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 22 Jun 2024
Short summary
Air pollution harms health and society. Understanding and predicting it is crucial. Various models are developed to model air pollution. However, the consistency exhibited by a model in different areas is commonly neglected. Our study accounts for this and shows lower accuracy near busy roads, but higher in less populated areas. Considering location characteristics in air pollution predictions is important in comparing statistical models and understanding the health-society-space relationship.