the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
High-resolution air quality maps for Bucharest using Mixed-Effects Modeling Framework
Abstract. Fine-scale mapping of pollutants based on mobile observations facilitates deep understanding of air pollutants distribution within a city and fosters science-based decisions to improve air quality, by adding up to the existing but not optimally distributed permanent monitoring stations. In this study, we developed high-resolution concentration maps of nitrogen dioxide (NO2), particulate matter (PM10) and ultrafine particles (UFP) for Bucharest, Romania, to evaluate the spatial variation of pollutants across the city during the warm and the cold seasons. Maps were generated using a mixed-effect method applied to a Land-use Regression (LUR) model. The approach relies on multiple land-use and traffic predictor variables, and assimilation of data collected by mobile measurements over 30 days in the periods May–July 2022 and January–February 2023. Cross-validation was done against in-situ data extracted from the same collection, while validation was organized by comparison with standard measurements at fixed reference sites. Our study shows that this combined method has a good performance for all pollutants (R2 > 0.65), the highest performance being observed for the cold season. PM10 concentration maps indicate multiple sources of particles during the warm season, the most important source being traffic. During the cold season PM10 concentration maps show a more uniform distribution of sources in Bucharest. The city’s principal roads, particularly the Bucharest ring road, are also highlighted in the NO2 maps, with higher gradient during the warm period.
- Preprint
(5314 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2930', Anonymous Referee #3, 10 Dec 2024
General comments:
The manuscript presents an innovative approach to fine-scale air quality mapping in Bucharest, leveraging mixed-effects Land-Use Regression models and extensive mobile measurement campaigns. These results provide a novel and robust dataset for Bucharest, offering valuable insights into the spatial heterogeneity of air pollution across seasons and highlighting the potential of mobile monitoring and regression-based methods to address critical gaps in urban air quality assessments. The study’s findings are particularly relevant for informing policymakers and guiding targeted interventions to improve air quality in densely populated urban environments.
While the study is methodologically sound and addresses a pressing public health issue, the preprint would greatly benefit from improved clarity and expanded contextualization throughout the manuscript. The introduction could be strengthened by providing a broader overview of urban air quality modelling approaches and including relevant citations to highlight the significance of their work.
In the methods section, the rationale behind key methodological choices, such as the measurement campaign design, data filtering approaches, and validation strategies, should be elaborated to provide better context and justification for readers. Additional figures to visualize land use heterogeneity and measurement routes would greatly enhance the reader's ability to assess the study's spatial representativeness. Similarly, in the results and discussion section, more clarity is needed to interpret RMSE values and model outputs effectively. Including supporting plots (while not implicitly necessary), like scatter plots or overlaid maps, and addressing inconsistencies in figure captions and data representation will improve the manuscript’s readability.
The authors should also address specific questions about traffic and land use variables, ensuring data sources are up-to-date and well-referenced. Discussions of model limitations, particularly regarding underrepresentation of traffic congestion or the exclusion of vehicle-specific emissions, should be expanded to provide a balanced assessment of the findings.
Lastly, the conclusions could be made more impactful by explicitly outlining actionable insights for local policymakers, such as recommendations for optimizing air quality monitoring station distribution or addressing discrepancies in land use planning. By focusing on these areas, the authors will significantly improve the clarity, rigor, and practical implications of their study.
Specific comments and technical corrections
Abstract
Lines 1-3: Please consider revising the first sentence of your abstract to improve clarity. I would suggest breaking the long sentence into shorter ones. Please consider using a single, consistent term (e.g., "high-resolution mapping" or "fine-scale mapping") throughout the manuscript to ensure uniform terminology and avoid potential confusion. Otherwise, the bulk of the abstract seems to be well constructed.
Introduction
Lines 19 – 21. Please consider expanding this paragraph. Short term exposure to very high concentrations can also be a significant risk factor to human health in addition to prolonged exposure at lower concentrations.
Lines 23 – 25. “Despite its critical impact,” I believe this paragraph may benefit from a relevant citation.
Also line 35, lines 35-36 and 36-39 should be properly referenced.
Lines 39 – 46. “Several models have been developed…” Apart from LUR and dispersion models, some additional well-established models and methods used for in urban air quality studies can also be mentioned in this section. The LUR is sufficiently detailed in the following paragraphs however some other methods may be worth mentioning.
2 Materials and methods
2.1 Study area
Lines 90 – 95: …. “The land use of Bucharest is diverse” ….
Please consider adding an additional figure showcasing the diverse land use types present in the Bucharest metropolitan area. Additionally, one may plot the routes covered in the measurement campaign to showcase the lengths covered in each of the major land use areas. This would allow the readers to better assess the spatial coverage of the study in relation to Bucharest's heterogeneous urban structure.
2.2 Observational data
Lines 104-105. The authors describe conducting 15 measurement routes per campaign, each approximately 100 km in length. While the spatial coverage across different urban typologies is noted, it would be valuable to elaborate on the rationale behind this specific campaign structure. What were the key factors determining the number and configuration of routes? Statistical requirements for the modelling? Logistical constraints (traffic patterns, vehicle range, or time limitations)?
Lines 111-113. Please elaborate on the decision to use the pass-band filter with a window of 3 data points. What advantages are expected for choosing this smaller window? Why not use a larger window of 5 to 10?
2.3 Fine-mapping model
Lines 131 – 137. The authors explicitly address a temporal correction, however it is not clear to me if this is enough to cover for traffic congestions. I would expect that longer instances of traffic congestion and/or lower than intended traveling speeds would result in skewed measurements for any given 250m segments. How do the authors address this issue? Are there additional temporal corrections considered?
Lines 139-140: “First, a subset of data collected…”
How large was the subset of the cross-validation data? 20-30%? And what was the reasoning behind this percentage? Please elaborate on this topic.
3 Results and discussions
3.1 Tuning the mixed-effects Land-Use Regression model for Bucharest
Lines 163-165. How was the traffic intensity affected on rainy days? Was there any substantial rain fall during the measurement campaigns? One would expect an increase in traffic intensity leading to an increase in the random effect. Can the authors elaborate on this topic? Was the model performance impacted by higher traffic intensity on rainy days (if such conditions were present)?
Lines 167-168. I see the 85/15% split in training/validation is presented here. How did the authors tackle any overfitting issues that may arise from this split? Where any other splits considered and if so, what led to this split choice? Why is this optimal for your study? Please elaborate.
3.1.1 Spatial predictor variables
Comment: I am aware that the 2018 CORINE land cover data is currently the most recent iteration, however I suspect that the metropolitan area in Bucharest may have suffered some changes in the last 6 years since this dataset was made public. To this extent the additional map requested for section 2.1 may be useful for representing the land cover types and the campaign routs. One would expect some inconsistencies in the land cover data if any routs where traversing outside the residential areas.
Line 176: “There is no recent source quantifying the traffic intensity on road segments in Bucharest…”
What does “recent” mean for the authors? To my knowledge, some municipalities and/or local police agencies should have some up to date traffic data, including vehicle types and population data. Is this data not publicly available in Bucharest?
Line 184: “…. in Table 1. The column “direction of effect”…
The specified column is labelled simply as “effect”. For clarity please choose to update either the table column or the text at line 184.
Also, please update the source data column for the traffic intensity variable in table 1.
3.1.2 Predictor variable selection
Line 198: “The direction of effect for all variables was kept as in Table 1”
If this is indeed the case, what is the reasoning behind the +/- effect for “Agricultural areas” and “Water bodies”? Later we find out that water bodies were obvious sinks.
3.2 Evaluation of the model performances
Lines 215-216: Please give some additional on the criteria for selecting the 15% cross-validation data set in order to assess if there may be any data leakage present. Was the data selected from all routes and segments? Was there any consideration given to any potential data leakage from the training to the cross-validation data?
Lines 229-231: I am not sure how to interpret the RMSE values without any mean values. This is just a personal preference but for me, a simple scatter plot would have been more helpful.
Figure 1 could be improved by adding an additional layer. An open street map or an RGB satellite layer should improve the interpretation of these relative differences. I see figure 2 has one such layer so maybe keep the same theme.
3.2.2 Validation against independent measurements at fixed observation sites
Line 240 – 242: The authors mention that UFP data is not available in the NAQMN. Some additional information about what type of variables are/can be measured by NAQMN may be helpful for the readers.
Line 244 and Fig.2.
Please update figure 2 according to the type of variable being displayed. Both the upper panel and the lower panel indicate NO2, however the authors mention PM 10 concentrations in the Figure description.
Please explain the missing data in figure 2. For example, why are NO2 concentrations missing from B6 – cold period, or from MARS in the warm period. The same for PM10.
Table 2. A general comment also applicable to the remainder of the manuscript
If both PM2.5 and PM10 were modelled, why are PM2.5 missing form the results section? How did the model perform against the independent measurements of the NAQMN? If the data is available why not at least update table 2 with PM2.5.
3.2.3 Evaluation of the model performance to resolve different types of environment
General comment:
I se the relative differences seem to be mostly negative for the traffic datasets. To this extent I believe that the model is not suited to represent traffic congestions were multiple instances of vehicle stop/starts can result in higher NO2/Pm concentrations. I would also point out that vehicle types and ages, if not represented by the model, can also lead to these underestimations. Please update the discussions if this is the case.
Figure 3. Why are the standard deviations missing from the NO2 cold season traffic measurements?
3.3 Mapping atmospheric pollution in Bucharest
Line 307: ….“Sinks related to the green areas and water bodies regions are identified in green colours…’’ Green as in all shade of green or as in a specific shade (e.g. dark green)?
There seems to be some inconsistency in the discussion at line 307 – 309 … “overall NO2 concentration is higher during the warm period’’… with lines 312 – 315 “At the level of the city of Bucharest, the average value of the NO2 concentration as estimated by the model for the warm season is 16.66 ± 4.04 ppb, while for the cold season it is 18.75 ± 1.98 ppb” (suggesting higher concentrations in the cold). Also, the upper panel of figure 4 seems to suggest that NO2 concentrations are indeed higher in the warm period.
Line 316: “anthropocentric agglomerations”? Is it not anthropogenic?
Line 322: The phrase "sources are more homogeneous" is not precise - it's not the sources that are homogeneous, but rather the distribution or concentration of PM10.
Line 328: Please improve the clarity of the sentence, “Traffic sources are less effective…” Maybe “Traffic emissions have less impact” … also consider improving the “reduce under reduced” end part of the sentence.
Please update figure 5 with a large font size as seen in figures 3 and 4.
Conclusions
Lines 349-350. Consider replacing "correlated with" with "coupled with" to improve the clarity of the sentence.
Also, the sentence in lines 352-353 could benefit from minor grammatical improvements.
Expanding on the ideas presented in sentences 355-356 would strengthen the article's overall argument. Consider adding a paragraph in the results section to further explore this topic.
The sentence at line 359 can also benefit from minor grammatical improvements.
Line 365 – 367. Consider splitting the paragraph into 2 sentences to improve the clarity of the overall message.
Final comment: I see the authors have identified the advantages of this approach with regards to adding policymakers in local administrations. To this extent I believe the study could benefit from one additional conclusion. Based on this novel information, how should the local municipality address the current distribution of air quality monitoring stations? Would the city benefit from additional AQ stations and/or any specific spatial configuration? Based on the model output, have the authors identified any inconsistencies with respect to the current land use configuration in Bucharest? And if so, how can the local administration address these possible issues?
Citation: https://doi.org/10.5194/egusphere-2024-2930-RC1 -
RC2: 'Comment on egusphere-2024-2930', Anonymous Referee #4, 10 Dec 2024
This paper is a thorough and structured analysis of air quality mapping using a mixed-effects modeling framework for Bucharest. The paper is clearly written with clear sections which makes it mostly easy to follow. I recommend that it is accepted for publication with minimal minor changes.
Introduction –
The introduction effectively outlines the problem and objectives but could benefit from a clearer emphasis on how this research fills existing gaps compared to other studies. Adding brief references to similar studies in other European cities could strengthen its relevance.
Methodology –
Explain the rationale behind the 3-point moving average for outlier removal. How does this choice affect the spatial resolution and data reliability?
Results –
The analysis in Figure 5 (PM2.5/PM10 ratio maps) needs refinement.
Borders in Tables: When zooming into the document, gaps are noticeable in the outside borders of tables. This may be due to uneven line weights, misaligned elements, or incomplete formatting. Please address this.
Discussion –
Acknowledge the limitations of the mobile measurement route. For instance, areas with restricted car access might lead to underestimations.
The following suggestions would help the clarity:
Line 97: “...characterized by hot summers and cold winters.” (clarify structure for conciseness).
Line 92-93: “...more industrialized, hosting a variety of manufacturing plants, such as machinery, textiles...” (remove redundancy).
Typos:
- Line 164-165: aggregated values of the spatial predictor variables calculated in circular buffers with radii between 25 m and 2 km.”
For more Clearer phrasing: “...aggregated values of spatial predictor variables calculated within circular buffers ranging from 25 m to 2 km in radius.”
- Line361: “...pin pointed pollutant variability mostly during warm season and higher concentrations...”
Suggestion: “...pinpointed pollutant variability mostly during the warm season and higher concentrations...” Reason: "Pinpointed" should be one word, and "the" is required before "warm season."
-
RC3: 'Comment on egusphere-2024-2930', Anonymous Referee #5, 10 Dec 2024
"High-resolution air quality maps for Bucharest using Mixed-Effects Modeling Framework"
The study presented a model based on mobile measurements to generate high-resolution air quality maps in Bucharest, using a mixed-effects modeling framework integrated with the Land Use Regression (LUR) model. The authors demonstrated the model's ability to capture the seasonal and spatial variability of major air pollutants – such as NO2, PM10, and ultrafine particles (UFP) – during the warm and cold seasons, employing data collected during two intensive monitoring campaigns. The method showed good performance (R² > 0.65), being more effective during the cold season, when pollutant concentrations exhibited lower variability.
This work provides a significant contribution to the field of air quality, particularly in urban contexts with limited monitoring infrastructure, such as Bucharest. The integration of mobile measurements, land use parameters, and mixed-effects modeling addressed data gaps from fixed monitoring networks and enhanced the spatial resolution of concentration estimates. However, the study could benefit from a broader contextualization of the results, especially in discussing factors leading to model underestimation in dense urban areas and technical challenges related to monitoring in industrial and high-traffic environments.
It is recommended that the manuscript be revised to include more detailed analyses of methodological limitations, such as the impacts of selected buffer sizes and the validation of predictive variables. Additionally, including uncertainty statistics, such as confidence intervals, and more detailed maps of measurement routes and pollution gradients would enrich the conclusions. These improvements could further strengthen the study’s impact and its applicability to cities facing similar challenges.
Introduction- Expand the explanation of the limitations of alternative models, such as dispersion models. For example, discuss the dependence of dispersion models on detailed meteorological data and high computational capacity, contrasting this with the simplicity and efficiency of LUR.
- Develop a specific section addressing the short-term risks associated with high concentrations of PM10 and NO2. Include information about cardiovascular, respiratory, and even immune system impacts.
Are there comparative studies with other methods in cities similar to Bucharest that could enrich the justification presented?
Methodology- Add detailed maps of the study area, highlighting industrial, residential, and commercial zones. Include the routes of mobile measurements and collection points to contextualize the spatial distribution.
- Explain the criteria for buffer size selection and the reasons for using varying sizes.
Justify the use of the moving average filter, considering its role in removing outliers and enhancing model accuracy. - Include a comparative table presenting the advantages and disadvantages of alternative methodologies, such as satellite-based models versus hybrid models like LUR/mixed-effects.
- Were sensitivity tests conducted to evaluate the impacts of different combinations of predictive variables?
- How did varying buffer sizes influence the results, and was cross-validation performed to determine the optimal parameters?
Results and Discussion
- Relocate the methodological descriptions from sections 3.1, 3.1.1, 3.1.2, and 3.2 to the methodology section, facilitating a more focused discussion of the results.
- It would be beneficial to explain in detail why the model underestimates PM10 levels in urban areas. Including maps illustrating the spatial distribution of pollutants in different environments would enhance the section on environmental types.
- What were the main technical challenges in modeling industrial and high-traffic areas?
- Add graphs showing the differences between predictions and measured values, highlighting seasonal variations.
Conclusions
- Detail how citizen involvement could improve data collection, including examples of using bicycles or pedestrians to access restricted areas.
- Discuss how the methods could be adjusted for cities with similar urban characteristics, detailing the data requirements and necessary adjustments.
Citation: https://doi.org/10.5194/egusphere-2024-2930-RC3
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
208 | 93 | 11 | 312 | 2 | 3 |
- HTML: 208
- PDF: 93
- XML: 11
- Total: 312
- BibTeX: 2
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1